Cloud Zone is brought to you in partnership with:

I'm founder of Cynapta Software, an ISV based out of India. I'm also a Windows Azure MVP. Gaurav is a DZone MVB and is not an employee of DZone and has posted 35 posts at DZone. You can read more from them at their website. View Full User Profile

Uploading Large Files in Windows Azure Blob Storage

02.18.2013
| 3946 views |
  • submit to reddit

 

In my last post I talked about Shared Access Signature feature of Windows Azure Storage. You can read that post here: http://gauravmantri.com/2013/02/13/revisiting-windows-azure-shared-access-signature/. In this post we’ll put that to some practical use :) .

One thing I wanted to accomplish recently is the ability to upload very large files into Windows Azure Blob Storage from a web application. General approach is to read the file through your web application using “File” HTML control and upload that entire file to some server side code which would then upload the file in blob storage. This approach would work best for smaller files but would fail terribly when it comes to moderately to very large files as the file upload control would upload the entire file to the server (for bigger files, this would cause timeouts depending on your Internet connection) and then that file resides in the server memory before any action can be taken on that file (again for bigger files, this would cause performance issues assuming you have thousands of users uploading thousands of large files).

In this blog post, we’ll talk about how we can accomplish that using File API feature in HTML5.

Please be warned that I’m no JavaScript/HTML/CSS expert :) . About 4 – 5 years ago I used to do a lot of JavaScript development and built some really insane applications using JavaScript (like mail merge kind of features, office automation, native device interfacing all using JS) but that was then. For last 4 – 5 years, I have been extensively doing desktop application development. So most of the JS code you’ll see below is copied from various sites and StackOverflow :) .

Objectives

Following were some of the objectives I had in mind:

  1. The interface should be web based (possibly pure HTML).
  2. I don’t have to share my storage account credentials with the end users.
  3. If possible, there should not be any server side code (ASP.Net/PHP etc.) i.e. it should be pure HTML and the communication should be between user’s web browser and storage account.
  4. If possible, the solution should upload the file in chunks so that bigger files can be uploaded without reading them in completely in a single go.

Now let’s see how we can meet our objectives.

Solution

Here’re some of the things I did to achieve the objectives listed above.

Chunking File

The first challenge I ran into is how to upload the file in chunks. All the examples I saw were uploading an entire file however this is not what I wanted. Finally I ran into this example on HTML5 Rocks where they talked about the File API: http://www.html5rocks.com/en/tutorials/file/dndfiles/. Basically what caught my interest there is the “slicing” feature available in HTML 5’s File interface. In short, what this slice feature does is that it reads a portion of a file asynchronously and returns that data. This is exactly I was looking at. Once I found that, I knew 75% of my work was done! Everything else was a breeze :)

Securing Storage Account Credentials

Next thing on my list was securing storage account credentials and this was rather painless as I knew exactly what I had to do – Shared Access Signature. SAS provided me with a secured URI using which users will be able to upload files in my storage account without me giving them access to storage account credentials. This was covered extensively in my post about the same: Revisiting Windows Azure Shared Access Signature.

Direct Communication Between Client Application and Windows Azure Storage

Next thing was to facilitate direct communication between the client application and storage. As we know Windows Azure Storage is built on REST so that means I can simply use AJAX functionality to communicate with REST API. One important thing to understand is that Windows Azure Storage still does not support Cross-Origin Resource Sharing (CORS) at the time of writing this blog. What that means is that your web application and blob storage must be in the same domain. The solution to this problem is to host your HTML application in a public blob container in the same storage account where you want your users to upload the files. I’ve been told that CORS support is coming soon in Windows Azure Storage and once that happens then you need not host this application in that storage account but till then you would need to live with this limitation.

The Code

Now let’s look at the code.

HTML Interface

Since I was trying to hack my way through the code, I kept the interface rather simple. Here’s how my HTML code looks like:

<body>
    <form>
        <div style="margin-left: 20px;">
            <h1>File Uploader</h1>
            <p>
                <strong>SAS URI</strong>:
                <br/>
                <span class="input-control text">
                    <input type="text" id="sasUrl" style="width: 50%"
                           value=""/>
                </span>
            </p>
            <p>
                <strong>File To Upload</strong>:
                <br/>
                <span class="input-control text">
                    <input type="file" id="file" name="file" style="width: 50%"/>
                </span>
            </p>
            <div id="output">
                 
                <strong>File Properties:</strong>
                <br/>
                <p>
                    Name: <span id="fileName"></span>
                </p>
                <p>
                    File Size: <span id="fileSize"></span> bytes.
                </p>
                <p>
                    File Type: <span id="fileType"></span>
                </p>
                <p>
                    <input type="button" value="Upload File" onclick="uploadFileInBlocks()"/>
                </p>
                <p>
                    <strong>Progress</strong>: <span id="fileUploadProgress">0.00 %</span>
                </p>
            </div>
        </div>
        <div>
        </div>
    </form>
</body>

image

All it has a textbox for a user to enter SAS URI and HTML File control. Once the user selects a file, I display the file properties and the “Upload” button to start uploading the file.

Reading File Properties

To display file properties, I made use of “onchange” event of File element. The event gave me a list of files. Since I was uploading just one file, I picked up the first file from that list and got its name (blob would have that name), size (blob’s size and determining chunk size) and type (for setting blob’s content type property).

//Bind the change event.
$("#file").bind('change', handleFileSelect);
 
         
        //Read the file and find out how many blocks we would need to split it.
        function handleFileSelect(e) {
            var files = e.target.files;
            selectedFile = files[0];
            $("#fileName").text(selectedFile.name);
            $("#fileSize").text(selectedFile.size);
            $("#fileType").text(selectedFile.type);
        }

Chunking

In my application I made an assumption that I will split the file in chunks of 256 KB size. Once I found the file’s size, I just found out the total number of chunks.

//Read the file and find out how many blocks we would need to split it.
function handleFileSelect(e) {
    maxBlockSize = 256 * 1024;
    currentFilePointer = 0;
    totalBytesRemaining = 0;
    var files = e.target.files;
    selectedFile = files[0];
    $("#output").show();
    $("#fileName").text(selectedFile.name);
    $("#fileSize").text(selectedFile.size);
    $("#fileType").text(selectedFile.type);
    var fileSize = selectedFile.size;
    if (fileSize < maxBlockSize) {
        maxBlockSize = fileSize;
        console.log("max block size = " + maxBlockSize);
    }
    totalBytesRemaining = fileSize;
    if (fileSize % maxBlockSize == 0) {
        numberOfBlocks = fileSize / maxBlockSize;
    } else {
        numberOfBlocks = parseInt(fileSize / maxBlockSize, 10) + 1;
    }
    console.log("total blocks = " + numberOfBlocks);
}

Endpoint for File Uploading

The SAS URI actually represented a URI for blob container. Since I had to create an endpoint for uploading file, I split the URI (path and query) and appended the file name to the path and then re-appended the query to the end.

var baseUrl = $("#sasUrl").val();
var indexOfQueryStart = baseUrl.indexOf("?");
submitUri = baseUrl.substring(0, indexOfQueryStart) + '/' + selectedFile.name + baseUrl.substring(indexOfQueryStart);
console.log(submitUri);

Reading Chunk

This is where File API’s chunk function would come in picture. What happens in the code is that when the user clicks the upload button, I read a chunk of that file asynchronously and get a byte array. That byte array will be uploaded.

var reader = new FileReader();
var fileContent = selectedFile.slice(currentFilePointer, currentFilePointer + maxBlockSize);
reader.readAsArrayBuffer(fileContent);

Uploading Chunk

Since I wanted to implement uploading in chunk, as soon as a chunk is read from the file, I create a Put Block request based on Put Block REST API specification using jQuery’s AJAX function and pass that chunk as data. Once this request is successfully completed, I read the next chunk and repeat the process till the time all chunks are processed.

reader.onloadend = function (evt) {
    if (evt.target.readyState == FileReader.DONE) { // DONE == 2
        var uri = submitUri + '&comp=block&blockid=' + blockIds[blockIds.length - 1];
        var requestData = new Uint8Array(evt.target.result);
        $.ajax({
            url: uri,
            type: "PUT",
            data: requestData,
            processData: false,
            beforeSend: function(xhr) {
                xhr.setRequestHeader('x-ms-blob-type', 'BlockBlob');
                xhr.setRequestHeader('Content-Length', requestData.length);
            },
            success: function (data, status) {
                console.log(data);
                console.log(status);
                bytesUploaded += requestData.length;
                var percentComplete = ((parseFloat(bytesUploaded) / parseFloat(selectedFile.size)) * 100).toFixed(2);
                $("#fileUploadProgress").text(percentComplete + " %");
                uploadFileInBlocks();
            },
            error: function(xhr, desc, err) {
                console.log(desc);
                console.log(err);
            }
        });
    }
};

Committing Blob

Last step in this process is to commit the blob in blob storage. For this I create a Put Block List request based on Put Block List REST API specification and process that request again using jQuery’s AJAX function and pass the block list as data. This completed the process.

function commitBlockList() {
    var uri = submitUri + '&comp=blocklist';
    console.log(uri);
    var requestBody = '<?xml version="1.0" encoding="utf-8"?><BlockList>';
    for (var i = 0; i < blockIds.length; i++) {
        requestBody += '<Latest>' + blockIds[i] + '</Latest>';
    }
    requestBody += '</BlockList>';
    console.log(requestBody);
    $.ajax({
        url: uri,
        type: "PUT",
        data: requestBody,
        beforeSend: function (xhr) {
            xhr.setRequestHeader('x-ms-blob-content-type', selectedFile.type);
            xhr.setRequestHeader('Content-Length', requestBody.length);
        },
        success: function (data, status) {
            console.log(data);
            console.log(status);
        },
        error: function (xhr, desc, err) {
            console.log(desc);
            console.log(err);
        }
    });

Complete Code

Here’s the complete code. For CSS, I actually used Metro UI CSShttp://metroui.org.ua/. If you’re planning on building web applications and want to give them Windows 8 applications style look and feel, do give it a try. It’s pretty awesome + it’s open source. Really no reason for you to not give it a try!

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>File Uploader</title>
    <script src="js/jquery-1.7.1.js"></script>
    <link rel="stylesheet" href="css/modern.css"/>
    <script>
        var maxBlockSize = 256 * 1024;//Each file will be split in 256 KB.
        var numberOfBlocks = 1;
        var selectedFile = null;
        var currentFilePointer = 0;
        var totalBytesRemaining = 0;
        var blockIds = new Array();
        var blockIdPrefix = "block-";
        var submitUri = null;
        var bytesUploaded = 0;
         
        $(document).ready(function () {
            $("#output").hide();
            $("#file").bind('change', handleFileSelect);
            if (window.File && window.FileReader && window.FileList && window.Blob) {
                // Great success! All the File APIs are supported.
            } else {
                alert('The File APIs are not fully supported in this browser.');
            }
        });
         
        //Read the file and find out how many blocks we would need to split it.
        function handleFileSelect(e) {
            maxBlockSize = 256 * 1024;
            currentFilePointer = 0;
            totalBytesRemaining = 0;
            var files = e.target.files;
            selectedFile = files[0];
            $("#output").show();
            $("#fileName").text(selectedFile.name);
            $("#fileSize").text(selectedFile.size);
            $("#fileType").text(selectedFile.type);
            var fileSize = selectedFile.size;
            if (fileSize < maxBlockSize) {
                maxBlockSize = fileSize;
                console.log("max block size = " + maxBlockSize);
            }
            totalBytesRemaining = fileSize;
            if (fileSize % maxBlockSize == 0) {
                numberOfBlocks = fileSize / maxBlockSize;
            } else {
                numberOfBlocks = parseInt(fileSize / maxBlockSize, 10) + 1;
            }
            console.log("total blocks = " + numberOfBlocks);
            var baseUrl = $("#sasUrl").val();
            var indexOfQueryStart = baseUrl.indexOf("?");
            submitUri = baseUrl.substring(0, indexOfQueryStart) + '/' + selectedFile.name + baseUrl.substring(indexOfQueryStart);
            console.log(submitUri);
        }
 
        var reader = new FileReader();
 
        reader.onloadend = function (evt) {
            if (evt.target.readyState == FileReader.DONE) { // DONE == 2
                var uri = submitUri + '&comp=block&blockid=' + blockIds[blockIds.length - 1];
                var requestData = new Uint8Array(evt.target.result);
                $.ajax({
                    url: uri,
                    type: "PUT",
                    data: requestData,
                    processData: false,
                    beforeSend: function(xhr) {
                        xhr.setRequestHeader('x-ms-blob-type', 'BlockBlob');
                        xhr.setRequestHeader('Content-Length', requestData.length);
                    },
                    success: function (data, status) {
                        console.log(data);
                        console.log(status);
                        bytesUploaded += requestData.length;
                        var percentComplete = ((parseFloat(bytesUploaded) / parseFloat(selectedFile.size)) * 100).toFixed(2);
                        $("#fileUploadProgress").text(percentComplete + " %");
                        uploadFileInBlocks();
                    },
                    error: function(xhr, desc, err) {
                        console.log(desc);
                        console.log(err);
                    }
                });
            }
        };
 
        function uploadFileInBlocks() {
            if (totalBytesRemaining > 0) {
                console.log("current file pointer = " + currentFilePointer + " bytes read = " + maxBlockSize);
                var fileContent = selectedFile.slice(currentFilePointer, currentFilePointer + maxBlockSize);
                var blockId = blockIdPrefix + pad(blockIds.length, 6);
                console.log("block id = " + blockId);
                blockIds.push(btoa(blockId));
                reader.readAsArrayBuffer(fileContent);
                currentFilePointer += maxBlockSize;
                totalBytesRemaining -= maxBlockSize;
                if (totalBytesRemaining < maxBlockSize) {
                    maxBlockSize = totalBytesRemaining;
                }
            } else {
                commitBlockList();
            }
        }
         
        function commitBlockList() {
            var uri = submitUri + '&comp=blocklist';
            console.log(uri);
            var requestBody = '<?xml version="1.0" encoding="utf-8"?><BlockList>';
            for (var i = 0; i < blockIds.length; i++) {
                requestBody += '<Latest>' + blockIds[i] + '</Latest>';
            }
            requestBody += '</BlockList>';
            console.log(requestBody);
            $.ajax({
                url: uri,
                type: "PUT",
                data: requestBody,
                beforeSend: function (xhr) {
                    xhr.setRequestHeader('x-ms-blob-content-type', selectedFile.type);
                    xhr.setRequestHeader('Content-Length', requestBody.length);
                },
                success: function (data, status) {
                    console.log(data);
                    console.log(status);
                },
                error: function (xhr, desc, err) {
                    console.log(desc);
                    console.log(err);
                }
            });
 
        }
        function pad(number, length) {
            var str = '' + number;
            while (str.length < length) {
                str = '0' + str;
            }
            return str;
        }
    </script>
</head>
<body>
    <form>
        <div style="margin-left: 20px;">
            <h1>File Uploader</h1>
            <p>
                <strong>SAS URI</strong>:
                <br/>
                <span class="input-control text">
                    <input type="text" id="sasUrl" style="width: 50%"
                           value=""/>
                </span>
            </p>
            <p>
                <strong>File To Upload</strong>:
                <br/>
                <span class="input-control text">
                    <input type="file" id="file" name="file" style="width: 50%"/>
                </span>
            </p>
            <div id="output">
                 
                <strong>File Properties:</strong>
                <br/>
                <p>
                    Name: <span id="fileName"></span>
                </p>
                <p>
                    File Size: <span id="fileSize"></span> bytes.
                </p>
                <p>
                    File Type: <span id="fileType"></span>
                </p>
                <p>
                    <input type="button" value="Upload File" onclick="uploadFileInBlocks()"/>
                </p>
                <p>
                    <strong>Progress</strong>: <span id="fileUploadProgress">0.00 %</span>
                </p>
            </div>
        </div>
        <div>
        </div>
    </form>
</body>
</html>

Some Caveats

This makes use of HTML5 File API and while all new browsers support that, same can’t be said about older browsers. If your users would be accessing an application like this using older browsers, you would need to think about alternative approaches. You could either make use of SWF File Uploader or could write an application using Silverlight. Steve Marx wrote a blog post about uploading files using Shared Access Signature and Silverlight which you can read here: http://blog.smarx.com/posts/uploading-windows-azure-blobs-from-silverlight-part-1-shared-access-signatures.

I found the code working in IE 10, Google Chrome (version 24.0.1312.57 m) on my Windows 8 machine. I got error when I tried to run the code in FireFox (version 18.0.2) and Safari (version 5.1.7) browsers so obviously one would need to keep the browser incompatibility in mind.

Enhancements

I hacked this code in about 4 hours or so and obviously my knowledge is somewhat limited when it comes to JavaScript and CSS so a lot can be improved on that front :) . However some other features I could think of are:

Generate SAS on demand: You could possibly have a server side component which would generate SAS URI on demand instead of having a user enter that manually.

Multiple file uploads: This application can certainly be extended to upload multiple files. A user would select multiple files (or may be even a folder) and have the application upload multiple files.

Drag/drop support: This application can certainly be extended to support drag/drop scenario where users could drag files from their desktop and drop them to upload.

Do upload in Web Worker: This is another improvement that can be done where uploads are done through web worker capability in HTML5.

Parallel uploads: Currently the code uploads one chunk at a time. A modification could be to upload multiple chunks simultaneously.

Transient error handling: Since Windows Azure Storage is a remote shared resource, you may encounter transient errors. You could modify the application to handle these transient errors. For more details on transient errors, please see this blog post of mine: http://gauravmantri.com/2013/01/11/some-best-practices-for-building-windows-azure-cloud-applications/.

Summary

So that’s it for this post! As you saw, it is quite easy to implement a very simple HTML/JS based application for getting data into Windows Azure Blob Storage. Obviously there’re some limitations and there’s cross-browser compatibility issues one would need to consider but once those are sorted out, it opens up a lot of exciting opportunities. I hope you’ve found this post useful. As always, if you find any issues with the post please let me know and I’ll fix it ASAP.

Happy Coding!





Published at DZone with permission of Gaurav Mantri, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)