Skip to main content

Azure Storage Blobs (Day 18 of 31)

List of all posts from this series:

Short Description 
Azure Storage is one of foundation components of Microsoft Azure. It offers cloud storage in a scalable and redundant way for multiple use cases. It can store petabytes of data without any kind of problem, but in the same time is flexible and can handle heavy traffic very easily.
In this moment Azure Storage had 4 components that are defining it:

  • Blobs
  • Tables
  • Queues
  • Files

… it is pretty hard for me to talk about all of them in only one post. Because of this I decided to split this discussion into 4 separately post, that will target each of this services.
This post is dedicated to Azure Storage Blobs.

Short Description (second try)
Azure Storage Blobs allow us to store any kind of data into Azure. From documents, to videos, to databases backup, anything that is a binary content can be stored there. For normal use cases Blob storage is like Santa’s sack – bottomless, you can put content there without caring about that it will fill up.

Main Features 
First think that we should talk about is the size of Azure Blobs, which can reach up to 500TB of data per account. On top of this you can create multiple storage accounts. In this way you can end up with more than 500TB of data.
Also keep in mind that this limit, like other Azure limits can change in time (going up all the time), this mean that in the future, you might be able to store more than 500TB of data per account.
We can reach up to 10 gigabits per second of data for inbound and 20 gigabits per second of data for outbound traffic. On top of this, if you use the premium storage you can reach up to 50 gigabitsof data for inbound and outbound traffic (inbound + outbound = 50Gb).
I specially used gigabits to avoid confusion between GB and Gb.
Azure Blob storage give us multiple options when we talk about redundancy. By default we have redundancy at data center level – this mean that all the time there are 3 copies of the same content (and you pay only one). On top of this there are other 3 options of redundancy that I will describe below (plus the one that we already talk about):

  • LRS (Local Redundant Storage) – Content is replicate 3 times in the same data center (facility unit from region)
  • ZRS (Zone Redundant Storage) – Content is replicated 3 times in the same regions cross 2 datacenters (facilities) where 2 datacenters are available. Otherwise the content is replicated 3 times into 2 different regions
  • GRS (Geo Redundant Storage) – Content is replicated 6 times across 2 regions (3 times in the same region and 3 times across different regions)
  • RA- GRS (Read Access Geo Redundant Storage) – Content is replicated in the same way as for GRS, but you have read only access in the second region. For the GRS even if the data exist in the second region you cannot access it directly.

Pay only what you use
At the end of the month you will pay only the space that you used. Because of this, clients don’t pay in advance the space that will be used or to pay for the space that is not used anymore.
Native Libraries
There is full support for different programing languages and technologies. There are native libraries for .NET, Java, Node.js, C++ and PowerShell
On top of native libraries, there is full support for REST API. This mean that we can connect and manage storage from any device.
SSD (Premium Storage)
The standard storage that is used for blobs (and Azure Storage in general) is using HDDs and is called Standard Storage. On top of this, we can use the premium storage that is using SSDs. This mean that your I/O performance increase drastically, having a very low latency.
Types of blobs
There are two types of blobs:

  • Block Blobs – used to manage large files and are formed from multiple blocks are used with success when you need to access binary content as a stream
  • Page Blobs – used to manage files where random access is required. Base on a cursor you can access a specific location of a page blob

It is a unique tag used for versioning. Can be used with success to know if the blob was modified from the last time when the content was accessed by an application.
Snapshot support
There is the ability to create snapshots of blobs. Can be used with success to create backups or checkpoints of blobs. A snapshot contains the name of the blob and the date and time when it was taken.
Containers are used to group ore or more blobs under the same ‘folder’. For each storage account, users can define containers and grouped content under them. Can be a good way to manage the access rights.
Root Container
Needs to be manually created ($root) and plays the role of the default container of the storage. Using this container users can access the content of it without having to refer the container name if the path of the url.
We can configure the timeouts for different operations (access, upload, download). Based on this timeouts the blobs can throw an exception if clients don’t execute a specific action the maximum allowed time.
For example the maximum timeout to write a block list is 1 minute.
Metadata are basically HTTP heathers and allow us to add additional information related to content (there is also support for properties). We can add this type of content based on (key, value) pair at each container of blob level.
We can use metadata to store any kind of information related to blobs that we need.  
There are 3 different methods to manage the access to your blob content:

  • Container Permissions – you can manage access at container level. In this way you can allow anonymous users (rest of the internet) to read the content from container and blobs
  • Shared Access Signature – you have full control to manage read, write, manage access at container and blob level for each item. In this way different users can have different access level
  • Stored Access Policy – Similar with Shared Access Signature, but allow us better management mechanism. In this way for a specific key (SAS key) we can manage from the backend the access writes without having to revoke it.

Tracing capabilities
Azure Storage has tracing abilities over blobs and containers. Information like access time, client IP and how request ended can automatically be stored and accessed. In this way we can have a full audit over storage content.
Analytics Metrics
Important metrics are automatically stored and can be used to know the load on our storage. In can help us to decide to mitigate different problems and scale if needed.
Below you can find the metrics that are generated:

  • Transactions
  • Storage that is used
  • Number of containers
  • Number of committed/uncommitted blocks or pages blobs

There are some special tables that are used for this purpose. The name of this table start with “$Metrics”
Import/Export support
There is full support over an API to execute import/export operations over blob storage.

Premium Storage Max Size
The maximum capacity of a storage account is limited to 32TB in this moment. Don’t forget that you can have multiple storage accounts.
Block Blob Max Size
The maximum size of a block blob in this moment is 200GB.
Page Blob Max Size
The maximum size of a page blob in this moment is 1TB.
Size of Metadata
For each container of blob the size of metadata cannot exceed 8KB (keys + values).

There are also some limitations at the number of containers, blobs and so on, but the limits are very high and in normal use cases you will not hit them. You only need to take them into account if you know that you will hit pretty hard the storage infrastructure.

Applicable Use Cases 
Below you can find 3 use cases where I would use blob storage.
Sharing pictures between different users
Sharing pictures can be a pretty hard thing, when we take into account that we will need to handle a lot of content and also we need to be able to manage the access rights on each picture. We need to be able to let each user to manage the access rights for their pictures. Using blob storage our work can be very simple and easy. We don’t need to think about scaling, redundancy or manage access rights – Simple like that.
Backup on-premises content (or personal content like pictures)
I would use Azure blob storage to backup encrypted content from on-premises servers. In this way we don’t need to buy and manage the backups of our infrastructure that can become an expensive and hard thing to do.
Store applications content
When you design application that needs to handle and store binary content, you use blobs to store the content on them. In this way you will have a binary repository that can store data very cheap and also is available 24 per day.

Code Sample 

// Get storage account reference
CloudStorageAccount cloudStorageAccount = 
// get the reference to the client class that allow us to access blobs and containers
CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();

// Get a reference to a specific containers
CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference("cars");

// Get the content of container as a list
var blobItemList = cloudBlobContainer.ListBlobs(null, false);

// Get reference to a container and create it if don't exist.
CloudBlobContainer phonesContainer = cloudBlobClient.GetContainerReference("phones");

// Get reference to a blob
CloudBlockBlob bmwBlob = cloudBlobContainer.GetBlockBlobReference("bmw.mpg");

// Download blob content to a file
using (var localFileStream = System.IO.File.OpenWrite([myLocalPath]))

Pros and Cons 
(there are so many things, for me the most important one would be)

  • Redundancy support
  • Shared Access Signature
  • Size of the storage
  • Availability (up to 99.99%)

Only one, but can be managed from client applications:

  • A mechanism to notify users when replicated on the second facility was done (for us this is an important things)

When you start to calculate the cost of Azure Blob Storage you should take into account the following things:

  • Capacity (size)
  • Number of Transactions
  • Outbound traffic
  • Traffic between facilities (data centers)

In conclusion I would say that Azure Blob Storage is one of the greatest services offered by Azure. Even if is a base service, has a lot of features that can improve the quality of our products. I would even say that is one of Cloud wonders :-).


Popular posts from this blog

How to check in AngularJS if a service was register or not

There are cases when you need to check in a service or a controller was register in AngularJS.
For example a valid use case is when you have the same implementation running on multiple application. In this case, you may want to intercept the HTTP provider and add a custom step there. This step don’t needs to run on all the application, only in the one where the service exist and register.
A solution for this case would be to have a flag in the configuration that specify this. In the core you would have an IF that would check the value of this flag.
Another solution is to check if a specific service was register in AngularJS or not. If the service was register that you would execute your own logic.
To check if a service was register or not in AngularJS container you need to call the ‘has’ method of ‘inhector’. It will return TRUE if the service was register.
if ($injector.has('httpInterceptorService')) { $httpProvider.interceptors.push('httpInterceptorService&#…

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

Run native .NET application in Docker (.NET Framework 4.6.2)

The main scope of this post is to see how we can run a legacy application written in .NET Framework in Docker.

First of all, let’s define what is a legacy application in our context. By a legacy application we understand an application that runs .NET Framework 3.5 or higher in a production environment where we don’t have any more the people or documentation that would help us to understand what is happening behind the scene.
In this scenarios, you might want to migrate the current solution from a standard environment to Docker. There are many advantages for such a migration, like:

Continuous DeploymentTestingIsolationSecurity at container levelVersioning ControlEnvironment Standardization
Until now, we didn’t had the possibility to run a .NET application in Docker. With .NET Core, there was support for .NET Core in Docker, but migration from a full .NET framework to .NET Core can be costly and even impossible. Not only because of lack of features, but also because once you…