Skip to main content

Azure Storage Blobs (Day 18 of 31)

List of all posts from this series: http://vunvulearadu.blogspot.ro/2014/11/azure-blog-post-marathon-is-ready-to.html

Short Description 
Azure Storage is one of foundation components of Microsoft Azure. It offers cloud storage in a scalable and redundant way for multiple use cases. It can store petabytes of data without any kind of problem, but in the same time is flexible and can handle heavy traffic very easily.
In this moment Azure Storage had 4 components that are defining it:

  • Blobs
  • Tables
  • Queues
  • Files

… it is pretty hard for me to talk about all of them in only one post. Because of this I decided to split this discussion into 4 separately post, that will target each of this services.
This post is dedicated to Azure Storage Blobs.

Short Description (second try)
Azure Storage Blobs allow us to store any kind of data into Azure. From documents, to videos, to databases backup, anything that is a binary content can be stored there. For normal use cases Blob storage is like Santa’s sack – bottomless, you can put content there without caring about that it will fill up.


Main Features 
Size
First think that we should talk about is the size of Azure Blobs, which can reach up to 500TB of data per account. On top of this you can create multiple storage accounts. In this way you can end up with more than 500TB of data.
Also keep in mind that this limit, like other Azure limits can change in time (going up all the time), this mean that in the future, you might be able to store more than 500TB of data per account.
Speed
We can reach up to 10 gigabits per second of data for inbound and 20 gigabits per second of data for outbound traffic. On top of this, if you use the premium storage you can reach up to 50 gigabitsof data for inbound and outbound traffic (inbound + outbound = 50Gb).
I specially used gigabits to avoid confusion between GB and Gb.
Redundant 
Azure Blob storage give us multiple options when we talk about redundancy. By default we have redundancy at data center level – this mean that all the time there are 3 copies of the same content (and you pay only one). On top of this there are other 3 options of redundancy that I will describe below (plus the one that we already talk about):

  • LRS (Local Redundant Storage) – Content is replicate 3 times in the same data center (facility unit from region)
  • ZRS (Zone Redundant Storage) – Content is replicated 3 times in the same regions cross 2 datacenters (facilities) where 2 datacenters are available. Otherwise the content is replicated 3 times into 2 different regions
  • GRS (Geo Redundant Storage) – Content is replicated 6 times across 2 regions (3 times in the same region and 3 times across different regions)
  • RA- GRS (Read Access Geo Redundant Storage) – Content is replicated in the same way as for GRS, but you have read only access in the second region. For the GRS even if the data exist in the second region you cannot access it directly.

Pay only what you use
At the end of the month you will pay only the space that you used. Because of this, clients don’t pay in advance the space that will be used or to pay for the space that is not used anymore.
Native Libraries
There is full support for different programing languages and technologies. There are native libraries for .NET, Java, Node.js, C++ and PowerShell
REST API
On top of native libraries, there is full support for REST API. This mean that we can connect and manage storage from any device.
SSD (Premium Storage)
The standard storage that is used for blobs (and Azure Storage in general) is using HDDs and is called Standard Storage. On top of this, we can use the premium storage that is using SSDs. This mean that your I/O performance increase drastically, having a very low latency.
Types of blobs
There are two types of blobs:

  • Block Blobs – used to manage large files and are formed from multiple blocks are used with success when you need to access binary content as a stream
  • Page Blobs – used to manage files where random access is required. Base on a cursor you can access a specific location of a page blob

ETag
It is a unique tag used for versioning. Can be used with success to know if the blob was modified from the last time when the content was accessed by an application.
Snapshot support
There is the ability to create snapshots of blobs. Can be used with success to create backups or checkpoints of blobs. A snapshot contains the name of the blob and the date and time when it was taken.
Containers
Containers are used to group ore or more blobs under the same ‘folder’. For each storage account, users can define containers and grouped content under them. Can be a good way to manage the access rights.
Root Container
Needs to be manually created ($root) and plays the role of the default container of the storage. Using this container users can access the content of it without having to refer the container name if the path of the url.
Timeouts
We can configure the timeouts for different operations (access, upload, download). Based on this timeouts the blobs can throw an exception if clients don’t execute a specific action the maximum allowed time.
For example the maximum timeout to write a block list is 1 minute.
Metadata
Metadata are basically HTTP heathers and allow us to add additional information related to content (there is also support for properties). We can add this type of content based on (key, value) pair at each container of blob level.
We can use metadata to store any kind of information related to blobs that we need.  
Security
There are 3 different methods to manage the access to your blob content:

  • Container Permissions – you can manage access at container level. In this way you can allow anonymous users (rest of the internet) to read the content from container and blobs
  • Shared Access Signature – you have full control to manage read, write, manage access at container and blob level for each item. In this way different users can have different access level
  • Stored Access Policy – Similar with Shared Access Signature, but allow us better management mechanism. In this way for a specific key (SAS key) we can manage from the backend the access writes without having to revoke it.

Tracing capabilities
Azure Storage has tracing abilities over blobs and containers. Information like access time, client IP and how request ended can automatically be stored and accessed. In this way we can have a full audit over storage content.
Analytics Metrics
Important metrics are automatically stored and can be used to know the load on our storage. In can help us to decide to mitigate different problems and scale if needed.
Below you can find the metrics that are generated:

  • Transactions
  • Storage that is used
  • Number of containers
  • Number of committed/uncommitted blocks or pages blobs

There are some special tables that are used for this purpose. The name of this table start with “$Metrics”
Import/Export support
There is full support over an API to execute import/export operations over blob storage.

Limitations 
Premium Storage Max Size
The maximum capacity of a storage account is limited to 32TB in this moment. Don’t forget that you can have multiple storage accounts.
Block Blob Max Size
The maximum size of a block blob in this moment is 200GB.
Page Blob Max Size
The maximum size of a page blob in this moment is 1TB.
Size of Metadata
For each container of blob the size of metadata cannot exceed 8KB (keys + values).

There are also some limitations at the number of containers, blobs and so on, but the limits are very high and in normal use cases you will not hit them. You only need to take them into account if you know that you will hit pretty hard the storage infrastructure.

Applicable Use Cases 
Below you can find 3 use cases where I would use blob storage.
Sharing pictures between different users
Sharing pictures can be a pretty hard thing, when we take into account that we will need to handle a lot of content and also we need to be able to manage the access rights on each picture. We need to be able to let each user to manage the access rights for their pictures. Using blob storage our work can be very simple and easy. We don’t need to think about scaling, redundancy or manage access rights – Simple like that.
Backup on-premises content (or personal content like pictures)
I would use Azure blob storage to backup encrypted content from on-premises servers. In this way we don’t need to buy and manage the backups of our infrastructure that can become an expensive and hard thing to do.
Store applications content
When you design application that needs to handle and store binary content, you use blobs to store the content on them. In this way you will have a binary repository that can store data very cheap and also is available 24 per day.

Code Sample 

// Get storage account reference
CloudStorageAccount cloudStorageAccount = 
    CloudStorageAccount.Parse(
        CloudConfigurationManager.GetSetting("FooStorageConnectionString"));
        
// get the reference to the client class that allow us to access blobs and containers
CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();

// Get a reference to a specific containers
CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference("cars");

// Get the content of container as a list
var blobItemList = cloudBlobContainer.ListBlobs(null, false);

// Get reference to a container and create it if don't exist.
CloudBlobContainer phonesContainer = cloudBlobClient.GetContainerReference("phones");
phonesContainer.CreateIfNotExists();

// Get reference to a blob
CloudBlockBlob bmwBlob = cloudBlobContainer.GetBlockBlobReference("bmw.mpg");

// Download blob content to a file
using (var localFileStream = System.IO.File.OpenWrite([myLocalPath]))
{
    bmwBlob.DownloadToStream(localFileStream);
}

Pros and Cons 
Pros
(there are so many things, for me the most important one would be)

  • Redundancy support
  • Shared Access Signature
  • Size of the storage
  • Availability (up to 99.99%)

Cons
Only one, but can be managed from client applications:

  • A mechanism to notify users when replicated on the second facility was done (for us this is an important things)


Pricing 
When you start to calculate the cost of Azure Blob Storage you should take into account the following things:

  • Capacity (size)
  • Number of Transactions
  • Outbound traffic
  • Traffic between facilities (data centers)


Conclusion
In conclusion I would say that Azure Blob Storage is one of the greatest services offered by Azure. Even if is a base service, has a lot of features that can improve the quality of our products. I would even say that is one of Cloud wonders :-).

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...