Azure Storage Blobs (Day 18 of 31)

List of all posts from this series: http://vunvulearadu.blogspot.ro/2014/11/azure-blog-post-marathon-is-ready-to.html

Short Description
Azure Storage is one of foundation components of Microsoft Azure. It offers cloud storage in a scalable and redundant way for multiple use cases. It can store petabytes of data without any kind of problem, but in the same time is flexible and can handle heavy traffic very easily.
In this moment Azure Storage had 4 components that are defining it:

Blobs
Tables
Queues
Files

… it is pretty hard for me to talk about all of them in only one post. Because of this I decided to split this discussion into 4 separately post, that will target each of this services.
This post is dedicated to Azure Storage Blobs.

Short Description (second try)
Azure Storage Blobs allow us to store any kind of data into Azure. From documents, to videos, to databases backup, anything that is a binary content can be stored there. For normal use cases Blob storage is like Santa’s sack – bottomless, you can put content there without caring about that it will fill up.

Main Features
Size
First think that we should talk about is the size of Azure Blobs, which can reach up to 500TB of data per account. On top of this you can create multiple storage accounts. In this way you can end up with more than 500TB of data.
Also keep in mind that this limit, like other Azure limits can change in time (going up all the time), this mean that in the future, you might be able to store more than 500TB of data per account.
Speed
We can reach up to 10 gigabits per second of data for inbound and 20 gigabits per second of data for outbound traffic. On top of this, if you use the premium storage you can reach up to 50 gigabitsof data for inbound and outbound traffic (inbound + outbound = 50Gb).
I specially used gigabits to avoid confusion between GB and Gb.
Redundant
Azure Blob storage give us multiple options when we talk about redundancy. By default we have redundancy at data center level – this mean that all the time there are 3 copies of the same content (and you pay only one). On top of this there are other 3 options of redundancy that I will describe below (plus the one that we already talk about):

LRS (Local Redundant Storage) – Content is replicate 3 times in the same data center (facility unit from region)
ZRS (Zone Redundant Storage) – Content is replicated 3 times in the same regions cross 2 datacenters (facilities) where 2 datacenters are available. Otherwise the content is replicated 3 times into 2 different regions
GRS (Geo Redundant Storage) – Content is replicated 6 times across 2 regions (3 times in the same region and 3 times across different regions)
RA- GRS (Read Access Geo Redundant Storage) – Content is replicated in the same way as for GRS, but you have read only access in the second region. For the GRS even if the data exist in the second region you cannot access it directly.

Pay only what you use
At the end of the month you will pay only the space that you used. Because of this, clients don’t pay in advance the space that will be used or to pay for the space that is not used anymore.
Native Libraries
There is full support for different programing languages and technologies. There are native libraries for .NET, Java, Node.js, C++ and PowerShell
REST API
On top of native libraries, there is full support for REST API. This mean that we can connect and manage storage from any device.
SSD (Premium Storage)
The standard storage that is used for blobs (and Azure Storage in general) is using HDDs and is called Standard Storage. On top of this, we can use the premium storage that is using SSDs. This mean that your I/O performance increase drastically, having a very low latency.
Types of blobs
There are two types of blobs:

Block Blobs – used to manage large files and are formed from multiple blocks are used with success when you need to access binary content as a stream
Page Blobs – used to manage files where random access is required. Base on a cursor you can access a specific location of a page blob

ETag
It is a unique tag used for versioning. Can be used with success to know if the blob was modified from the last time when the content was accessed by an application.
Snapshot support
There is the ability to create snapshots of blobs. Can be used with success to create backups or checkpoints of blobs. A snapshot contains the name of the blob and the date and time when it was taken.
Containers
Containers are used to group ore or more blobs under the same ‘folder’. For each storage account, users can define containers and grouped content under them. Can be a good way to manage the access rights.
Root Container
Needs to be manually created ($root) and plays the role of the default container of the storage. Using this container users can access the content of it without having to refer the container name if the path of the url.
Timeouts
We can configure the timeouts for different operations (access, upload, download). Based on this timeouts the blobs can throw an exception if clients don’t execute a specific action the maximum allowed time.
For example the maximum timeout to write a block list is 1 minute.
Metadata
Metadata are basically HTTP heathers and allow us to add additional information related to content (there is also support for properties). We can add this type of content based on (key, value) pair at each container of blob level.
We can use metadata to store any kind of information related to blobs that we need.
Security
There are 3 different methods to manage the access to your blob content:

Container Permissions – you can manage access at container level. In this way you can allow anonymous users (rest of the internet) to read the content from container and blobs
Shared Access Signature – you have full control to manage read, write, manage access at container and blob level for each item. In this way different users can have different access level
Stored Access Policy – Similar with Shared Access Signature, but allow us better management mechanism. In this way for a specific key (SAS key) we can manage from the backend the access writes without having to revoke it.

Tracing capabilities
Azure Storage has tracing abilities over blobs and containers. Information like access time, client IP and how request ended can automatically be stored and accessed. In this way we can have a full audit over storage content.
Analytics Metrics
Important metrics are automatically stored and can be used to know the load on our storage. In can help us to decide to mitigate different problems and scale if needed.
Below you can find the metrics that are generated:

Transactions
Storage that is used
Number of containers
Number of committed/uncommitted blocks or pages blobs

There are some special tables that are used for this purpose. The name of this table start with “$Metrics”
Import/Export support
There is full support over an API to execute import/export operations over blob storage.

Limitations
Premium Storage Max Size
The maximum capacity of a storage account is limited to 32TB in this moment. Don’t forget that you can have multiple storage accounts.
Block Blob Max Size
The maximum size of a block blob in this moment is 200GB.
Page Blob Max Size
The maximum size of a page blob in this moment is 1TB.
Size of Metadata
For each container of blob the size of metadata cannot exceed 8KB (keys + values).

There are also some limitations at the number of containers, blobs and so on, but the limits are very high and in normal use cases you will not hit them. You only need to take them into account if you know that you will hit pretty hard the storage infrastructure.

Applicable Use Cases
Below you can find 3 use cases where I would use blob storage.
Sharing pictures between different users
Sharing pictures can be a pretty hard thing, when we take into account that we will need to handle a lot of content and also we need to be able to manage the access rights on each picture. We need to be able to let each user to manage the access rights for their pictures. Using blob storage our work can be very simple and easy. We don’t need to think about scaling, redundancy or manage access rights – Simple like that.
Backup on-premises content (or personal content like pictures)
I would use Azure blob storage to backup encrypted content from on-premises servers. In this way we don’t need to buy and manage the backups of our infrastructure that can become an expensive and hard thing to do.
Store applications content
When you design application that needs to handle and store binary content, you use blobs to store the content on them. In this way you will have a binary repository that can store data very cheap and also is available 24 per day.

Code Sample

// Get storage account reference
CloudStorageAccount cloudStorageAccount = 
    CloudStorageAccount.Parse(
        CloudConfigurationManager.GetSetting("FooStorageConnectionString"));
        
// get the reference to the client class that allow us to access blobs and containers
CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();

// Get a reference to a specific containers
CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference("cars");

// Get the content of container as a list
var blobItemList = cloudBlobContainer.ListBlobs(null, false);

// Get reference to a container and create it if don't exist.
CloudBlobContainer phonesContainer = cloudBlobClient.GetContainerReference("phones");
phonesContainer.CreateIfNotExists();

// Get reference to a blob
CloudBlockBlob bmwBlob = cloudBlobContainer.GetBlockBlobReference("bmw.mpg");

// Download blob content to a file
using (var localFileStream = System.IO.File.OpenWrite([myLocalPath]))
{
    bmwBlob.DownloadToStream(localFileStream);
}

Pros and Cons
Pros
(there are so many things, for me the most important one would be)

Redundancy support
Shared Access Signature
Size of the storage
Availability (up to 99.99%)

Cons
Only one, but can be managed from client applications:

A mechanism to notify users when replicated on the second facility was done (for us this is an important things)

Pricing
When you start to calculate the cost of Azure Blob Storage you should take into account the following things:

Capacity (size)
Number of Transactions
Outbound traffic
Traffic between facilities (data centers)

Conclusion
In conclusion I would say that Azure Blob Storage is one of the greatest services offered by Azure. Even if is a base service, has a lot of features that can improve the quality of our products. I would even say that is one of Cloud wonders :-).

Cloud Myths: Migrating to the cloud is quick and easy (Pill 2 of 5 / Cloud Pills)

The idea that migration to the cloud is simple, straightforward and rapid is a wrong assumption. It’s a common misconception of business stakeholders that generates delays, budget overruns and technical dept. A migration requires laborious planning, technical expertise and a rigorous process. Migrations, especially cloud migrations, are not one-size-fits-all journeys. One of the most critical steps is under evaluation, under budget and under consideration. The evaluation phase, where existing infrastructure, applications, database, network and the end-to-end estate are evaluated and mapped to a cloud strategy, is crucial to ensure the success of cloud migration. Additional factors such as security, compliance, and system dependencies increase the complexity of cloud migration. A misconception regarding lift-and-shits is that they are fast and cheap. Moving applications to the cloud without changes does not provide the capability to optimise costs and performance, leading to ...

Cloud as a Story - Vunvulea Radu

Search This Blog

Azure Storage Blobs (Day 18 of 31)

Labels

Comments

Post a Comment

Popular posts from this blog

How to audit an Azure Cosmos DB

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills)

Cloud Myths: Migrating to the cloud is quick and easy (Pill 2 of 5 / Cloud Pills)