Skip to main content

Azure Tables Storage (Day 19 of 31)

List of all posts from this series: http://vunvulearadu.blogspot.ro/2014/11/azure-blog-post-marathon-is-ready-to.html

Short Description 
Azure Table Services are part of Azure Storage and give us the possibility to store large amount of data in a structured way. It allow us to store collections of entities in so called table. If we look over this service from NoSQL side, we could say that so called table in Azure Table are the collections of NoSQL world.


Main Features 
A part of the features of Azure Table are similar with Azure Blob Storage. This is happening because both services are constructed over the same infrastructure and have a common base.
Different entities schema in the same table
We are allowed to store under the same table entities with different schema. For example we can store entity ‘Car’ and ‘House’ in the same table as long as in the moment when we retrieve the entity we can detect the type of entity.
Size
The size of tables is unlimited. We can store large collection of entities in Azure Table without any kind of problem (the size can be reach more than 1 TB). The limit of the maximum size is the same for Azure Blob Storage – 500TB (for now).
RESTFull and OData
All the access is made over a REST API that support OData standard. This mean that we can query the storage very easily from any kind of device.
Tables Count
We don’t have a real limit of the number of tables that we can create under the same storage. Theoretically, we can create as many tables we want as long we have enough space.
Simple hierarchy
The hierarchy of Azure Table database is very simple. Under Storage Account we have 0..N tables. Each table can contains 0..M entities. Each entity can have from 3..256 properties.
Partition Key
It is used to group entities that are similar from a table. Partition key is used by Azure Table to partition the tables in different nodes when the table size is too big.
Row Key
The unique ID of an entity in the same partition. In a table, we can have multiple Row Key with the same value as long as the partition key for each entity is different.
Entity Key (ID)
The unique ID of each entity is constructed from partition key + row key. This key is unique per table and can be used to retrieve entities.
Timestamp
This is the property key that give us information when was the last time when the entity was changed. Can be used with success to detect if the entity value changes during a specific time period (ETag).
This value cannot be set. It is set automatically by the system. Any kind of changes of this property will be ignored by the system.
Batch Support
We have the ability to execute a batch over Azure Table. You can maximum have 100 actions in a batch. The changes from the batch can affect only on partition from a table.
Property Supported Types
The range of types supported by properties is very large, starting from INT and STRING to bool or array of bytes. Below you can find the list of all types that are supported:

  • byte[]
  • bool
  • DateTime
  • Double
  • Guid
  • String
  • Int32
  • Int64

Security
There are 3 different methods to manage the access to your blob content:

  • Shared Access Signature – you have full control to manage read, write, manage access at container and blob level for each item. In this way different users can have different access level
  • ACL – Similar with Shared Access Signature, but allow us better management mechanism. In this way for a specific key (SAS key) we can manage from the backend the access writes without having to revoke it.

Pagination Support
When we execute queries that retrieves large about of data we can use the continuous token to be able to iterate over all the entities retrieved by query. This token contains 3 parts:

  • NextTableName
  • NextPartitionKey
  • NextRowKey

Query
Azure Tables have query support. The queries support the fallowing components:

  • $filter – used to filter entities based on client rules
  • $top – returns the first N entities 
  • $selected – returns only the property that client request

$filter
Has support for base filter rules like equal, greater, less, not equal, less or equal, grater and equal. On top of this we have support for Boolean operations (and, or, not).
Redundant
Azure Blob storage give us multiple options when we talk about redundancy. By default we have redundancy at data center level – this mean that all the time there are 3 copies of the same content (and you pay only one). On top of this there are other 3 options of redundancy that I will describe below (plus the one that we already talk about):

  • LRS (Local Redundant Storage) – Content is replicate 3 times in the same data center (facility unit from region)
  • ZRS (Zone Redundant Storage) – Content is replicated 3 times in the same regions cross 2 datacenters (facilities) where 2 datacenters are available. Otherwise the content is replicated 3 times into 2 different regions
  • GRS (Geo Redundant Storage) – Content is replicated 6 times across 2 regions (3 times in the same region and 3 times across different regions)
  • RA- GRS (Read Access Geo Redundant Storage) – Content is replicated in the same way as for GRS, but you have read only access in the second region. For the GRS even if the data exist in the second region you cannot access it directly.

Limitations 
  • Like other NoSQL storages, Azure Table cannot manage complex joins, FK or store procedures. In this moment you have support for querying, but you should keep in mind that only on the index properties the query will work (fast). 
  • Each entity can have maximum 256 property. The maxim size of an entity can be 1MB.
  • A batch can contains maximum 100 actions and can be executed only over the same partitions of a table, with the payload maximum size of 4MB.
  • The length of a table name can be between 3 and 64 characters. 
  • Maximum size of Partition Key or Row Key is 1KB (each of them).

Applicable Use Cases
Below you can find 3 uses cases when I would use Azure Tables.
Logs
I would use Azure Table to store large amount of logs, splitter in different tables based on date and source. In this way the traceability step and cleaning steps are pretty simple.
Software updates assignments
If we have a system that needs to store the software updates assignments for each device than we could think to use Azure Tables. We can store in each table the assignments for each system. We can group the assignments of each system based on the applications type using partition key.
Content tags and description
If we are using Azure Blobs for example to store our binary content, than we may need a place where we would like to store the list of tags for each binary content, plus descriptions and other information. For this case we could use Azure Table with success.

Code Sample 


// Retrieve storage account from connection string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
    CloudConfigurationManager.GetSetting("StorageConnectionString"));

// Create the table client
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

//Create the CloudTable that represents the "people" table.
CloudTable table = tableClient.GetTableReference("people");

// Define the query, and only select the Email property
TableQuery<DynamicTableEntity> projectionQuery = new TableQuery<DynamicTableEntity>().Select(new string[] { "Email" });

// Define an entity resolver to work with the entity after retrieval.
EntityResolver<string> resolver = (pk, rk, ts, props, etag) => props.ContainsKey("Email") ? props["Email"].StringValue : null;

foreach (string projectedEmail in table.ExecuteQuery(projectionQuery, resolver, null, null))
{
    Console.WriteLine(projectedEmail);
}

Source: http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/
Pros and Cons 
Pros

  • Unlimited number of tables and entities
  • Maximum size of database very high (same as for Azure Storage)
  • Fast and easy to query
  • Querying over partition key and row key very fast (plus timestamp)
  • Cheap

Cons

  • Not all properties are indexed (only partition key and row key)
  • Limited number of properties per entity
  • Querying on properties that are not indexed required getting the entities from Azure to client machine


Pricing
When you start to calculate the cost of Azure Blob Storage you should take into account the following things:

  • Capacity (size)
  • Number of Transactions
  • Outbound traffic
  • Traffic between facilities (data centers)


Conclusion
Yes, Azure Table are a good option if we need a NoSQL solution to store content in (key,value) format. If you need more than that I recommend looking at DocumentDB. There are use cases when Azure Table is the perfect solution for you.
Personal note: Take into account the number of transactions that you are executing,  because it can affects the costs at the end of the month.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...