Skip to main content

Blueprint of a cloud data store for objects with state updated often (Azure and AWS)

This post will focus on how we shall design a cloud system inside Azure and AWS that needs to handle objects with the state changing very often. 

Proposition 
I want to have a system that can store objects in a storage with a flexible schema. A part of the objects are updated every day and queries are per object. Aggregation reports are done inside a data warehouse solution, not directly inside this storage. 

Requirements  
Let’s imagine a client that needs a cloud solution with the following requirements: 
  • 500M objects stored  
  • 1M new objects added every day 
  • 10M objects state are changed every day 
  • Update operation shall be under 0.3s 
  • Query per objects shall be under 0.2s 
  • 30M queries per day that check the object state 
  • Dynamic schema of the objects 
  • Except for object state, all other attributes are written one time (during insert operation) 
Solution overview 
We need a NoSQL solution to store the objects. Even so, the challenging part is to design a solution that enables us to do fast updates on the object state and keep the cost under control. By using a NoSQL solution the size of the storage is not a problem. Having 500M or 1000M objects is the same thing as long as we are doing the partitioning the right way from the beginning.  
Because most of the updates and queries are on the state attribute of the object we can optimize the storage by adding an index on the state field if necessary. 
Even if we have a NoSQL solution, having a high number of operations would create similar bottlenecks as for a relational database. Besides this, we need to take the cost into account and try to optimize the consumption as much as possible.  
The proposed solution is a hybrid one that combines two different types of NoSQL solutions. Object attributes are stored inside a document DB storage except for the state attribute. The state attribute is stored inside key-value storage, that it is optimized for a high number of write and reads.  
The latency may increase a little because if you need to load an object completely, you need to query two storages, but at the same time because of the key-value storage, you can retrieve easily the object state based on the object ID. 
The cost of storing the data inside a key-value database with items that are very often updated is much lower in comparison with a document DB storage. 
In the next part of the post will take a look on how the solution would look like inside AWS and Azure. 

Azure Approach 
The data layer would use two different types of storages from Azure Cosmos DB. The first type of storage is DocumentDB that would be used to store the objects information inside Azure Cosmos DB. All objects attributes are stored inside it except the object state attribute. 
The object state attribute is stored inside Tables. This key-value stored is optimized for a high number of writes. To reduce the running cost, even more, we would replace the Tables, that are part of Azure Cosmos DB with Azure Tables. Azure Tables are a good option for us as long we limit our queries per objects ID (key) and we don’t try to run complex queries. 
Inside Azure Cosmos DB we have a level of control at the partitioning level, but for Azure Table we might hit some limitations. Because of this, if we go with an approach where we use Azure Tables the Partition Key shall be the hash of the object ID and the key the object itself. Also, if the number of transactions per Azure Table is higher than 20K/second, multiple Storage accounts might be required. If you don’t want to manage these possible issues and reduce risk then you should go with Tables from Azure Cosmos DB. 
Azure Cosmos DB can scale automatically and has a DR strategy that is very powerful and easy to use. It’s one of the best NoSQL solutions that are on the market, and when it is well configured is amazing. Automatic DR and data replication across regions is available, but everything comes with a cost, especially from operation part ($$$) 

AWS Approach 
The approach inside AWS is similar, but it is built on top of AWS DocumentDB to store the object attributes and AWS DynamoDB to store the object state. AWS DynamoDB is one of the best key-value data stores available on the market. When data consistency and DR are not top priorities AWS DynamoDB is your best choice. Besides scaling and speed capabilities, AWS DynamoDB is enabling us to push a stream of data to the AWS Redshift. Any update of the data is automatically pushed to the data warehouse, allowing us to have an out of the box system that sends the updates to the data warehouse.  
AWS DocumentDB fulfils his job very good, being able to store 500M of objects without any issues.  

Final thoughts  
Splitting the data storage into 2 different types of storage can be a right choice when there only a small subset of the fields are updated very often and the rest of them are written only one time – during the insert operation. Combining the power of document DB storage with key-value pair storage enables us to design a system that can manage a high throughput easily 
Both cloud providers offer services that match our needs that are highly scalable and cheap from the operational perspective. Inside Azure, this can be achieved by combining DocumentDB and Tables from Azure Cosmos DB. For AWS ecosystem we would need to use AWS DocumentDB and AWS DynamoDB. 

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provi...