Skip to main content

Scaling dimensions inside Azure CosmosDB



Azure CosmosDB is the non-relational database available inside Azure, multi-model and global distribution. Documents are stored in collections, that can be queried to fetch content.
In general, people are doing the following comparison between DocumentDB and SQL

  • A Document is similar to a Row
  • A Collection is similar to a Table
  • A Database is similar to a SQL DB
Even so, when you talk about scaling and throughput, things are a little more complicated inside Azure CosmosDB - there are 2 different dimensions that can be used for throughput.

Container Level
The first one is at the container level. What is a container? Well, for DocumentDB it is represented by the collection itself. You have the ability to specify the no. of resources (RU - Requests Units) that you want to reserve for a specific container. 
When you specify the level of throughput you also required to specify a partition that will be used to generate the logical partition. That is generated behind the scene and contain documents with the same partition key. Logical partitions are used to distribute the load across our collection. It is a group of documents with the same partition key.
For one or multiple logical partitions, Azure CosmosDB generates physical partitions mapped to one or more logical partitions.  There is no control on the no. of partitions, they are fully managed by Azure CosmosDB. Each replica will have the same no of physical partitions with the same no. of resources reserved.
When we allocate resources at container level we are reserving resources for a collection. The resources at the collection level are shared between all the physical partitions. Because of this if we have a partition that has a high load, the other partitions will suffer from a lack of resources. We don't have the ability to reserve a resource and blocked it at the partition level.
You should be aware that when you specify a partition key at the collection level for throughput configuration, it will be used by the container to do the data partitioning between the containers, you will not reserve dedicated resources per partition. The resources are per collection. 

Database Level
The second is at the database level. Resources are shared across all the collections under the database. You might have lower costs. but no predictable performance at the collection level. The performance can vary, depending on the load at the database level, being affected by:
  • No. of containers
  • No. of partitions keys per collections
  • Load distribution across logical partitions
Mixing Database and Container level scalability
There is the ability to reserve dedicated resources per-database and container level. Let's assume that you have a database (D1) with 4 collections (C1, C2, C3, C4). We can reserve 10.000RU/s at the database level and 2.000 additional RU/s for C2. 
Doing such provisioning means that:
  • You pay for 12.000 RU/s
  • 10.000 RU/s are shared between C1, C3 and C4 collections
  • 2.000 RU/s are fully dedicated only for C2 with clear SLA and response time per collection
When the load of C2 exceed the 2.000 RU/S reserved, a throughput exception is generated, even if at database level you might have resources available.

Resource allocation
At the initial phases of the project, you can have an approach where you allocate all resource at the database level. This is a big advantage for DEV and TEST env. where you can limit the CosmosDB costs. Once you identify collection where the load is high and the query complexity request higher resources, you can allocate dedicated resources for each collection. 
A common mistake is to start with the resources allocated at the container level. This force you to have high initial costs and no ability to share resources for collections that have a low load. 


Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see