Skip to main content

Scaling dimensions inside Azure CosmosDB



Azure CosmosDB is the non-relational database available inside Azure, multi-model and global distribution. Documents are stored in collections, that can be queried to fetch content.
In general, people are doing the following comparison between DocumentDB and SQL

  • A Document is similar to a Row
  • A Collection is similar to a Table
  • A Database is similar to a SQL DB
Even so, when you talk about scaling and throughput, things are a little more complicated inside Azure CosmosDB - there are 2 different dimensions that can be used for throughput.

Container Level
The first one is at the container level. What is a container? Well, for DocumentDB it is represented by the collection itself. You have the ability to specify the no. of resources (RU - Requests Units) that you want to reserve for a specific container. 
When you specify the level of throughput you also required to specify a partition that will be used to generate the logical partition. That is generated behind the scene and contain documents with the same partition key. Logical partitions are used to distribute the load across our collection. It is a group of documents with the same partition key.
For one or multiple logical partitions, Azure CosmosDB generates physical partitions mapped to one or more logical partitions.  There is no control on the no. of partitions, they are fully managed by Azure CosmosDB. Each replica will have the same no of physical partitions with the same no. of resources reserved.
When we allocate resources at container level we are reserving resources for a collection. The resources at the collection level are shared between all the physical partitions. Because of this if we have a partition that has a high load, the other partitions will suffer from a lack of resources. We don't have the ability to reserve a resource and blocked it at the partition level.
You should be aware that when you specify a partition key at the collection level for throughput configuration, it will be used by the container to do the data partitioning between the containers, you will not reserve dedicated resources per partition. The resources are per collection. 

Database Level
The second is at the database level. Resources are shared across all the collections under the database. You might have lower costs. but no predictable performance at the collection level. The performance can vary, depending on the load at the database level, being affected by:
  • No. of containers
  • No. of partitions keys per collections
  • Load distribution across logical partitions
Mixing Database and Container level scalability
There is the ability to reserve dedicated resources per-database and container level. Let's assume that you have a database (D1) with 4 collections (C1, C2, C3, C4). We can reserve 10.000RU/s at the database level and 2.000 additional RU/s for C2. 
Doing such provisioning means that:
  • You pay for 12.000 RU/s
  • 10.000 RU/s are shared between C1, C3 and C4 collections
  • 2.000 RU/s are fully dedicated only for C2 with clear SLA and response time per collection
When the load of C2 exceed the 2.000 RU/S reserved, a throughput exception is generated, even if at database level you might have resources available.

Resource allocation
At the initial phases of the project, you can have an approach where you allocate all resource at the database level. This is a big advantage for DEV and TEST env. where you can limit the CosmosDB costs. Once you identify collection where the load is high and the query complexity request higher resources, you can allocate dedicated resources for each collection. 
A common mistake is to start with the resources allocated at the container level. This force you to have high initial costs and no ability to share resources for collections that have a low load. 


Comments

Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

Entity Framework (EF) TransactionScope vs Database.BeginTransaction

In today blog post we will talk a little about a new feature that is available on EF6+ related to Transactions.
Until now, when we had to use transaction we used ‘TransactionScope’. It works great and I would say that is something that is now in our blood.
using (var scope = new TransactionScope(TransactionScopeOption.Required)) { using (SqlConnection conn = new SqlConnection("...")) { conn.Open(); SqlCommand sqlCommand = new SqlCommand(); sqlCommand.Connection = conn; sqlCommand.CommandText = ... sqlCommand.ExecuteNonQuery(); ... } scope.Complete(); } Starting with EF6.0 we have a new way to work with transactions. The new approach is based on Database.BeginTransaction(), Database.Rollback(), Database.Commit(). Yes, no more TransactionScope.
In the followi…

GET call of REST API that contains '/'-slash character in the value of a parameter

Let’s assume that we have the following scenario: I have a public HTTP endpoint and I need to post some content using GET command. One of the parameters contains special characters like “\” and “/”. If the endpoint is an ApiController than you may have problems if you encode the parameter using the http encoder.
using (var httpClient = new HttpClient()) { httpClient.BaseAddress = baseUrl; Task<HttpResponseMessage> response = httpClient.GetAsync(string.Format("api/foo/{0}", "qwert/qwerqwer"))); response.Wait(); response.Result.EnsureSuccessStatusCode(); } One possible solution would be to encode the query parameter using UrlTokenEncode method of HttpServerUtility class and GetBytes method ofUTF8. In this way you would get the array of bytes of the parameter and encode them as a url token.
The following code show to you how you could write the encode and decode methods.
publ…