Skip to main content

How to migrate an Azure Cosmos DB instance

In this post, we talk about how we can migrate an Azure Cosmos DB database from one tenant to another.

There are cases when you need to migrate your database, especially when your systems evolve. In this situation moving the database from one Azure Subscription to another one, it is not enough.

The migration was required because of the business context and the way how the product evolved. When you need to do such a migration, there a few questions that you need to ask yourself.
1.       What kind of Azure Cosmos DB API are you using? We used SQL API
2.       What is the maintenance window? We had 24 hours for the entire system, with a 3-hour window for Azure Cosmos DB.
3.       What is the database size? 20GB
4.       What is the index size? 2.2GB
5.       What are the average request and average throughput per second? The throughput is around 200 RU/s with around 25 requests/s

Current options
At this moment in time, Azure Cosmos DB does not have a self-service feature that allows us to create and restore backups of our databases. There is a feature request on Azure Feedback for it, but we still need to wait for it.

[1] Azure Ticket – Backup & Restore
This involves opening a ticket to Azure Support Team to create a backup. Once the backup was created, another ticket should be open to restoring the backup of the database. This approach should work without any incidents as long as you don't have a fixed time window. There are no SLA related to how fast the backup and restoration is executed by the support team - we don’t have the insurance that the two actions are done in 3 hours time window.
You need to keep in mind that backups are done automatically every 4 hours, so in theory, you should create the restore ticket only – as long as you can keep the database out of changes for 4 hours before opening the ticket.
If you want to find more about it you can check the following two resources:

[2] Data migration tool
The second option that we have available is based on the Data Migration tool that is available from the Azure team. It enables us to migrate content from one Azure Cosmos DB container to another one or from other resources like JSON, Hbase, AWS DynamoDB and so on.
There are two different interfaces that can be used. One is a CLI interface, great when we want to do automation and the other one has a friendly UI interface, similar to the one from SQL Import/Export. The tool is well documented, but there were questions marks related to how fast the import can be done and what happens in the case an error occurs. 
The tool has the ability to specify the no. of retries in case of a failure. This combine with the option to update (or not) existing documents, would enable to restore an import job. When you use this tool, it is necessary to overprovision the Azure Cosmos DB to ensure that there are enough resources that can manage the import.
During an import test, the full migration took around 2.5 hours. Even so, there were some concerns from the team that wanted to see what other options do they have.

[3] Restore + Change Feed
This hybrid approach between using a backup of Azure Cosmos DB by open a ticket to the support team combined with change feed works well for cases when the update window is small and the number of changes that you have on the database is not very high (under 20% of the documents).
The steps that you should do are the following:

(1)    Enable a Change Feed to your Azure Cosmos DB database, that listen to all changes that are happening to your documents and stream it to Event Hub or Blob Storage. For these situations, Azure Event Hub would work much better.
(2)    The first step that needs to be done 3 or 4 days before the migration is to create a ticket to and request a backup of Azure Cosmos DB to be restored on a specific tenant.
(3)    Day 0  - Migration: Current Azure Cosmos DB database goes in read-only mode.
(4)    Define and run an Azure Function that consumes the change feed stored inside Event Hub and push the changes to the new the instance of Azure Cosmos DB (be careful with parallel processing of events and data consistency - you need to take into account that you can run in parallel the functions for multiple partitions on a container). You can consume the feed starting from the time when the backup was done.
(5)    Once the Azure Function finishes the consumption of all the events, you have the database fully restored.

Taking into account the context (20GB or data and 3 hours time windows), the best options are no. 2 and 3 - Data migration tool OR Restore + Change Feed. From the cost point of view, I would recommend going with option no. 2 because it is the safest one. But I would run the migration tool from an Azure VM that it is in the same Azure Regions as the destination or source database instance.
Option no. 3 can introduce bugs inside the system because the function needs to be implemented. At the same time for bigger databases, option no. 3 is the best one, as long as we have a database where changes don’t occur to often.


Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

Entity Framework (EF) TransactionScope vs Database.BeginTransaction

In today blog post we will talk a little about a new feature that is available on EF6+ related to Transactions.
Until now, when we had to use transaction we used ‘TransactionScope’. It works great and I would say that is something that is now in our blood.
using (var scope = new TransactionScope(TransactionScopeOption.Required)) { using (SqlConnection conn = new SqlConnection("...")) { conn.Open(); SqlCommand sqlCommand = new SqlCommand(); sqlCommand.Connection = conn; sqlCommand.CommandText = ... sqlCommand.ExecuteNonQuery(); ... } scope.Complete(); } Starting with EF6.0 we have a new way to work with transactions. The new approach is based on Database.BeginTransaction(), Database.Rollback(), Database.Commit(). Yes, no more TransactionScope.
In the followi…

GET call of REST API that contains '/'-slash character in the value of a parameter

Let’s assume that we have the following scenario: I have a public HTTP endpoint and I need to post some content using GET command. One of the parameters contains special characters like “\” and “/”. If the endpoint is an ApiController than you may have problems if you encode the parameter using the http encoder.
using (var httpClient = new HttpClient()) { httpClient.BaseAddress = baseUrl; Task<HttpResponseMessage> response = httpClient.GetAsync(string.Format("api/foo/{0}", "qwert/qwerqwer"))); response.Wait(); response.Result.EnsureSuccessStatusCode(); } One possible solution would be to encode the query parameter using UrlTokenEncode method of HttpServerUtility class and GetBytes method ofUTF8. In this way you would get the array of bytes of the parameter and encode them as a url token.
The following code show to you how you could write the encode and decode methods.