Skip to main content

How to migrate an Azure Cosmos DB instance


In this post, we talk about how we can migrate an Azure Cosmos DB database from one tenant to another.

Why?
There are cases when you need to migrate your database, especially when your systems evolve. In this situation moving the database from one Azure Subscription to another one, it is not enough.

Context
The migration was required because of the business context and the way how the product evolved. When you need to do such a migration, there a few questions that you need to ask yourself.
1.       What kind of Azure Cosmos DB API are you using? We used SQL API
2.       What is the maintenance window? We had 24 hours for the entire system, with a 3-hour window for Azure Cosmos DB.
3.       What is the database size? 20GB
4.       What is the index size? 2.2GB
5.       What are the average request and average throughput per second? The throughput is around 200 RU/s with around 25 requests/s

Current options
At this moment in time, Azure Cosmos DB does not have a self-service feature that allows us to create and restore backups of our databases. There is a feature request on Azure Feedback for it, but we still need to wait for it.

[1] Azure Ticket – Backup & Restore
This involves opening a ticket to Azure Support Team to create a backup. Once the backup was created, another ticket should be open to restoring the backup of the database. This approach should work without any incidents as long as you don't have a fixed time window. There are no SLA related to how fast the backup and restoration is executed by the support team - we don’t have the insurance that the two actions are done in 3 hours time window.
You need to keep in mind that backups are done automatically every 4 hours, so in theory, you should create the restore ticket only – as long as you can keep the database out of changes for 4 hours before opening the ticket.
If you want to find more about it you can check the following two resources:

[2] Data migration tool
The second option that we have available is based on the Data Migration tool that is available from the Azure team. It enables us to migrate content from one Azure Cosmos DB container to another one or from other resources like JSON, Hbase, AWS DynamoDB and so on.
There are two different interfaces that can be used. One is a CLI interface, great when we want to do automation and the other one has a friendly UI interface, similar to the one from SQL Import/Export. The tool is well documented, but there were questions marks related to how fast the import can be done and what happens in the case an error occurs. 
The tool has the ability to specify the no. of retries in case of a failure. This combine with the option to update (or not) existing documents, would enable to restore an import job. When you use this tool, it is necessary to overprovision the Azure Cosmos DB to ensure that there are enough resources that can manage the import.
During an import test, the full migration took around 2.5 hours. Even so, there were some concerns from the team that wanted to see what other options do they have.

[3] Restore + Change Feed
This hybrid approach between using a backup of Azure Cosmos DB by open a ticket to the support team combined with change feed works well for cases when the update window is small and the number of changes that you have on the database is not very high (under 20% of the documents).
The steps that you should do are the following:

(1)    Enable a Change Feed to your Azure Cosmos DB database, that listen to all changes that are happening to your documents and stream it to Event Hub or Blob Storage. For these situations, Azure Event Hub would work much better.
(2)    The first step that needs to be done 3 or 4 days before the migration is to create a ticket to and request a backup of Azure Cosmos DB to be restored on a specific tenant.
(3)    Day 0  - Migration: Current Azure Cosmos DB database goes in read-only mode.
(4)    Define and run an Azure Function that consumes the change feed stored inside Event Hub and push the changes to the new the instance of Azure Cosmos DB (be careful with parallel processing of events and data consistency - you need to take into account that you can run in parallel the functions for multiple partitions on a container). You can consume the feed starting from the time when the backup was done.
(5)    Once the Azure Function finishes the consumption of all the events, you have the database fully restored.

Conclusion
Taking into account the context (20GB or data and 3 hours time windows), the best options are no. 2 and 3 - Data migration tool OR Restore + Change Feed. From the cost point of view, I would recommend going with option no. 2 because it is the safest one. But I would run the migration tool from an Azure VM that it is in the same Azure Regions as the destination or source database instance.
Option no. 3 can introduce bugs inside the system because the function needs to be implemented. At the same time for bigger databases, option no. 3 is the best one, as long as we have a database where changes don’t occur to often.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provi...