Skip to main content

How to migrate an Azure Cosmos DB instance


In this post, we talk about how we can migrate an Azure Cosmos DB database from one tenant to another.

Why?
There are cases when you need to migrate your database, especially when your systems evolve. In this situation moving the database from one Azure Subscription to another one, it is not enough.

Context
The migration was required because of the business context and the way how the product evolved. When you need to do such a migration, there a few questions that you need to ask yourself.
1.       What kind of Azure Cosmos DB API are you using? We used SQL API
2.       What is the maintenance window? We had 24 hours for the entire system, with a 3-hour window for Azure Cosmos DB.
3.       What is the database size? 20GB
4.       What is the index size? 2.2GB
5.       What are the average request and average throughput per second? The throughput is around 200 RU/s with around 25 requests/s

Current options
At this moment in time, Azure Cosmos DB does not have a self-service feature that allows us to create and restore backups of our databases. There is a feature request on Azure Feedback for it, but we still need to wait for it.

[1] Azure Ticket – Backup & Restore
This involves opening a ticket to Azure Support Team to create a backup. Once the backup was created, another ticket should be open to restoring the backup of the database. This approach should work without any incidents as long as you don't have a fixed time window. There are no SLA related to how fast the backup and restoration is executed by the support team - we don’t have the insurance that the two actions are done in 3 hours time window.
You need to keep in mind that backups are done automatically every 4 hours, so in theory, you should create the restore ticket only – as long as you can keep the database out of changes for 4 hours before opening the ticket.
If you want to find more about it you can check the following two resources:

[2] Data migration tool
The second option that we have available is based on the Data Migration tool that is available from the Azure team. It enables us to migrate content from one Azure Cosmos DB container to another one or from other resources like JSON, Hbase, AWS DynamoDB and so on.
There are two different interfaces that can be used. One is a CLI interface, great when we want to do automation and the other one has a friendly UI interface, similar to the one from SQL Import/Export. The tool is well documented, but there were questions marks related to how fast the import can be done and what happens in the case an error occurs. 
The tool has the ability to specify the no. of retries in case of a failure. This combine with the option to update (or not) existing documents, would enable to restore an import job. When you use this tool, it is necessary to overprovision the Azure Cosmos DB to ensure that there are enough resources that can manage the import.
During an import test, the full migration took around 2.5 hours. Even so, there were some concerns from the team that wanted to see what other options do they have.

[3] Restore + Change Feed
This hybrid approach between using a backup of Azure Cosmos DB by open a ticket to the support team combined with change feed works well for cases when the update window is small and the number of changes that you have on the database is not very high (under 20% of the documents).
The steps that you should do are the following:

(1)    Enable a Change Feed to your Azure Cosmos DB database, that listen to all changes that are happening to your documents and stream it to Event Hub or Blob Storage. For these situations, Azure Event Hub would work much better.
(2)    The first step that needs to be done 3 or 4 days before the migration is to create a ticket to and request a backup of Azure Cosmos DB to be restored on a specific tenant.
(3)    Day 0  - Migration: Current Azure Cosmos DB database goes in read-only mode.
(4)    Define and run an Azure Function that consumes the change feed stored inside Event Hub and push the changes to the new the instance of Azure Cosmos DB (be careful with parallel processing of events and data consistency - you need to take into account that you can run in parallel the functions for multiple partitions on a container). You can consume the feed starting from the time when the backup was done.
(5)    Once the Azure Function finishes the consumption of all the events, you have the database fully restored.

Conclusion
Taking into account the context (20GB or data and 3 hours time windows), the best options are no. 2 and 3 - Data migration tool OR Restore + Change Feed. From the cost point of view, I would recommend going with option no. 2 because it is the safest one. But I would run the migration tool from an Azure VM that it is in the same Azure Regions as the destination or source database instance.
Option no. 3 can introduce bugs inside the system because the function needs to be implemented. At the same time for bigger databases, option no. 3 is the best one, as long as we have a database where changes don’t occur to often.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills)

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills) The idea that moving to the cloud reduces the costs is a common misconception. The cloud infrastructure provides flexibility, scalability, and better CAPEX, but it does not guarantee lower costs without proper optimisation and management of the cloud services and infrastructure. Idle and unused resources, overprovisioning, oversize databases, and unnecessary data transfer can increase running costs. The regional pricing mode, multi-cloud complexity, and cost variety add extra complexity to the cost function. Cloud adoption without a cost governance strategy can result in unexpected expenses. Improper usage, combined with a pay-as-you-go model, can result in a nightmare for business stakeholders who cannot track and manage the monthly costs. Cloud-native services such as AI services, managed databases, and analytics platforms are powerful, provide out-of-the-shelve capabilities, and increase business agility and innovation. H...