Skip to main content

Azure Storage: How to reduce distribution time from 16 days to 8h with only 11 cents

Cloud is sold out like an environment without limits, where you can have as many resource you want. The true is that each service offered by a cloud provider has some limitations – like a normal service. For each service we can mitigate in different way this boundaries.
But what is happening when we reach the limits of a service? For a part of this services, there are best practices related on how we can scale over a service limitation. But for others services we need to put our mind at work and find different ways how you can scale above cloud limits.
In today post we will talk about Azure Storage and what we should do if we want to scale over Azure Storage limits.
Before jumping to the problem, let’s see some limitations of Azure Storage.

Context
The biggest limitation is not the storage capacity that can reach 500 TB of data per Azure account – the biggest problem is the bandwidth. Based on the next two characteristics, the available outbound bandwidth can vary from 10 gigabits to 30 gigabyte per second:
  • Region (location where the account was created)
  • Type of storage redundancy

10 gigabytes of outbound traffic already sounds very good. We should take into account that this value is per Azure Storage Account and not per file. On top of the total bandwidth that can be reached using a storage account there are some limitations at partition level.
Before going deeper on this topic, let’s see what does a partition means in this case. A partition is represented by an object of Azure Storage that affect and determinate how load balancing and traffic manager should work.
For each component of Azure Storage a partition is represented by a different scalability unit:
  1. Azure Blobs - A partition is represented by Container + Blob
  2. Azure Tables – A partition is represented by Table + Partition
  3. Azure Queues – A partition is represented by a Queue

What does this mean? It means that we have a maximum outbound traffic limit that can be reached by a partition and we need to be aware about it.
In this moment, the maximum throughput limits that can be reached on each partition are:
  1. Azure Blobs – 60MB/s or 500 requests/s
  2. Azure Tables – 2.000 entities/s
  3. Azure Queues – 2.000 messages/s

In this moment we already identify some limitations at partition level that can affect our system. Let’s see what we can do to improve it.

The Problem
Imagine a system that store large amounts of data on Azure Storage. This information contains software updates that will reach millions of devices would wide.  Because we need a secure mechanism to access this content (Shared Access Signature – token base access) we will need to store it on blobs.

Azure CDNs are not used because there is no support for Shared Access Signature or other security mechanism. A possible solution would be to encrypt the content and distributed it over Azure CDN network. But the cost of decryption at device level is too high (from CPU perspective).
Because of this in this moment we are limited to 60 MB/s or up to 500 requests/s. If on average, the download bandwidth of each device is 512 KB/s we can have maximum 120 devices that can download the content with the maximum speed. Or we could have 500 devices that download the content at a rate of 122 KB/s.
  • 120 devices at 512 KB/s
  • 500 devices at 122 KB/s

Going further, a package that has 100 MB, will reach maximum 500 devices in 14 minutes. If we put on the table the network connectivity issues that can appear at device level we end up with maximum 300 devices that will download with success that package every 14 minutes.
But in the world of IoT, where we can have 50.000 devices, this would mean that we will be able to distribute this package to all our devices in 39 hours.
Upsss … as a client I expect more than that, I need to be able to push a package to this devices in a shorter period of time.

Before going further let’s make a summary.
The download limit for a blob is 60 MB/s or up to 500 request per second. A package that has 100 MB, will be deliver to 50.000 devices in 39 hours.

Solution
To be able to reduce the package distribution time we need to be clever. Having only one location where the content is store can be simple to maintain, but don’t allow us to scale over the account limitations.
Nothing stop us to copy a package to multiple regions and datacenters. In this way we can have a package distributed to 3 different datacenters using 3 different storage accounts. From performance perspective, this would mean that we can distribute a package to 900 devices in 14 minutes. This mean that we would be able to distribute our package to all 50.000 devices in only 13 hours.
  • 1 replication…300 devices reached every 14 minutes
  • 3 replication…900 devices reached every 14 minutes


But what if we need to distribute the same package to 500.000 devices? We can apply the same technique but more clever. Let’s have the following deployment configuration:

  • Replicate content in 5 datacenters
  • In each datacenter replicate the content on 3 different Azure Storage accounts

Remarks: We could replicate in the in the same Azure Storage account the content in 3 different blobs, but personally I prefer to use different Azure Storage accounts (in this way the risk to reach the storage accounts limits is lower).
Having the content replicated in 5 datacenters on 3 Azure Storage accounts would mean that we have the content replicated in 15 different locations.
The above configuration would allow us to reach:
  • 4.500 devices every 14 minutes    
  • 50.000 devices in 2.5 hours
  • 100.000 devices in 5.2 hours
  • 500.000 devices in 26 hours

In this moment we have 16 different Azure datacenters public available would wide. Imagine making a package deployment in all the datacenters on 3 different accounts per datacenter. This would allow us to reach 14.400 devices every 14 minutes and 500.000 devices in 8.2 hours.
From technical perspective, it is pretty clear that we would add some complexity to our system – no pain no game (smile).
There will be two components that will become more complex:
First one, is the component that manage the package upload. This component needs to be able to replicate the package payload on all the datacenters and Storage Accounts where we want to store the package.
The second components that will become more complex is the component that resolve the blob download URL for each device. This component needs to be able to uniform distribute the load across all available nodes. If you can partition your devices based on region, location or anything else, than you will be able to do this pretty simple.
In the near future I will write some posts that will talk more about the two components.    

Cost
Let’s see how the cost will be affected by a solution that replicate the content in multiple datacenters. Let’s see how much would cost us to download 100 MB of data from Azure Blobs by 500.000 devices.
Remarks: Because the download costs is different based on Zone (datacenter location), we will assume that all datacenters are in the Zone 1, with traffic per month of 60 TB – Outbound traffic cost per GB would be €0.0522. Also we assume the storage account is 30TB with only Local Redundancy Storage – Storage cost per month for 1 GB of data will be €0.022.
Download Cost:
     500.000 devices X 100 MB / 1024 = 48828.125 GB = 47.68 TB
     48829 GB X €0.0522 = €2548.8738

Storage Cost
     For 100 MB:
     With only one copy: 100 MB = 0.098 GB X €0.022 = €0.002156
     With 3 replications: €0.006468
     With 15 replications: €0.03234
     With 48 replications: €0.103488 (16 datacenters X 3 Storage Account per datacenter)

As you can see, the cost of storage it is only a small small small part of the total price. The download cost is the one that dominate the total cost.
If we think that we only add an extra cost of less than €0.11 to reduce the distribution for 500.000 devices from 16 days to 8.2 hours it is pretty amazing what we can do with a clever solution.
We can have the package content replicated in 48 storage accounts only in the moment when we release the package. Once we released it and deployed we can keep the content replicated only in 3 nodes for example.

Conclusion
In conclusion I would like to say that even if Azure Storage accounts have some limitations, because in this moment the number of datacenters would wide reached 16, we can do pretty amazing things if we scale horizontally.

With only 11 cents per month, we can reduce the distribution time of a package from 16 days to 8.2 hours. This is the power of a system that is distributed would wide.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provi...