Cloud is sold out like an environment without limits, where
you can have as many resource you want. The true is that each service offered
by a cloud provider has some limitations – like a normal service. For each
service we can mitigate in different way this boundaries.
But what is happening when we reach the limits of a service?
For a part of this services, there are best practices related on how we can
scale over a service limitation. But for others services we need to put our
mind at work and find different ways how you can scale above cloud limits.
In today post we will talk about Azure Storage and what we
should do if we want to scale over Azure Storage limits.
Before jumping to the problem, let’s see some limitations of
Azure Storage.
Context
The biggest limitation is not the storage capacity that can
reach 500 TB of data per Azure account – the biggest problem is the bandwidth. Based
on the next two characteristics, the available outbound bandwidth can vary from
10 gigabits to 30 gigabyte per second:
- Region (location where the account was created)
- Type of storage redundancy
10 gigabytes of outbound traffic already sounds very good.
We should take into account that this value is per Azure Storage Account and
not per file. On top of the total bandwidth that can be reached using a storage
account there are some limitations at partition level.
Before going deeper on this topic, let’s see what does a partition
means in this case. A partition is represented by an object of Azure Storage
that affect and determinate how load balancing and traffic manager should work.
For each component of Azure Storage a partition is represented
by a different scalability unit:
- Azure Blobs - A partition is represented by Container + Blob
- Azure Tables – A partition is represented by Table + Partition
- Azure Queues – A partition is represented by a Queue
What does this mean? It means that we have a maximum
outbound traffic limit that can be reached by a partition and we need to be
aware about it.
In this moment, the maximum throughput limits that can be
reached on each partition are:
- Azure Blobs – 60MB/s or 500 requests/s
- Azure Tables – 2.000 entities/s
- Azure Queues – 2.000 messages/s
In this moment we already identify some limitations at
partition level that can affect our system. Let’s see what we can do to improve
it.
The Problem
Imagine a system that store large amounts of data on Azure
Storage. This information contains software updates that will reach millions of
devices would wide. Because we need a
secure mechanism to access this content (Shared Access Signature – token base
access) we will need to store it on blobs.
Azure CDNs are not used because there is no support for
Shared Access Signature or other security mechanism. A possible solution would
be to encrypt the content and distributed it over Azure CDN network. But the
cost of decryption at device level is too high (from CPU perspective).
Because of this in this moment we are limited to 60 MB/s or
up to 500 requests/s. If on average, the download bandwidth of each device is
512 KB/s we can have maximum 120 devices that can download the content with the
maximum speed. Or we could have 500 devices that download the content at a rate
of 122 KB/s.
- 120 devices at 512 KB/s
- 500 devices at 122 KB/s
Going further, a package that has 100 MB, will reach maximum
500 devices in 14 minutes. If we put on the table the network connectivity issues
that can appear at device level we end up with maximum 300 devices that will
download with success that package every 14 minutes.
But in the world of IoT, where we can have 50.000 devices,
this would mean that we will be able to distribute this package to all our
devices in 39 hours.
Upsss … as a client I expect more than that, I need to be
able to push a package to this devices in a shorter period of time.
Before going further let’s make a summary.
The download limit for a blob is 60 MB/s or up to 500
request per second. A package that has 100 MB, will be deliver to 50.000
devices in 39 hours.
Solution
To be able to reduce the package distribution time we need
to be clever. Having only one location where the content is store can be simple
to maintain, but don’t allow us to scale over the account limitations.
Nothing stop us to copy a package to multiple regions and datacenters.
In this way we can have a package distributed to 3 different datacenters using
3 different storage accounts. From performance perspective, this would mean
that we can distribute a package to 900 devices in 14 minutes. This mean that
we would be able to distribute our package to all 50.000 devices in only 13
hours.
- 1 replication…300 devices reached every 14 minutes
- 3 replication…900 devices reached every 14 minutes
But what if we need to distribute the same package to
500.000 devices? We can apply the same technique but more clever. Let’s have
the following deployment configuration:
- Replicate content in 5 datacenters
- In each datacenter replicate the content on 3 different Azure Storage accounts
Remarks: We could replicate in the in the same Azure Storage
account the content in 3 different blobs, but personally I prefer to use
different Azure Storage accounts (in this way the risk to reach the storage
accounts limits is lower).
Having the content replicated in 5 datacenters on 3 Azure
Storage accounts would mean that we have the content replicated in 15 different
locations.
The above configuration would allow us to reach:
- 4.500 devices every 14 minutes
- 50.000 devices in 2.5 hours
- 100.000 devices in 5.2 hours
- 500.000 devices in 26 hours
In this moment we have 16 different Azure datacenters public
available would wide. Imagine making a package deployment in all the datacenters
on 3 different accounts per datacenter. This would allow us to reach 14.400
devices every 14 minutes and 500.000 devices in 8.2 hours.
From technical perspective, it is pretty clear that we would
add some complexity to our system – no pain no game (smile).
There will be two
components that will become more complex:
First one, is the component that manage the package upload.
This component needs to be able to replicate the package payload on all the
datacenters and Storage Accounts where we want to store the package.
The second components that will become more complex is the
component that resolve the blob download URL for each device. This component
needs to be able to uniform distribute the load across all available nodes. If
you can partition your devices based on region, location or anything else, than
you will be able to do this pretty simple.
In the near future I will write some posts that will talk more
about the two components.
Cost
Let’s see how the cost will be affected by a solution that
replicate the content in multiple datacenters. Let’s see how much would cost us
to download 100 MB of data from Azure Blobs by 500.000 devices.
Remarks: Because the download costs is different based on
Zone (datacenter location), we will assume that all datacenters are in the Zone
1, with traffic per month of 60 TB – Outbound traffic cost per GB would be €0.0522.
Also we assume the storage account is 30TB with only Local Redundancy Storage –
Storage cost per month for 1 GB of data will be €0.022.
Download Cost:
500.000 devices X 100 MB / 1024 = 48828.125 GB = 47.68 TB
48829 GB X €0.0522 = €2548.8738
Storage Cost
For 100 MB:
With only one copy: 100 MB = 0.098 GB X €0.022 = €0.002156
With 3 replications: €0.006468
With 15 replications: €0.03234
With 48 replications:
€0.103488 (16 datacenters X 3 Storage Account per datacenter)
As you can see, the cost of storage it is only a small small
small part of the total price. The download cost is the one that dominate the
total cost.
If we think that we only add an extra cost of less than €0.11
to reduce the distribution for 500.000 devices from 16 days to 8.2 hours it is
pretty amazing what we can do with a clever solution.
We can have the package content replicated in 48 storage
accounts only in the moment when we release the package. Once we released it
and deployed we can keep the content replicated only in 3 nodes for example.
Conclusion
In conclusion I would like to say that even if Azure Storage
accounts have some limitations, because in this moment the number of
datacenters would wide reached 16, we can do pretty amazing things if we scale horizontally.
With only 11 cents per month, we can reduce the distribution
time of a package from 16 days to 8.2 hours. This is the power of a system that
is distributed would wide.
Comments
Post a Comment