Skip to main content

Deep dive in clouds providers SLAs

In the era of cloud we are bombarded with different cloud services. New features of cloud providers are lunched every day and prices are dropping every month. In this moment the most known cloud providers are Amazon, Google and Microsoft.
Looking over their services we will see SLAs (Service Level Agreement) that reach 99.9% availability, 99.95% availability or even 99.99% availability. This article will jump into cloud providers SLAs, trying to explain why the SLAs are so important, what are the benefits of them and last but not least how much money we could get back if a service goes down.

What does a SLA mean?
“A service level agreement (SLA) is a contract between a service provider (either internal or external) and the end user that defines the level of service expected from the service provider. SLAs are output-based in that their purpose is specifically to define what the customer will receive. SLAs do not define how the service itself is provided or delivered.”
Source: https://www.paloaltonetworks.com


A SLA is a contract between a service provider and the consumer that specify the ‘quality’ of a service that will be provided to the consumer. For example if we think about a service that give you the exact time, in the SLA will define how much time the service will be up and running in a year (99.99%).
Beside that the SLA defines what are the warranties that are provided if the SLA is not reached. For example if the exact time service is down more than 0.01% per month, the service provider will reduce the total cost the bill with 50%.
What areas are covered?
Based on what type of service or business we are talking about, the things that can be covered can different. It is very common a SLA to cover the fallowing attributes of a service:
Volume
Speed
Responsiveness
Efficiency
Quality of work
Looking again over our example with the exact time service we could have an SLA that says: “The exact time service is up 99.99% per year, the response time from the moment when a request hit the service is 0.0001 seconds and the time precision is 0.00000001 second.”

Cloud SLAs
In general if we are talking about cloud providers and services that are offered by them, the areas that are covered by all of them is the uptime. Beside uptime, there are other areas that are covered, but varies based on service type.
Microsoft, Google and Amazon have a clear SLA that specify the uptime for each service that is provided by them.  Even if they are different companies, the SLAs are very similar to each other.
For example if we are looking at cloud storage service that it is not replicated to different datacenters or nodes we will discover that Google is offering a 99.9% uptime SLA, Microsoft is offering a 99.9% uptime SLA, Amazon is offering a 99.95% uptime SLA (with the remark that if we have for use Read Access-Geo Redundant Storage from Microsoft we can reach even 99.99%).
As we could in the above example the SLAs are very similar +/-0.05%.

How the service is measured?
This question is very important, because each cloud provider SLA specify very clear how, who, when, based on what the service SLA is measured.
In all the cases the service uptime is measured internally, by their own system.  This doesn’t mean that the measure is not real. No, it is very real, but if the cause of the downtime is an external factor like network issues on client side, than it is not their problem – and it is normal.
The SLA is not applicable for cases when the service is used outside specific boundaries. For example the SLA is applicable only when the number of requests per second is under 10 million requests or when there are at least two instances of that deployment.

Warranty
Very often people assume that if they have a service in cloud that generate 1000$ per hour, in a case of a downtime, the provider will give them the amount of money that service would generate. Another wrong assumption is that a cloud provider will cover all the lost that are generated by a downtime.
Both assumptions are wrong. In this moment I don’t know a cloud provider or a service provider that would cover the loss of a downtime.
It might sounds strange but it is normal. First off all it is hard to measure and calculate the loss and secondly the SLA is referring to the service that you are using, not the system and services that you are providing over it.
Google, Microsoft and Amazon are offering very similar warranties. Based on the downtime period at the end of the month, a specific amount of service credit is offered to the customer. For example if the uptime of a service was under 99.9%, the customer will receive 25% of that service cost of that specific month as credit. This credit will be used to reduce the bill cost in the next month.
Also the SLAs specify if a specific incident or event cause a downtime too more than one cloud services than the client can submit a claim only for one services that was affected by this event.
For example if a datacenter goes down because of the same software update, where storage, computation and messaging systems are affected, than the customer can claim the credit for only one of the services.

Amazon, Google and Microsoft Warranty
Let’s take a look to warranties that are offered by this cloud providers in a case of a downtime of their storage service.
Amazon
Monthly Uptime Percentage Service Credit Percentage
99% < 99.95%                         10%
<99.0%                                 30%

Google
Monthly Uptime Percentage Service Credit Percentage
99.0% = < 99.9%                 10%
95.0% = < 99.0%                  25%
<95%                                  50%

Microsoft
Monthly Uptime Percentage Service Credit Percentage
99% < 99.99%                         10%
<99%                                  25%

The offer is not 100% the same, but is pretty similar. Even if Google offers 50% storage credit in the case of a downtime, I don’t want to be in the case when the uptime is only 90% for example. The credit that is offered when the uptime is between 99.xx% and 99% is the same. Every ‘9’ that is offered on top of 99% is very expensive and hard to obtain. That nines are representing the real battle and can make a different between a service and a great service.

When and how I receive the credit?
All cloud providers have different mechanism to notify customers when a service is not working as expecting (over a web sites, using an API or via email). In all this cases, even if a service is down more than it is specified in SLA, the customer will not receive the credit that we talked above (by default).
In the moment when customers are affected by an incident they needs to notify the cloud provider and open a ticket at customer support level. They need to specify what service was affected and when.  Based on this information, the cloud provider will check the internal audit system and the level of error rate at that specific time interval.

Trust
This is the key world around cloud providers and their customers. The most important thing is the trust that exist between them. We, the customers, trust our cloud providers that will respect their SLAs. The same thing is happening with any external provider.
In general all the SLAs that are offered by cloud providers are respected. The cases when incidents exists are very rarely and isolated.

Cloud service uptime is not our product uptime
An important thing that we need to take into consideration is that when we are constructing a product over cloud the uptime of our system is not the same with the uptime cloud services.
For example if we have a product that is constructed using 20 services from cloud, than the uptime of our system will need to be calculated taking in considerations the uptime of all cloud services. If each of cloud services has an uptime of 99.9% than the uptime of our system could reach around 98% uptime.

Conclusion
As we seen above, the SLAs that are offered by different cloud providers are pretty similar. The most important thing is to know exactly what does the SLA cover and how to handle downtime periods.    

References
Amazon EC2 SLA: http://aws.amazon.com/ec2/sla/
Google SLA: https://cloud.google.com/storage/sla
Microsoft SLA: http://azure.microsoft.com/en-us/support/legal/sla/

Comments

Popular posts from this blog

How to check in AngularJS if a service was register or not

There are cases when you need to check in a service or a controller was register in AngularJS.
For example a valid use case is when you have the same implementation running on multiple application. In this case, you may want to intercept the HTTP provider and add a custom step there. This step don’t needs to run on all the application, only in the one where the service exist and register.
A solution for this case would be to have a flag in the configuration that specify this. In the core you would have an IF that would check the value of this flag.
Another solution is to check if a specific service was register in AngularJS or not. If the service was register that you would execute your own logic.
To check if a service was register or not in AngularJS container you need to call the ‘has’ method of ‘inhector’. It will return TRUE if the service was register.
if ($injector.has('httpInterceptorService')) { $httpProvider.interceptors.push('httpInterceptorService&#…

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

[Post-Event] Codecamp Conference Cluj-Napoca - Nov 19, 2016

Last day I was invited to another Codecamp Conference, that took place in Cluj-Napoca. Like other Codecamp Conferences, the event was very big, with more than 1.000 participants and 70 sessions. There were 10 tracks in parallel, so it was pretty hard to decide at  what session you want to join.
It was great to join this conference and I hope that you discovered something new during the conference.
At this event I talked about Azure IoT Hub and how we can use it to connect devices from the field. I had a lot of demos using Raspberry PI 3 and Simplelink SensorTag. Most of the samples were written in C++ and Node.JS and people were impressed that even if we are using Microsoft technologies, we are not limited to C# and .NET. World and Microsoft are changing so fast. Just looking and Azure IoT Hub and new features that were launched and I'm pressed (Jobs, Methods, Device Twin).
On backend my demos covered Stream Analytics, Event Hub, Azure Object Storage and DocumentDB.

Title:
What abo…