Skip to main content

My thoughts about Azure Cache for Redis (Enterprise)

In today's post, we talk about the flavours of Redis Cache for Microsoft Azure and how to decrypt undocumented errors that we can receive from Redis during the provisioning phase.

When using Microsoft Azure, we have two main options for using Redis Cache:

- Azure Cache for Redis: a SaaS service provided by Microsoft that uses OSS Redis (Redis open-source)

- Azure Cache for Redis Enterprise: fully managed by Microsoft that uses Redis Enterprise

In 90% of the cases, Azure Cache for Redis it's the best-managed cache solution available in Microsoft Azure, offering a 99.9% availability SLA, supporting 120 GB of memory and 40k client connection. I had a great experience with it, as long as you understand the connection concept of Redis.

Azure Cache for Redis Enterprise provides more power, up to 13TB of memory, 2M client connection, a 99.999 availability SLA, 1M operations per second and all the features of Redis Enterprise like active-geo, modules, time series support and Redis on Flash. 

Going with the Redis Enterprise tiers comes with a price, but it is a good offer, especially when you need active-geo replication. You need to consider that active-geo replication requires 2 instances of Redis Enterprise. The pricing model includes both replicas, because active-geo is the most common motivation to go with the Enterprise tier.

From the performance point of view, you should expect up to 70-75% more operations per second and 40% better latency when you compare the Premium tier of Azure Cache for Redis with Azure Cache for Redis Enterprise. 

From the cost point of view, it is hard to compare, but if we compare the P5 tier of Premium offer of Azure Cache for Redis with E100 that is similar from the cache size point of view, your running cost is almost double, BUT you get two data nodes. The real cost hit is when you use the C5 or C5 tier standard tier, and you need to go with the enterprise one for active-geo, for example, when the running cost is 7-10 times more.

An important difference between the two services is who provides support for it. Azure Cache for Redis is fully managed by Microsoft and well documented. Azure Cache for Redis Enterprise is managed by Microsoft, and you get good support from the Redis team, but you need to consider that it is not directly from Microsoft.


When should I use the Enterprise tier?

The no. 1 feature of the Enterprise tier is the active-geo (active geo-replication) that makes the customer move to this tier, together with JSON and time series features.  The performance provided by Azure Cache for the Redis Premium tier is very good. Until now, I was not involved in a project where migration to the Enterprise tier was caused by performance. Yes, we were using 2-4-6 instances of Premium tier deployed across regions without issues. But when geo-replication was required, the Enterprise tier was the best option, even in comparison with other solutions provided by the market. 

When you consider active geo-replication of Redis, there are 3 main cases when you can use it:

(1) Geo-distributed applications: where you want to replicate content across multiple locations in near-real time

(2) Handle region failures: where you ensure a failover node, that is fully replicated

(3) Roaming user sessions: across 2 different locations, having the ability to serve the user from two different locations

Issues with Enterprise tier

A few weeks ago, we had an interesting experience with Redis Enterprise. For a few days, the team received the below error when they were trying to spin up an instance of Redis Enterprise.

    "status": "Failed",

    "error": {

        "code": "ResourceDeploymentFailure",

        "message": "The resource operation completed with terminal provisioning state 'Failed'."

    }

}

The error message is encrypted, and it is not very clear. You can make a lot of assumptions and you don't know if the problem is from your side or from Redis Enterprise.

No additional information was provided, and the ARM scripts were correct. The same error message was provided when a new Redis Enterprise instance was created from the Azure Portal. We were trying to do a PoC, targeting the active-geo feature, and it was not the best experience for the technical team that was stuck. We opened an incident ticket to Redis related to it.

The cause of the incident was a lack of resources available to Redis in the given region. After a few days, we were able to create a new instance without a problem, but I still have a concern related to - What if this would happen in the production environment during an incident? Would be the customer solution down for a few days?

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

What to do when you hit the throughput limits of Azure Storage (Blobs)

In this post we will talk about how we can detect when we hit a throughput limit of Azure Storage and what we can do in that moment. Context If we take a look on Scalability Targets of Azure Storage ( https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/ ) we will observe that the limits are prety high. But, based on our business logic we can end up at this limits. If you create a system that is hitted by a high number of device, you can hit easily the total number of requests rate that can be done on a Storage Account. This limits on Azure is 20.000 IOPS (entities or messages per second) where (and this is very important) the size of the request is 1KB. Normally, if you make a load tests where 20.000 clients will hit different blobs storages from the same Azure Storage Account, this limits can be reached. How we can detect this problem? From client, we can detect that this limits was reached based on the HTTP error code that is returned by HTTP