Skip to main content

Decouple external communication inside AKS (Azure Kubernetes Services)

In a world that it is more and more oriented to microservices, reliable asynchrony communication becomes vital for our system. In this post, we will try to identify different locations where asynchrony communication based on the messaging system could be a good fit.
Let’s imagine that we have a system that is hosted inside a Kubernetes cluster (AKS). The system is a high-availability and high-performance system that shall be protected for any kind of peaks or temporary outages.
Of the most important characteristics of such a system is to able to not affect other systems. This means that it shall be able to accept requests even if the internal load would not allow him to process the additional request. Beside this, internally specific groups of services shall be able to send the request to other services during temporary outages. The services shall be able to recover from this outages with minimum effort by using a simple solution.

The first location where a dedicated messaging system would increase our availability is at the ingest layer between our system and the external world (1). External picks and high-load can be ‘buffered’ in a messaging system that would be able to accept a new request and store them until the system is ready to process them. Such a solution provides metrics that can be used to increase or decrease the size of the AKS cluster, taking into account the no. of messages that are waiting in the queue to be processed.
The second location where such a dedicated messaging system would improve the quality metrics is the communication between the system and any other external system. For example, our ASK cluster can communicate with another AKS cluster that is responsible for another set of functionality. For this, an async communication based on the messaging system would fully decouple all the integration points.
In both scenarios, a messaging system like Azure Services Bus can be used hand in hand with AKS. Being an external service, outside AKS allows us to have an additional redundancy layer in the case of a failing of our microservices hosted inside AKS. With no extra effort, we can spin a new AKS cluster and deploy our system. Messaging backup and redundancy is provided out of the box by Azure Service Bus.

Another location where a messaging system can be used is between the services inside AKS, where we have the option to use direct calls or a messaging system. For internal communication, I don’t recommend to use external messaging systems that are not hosted inside the cluster. External communication would increase the latency to the system, that might affect the end to end process duration. For internal communication a messaging broker like KubeMQ or Redis.

In general, I would try to avoid for internal communication solution outside the AKS cluster.with some isolated exception, where special requirements are defined. You don't want to add the extra latency and complexity by calling external services/systems. 

Once you decide to use Azure Services Bus for the above communication, you should know that you will have out of the box features that will resolve common issues related to failure and redundancy.
On the premium tier, you already have a predictable performance and the ability to scale-up/down the resources available for the messaging system using MU (Messaging Units). Geo-replication and geo-disaster are supported, allowing us to execute a failover or a replication procedure.
Even so, a part of the actions needs to be managed manually from PowerShell, Portal or code.
In the next post, we will approach how you can implement a replication of failover procedure. 

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

What to do when you hit the throughput limits of Azure Storage (Blobs)

In this post we will talk about how we can detect when we hit a throughput limit of Azure Storage and what we can do in that moment. Context If we take a look on Scalability Targets of Azure Storage ( https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/ ) we will observe that the limits are prety high. But, based on our business logic we can end up at this limits. If you create a system that is hitted by a high number of device, you can hit easily the total number of requests rate that can be done on a Storage Account. This limits on Azure is 20.000 IOPS (entities or messages per second) where (and this is very important) the size of the request is 1KB. Normally, if you make a load tests where 20.000 clients will hit different blobs storages from the same Azure Storage Account, this limits can be reached. How we can detect this problem? From client, we can detect that this limits was reached based on the HTTP error code that is returned by HTTP