Skip to main content

Decouple external communication inside AKS (Azure Kubernetes Services)

In a world that it is more and more oriented to microservices, reliable asynchrony communication becomes vital for our system. In this post, we will try to identify different locations where asynchrony communication based on the messaging system could be a good fit.
Let’s imagine that we have a system that is hosted inside a Kubernetes cluster (AKS). The system is a high-availability and high-performance system that shall be protected for any kind of peaks or temporary outages.
Of the most important characteristics of such a system is to able to not affect other systems. This means that it shall be able to accept requests even if the internal load would not allow him to process the additional request. Beside this, internally specific groups of services shall be able to send the request to other services during temporary outages. The services shall be able to recover from this outages with minimum effort by using a simple solution.

The first location where a dedicated messaging system would increase our availability is at the ingest layer between our system and the external world (1). External picks and high-load can be ‘buffered’ in a messaging system that would be able to accept a new request and store them until the system is ready to process them. Such a solution provides metrics that can be used to increase or decrease the size of the AKS cluster, taking into account the no. of messages that are waiting in the queue to be processed.
The second location where such a dedicated messaging system would improve the quality metrics is the communication between the system and any other external system. For example, our ASK cluster can communicate with another AKS cluster that is responsible for another set of functionality. For this, an async communication based on the messaging system would fully decouple all the integration points.
In both scenarios, a messaging system like Azure Services Bus can be used hand in hand with AKS. Being an external service, outside AKS allows us to have an additional redundancy layer in the case of a failing of our microservices hosted inside AKS. With no extra effort, we can spin a new AKS cluster and deploy our system. Messaging backup and redundancy is provided out of the box by Azure Service Bus.

Another location where a messaging system can be used is between the services inside AKS, where we have the option to use direct calls or a messaging system. For internal communication, I don’t recommend to use external messaging systems that are not hosted inside the cluster. External communication would increase the latency to the system, that might affect the end to end process duration. For internal communication a messaging broker like KubeMQ or Redis.

In general, I would try to avoid for internal communication solution outside the AKS cluster.with some isolated exception, where special requirements are defined. You don't want to add the extra latency and complexity by calling external services/systems. 

Once you decide to use Azure Services Bus for the above communication, you should know that you will have out of the box features that will resolve common issues related to failure and redundancy.
On the premium tier, you already have a predictable performance and the ability to scale-up/down the resources available for the messaging system using MU (Messaging Units). Geo-replication and geo-disaster are supported, allowing us to execute a failover or a replication procedure.
Even so, a part of the actions needs to be managed manually from PowerShell, Portal or code.
In the next post, we will approach how you can implement a replication of failover procedure. 

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provided a too