Skip to main content

(Auto) Scaling dimensions inside Azure Kubernetes Service (AKS)

Azure Kubernetes Services (AKS) it's the perfect location where you can deploy and run your Kubernetes system. You just deploy your application, focusing on business needs and letting Microsoft manage your infrastructure.
Once your application is a big success and you have a lot of visitors or clients you will need to scale-up or down based on different factors like no of users, no of views or many other specific counters. Let's see how you can scale your Kubernetes solution inside AKS and what Microsoft prepared to us to support our dynamic scaling.

Of the most appealing feature of a microservice architecture is scaling, that it is more granular than on a classical deployment, providing us the flexibility to scale only the components that are on stress. In this way, we can optimize how we consume our computation resources. 

Scaling dimensions
Inside Kubernetes, there are different directions on how we can scale our applications. Two have them are more on the software side and can be done automatically. The 3rd one it is more related to hardware and computation resources and even in a cloud provider like Azure or AWS would take a few minutes.

The first dimension is the no. of instances of a specific service or pods (a pod can include one or multiple types of services). This involves increasing the no. of instances of a service that is under pressure. By having a Kubernetes cluster that has multiple computation units (VM nodes), we can balance the no. of services based on our need without having to spin-up new nodes.

The second dimension is the cluster size. Each Kubernetes cluster contains multiple VM nodes. When the load on the physical or virtual nodes is high, we can increase the cluster size by adding additional nodes. When scaling in this way, you need to have the computation resources available. AKS and Kubernetes can automatically do all the configuration of the new nodes and move the service instances on them. 
In comparison with the first dimension, the scaling activity can take from a few minutes to even half an hour or more. Why? The infrastructure provider needs to configure and VM with your specific OS configuration. This activity is similar to creating and setting a VM from scratch. To decrease the time, Microsoft and other providers are keeping in the queue some machines already configured with different configurations. This is why you can spin-up an AKS cluster in just 2-3 minutes. 

The 3rd dimension is the VM size (tier). Scaling in this way it is not common because it involves replacing some nodes from the cluster with different VMs. When you scale using this approach, you add more powerful nodes to the cluster, migrate service instances to them and retire a part of the existing nodes. 


Manual scaling
I'm pretty sure that you already know about manual scaling and you are not interested too much about it. Manual scaling can happen at pods level. Using the shell, we can specify the no. of replicas that we want to have. In the below example we set the no. of replicas (instances) of  Card pod from 5 to 9
kubectl scale --replicas=9 deployment/rv_epi_card

Using 'get pods' command we can see the list of pods that we have  
$ kubectl get pods

                                    READY     STATUS    RESTARTS   AGE
rv-epi-card-3545353453-2hfh0   1/1       Running   0          3m
rv-epi-card-3545353453-bzt05   1/1       Running   0          3m
rv-epi-card-3545353453-fvcvm   1/1       Running   0          3m
rv-epi-card-3545353453-hrbf2   1/1       Running   0          15m
rv-epi-card-3545353453-qphz8   1/1       Running   0          3m
rv-epi-card-3545353453-th3h1   1/1       Running   0          3m
rv-epi-card-3545353453-5z504   1/1       Running   0          20m
rv-epi-card-3545353453-rvfvm   1/1       Running   0          3m
rv-epi-card-3545353453-1rbgf   1/1       Running   0          15m

We can scale the size of the cluster directly from Azure Portal or from the shell by changing the no. of nodes inside the cluster.

Auto Scaling
When we talk about auto scaling inside AKS, we need to keep in mind that there are two dimensions of scaling that we can automatize with out of the box features. One is controlled by AKS and Kubernetes and related to the of instances of the same type (services of pods) - so-called replicas inside Kubernetes. The second one is the cluster size, where we can add or remove nodes dynamically based on different counters and formulas that we define and control.
Horizontal Pod Autoscaler (HPA)
Horizontal Pod Autoscaler (HPA) can monitor the load of pods and resources and can decide to increase or decrease the number of replicas for each pod. The HPA is the same version that we have any Kubernetes cluster with version 1.8 or higher. It is checking the load on pods and replicas every 30 seconds, and it can decide to decrease or increase the number of replicas. The Metric Server collects the counter information from workers and can provide input for the HPA (e.g., CPU, memory, network).
There are moments when this is not enough, and you need to collect or use custom metrics. For this situations, we can install and configure other monitoring systems like Prometheus. It is widely used mainly when we have custom metrics that we want to use to archive auto-scaling at pod level. Metrics from Prometheus are exposed in the same format as Metric Server and can be consumed by the HPA over an adapter (Prometheus Adapter) that is able to push metrics to HPA.
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2alpha1
metadata:
name: rv-epi-card-hpa
spec:
    scaleTargetRef:
        kind: Deployment
        name: rv-epi-card
    minReplicas: 2
    maxReplicas: 8
    metrics:
    - type: Object
    object:
        target:
        kind: Service
        name: sample-metrics-app
        metricName: http_request
        targetValue: 256
The above configuration specified in the *.yaml file configures the HPA to have as target 265 HTTP requests per second. To achieve this, it will increase and decrease the no. of replicas of our pod between 2 and 8 replicas to achieve the 265 HTTP requests per second. 
Cluster Autoscaler (CA)
In comparison with HPA, the cluster autoscaling it is a more Azure specific functionality. Every 10 seconds is checking the load of the cluster and if the number of nodes of the cluster needs to be increased or decreased.
The current integration with HPA enables CA to remove unused nodes if no pods are running on nodes for more than 10 minutes. CA checks HPA if there are enough nodes for pods (every 10 seconds) and increase the number of nodes if there are not enough nodes for pods. There there are not enough resources to increase the number of pods, the HPA has specific metrics/flag that can read by CA.  
A difference between CA and HPA is on how they are looking at resources consumed. HPA is looking at the actual resources consumed by a pod replica. In contrast, the default behavior of CA is to look at the resources required in the*.yaml file for a specific service.
resources:
  requests:
    cpu: 250m
    memory: 96Mb
In the above example, we specified that the service instance requires a quarter of vCPU and 96Mb of memory.

On-demand fast scaling
CA and HPA work great, but it is impossible to add a new node to the cluster in just a few seconds. There is no SLA related to how long it takes to AKS to increase the cluster size. For this kind of situations what we can do to be able to handle the load peek on our application.
Imagine that you have a website that sales food and Friday afternoon you have a peek on the number of clients that want to order. If your application is slow for a few minutes, the clients will use another system to order the dinner. 
In a standard Kubernetes deployment you would not be able to do too much if nodes are not physically available. Inside AWS or Azure, systems like CA can increase the number of nodes automatically, but still, there are a few minutes latency when you can lose business.
For this kind of situations, Microsoft gives us the ability to extend our cluster inside Azure Container Instances (ACI). ACI is a SaaS solution inside Azure to host and run our micro-services. By integrating ACI with AKS we can scale out our cluster in just a few seconds inside ACI. 
This is possible by the way how ACI was integrated inside AKS. ACI it is seen as multiple notes of AKS cluster. From Kubernetes perspective is just a pool of nodes like any other nodes of the cluster. 
Behind the scene, ACI is mapped as nodes that can run replicas of the poods. The virtual nodes are added inside a subnet that is secured connected to AKS. Even if the computation runs inside ACI, there is a sandbox that guarantee us that our pods are isolated from other users.

Overview
Scaling inside AKS is simple but at the same time complex. We have all the tools necessary to configure autoscaling, but the trick, as usual, is to know how to configure it to respond to your specific needs. Even if the integration with ACI is only at the beginning, the base concept is powerful and full with potential for high loads. HPA works as it would work on an on-premises cluster, allowing us to use the same configuration without taking into account if we are on-premises or inside Azure.
CA is useful when we need autoscaling, but can be tricky to configure when the default behavior is not enough for us. The good part is that is not too common to configure autoscaling at nodes level. 

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provi...