Azure Kubernetes Services (AKS) it's the perfect location where you can deploy and run your Kubernetes system. You just deploy your application, focusing on business needs and letting Microsoft manage your infrastructure.
Once your application is a big success and you have a lot of visitors or clients you will need to scale-up or down based on different factors like no of users, no of views or many other specific counters. Let's see how you can scale your Kubernetes solution inside AKS and what Microsoft prepared to us to support our dynamic scaling.
Of the most appealing feature of a microservice architecture is scaling, that it is more granular than on a classical deployment, providing us the flexibility to scale only the components that are on stress. In this way, we can optimize how we consume our computation resources.
Inside Kubernetes, there are different directions on how we can scale our applications. Two have them are more on the software side and can be done automatically. The 3rd one it is more related to hardware and computation resources and even in a cloud provider like Azure or AWS would take a few minutes.
The first dimension is the no. of instances of a specific service or pods (a pod can include one or multiple types of services). This involves increasing the no. of instances of a service that is under pressure. By having a Kubernetes cluster that has multiple computation units (VM nodes), we can balance the no. of services based on our need without having to spin-up new nodes.
The second dimension is the cluster size. Each Kubernetes cluster contains multiple VM nodes. When the load on the physical or virtual nodes is high, we can increase the cluster size by adding additional nodes. When scaling in this way, you need to have the computation resources available. AKS and Kubernetes can automatically do all the configuration of the new nodes and move the service instances on them.
In comparison with the first dimension, the scaling activity can take from a few minutes to even half an hour or more. Why? The infrastructure provider needs to configure and VM with your specific OS configuration. This activity is similar to creating and setting a VM from scratch. To decrease the time, Microsoft and other providers are keeping in the queue some machines already configured with different configurations. This is why you can spin-up an AKS cluster in just 2-3 minutes.
The 3rd dimension is the VM size (tier). Scaling in this way it is not common because it involves replacing some nodes from the cluster with different VMs. When you scale using this approach, you add more powerful nodes to the cluster, migrate service instances to them and retire a part of the existing nodes.
I'm pretty sure that you already know about manual scaling and you are not interested too much about it. Manual scaling can happen at pods level. Using the shell, we can specify the no. of replicas that we want to have. In the below example we set the no. of replicas (instances) of Card pod from 5 to 9
kubectl scale --replicas=9 deployment/rv_epi_card
Using 'get pods' command we can see the list of pods that we have
$ kubectl get pods
READY STATUS RESTARTS AGE
rv-epi-card-3545353453-2hfh0 1/1 Running 0 3m
rv-epi-card-3545353453-bzt05 1/1 Running 0 3m
rv-epi-card-3545353453-fvcvm 1/1 Running 0 3m
rv-epi-card-3545353453-hrbf2 1/1 Running 0 15m
rv-epi-card-3545353453-qphz8 1/1 Running 0 3m
rv-epi-card-3545353453-th3h1 1/1 Running 0 3m
rv-epi-card-3545353453-5z504 1/1 Running 0 20m
rv-epi-card-3545353453-rvfvm 1/1 Running 0 3m
rv-epi-card-3545353453-1rbgf 1/1 Running 0 15m
We can scale the size of the cluster directly from Azure Portal or from the shell by changing the no. of nodes inside the cluster.
When we talk about auto scaling inside AKS, we need to keep in mind that there are two dimensions of scaling that we can automatize with out of the box features. One is controlled by AKS and Kubernetes and related to the of instances of the same type (services of pods) - so-called replicas inside Kubernetes. The second one is the cluster size, where we can add or remove nodes dynamically based on different counters and formulas that we define and control.
Horizontal Pod Autoscaler (HPA)
Horizontal Pod Autoscaler (HPA) can monitor the load of pods and resources and can decide to increase or decrease the number of replicas for each pod. The HPA is the same version that we have any Kubernetes cluster with version 1.8 or higher. It is checking the load on pods and replicas every 30 seconds, and it can decide to decrease or increase the number of replicas. The Metric Server collects the counter information from workers and can provide input for the HPA (e.g., CPU, memory, network).
There are moments when this is not enough, and you need to collect or use custom metrics. For this situations, we can install and configure other monitoring systems like Prometheus. It is widely used mainly when we have custom metrics that we want to use to archive auto-scaling at pod level. Metrics from Prometheus are exposed in the same format as Metric Server and can be consumed by the HPA over an adapter (Prometheus Adapter) that is able to push metrics to HPA.
- type: Object
The above configuration specified in the *.yaml file configures the HPA to have as target 265 HTTP requests per second. To achieve this, it will increase and decrease the no. of replicas of our pod between 2 and 8 replicas to achieve the 265 HTTP requests per second.
Cluster Autoscaler (CA)
In comparison with HPA, the cluster autoscaling it is a more Azure specific functionality. Every 10 seconds is checking the load of the cluster and if the number of nodes of the cluster needs to be increased or decreased.
The current integration with HPA enables CA to remove unused nodes if no pods are running on nodes for more than 10 minutes. CA checks HPA if there are enough nodes for pods (every 10 seconds) and increase the number of nodes if there are not enough nodes for pods. There there are not enough resources to increase the number of pods, the HPA has specific metrics/flag that can read by CA.
A difference between CA and HPA is on how they are looking at resources consumed. HPA is looking at the actual resources consumed by a pod replica. In contrast, the default behavior of CA is to look at the resources required in the*.yaml file for a specific service.
In the above example, we specified that the service instance requires a quarter of vCPU and 96Mb of memory.
On-demand fast scaling
CA and HPA work great, but it is impossible to add a new node to the cluster in just a few seconds. There is no SLA related to how long it takes to AKS to increase the cluster size. For this kind of situations what we can do to be able to handle the load peek on our application.
Imagine that you have a website that sales food and Friday afternoon you have a peek on the number of clients that want to order. If your application is slow for a few minutes, the clients will use another system to order the dinner.
In a standard Kubernetes deployment you would not be able to do too much if nodes are not physically available. Inside AWS or Azure, systems like CA can increase the number of nodes automatically, but still, there are a few minutes latency when you can lose business.
For this kind of situations, Microsoft gives us the ability to extend our cluster inside Azure Container Instances (ACI). ACI is a SaaS solution inside Azure to host and run our micro-services. By integrating ACI with AKS we can scale out our cluster in just a few seconds inside ACI.
This is possible by the way how ACI was integrated inside AKS. ACI it is seen as multiple notes of AKS cluster. From Kubernetes perspective is just a pool of nodes like any other nodes of the cluster.
Behind the scene, ACI is mapped as nodes that can run replicas of the poods. The virtual nodes are added inside a subnet that is secured connected to AKS. Even if the computation runs inside ACI, there is a sandbox that guarantee us that our pods are isolated from other users.
Scaling inside AKS is simple but at the same time complex. We have all the tools necessary to configure autoscaling, but the trick, as usual, is to know how to configure it to respond to your specific needs. Even if the integration with ACI is only at the beginning, the base concept is powerful and full with potential for high loads. HPA works as it would work on an on-premises cluster, allowing us to use the same configuration without taking into account if we are on-premises or inside Azure.
CA is useful when we need autoscaling, but can be tricky to configure when the default behavior is not enough for us. The good part is that is not too common to configure autoscaling at nodes level.