Skip to main content

Serverless and microservices confusion

Past and present

Six years ago, I started to work on the first microservices projects. It was a migration from a monolithic solution to something different, to containers and microservices. Fast forward, we are in 2020, and in ALL engagements where I am involved, we have containers and something more – serverless.

The challenges that we had to manage six years ago are, in most of the cases, already solved by mature tools and products that are part of the microservices ecosystem. The natural integration that we have nowadays with cloud vendors (e.g., Microsoft Azure), load balancers, IDE, and debugging tools enable developers to build microservices solutions as they would write a console application.

The flip side of all of this is the lack of in-depth knowledge related to how does microservices and serverless work. The pressure from stakeholders on teams to deliver fast and agile makes them not to consider all the implications and ignore some best practices. For example, often I see production systems where the only tools used to do the maintenance of a serverless solution is a text file and a powerful text editor. It was the right solution in 2015, but nowadays, it creates confusion and can hide the real cause of issues.

Don’t expect to find solutions. I want to emphasize what we all forget when we work on a serverless or microservices project. The answers are already available on the market; we need to use them.

Human body

I always imagine a solution built using microservices and serverless as our human body. Different organs are the containers where we have one primary responsibility and maybe some secondary ones. Organs need to be able to communicate with each other. They might work for a while when another organ takes a break, but the dependency between them is high. The interface between them is well defined, and our eyes, ears, mouth, or hands are the integration points with external systems. We know how to maintain our body, communicate with it, or debug it. Often we ignore all of this when we build complex IT systems based on a serverless and microservices architecture.


Future

The future is here! Microservices and serverless are here to stay. There is no competition between them – the future is a blend between the two, wherein most of the cases, the piece of code you write can run in both ways.

Things about that 73% of companies planning to use one of these two approaches already see them extremely beneficial for building applications. Almost 65% of companies are already building solutions using microservices or serverless approach.

As you can see, the future is here. There is a high demand from the market, we have the right tools, but we will ignore the base principals we used to respect 5-10 years ago.

I found inspiring the ‘’State of Microservices 2020” report, provided by the “The Software House.” Using the figures provided by them, I identified easily the things that teams are forgetting to do, to take into account, and to tackle. In many cases, this can make the difference between the success or the failure of a project.

 

Technical experience

It is not hard to find people with 10 or 15 years in software development. Even so, when you look for people with microservices experience, around 50% of the people have less than one year. If you ask for serverless experience, the figures are even higher. This is one of the root causes of serverless and microservices solution don't work as expected, and you you with running, operation and features enhancement is expensive and hard.

Orchestration of such solutions cannot be reduced to the simplicity of a function. The right tools and processes are required, and the lack of experienced people (less than 7.5% with more than five years of experience) makes it hard to do things right from the first iteration.

 

Ways of working

Almost all the time, a team that starts to work with serverless and microservices is excited and happy in the first sprints. A lot of frustration accumulates, and before going into production, more than 58% of the teams are unhappy with how debugging and maintenance are done.

This happens because cloud vendors like Azure offer scalable environments, and cloud governance is not done all of time as it it should be. Teams are allowed to increase the number of cores or consumption by 40% without providing too many explications. This can hide performance or implementations issues.


Yes, serverless and microservices improve the efficiency of work and teamwork. Although performance issues can be easily hidden and defining stable contracts between components, it is not so easy as we think.


Payloads

More than 65% of the solutions that are using containers are running already inside the cloud. Less than 35% of the current solutions are using their infrastructure. The migration to the cloud shall be done easily as long as the solutions used for the on-premises systems can also be found in the cloud also.

From the start, most of the hybrid solutions and multi-cloud solutions are built around them. Even if AWS Lambda is preferred by 62% of the cloud services solutions, Azure Functions have a big advantage because they can run seamlessly anywhere (e.g., on-premises Kubernetes cluster).

Each cloud provider is providing a specific model to build such a solution (e.g., AWS Serverless Application Model). Less than 26% of the projects are using the best practices and recommendations to build solutions. The rest of the projects are not aware or do not consider them. This is one of the leading causes that can generate additional costs and delays.

If we add the pressure to build multi-stack and multi-cloud solutions on top of it, with the false expectation, that teams shall be faster because the ‘components’ (functions) are simpler, creates the perfect recipe for disaster.


System communication

More than 66% of the current systems have internal communication between services, and around 57% are communicating with a static frontend. Unfortunately, SSR frontends are not common (less than 27%) when we use serverless and microservices, which can create friction between the backend and frontend teams.

You might say that we shall already have standard ways of how we should communicate inside such solutions—the current numbers are saying something different. Even if in 76% of projects we are using direct communication using HTTP(s) or gRPC, we see that around 43% of solutions that are using events. If we put 43% side by side with the fact that around 63% of projects are using message brokers, we realize that the technical team doesn’t distinguish clearly what is an event and what is a message. This can easily degenerate and create communication and performance issues when you want to use a message base system to send a high no. of events, or we expect the persistently of events to be the same as for messages.

 

Debugging and monitoring

Even if we are in 2020, and around 85% of solutions are using logs, less than 34% of them are using metrics and tracking information during investigations. This is scary because this information is crucial when we do investigations, and the lack of them can hide the root cause of the problems. I don’t want to imagine how you do debug of a solution that has 30 functions and 20 microservices using only log files. It is not possible; however, understanding the context when an issue appeared is not easy when you don’t have (e.g., I/O metrics).

The future needs to bring us fault tolerance, smart prediction of a system failure. It can be done only if we collect all information related to application, trace, metrics, and health checks. There are so many tools on the market already that can be used:

  • Audits can be used by Falco to do abnormality detection
  • Monitoring can be achieved easily using Grafana or Prometheus that can analyze health checks, metrics, counters, and logs
  • Debugging can be done directly from IDE for the application (e.g., Visual Studio, shell) and ephemeral containers can offer an isolated sandbox where we can analyze the current status, init, logs, container and pod health.

When you have issues with a container, you should always check the following:

  • Does the service exist?
  • Did I run a service test?
  • Do I have the right bounding between services and pods?
  • Do I have the right port mapping between services and pods?
  • Do I have the right labels on pods?

 


Micro-frontends

Splitting a solution into multiple functions and microservices enables people to throw the responsibility from one team to another. At this moment in time, less than 24% of teams are using micro-frontends, even if from the way of working, it might represent the future of building such solutions.

In general, we have a team per each domain function, managing one or two services. Besides them, we have an aggregation layer and a frontend team that, besides building the frontend needs to align with all the backend teams and understand the data flow and how to consume the exposed functionality.

When we go with a micro-frontend approach, the team responsibility covers all the layers, from data to backend and frontend. Communication inside the team it is more easily, and the full functionality can be covered, ensuring the success of the team and the project in the end.


Final thoughts

The future is here. Even more serverless and microservices will become the standard way to build not only complex systems but also will become industry standards for backend development.

We need to understand the full landscape of serverless and microservices. It includes tooling, debugging methodology and way of working. The classical way of building software might work using the new architecture approach; however, we limit what we can achieve and the way we build, debug and operate such solutions.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provided a too