12. April 2021 By Stephan Wies
Service mesh – just another fad to add to the collection of buzzwords?
Microservices are being used in more and more new software systems or to replace outdated architectures. This approach results in modules being loosely coupled and significantly improves the speed and flexibility with which software is delivered. On the other side of the coin, however, the amount of communication and the complexity of the operation are also greatly increased. In each service, functions such as monitoring or resilience need to be solved. More tasks located in monolithic architectures in operations are migrating to development teams, which in turn means they have less time available to them that they can devote to developing actual technical aspects.
One solution to this problem may be frameworks that take on this task within a microservice. A prominent example from the world of Java is Spring Cloud. The framework provides a number of functions for regular problems, such as service discovery or circuit breaking. However, as the framework itself is also implemented in Java, you lose the advantage of the technology independence inherent in microservices. If another technology is a better option for solving a specific problem, you are still limited by the framework or have to use an additional framework, if one is available.
This ongoing trend towards microservices and the drawbacks of technology-dependent frameworks have led to the emergence of what are known as service meshes, a promising approach to mitigate the aforementioned problems by introducing a dedicated infrastructure layer that handles these tasks.
What is a service mesh?
A service mesh consists of two important architectural components, a data plane and a control plane. Figure 1 below shows the structure in more detail and provides a comparison with direct communication without a service mesh.
The data plane consists of a number of service proxies, each of which is provided alongside each functional service. The pattern is also called a sidecar. Functions that each service needs are extracted into an additional container (the sidecar).
The service proxies are configured via the second layer, the control plane.
Any change to the behaviour of the service mesh configured by the developer is applied to the control plane and automatically distributed to the service proxies. Another task is the processing of the telemetry data collected by the service proxies and forwarded to the control plane.
Service mesh functions
As mentioned in the introduction, monitoring, resilience and routeing are of particular relevance in a distributed microservice architecture. The extent to which a service mesh can help here is described in more detail below.
Every microservices-based system should have a central monitoring system that collects information from all of the services and makes it available in a central location. Alarm functions can be implemented based on the data collected, for example, which automatically draw attention to problems in the event of an error. The service proxies within the data plane of a mesh measure basic information such as latency and throughput, as well as more specific data such as the communication protocols. When communicating via HTTP, for example, the status codes can be processed and analysed in order to display error rates based on them.
As the service mesh handles this function, the code of the functional services remains untouched and each service delivers the same data regardless of the technology chosen.
However, there are also limitations to this. For the mesh, the actual service is a black box. It can provide metrics regarding incoming and outgoing requests, but internal information, for example about threads or database connections, is only known by the service itself.
In a microservice architecture, resilience is the ability of the overall system to continue to function even if individual services fail or become unavailable, or at least to be able to deal with the problem in a structured way. If an architecture is designed with asynchronous communication patterns, for example it exchanges data via messaging, the messaging platform handles large parts of this task as it serves as a buffer between the services and holds messages until a target service becomes available again.
If messaging is synchronous, such as via REST APIs using HTTP, what are known as circuit breakers are usually used for this purpose. If a target service is currently under very high load, communication is interrupted for a configured time and after this time, retries are carried out in defined cycles. As communication in a service mesh always runs via the service proxy or the sidecar, it can take on this task without having to make adjustments in the functional service code. There is no need to use different frameworks when using different technologies in particular.
In a service mesh, routeing rules, that is, definitions of which services should be addressed based on HTTP metadata, for example, can be distributed centrally to the service proxies via the control plane. This can help with canary releases, for example, if there are a large number of services and regular deployment cycles in the production environment, as is the case in agile process models. A newly deployed service or a new version is initially only addressed by a fraction of requests. If the new service proves to be stable and provides the desired functionality as expected, the traffic is gradually increased until only the new service is being used.
Brief market overview
If we take a look at the solutions that can be used to implement a service mesh, two candidates stand out: Istio and Linkerd.
- Linkerd is a project from Buoyant – a company presumably less well known than the two founders’ previous employer, Twitter. Linkerd was the first product to implement a service mesh and thus defined the term.
- Istio was born out of a collaboration between Lyft, Google and IBM in 2018. It is one of the most popular and complete solutions, and it is suitable for companies of any size and any software landscape.
Both frameworks have their advantages and disadvantages. Istio offers a greater range of functions and has a greater number of configuration options, but it requires more resources, which affects performance. Using Linkerd means you are limited to using Kubernetes as an orchestration platform. Istio offers more options in this regard.
Ultimately, a decision must be made depending on the context of the project, the existing system landscape and the functionality that the mesh should cover.
After reviewing the functions and options using a service mesh offers, the question remains whether this is just a new fad or whether it really makes sense to use one.
The added value may well outweigh the initial setup in architectures with multiple services, possibly even with different technologies and a high degree synchronous communication. Using a service mesh can also make troubleshooting considerably easier for the team, especially with higher call depths, that is, when several services are involved in a functional action and there are already unsolved problems with regard to monitoring and resilience, for example.
On the other hand, there is clearly a lot of effort required for the initial setup and a drain on resources, especially on the part of the staff. Never underestimate the amount of expertise that will need to be acquired; a new technology is always a mental hurdle as well. Moreover, if system performance is one of the decisive factors, using a mesh requires a greater level of analysis as using one always leads to higher latencies.
If you would like to learn more about exciting topics from the world of adesso, then check out our latest blog posts.