The socio-technical aspects of microservices: learning to run

June 1, 2021

This article is part of the series “the socio-technical aspects of microservices”.

When you switch to a microservice architecture, many aspects of your day-to-day experience will change. One of the things that will change is the way you run your software.

Ops teams in monolithic systems

It is quite common for a monolithic system to have a dedicated ops team. The ops team is responsible for keeping the system up and running. This includes not only the application, but also things like the database, load balancers and network configuration.

In such an environment, developers are responsible for only one thing: evolving the application.

In a monolithic system, this approach is working okay. But when you decompose your monolith into microservices the approach will stop working.

That’s because there will be way too many deployments for a single ops team to handle. Remember, the entire point of microservices is to allow teams to independently deploy their changes. When teams can deploy their changes without waiting for their changes to be coordinated, the frequency of deployments will increase.

Furthermore, the lack of coordination makes it more likely for teams to make different technology choices. Over time, the number of frameworks, databases and languages will grow quite a bit. The more things diverge, the more difficult will it be for a single ops team to master them all.

So, we can no longer rely on a single ops team in a microservice architecture. But who else will be responsible for keeping the system running?

You build it, you run it

The answer that got popular through the DevOps movement is: “the developers”. This sentiment is reflected in the guideline “you build it, you run it”. This particular approach comes with one huge benefit:

When developers are responsible for running their own services, they can funnel that experience back into the design of the software.

You see, running a piece of software on your own computer is quite different from running it in production. What’s missing is the actual production traffic.

A popular web service will have to handle hundreds, if not thousands of concurrent requests per second. To cope with such a volume of requests, web services use techniques like replication, caching and weak isolation levels, just to name a few.

These techniques are powerful, but they also have some surprising properties. If you’ve ever experienced the thundering herd problem, queues that grew unbounded or API throttling, you know exactly what I’m talking about.

Once you’ve experienced such problems first-hand, you’ll consider them in your designs. You’ll make sure that you’re prepared in the future. As a result, your services become more robust.

This is a huge benefit!

And it’s all just because you’re increasing the flow of information: when the same group of people designs and operates a piece of software, we establish a reinforcing feedback loop.

A feedback loop between designing and running a service

First, you must learn how to run (a service)

Unfortunately, you won’t be able to benefit from this feedback loop immediately.

When you’re just starting your migration to microservices, there’s a lot of new stuff to learn. Learning how to operate an entire system is no easy feat. Your developers will have to acquire an entirely new skillset.

But don’t despair. There are ways to learn about the operational aspects of microservices. I’ve found that five techniques work particularly well.

Instrument your services: Decide what should be measured about your service. Common metrics are availability, reliably and response times.
Set up a dashboard: Make it possible to monitor the most important metrics at a single glance. Deciding what should be on that dashboard will help you to prioritize.
Configure alarms: Set up automated alerts that will inform you about important changes in your metrics. This is a great exercise to define what a healthy system looks like.
Use feature toggles: When you put most of your changes between feature toggles, you make it really easy to recover from mistakes. This facilitates experimentation.
Create a pre-deployment checklist: Over time you’ll learn not only what constitutes a healthy system, but also what to look out for when making changes to the system. If you put all that into a checklist, you can help your developers to consider these things when they make changes.

When you use these techniques, you will make it easier for your developers to learn about the operational aspects of your services. As a result, your systems will become more robust.

This leaves us with only two remaining articles about the socio-technical aspects of microservices. So check out the next article about the importance of building relationships with the teams that depend on your service.