Hidden Costs of Microservices

Key Takeaways

💡 Microservices shouldn’t be the default for every project
💡 Microservices require detailed inter-service communication patterns
💡 Microservices require a change to the testing mindset
💡 Microservices require a change to environment setup
💡 Microservices require a coordinated observability infrastructure

Background

Microservices are quite often sold as a silver bullet to break apart a monolith application. In doing so, the expectation is to gain a massive increase in developer output. Features like isolated scalability for high-traffic code, clear domain boundaries, and frequent independent release cycles have made microservices highly appealing. The ability to adapt to specific problems by choosing the right frameworks and programming languages has further fueled adoption. Over the past decade, these advantages have won over the software industry, making microservices a dominant architectural choice. These features are great, but they are not for free. Implementing them well takes time, patience and skill. In this blog, I will talk through some of the main benefits of microservices and the technical costs you must be willing to pay to implement them well.

When reading this blog, it’s good to always be thinking, is that a problem you have or foresee coming, if it’s not then ask, “Do I really need this?”.

What I Mean by Microservices

When I talk about microservices, I am talking about reasonably small code bases providing one or more units of functionality. I do not define services in terms of lines of code, domain boundaries or any other hard-nosed limitations. My opinion is that if you require your microservices to be no more than 200 lines of code, you are right, or if you require your microservices to be no more than 5000 lines of code, you are right. If you require your services to provide only a single API, you are right, or if you bundle multiple APIs in a single interface, you are right. One person’s bookshelf is another person’s firewood. The point is, whatever makes the most sense for you is correct. The key metric you should aim to fine-tune is the length of time it takes developers to understand what’s going on. I would aim for the 3 hour mark but certainly not days. With a basic grasp of a microservice definition, we can now think about when we should move to microservices.

When to Think About Moving to Microservices

I’ll start by introducing key indicators that should prompt you to consider microservices. Note that I’ve said move to microservices, as opposed to start with microservices. It is unlikely that you will be reaching these key indicator thresholds when you begin a new project or product. I would go so far as to say you should never start a new project as totally independent microservices. I say independent in terms of having their own repositories, building pipelines, etc. If you know you will eventually split into independent microservices, using a monorepo is a good place to start. As we will find out, microservices are not for free, so defer on this decision for now if you’re not sure and start with a modular application.

⚡ Never start with microservices
⚡ Consider mono-repos or modular applications as a starting point
⚡ Remember that your monolith got you this far

The 4 Hidden Costs

Interservice Communication

Scalability is usually the first ingredient that gets developers and architects reaching for the microservices recipe book. It comes in two flavours: technical scalability and team scalability. Technical scalability handles the heat when the system load cranks up, while team scalability deals with the growing number of engineers stirring the pot. Because let’s face it—too many cooks can definitely spoil the broth.

High CPU or RAM usage metrics often signal technical scalability issues caused by resource-intensive code components. These “hot spots” are prime candidates for extraction from the monolith into standalone microservices. Once isolated, these services can be optimised with tailored CPU and RAM configurations or designed with scale-out capabilities to efficiently handle increased demand.

To avoid the chaos of too many engineers crowding around the same codebase, we break them into smaller teams and hand each group their own code repository—kind of like giving each set of cooks their own kitchen. The industry’s favorite rule of thumb? The “two-pizza team” size, means the team should be small enough to be fully fuelled by just two pizzas. This keeps the blast radius (aka, the code chaos zone) and cognitive load nice and small. Fewer moving parts get tangled up in code changes, and testing stays as neatly scoped as a well-sliced pizza.

⚡⚡⚡⚡⚡

But here’s the catch—splitting your microservices from your monolith creates a gap that you now have to bridge. And guess what? That’s the first big challenge you’ll face when diving into microservices. So, the real question is: How do you want these services to talk to each other? Do you go for synchronous communication, where they chat in real time, or asynchronous messaging, where they drop notes and process them later? Both options come with their own perks and pitfalls, so choose wisely—because this decision will shape how your system scales, performs, and handles failures!

🏁🏁🏁🏁🏁

Synchronous communication in microservices is like a well-organized conversation—one service speaks, the other listens, and everyone knows what’s going on. It’s simple and predictable, making it easy to implement, debug, and maintain without playing detective. When something goes wrong, the error handling is instant and obvious—no need to wonder if a message got lost in the void. Plus, it runs on battle-tested protocols like HTTP, gRPC, and GraphQL, meaning you get tons of built-in tools for logging, tracing, and monitoring. And let’s not forget strong data consistency—because when services talk one at a time, there’s no “he said, she said” with outdated data. If you need real-time responses, transactional accuracy, and fewer headaches, synchronous communication is your best friend!

🚨🚨🚨🚨🚨🚨

Synchronous communication might sound great—real-time responses, instant feedback—but it comes with some serious baggage. First off, it’s like a needy friend—every request has to wait for a response, which can slow things down if services start depending on each other too much. And let’s not forget scalability nightmares—if one service gets overwhelmed, it can choke the whole system, like a traffic jam caused by one slow driver. Then there’s the failure domino effect—if a critical service crashes, everything that depends on it goes down with it unless you build in retries, fallbacks, or circuit breakers. And oh, the open connections! Keeping all those services constantly talking can drain resources fast. So while synchronous communication might feel comfortable, it’s not always the best choice for high-scale, resilient microservices.

🏁🏁🏁🏁🏁

Asynchronous communication allows microservices to operate independently, enhancing scalability and resilience by avoiding blocking requests. Messages are queued and processed when resources are available, making it ideal for high-throughput workloads and fault tolerance.

🚨🚨🚨🚨🚨🚨

However, this flexibility introduces added complexity—debugging is harder, errors are delayed, and ensuring eventual consistency requires careful design. To handle failed messages, dead letter queues (DLQs) are often used to capture unprocessed messages for troubleshooting and retry logic. Building a reliable and scalable application requires additional infrastructure and robust retry mechanisms. As Werner Vogels famously puts it, “everything fails, all the time”—so designing for failure isn’t optional, it’s essential.

⚡ Scalability is usually the first driving force behind microservices
⚡ Inter Service communication is a challenge and you should choose wisely

Testing Mindset

Developers working with monoliths typically follow a straightforward development workflow—spin up the application, connect it to a local database, and test it in an environment that closely resembles production. The assumption is that the entire system is available and can be tested as a whole. However, with microservices, this process becomes far more complex. Running a single service in isolation often requires multiple dependencies—databases, APIs, message queues, and other infrastructure—just to get a working development setup. As the system grows, so does the complexity of managing and testing it. This brings us to the second major challenge of microservices: rethinking your approach to testing.

Shifting a developer’s mindset is no easy task, especially when it comes to testing. There’s often a deep-rooted fear of deploying code that hasn’t been tested locally—after all, seeing it work firsthand provides a sense of security. However, embracing a test-first approach changes the game. It forces you to understand your code more deeply, anticipate edge cases, and build with confidence. Over time, deployment becomes less of a nerve-wracking gamble and more of a routine step—a natural outcome of a well-tested system rather than a risky leap of faith.

When I talk about a test-first approach, I don’t necessarily mean Test-Driven Development (TDD). Not because I disagree with it, but because the order in which you write your code and tests isn’t as important as the end goal—a well-tested, reliable application. The ultimate benchmark? The ability to deploy to production on a Friday at 5 PM and walk away stress-free. In fact, I’ve been so confident in my testing on some projects that I’ve routinely done live deployments during tech talks without breaking a sweat.

So, how do you reach this level of confidence?

Your instinct might be to start with unit tests—and you wouldn’t be wrong—but I’d suggest taking it a step further: pair with your testers. They follow a structured playbook that likely includes edge cases you’d never even consider. Think of invalid inputs, unexpected response codes, service outages, and type mismatches in APIs. Starting with these edge cases doesn’t just improve test coverage—it also pulls testers in earlier, shortens feedback loops, and frees them up for more exploratory testing.

Once you’ve built out these edge cases, you’ll naturally start uncovering other critical scenarios you hadn’t thought of before. To keep your tests reliable and controlled, use mocks at service boundaries—this lets you simulate external dependencies like databases, APIs, and cloud services without relying on them during testing. Thankfully, plenty of libraries exist to support this, making it easier than ever to create a test-first culture that prioritizes reliability and confidence over rigid development order.

Testing goes beyond just writing tests—it extends into the deployment environments, which I’ll dive into shortly.

⚡ Testing mindset needs a reset
⚡ Get comfortable not running your application
⚡ Pair with your tester

Deployment Environments

If you’re considering microservices, chances are you already have a mature deployment pipeline, typically with environments like QA, Staging, and Production. At first glance, this setup seems straightforward. But now, imagine 10 teams constantly deploying new features to the same QA environment—suddenly, it’s a chaotic, unstable mess where test results become unpredictable and unreliable. In this scenario, would you really feel confident promoting a change to Staging or Production, knowing the environment isn’t fully controlled? Probably not. Managing deployments across multiple teams requires a smarter strategy to maintain stability and trust in your releases.

A better approach is to provide isolated lower environments to teams and only deploy the full suite of microservices at the Staging and Production environments. So what happens with the external services that the team’s services call? Well, just like our unit tests, we mock them out. Using this approach, each team can control the expected responses from boundary services. There are many tools to implement these mocks. Open source software such as Wiremock provides easy ways to create configuration based mocks without too much setup. Cloud providers like AWS also provide some services such as Elastic Load Balancers and API Gateway, which can provide out of the box support for mocking APIs, or alternatively, you can bring your own HTTP(S) server. You can couple this approach with contract based testing such as PACT, which gives you confidence in cross-domain services communication.

With this additional confidence that comes with the control over testing, you may even find yourself spending more time testing in Production and less time in your pre-production Staging environment. This might even lead you to remove the Staging environment altogether, thus saving huge costs, but this is something you may need to decide yourself.

⚡ Each team should have an isolated lower environment
⚡ Lower environments should be partial environments
⚡ Mock out the boundary services

Fragmented Observability

Once your distributed microservices application is up and running, issues are inevitable—just as Werner Vogels wisely reminds us! In a distributed system, each team often implements its own approach to logging and tracing, with critical data scattered across individual service logs. This fragmentation makes it incredibly difficult to piece together the full sequence of events, turning even simple debugging into a complex puzzle. Without a unified strategy, correlating logs across services becomes a major challenge, making visibility and troubleshooting a daunting task.

Well, there’s a reason companies like Datadog exist—observability in distributed systems is hard. Buying a solution like Datadog is definitely an option, but it comes at a cost (literally). Fortunately, there are more affordable alternatives.

OpenTelemetry is one such solution, designed to unify telemetry signals (logs, traces, and metrics) into a standard format. This data can then be fed into open-source tools like Grafana, Jaeger, and Prometheus for centralized monitoring. By annotating your services and deploying an OpenTelemetry agent, teams can stream all observability data to a single destination, making it easy to correlate signals across microservices.

With this approach, instead of chasing down logs in different systems, you get a holistic, meaningful view of your entire application—making debugging, performance monitoring, and system understanding a breeze.

⚡ You need a centralised telemetry system
⚡ OpenTelemetry provides an open source alternative to commercial solutions

Summary

Building and maintaining a Microservices architecture is no small feat—it requires rethinking deployment, testing, and observability while balancing scalability and complexity. From choosing between synchronous and asynchronous communication to ensuring reliable testing and tracing across services, the challenges are real, but so are the rewards. With the right tools, practices, and mindset shifts, microservices can unlock agility, resilience, and efficiency in ways monoliths simply can’t. Whether you’re just starting out or fine-tuning your architecture, one thing is certain: embracing change is the key to success.

🚀 Now, go build something great!

Need help?

Successfully implementing microservices requires the right strategy, tools, and expertise. At fourTheorem, we help businesses navigate the challenges of cloud architecture. If you’re considering migrating to microservices, optimising your existing architecture, or improving observability, reach out to us!

🔗 Useful Resources

Here are some tools and resources to help you on your Microservices journey:

SLIC Watch – fourTheorem’s open-source Observability toolkit: https://fourtheorem.com/open-source/slic-watch/
Datadog – Comprehensive monitoring for Microservices: https://www.datadoghq.com/
OpenTelemetry – Standardised tracing, logs, and metrics: https://opentelemetry.io/
Jaeger – Distributed tracing for Microservices: https://www.jaegertracing.io/
Grafana – Visualisation and monitoring dashboards: https://grafana.com/
WireMock – API mocking for Microservices testing: https://wiremock.org/
Pact – Contract testing for Microservices: https://pact.io/

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

4 Hidden Costs of Microservices