Microservice’s independent scaling can improve overall performance. But how, and how much ?

People say if you separate services and let them scale independently, you can will get better performance by allocating what’s needed and no computation resources will be wasted.

But how exactly? And how much gain will there be compared to a monolithic solution? This article will do a quantified research.


See conclusion


Let’s say there are 2 services, A and B. They have different RT (Response Time) . B’s RT is twice as A’s, if they are running on the same CU (compute unit)

Let’s compare the overall QPS (Queries per second) per CU if we bundle them together (monolith) , against if we distribute them (micro services) .

A and B both require at least 1 CU to run (minimum compute resource).

And to make QPS comparison fair, here we introduce a concept of “work weight” . Finishing 100 heavy queries per second is not the same as finishing 100 light queries per second. Here we define that a work unit’s weight is twice as the other if its RT is twice as the other.

A and B can be in parallel (both A and B serve requests directly and they don’t call each other), or they be serial (A calls B or B calls A to serve a request)

A and B run in parallel

In the case of running in parallel, request distribution for different services is a factor that will decide how many CU you put for A and for B. Services that have more weighted requests should need more CU.

Parallel, request distribution is A:B = 1:1

Parallel, request distribution is A:B = 1:2

Parallel, request distribution is A:B = 2:1

Parallel, request distribution is A:B = 4:1

A summary of parallel mode

Request distributionBundled or DistributedQPS/CUWeighted QPS/CU

The performance of bundled mode and that of distributed are always the same! Surprising?

Think about it this way: What difference does it make between a request going through Service A in a bundled instance, and the same request going through a dedicated Service A instance? No difference. There only thing is that in a bundled instance, there is code of Service B there , but it is not activated. A sleeping code doesn’t mean anything in terms of performance.

A and B run in series

In this mode, request distribution doesn’t make sense. Only difference is who calls who.

A calls B

B calls A

A summary of serial mode

Who calls whoBundled or DistributedQPS/CUWeighted QPS/CU
Light service calls heavy serviceBundled0.331
Heavy service calls light serviceBundled0.331

Again, the performance of bundled mode and the performance of distributed are always the same!

Why isn’t the distributed mode better? When an instance of service B is working, the instance of service A is also working. The work at the same time. Why isn’t it better in terms of performance?

The truth is, while A and B are both working, improving simultaneousness, they also consume more computation resources. This in reality offsets the performance gain brought by simultaneous working.


If services need the same minimum computing resource to run, then no matter if services work in parallel, or they work in series, the performance of bundled deployment (monolith) and the performance of distributed deployment (micro-services) are the same . Microservices is NOT doing better than monolith.

But what if services need different minimum computing resources to run? That’s what we’re going to find out in the next article.

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.