Uncovering the Database Coupling in Distributed Systems

  • 8 April 2018
  • 3 minutes to read
  • Share on:

In the perfect world, services in distributed systems hide their data behind an interface. Although the shared database is a well-known services integration anti-pattern, it’s still one of the most used ones due to its reduced initial complexity and promising consistency guarantees.

Extracting services in a system with an overused shared database integration style becomes a nightmare. Even if you find a sub-domain to isolate, you still need to decouple its data from the rest of the system.

My point is that we can gradually solve this problem avoiding massive rewrites. But we need better tools to do this.

Inspired by the Structure101 project and lessons learned while developing a coupling analysis plugin, I’d like to share my thoughts on what we need to measure and deal with coupling in distributed systems.

Imagine a situation when you have a core service in your system and a recently extracted service accessing the same data store. The core interacts with a new service to fulfill its business logic objectives but still has access to its underlying data.

alt text

This setup may lead to unexpected behavior in a newly extracted service that may (and usually will) rely on the full control over its data.
The idea is to identify potential pain points and deal with them on a case-by-case basis.

Tools needed:

Combining this all together, I’ve created a project that consists of three parts:

Simulation analysis produces a graph stored in Neo4j. Querying the graph via the web interface uncovers the following structure.

alt text

The mobile client accesses services API to create some records in the database and get them later. Core and sub-domain services access the table in the underlying data storage using SQL queries. But there is something more.

Nailing down the problem with another query, we identify the data leak.

alt text

When the mobile client accesses the core API, the core gets some data from the sub-domain service and additionally retrieves some data from the database table directly.

Problem identified. The only thing left is to grab your software skills and fix it.