Database per Service: Data Isolation and Ownership in Microservices

Learn how to implement the database per service pattern, manage data ownership, handle cross-service queries, and maintain data consistency.

published: March 24, 2026 reading time: 53 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Each microservice owns its data completely—no peeking into another service's database. The upside is real: teams ship without waiting for approval, scale exactly what needs scaling, and pick the right tool for each job instead of settling for one database to rule them all. The downside hits just as hard. You lose the ability to just join across services, so now you're managing saga coordination, CQRS, or event sourcing to keep data consistent. For greenfield microservices work, database per service earns its keep. Just don't underestimate the upfront work of getting service boundaries right—you'll be designing cross-service workflows before you see the independence you're after.

Database per Service: Data Isolation and Ownership in Microservices

The database per service pattern is a core idea in microservices. Instead of one big database shared across everything, each service manages its own data. Sounds straightforward, but it changes how you think about everything from team structure to how your data behaves when things get busy.

If you are building or migrating to microservices, you need to understand this pattern. It shapes how teams work, how services scale, and how your data behaves under load.

Introduction

The database per service pattern means that a microservice is the sole authority over its data. No other service can directly query or modify that data. Instead, services communicate through well-defined APIs.

Think of it like a neighborhood where each house has its own yard and front door. Your neighbor cannot just walk into your kitchen, even though you might chat over the fence. If you want something from your neighbor’s kitchen, you knock on the door and ask.

This boundary is intentional. It keeps a distributed system from turning into a distributed monolith, where services are technically separate but still tightly linked through a shared database.

What This Looks Like in Practice

A user service manages user accounts and authentication data in its own database. An order service handles orders and line items separately. A product catalog service maintains its own product information. None of these services reaches into another service’s database.

When the order service needs to know who placed an order, it does not query the user database directly. Instead, it either receives user information as part of the API response when the order is created, or it calls a user service API to look up the information it needs.

graph TB
    UserService --> UserDB[(User Database)]
    OrderService --> OrderDB[(Order Database)]
    ProductService --> ProductDB[(Product Database)]
    InventoryService --> InventoryDB[(Inventory Database)]

    OrderService -->|REST/gRPC| UserService
    OrderService -->|REST/gRPC| ProductService
    OrderService -->|REST/gRPC| InventoryService

    class UserService,OrderService,ProductService,InventoryService service
    class UserDB,OrderDB,ProductDB,InventoryDB database

Why Teams Embrace This Pattern

The database per service pattern delivers real advantages for teams building things.

Team Autonomy

When each team owns a service and its data, work can happen independently without coordination bottlenecks. The ordering team can change their database schema, optimize queries, or even switch database technologies without getting permission from the user team or the product team.

This autonomy extends to deployment. Teams can release updates on their own schedule, roll back if something breaks, and experiment with different approaches without risking the entire system.

What this looks like day-to-day: schema changes happen on the team’s timeline. The inventory team can add a warehouse_location column to their products table, index it for specific query patterns, and deploy the migration without coordinating with the order team. Technology switches are isolated too. The search team might migrate from Elasticsearch 7 to OpenSearch without any downstream services noticing, because the API contract stays the same. Deployment independence means the payment team can roll out a new fraud detection model at 2pm on a Tuesday without scheduling a maintenance window with eight other teams. This separation compounds at scale. A team of five working on a single service feels little benefit; ten teams each owning their own service feel the coordination overhead disappear.

The failure mode to avoid is a shared database with informal ownership boundaries. If the order team needs the user team to add an index for a query they own, autonomy collapses into a ticket queue. Database per service makes ownership explicit and self-service.

Independent Scaling

Different services have different resource needs. Your product catalog might need heavy read throughput during a sale event, while your order processing needs more write capacity during checkout. With separate databases, you can scale each service’s data layer independently.

You might run three replicas of your product database during a flash sale while keeping your order database on beefy write-optimized hardware. No single database becomes a bottleneck that throttles the entire system.

The actual numbers make this concrete. A product catalog during a flash sale might sustain 50,000 read queries per second with three read replicas, but the write headroom matters less since product updates are infrequent. Conversely, an order service during the same sale might handle 500 writes per second but only 1,000 reads per second, because order lookups are far less common than product browsing. These workloads have nothing in common, yet in a shared database configuration they compete for the same memory, CPU, and I/O. With separate databases, the product catalog can run on read-optimized instances with large buffer pools and many replicas, while the order service runs on write-optimized instances with fast NVMe storage and fewer replicas.

Technology Diversity

Not every problem needs the same tool. A product catalog with complex, hierarchical search might work well with MongoDB or Elasticsearch. User accounts with strict transactional requirements might belong in PostgreSQL. Session data that needs extreme read speeds might live in Redis.

Database per service lets you pick the right tool for each job rather than compromising everything on a single technology that has to serve all your needs.

The reasoning behind the picks matters more than the picks themselves. Relational databases work when your data has fixed structure and relationships that matter: orders have line items, line items reference products, products belong to categories. These foreign key constraints are not bureaucratic overhead, they are business rules enforced at the database level. If you remove them and move to a document store, you now need application code to enforce what the database used to handle for free.

Document databases make sense when the shape of your data varies in ways you cannot predict. A product catalog where each product type has different attributes, a user profile where you need to store arbitrary metadata, a content management system where pages have dynamic schemas. The flexibility is real, but it comes at a cost: you trade schema enforcement for freedom, and that trade-off only pays off when the schema truly varies and variation is a feature, not a bug.

Key-value stores are not a lesser option, they are the right option for a specific problem. When your access pattern is “give me thing X by ID” and you run that lookup millions of times per second, SQL parsing and index traversal are overhead you do not need. Redis does not parse queries, does not plan execution, does not maintain indexes. It hashes a key and returns a value. That is the entire operation, and it takes under a millisecond. The moment you need to query by anything other than the key, or range over data, or join across sets, you are using the wrong tool.

The Challenges Nobody Talks About Upfront

The benefits are real, but so are the difficulties. Before you commit to this pattern, you need to understand what you are signing up for.

Cross-Service Queries

The hardest problem in microservices is querying data that spans multiple services. In a monolith, you could write a single SQL query joining users, orders, and products. With separate databases, that query has to be assembled from multiple API responses.

This is not impossible, but it requires different approaches. You might use an API composition pattern where a gateway aggregates responses, or you might duplicate some data across services to make local queries fast. Each approach has trade-offs in consistency, complexity, and storage cost.

Three patterns handle cross-service queries at different points in the consistency-latency trade-off. API composition calls multiple services at query time and merges results in memory. This is the simplest to implement but has the highest latency and couples services at runtime. CQRS separates read and write models entirely, serving reads from a pre-built denormalized store that aggregates data from multiple sources. Event sourcing stores a log of events that downstream services consume to build their own local projections. The right choice depends on how fresh the data needs to be and how complex the query is.

Data Consistency Across Services

When you cannot use database transactions spanning multiple services, maintaining consistency becomes an architectural challenge. If placing an order should decrement inventory and create an order record atomically, you now need a mechanism to keep these actions in sync despite them touching different databases.

Patterns like the saga pattern handle this. Instead of ACID transactions across services, you coordinate a series of local transactions through messaging. It is more complex, but it works.

Teams often underestimate this problem. They assume eventual consistency will be fine, then spend months retrofitting saga coordination after users start seeing phantom inventory or double-booked orders.

Phantom inventory and double-booked orders are the scenarios that break production. Phantom inventory happens when a user places an order that decrements stock, but the inventory service and order service fall out of sync temporarily. The user sees “order confirmed” but the actual reserve failed silently, leading to overselling. Double-booked orders happen when two concurrent requests both read the same available inventory count before either writes the decrement. Without coordination, both orders succeed despite only one unit being in stock. Both scenarios demand saga coordination or pessimistic locking. Showing a user’s order history with a few seconds of lag is different. The business impact of a stale list is low; strong consistency across services is not worth the complexity.

Reporting and Analytics

Business intelligence usually wants a unified view of data. The finance team wants to see revenue across all products. Marketing wants to correlate user behavior with purchase patterns. With data scattered across dozens of service-specific databases, building these reports requires additional infrastructure.

You end up needing data warehousing solutions, ETL pipelines, or event streaming to assemble a coherent analytical picture. This is solvable, but it adds operational complexity and latency between when something happens and when it appears in your reports.

The typical pipeline uses CDC tools feeding a transformation layer. Airbyte, Fivetran, and Stitch read from each service’s read replica using change data capture, capturing inserts, updates, and deletes without querying the primary. The captured changes flow into a transformation layer, typically dbt, which applies business logic, normalizes column names, and builds the final analytical schema. The warehouse itself is usually Snowflake, BigQuery, or Redshift. CDC captures changes incrementally rather than doing full table dumps, which keeps analytical data fresher without impacting production databases.

Approaches for Cross-Service Queries

When you need data that lives in multiple services, several patterns can help you retrieve it.

API Composition

The simplest approach is to have a service or API gateway call multiple backend services and combine the results. A reporting service might call the user service for demographics, the order service for purchase history, and the product service for catalog data, then merge everything in memory.

This works well when the data volumes are reasonable and latency is acceptable. The downside is that you are doing join-like work at the application layer, and the coordinating service becomes a potential bottleneck.

The practical ceiling for API composition is latency. Four backend calls at 20-50ms each means your composite response takes at least 80-200ms before merging, timeouts, and retry logic. The coordinating service ends up being the most complex piece of your architecture.

Teams reach for API composition in the wrong places. A product listing page that pulls user-specific pricing, personalized recommendations, and inventory counts from three different services will load slowly no matter how you optimize the gateway. Pre-compute these composites and serve them from a dedicated read model instead. Use API composition for queries that run occasionally and where freshness matters more than speed.

API composition fits situations like a support agent needing a unified view of a specific customer’s order, user, and session data to troubleshoot a problem, or an audit endpoint combining data from several services for a compliance report that runs a few times per day. These are low-frequency, user-initiated queries where the data can be seconds old and the human waiting tolerates slower responses. It breaks down when the data is needed for every request in a high-traffic path, when the number of backend services grows beyond three or four, when any backend service has latency spikes that propagate to the composite response, or when the merge logic itself becomes complex enough to warrant its own service.

CQRS: Command Query Responsibility Segregation

CQRS separates read and write operations into different models. Write operations go to the service that owns the data. Read operations can be served from a specialized read store that is optimized for queries, even if it means the data is slightly stale.

You might have an order service that writes to a normalized PostgreSQL schema, but also publishes events to a message broker. A separate read service consumes those events and maintains a denormalized read model in Elasticsearch, optimized for complex queries across users, orders, and products.

CQRS works well with event-driven architectures and gives you query flexibility, but it introduces eventual consistency between your write and read models.

Here is how a read model works. Consider an order history dashboard showing all orders for a customer, including product details, shipping status, and payment summary. The write side stores this as normalized tables: orders, order_items, products, and payments with foreign key relationships. A query joining these four tables for the dashboard is expensive and slow. The read side instead maintains a denormalized document per order, embedding product name, price at time of purchase, shipping address, and payment status in a single Elasticsearch document. When the read service consumes an OrderPlaced event, it builds or updates the corresponding document. When an OrderShipped event arrives, it patches the document’s status field. The dashboard queries one index and gets a complete order view in milliseconds. The shipping status might be a few seconds behind the actual scan event, which is acceptable for this use case.

Event Sourcing and Projections

With event sourcing, you store not the current state but the sequence of events that led to that state. Other services can consume those events and build their own projections of the data they need.

A billing service does not need to query the order database directly. It subscribes to order events, builds its own billing record projections, and serves queries from its local store. When you need a new report, you do not change the order service schema, you create a new projection from the event stream.

Event sourcing gives you good audit trails and flexibility, but it requires more infrastructure and expertise to implement correctly.

A billing projection makes this concrete. The order service emits OrderPlaced, OrderItemAdded, OrderItemRemoved, PaymentAuthorized, PaymentCaptured, and OrderShipped events. The billing service subscribes to these events and builds a billing projection tracking every order’s financial lifecycle. When OrderPlaced arrives with line items and prices, the billing projection records the total. When OrderItemRemoved arrives, it adjusts the total. When PaymentCaptured arrives, it marks the order as paid. The billing service’s local store ends up with a complete, auditable financial record that no other service needs to know about. If the finance team later needs a revenue report broken down by product category, a new projection from the same event stream does the job without modifying either the order service or the billing service.

Data Ownership and Bounded Contexts

The database per service pattern forces you to think carefully about where data belongs. This is a domain modeling decision as much as a technical one.

Defining Service Boundaries Around Data

You need to identify bounded contexts in your domain: areas where a clear and consistent model applies. A user context and an ordering context might share the concept of a “customer,” but they model it differently for their own needs.

The ordering context cares about customer IDs, shipping addresses, and contact preferences for fulfilling orders. The user context cares about login credentials, profile information, and communication preferences. These are related but distinct models of the same real-world entity.

Each service owns its bounded context completely. If two services need similar data, they each maintain their own copy, synchronized through events or API calls.

Two practical signals identify bounded contexts. Ubiquitous language is the first: a term means the same thing everywhere within a bounded context but differs across contexts. In the ordering context, “order” means a purchase transaction with line items, pricing, and fulfillment state. In the shipping context, “order” means a parcel with a tracking number, carrier, and delivery window. The same word meaning different things signals separate contexts. Independent deployment is the second signal: if two things change for unrelated reasons, deployed on different schedules by different teams, they are likely separate bounded contexts. If changing the fraud detection rules in the payment service never requires changing anything in the product catalog service, those are separate contexts. If a change to one always seems to require a change to the other, they may belong in the same service.

When to Duplicate Data

Data duplication is not always a problem to avoid. Sometimes it is the right trade-off.

If your order service stores a copy of the product name and price at the time of purchase, you have an accurate record of what the customer actually ordered, even if the product catalog changes later. This immutability of historical records is often more valuable than a single source of truth.

The cost is keeping the copies synchronized when things change. You have to decide which data changes slowly versus frequently, which historical accuracy matters, and which services can tolerate stale reads.

Duplication makes sense for immutable data. An order record is immutable; you never change the amount charged or the products ordered after checkout. The product name and price stored in the order are correct by definition, regardless of what the product catalog later shows. Duplication also makes sense when the downstream service needs the data at read time without a network call and stale data is acceptable. A product recommendation cache storing product names and categories serves reads instantly from local memory even if the catalog updates the name a few hours ago. Duplication does not make sense when data changes frequently and downstream services need the current value. Storing a product’s real-time inventory count in the order service creates a synchronization burden that rarely pays off; the order service should call the inventory service at order placement time instead.

Choosing Database Technologies Per Service

There is no universal best database. The right choice depends on the access patterns and requirements of each service.

Relational Databases for Transactional Services

When you need strict consistency, complex joins, and ACID transactions, relational databases like PostgreSQL or MySQL remain the standard. Services that handle financial transactions, inventory with strict counts, or any data where a wrong number has real consequences benefit from these guarantees.

PostgreSQL in particular has become a popular choice for microservices because it offers JSON support for semi-structured data, powerful indexing, and a mature ecosystem.

Three PostgreSQL features matter beyond standard relational capabilities. JSONB stores semi-structured data without forcing a schema change, useful for attributes that vary per tenant or product type. Advisory locks provide a lightweight coordination mechanism for operations like reserve-on-checkout that would otherwise require a dedicated lock table. LISTEN and NOTIFY delivers lightweight event notifications directly from the database, enabling the outbox pattern where a service publishes domain events from within the same transaction that writes its data, eliminating dual writes to a message broker.

JSONB use cases play out in practice like this: a multi-tenant SaaS application where each tenant needs custom fields, a product catalog where different product types have different attributes, an event tracking system where events have variable payloads. You store the custom data as JSONB alongside your relational tables, index it with GIN indexes for fast containment queries, and query it with PostgreSQL’s JSON operators. The alternative is either a wide table with dozens of nullable columns, or a separate key-value table that requires joins. JSONB sits in between and works well when the variation is truly variable and you do not need to query across custom fields.

Advisory locks solve a specific problem that comes up constantly in inventory systems. The naive approach is SELECT FOR UPDATE which locks the row and blocks other transactions. The problem is that you need to lock a specific row to reserve inventory, but you do not have a row to lock until the product exists. Advisory locks let you take a named lock without any table involved. You call pg_advisory_xact_lock(product_id) at the start of your transaction, it acquires the lock, other transactions trying the same call block until you commit or rollback. No rows locked, no deadlocks from reversed lock ordering, no separate lock table to manage. It is the right primitive for reserve-on-checkout.

LISTEN and NOTIFY is how you get events out of the database without a separate message broker. The outbox pattern works like this: within the same transaction that creates an order, you insert a row into an outbox table. The transaction commits. A separate process calls LISTEN on the outbox table. When the NOTIFY fires, the process reads the outbox, publishes to Kafka or SQS, and deletes the outbox row. The event is never lost because it is in the same transaction as the business data. The alternative, publishing after the transaction commits, can lose events if the service crashes between commit and publish. The outbox pattern with LISTEN/NOTIFY eliminates that window.

Document Databases for Flexible Schemas

Product catalogs, content management systems, and user profiles often have varying structures that evolve over time. A document database like MongoDB lets you store these without rigid schemas while still providing secondary indexes and query capabilities.

You can iterate on your data model without migrations, which is valuable in the early stages of a service when you are still figuring out what you actually need.

MongoDB’s aggregation pipeline lets you express complex multi-stage transformations in the database layer. A product catalog service can run an aggregation that matches products by category, sorts by sales velocity, and limits to top performers without any application-side sorting. Change streams push real-time change notifications to downstream consumers, enabling read model updates without polling or custom CDC tooling. These two features let a document database serve as both the primary store and the event source for a service whose schema evolves as domain understanding matures.

Here is where MongoDB’s flexibility becomes a liability. Schema drift happens when documents accumulate fields inconsistently over time. One developer adds a manufacturer field to some product documents; another adds brand to others; a third uses both interchangeably. Six months later, your aggregation pipeline assumes all products have a brand field and silently skips documents that only have manufacturer. The database does not tell you this is happening. There is no schema enforcement, so there is no error. You just get wrong results.

Query predictability suffers the same way. In a relational database, you can look at a query plan and know exactly what indexes it will use. In MongoDB, the query planner makes choices based on statistics that can change as data grows. A query that used to use your index starts doing a collection scan after a shard split or after you cross some internal cardinality threshold. The fix is usually hint() to force the index, but that is a workaround for a problem that does not exist in relational databases with cost-based optimizers.

The practical rule: use MongoDB when your schema genuinely varies and variation is intentional, when you are early in a project and the schema is still settling, or when the aggregation pipeline and change streams features directly solve your problem. Use a relational database when data consistency matters more than schema flexibility, when you have fixed entity relationships that need enforcement, or when you need query predictability that does not depend on data distribution statistics.

Key-Value Stores for High-Speed Access

Session data, caching layers, and rate limiting often involve simple key-value lookups where speed matters more than query flexibility. Redis and Memcached excel here with sub-millisecond response times and built-in data structures that go beyond simple get-and-set.

What you actually get with a key-value store is predictable latency. No SQL parsing, no query plan optimization, no index overhead — just direct lookups. A Redis lookup for a session token stays under 1ms even under load. The same lookup against PostgreSQL with proper indexing typically runs 2-5ms.

Use them when your access pattern is already simple: “give me token X”, “increment counter Y”, “return product Z”. The moment you need range queries or text search, you’re fighting the tool.

Two patterns come up constantly in practice. Volatile caching loses data on restart — fine for session tokens. Persistent caching survives restarts — good for product catalogs or recommendation caches. Redis supports both through its eviction policies. The operational thing to watch is memory. Once a key-value store starts swapping to disk, latency stops being predictable.

Session token storage is the simplest use case: Redis evicts expired keys automatically, and a lookup returns either the deserialized session object or nil. Rate limiting uses atomic increment on a key like rate_limit:user_123:minute_42 with a 60-second expiry; the increment is atomic under concurrent requests, preventing race conditions that would break a counter-based limiter. Distributed caching stores precomputed results like a personalized homepage feed; a cache miss triggers the expensive computation and the result is written back with a TTL, so repeated requests hit the cache. These patterns use different Redis data structures — strings for sessions, sorted sets for sliding window rate limiting, hashes or lists for cached feeds — but all share the same sub-millisecond latency.

Time-Series Databases for Metrics and Events

If your service handles monitoring data, IoT sensor readings, or any data where you primarily append and query by time ranges, a time-series database like TimescaleDB or InfluxDB offers compression and query optimizations that general-purpose databases cannot match.

What makes time-series databases different is the access pattern. You append data with a timestamp and query across ranges — “show me CPU for the last 4 hours”, “find all events between 2am and 6am yesterday”. The storage engine is built for that, not for arbitrary joins or point lookups by ID.

TimescaleDB is essentially PostgreSQL with time-series extensions. You get a PostgreSQL interface, automatic time-based partitioning, and compression that cuts storage by 90% or more for old data. InfluxDB is built specifically for this workload, with its own query languages (InfluxQL and Flux). It handles higher write throughput than TimescaleDB, but it does not speak standard SQL, so your existing tooling needs adapting.

If your team already knows PostgreSQL and you are adding time-series to an existing system, TimescaleDB is the lower-friction option. If you are building a dedicated monitoring or IoT platform and can invest in InfluxDB’s ecosystem, that is the better choice for write-heavy workloads.

Compression is where time-series databases justify their cost. A metrics table with 90 days of data at 15-second intervals produces roughly 518,400 rows per metric time series. TimescaleDB’s native compression reduces this by 90-95%, storing the same data in a columnar format that also accelerates aggregation queries. For a service emitting 1,000 metric time series, uncompressed storage runs roughly 500GB; compressed, it drops below 50GB. InfluxDB’s backreferencing compression achieves similar ratios for high-cardinality data. The tradeoff is that compressed data must be decompressed for writes, which is why both databases recommend batch inserts rather than single-row inserts.

Search Engines for Full-Text Search

Product search, log analysis, and any feature requiring complex text matching benefits from dedicated search engines. Elasticsearch and OpenSearch provide inverted indexes, relevance tuning, and aggregation pipelines that would be painful to implement on top of a traditional database.

The inverted index maps every searchable term to the documents containing it, enabling sub-second full-text search across millions of documents. Relevance tuning adjusts ranking using field weights, term frequency, and inverse document frequency. A product search on “wireless headphones” boosts matches in the title field over the description field and ranks products with both terms highly even if neither appears in every field. Aggregation pipelines extend search beyond simple retrieval: a product search can return matching products alongside facet counts by brand, price range, and category, computed in a single query against the index.

A concrete relevance tuning example shows how this works. Suppose you index products with fields: name, description, brand, category, and price. A user searches for “wireless headphones”. Your relevance scoring starts by finding documents containing “wireless” or “headphones” in any field. The initial score is TF-IDF: term frequency in the document times inverse document frequency across the corpus. Documents where both terms appear in the name field score higher than documents where they appear only in description, because name is a more important signal.

You then apply field boosting: name^3, brand^2, description^1. A product named “Sony Wireless Headphones” with brand “Sony” gets (3+2) * high_tfidf for both terms. A product named “Wireless Audio Accessories” with headphones in the description gets 3 * high_tfidf for “wireless” but 1 * low_tfidf for “headphones” since it only appears in description. The final score ranks the Sony product first. You can further tune by adding recency boost, conversion rate boost, or custom business rules that push certain products up or down based on inventory levels or promotional partnerships.

Reporting Across Services

When business needs require reports that cross service boundaries, you need a strategy for assembling data from multiple sources.

Data Warehousing

The traditional approach is to funnel data from all your service databases into a centralized data warehouse. ETL pipelines read from each service’s database, transform the data into a unified schema, and load it into your warehouse on a schedule.

This gives you a consistent analytical view and powerful query capabilities, but the data is always somewhat stale, and building and maintaining the pipelines is significant work.

How it actually works: each service database has a CDC tool or a scheduled export reading from its read replica. Data lands in a staging layer, gets transformed (column names normalized, nulls handled, business logic applied), then lands in the warehouse. The schedule runs hourly, daily, or real-time depending on freshness requirements.

Common tools are Airbyte, Fivetran, or Stitch for CDC; dbt for transformations; Snowflake, BigQuery, or Redshift as the warehouse. The part teams underestimate is the operational burden — every new service that needs reporting is a new ETL connection, and source schema changes break pipelines if you have not versioned them. Get CDC tooling and schema versioning in place before you need them.

The CDC pattern determines how fresh your analytical data is. A CDC tool connects to a database’s read replica, reads the write-ahead log or change stream, and emits structured records representing each change. An insert becomes one record; an update becomes two records (before and after state in Debezium-style CDC); a delete becomes one record with a tombstone flag. The transformation layer then applies business logic: normalizing a customer ID that the order service calls customer_id but the user service calls user_id, handling nulls that represent missing addresses, and applying the window function that computes running revenue totals per product category. dbt handles this as a SQL-based pipeline that runs on a schedule, but it requires a transformation developer who understands both the source schema and the target analytical schema.

What teams underestimate is the maintenance tail. The initial setup takes a few weeks. The ongoing cost is higher than expected. Source schemas change: the order service adds a channel column, the ETL pipeline silently fails to load it, and your revenue report is suddenly missing an entire sales channel. Without schema versioning and alerting on pipeline failures, these gaps go unnoticed until finance asks why Q3 numbers do not match. Another underestimate: the transformation logic is business logic, and it belongs in version control with code review, but most teams treat it as configuration and deploy it without testing. A typo in a dbt model can overwrite months of historical data. Treat your transformation code like application code: test it, review it, and have a rollback plan.

Streaming ETL with Apache Kafka

A more modern approach uses event streaming. Each service publishes events to a shared Apache Kafka cluster. Downstream consumers transform and load the events into analytical stores.

With streaming ETL you get near-real-time data in your analytical systems and an audit log of everything that happened. The operational complexity is higher, but the fresher data and better decoupling are often worth it for organizations with the engineering capacity to manage it.

What it looks like in practice: each service emits domain events (OrderCreated, PaymentProcessed, InventoryReserved) to Kafka topics. A consumer group reads from these topics, transforms, and writes to ClickHouse, BigQuery, or Elasticsearch. Schema registry (Confluent or AWS Glue) handles event schema evolution so old events still deserialize after you change the schema.

The complexity is all operational. Kafka itself needs managing — topics, consumer lag, replication factor. Then you need schema registry, error handling, dead letter queues, and the ability to replay when a consumer bug corrupts a read model. Teams that skip the operational investment end up with data quality problems that are hard to fix retroactively. If nobody on your team has run Kafka in production before, batch ETL with periodic syncs is a better starting point.

Two Kafka features matter most for streaming ETL reliability. Schema registry enforces a contract for each topic: producers cannot publish an event that deviates from the registered schema, and consumers can reject messages that fail deserialization. This prevents a schema change in the order service from silently corrupting the billing read model. Dead letter queues capture messages a consumer cannot process — a malformed event, a missing field, an out-of-range value — without halting the entire consumer group. The failed message sits in the DLQ for manual inspection and replay while processing continues. Without DLQs, a single bad event blocks the consumer offset from advancing, creating lag that compounds silently.

Mirror Databases for Simple Cases

For smaller systems, you might simply replicate each service database to a read replica that analytics tools can query directly. This avoids building ETL pipelines but still gives analysts access to current data.

The trade-off is that your analytical queries run against production databases, which can impact your service’s performance if you are not careful about query patterns.

The practical risk is that analytical queries are resource-intensive. A complex JOIN across a million-row table in a production read replica can cause replication lag, buffer pool eviction of useful data, and increased latency for the service that shares that replica. Restrict analytics to read-only replicas, enforce query timeouts at the analytics tool level, and route analytics to a dedicated replica that is not also serving production traffic.

This works fine for teams with a handful of services and a few hundred gigabytes per service. Once you scale past that, the pipeline and storage complexity of ETL becomes worth the operational investment.

When to Use / When Not to Use Database per Service

Criteria	Database per Service	Shared Database	Notes
Team Autonomy	High (teams deploy independently)	Low (schema changes require coordination)	Shared DB creates coupling bottleneck
Scaling	Independent per service	Shared (single DB becomes bottleneck)	Hot services can be scaled separately
Technology Choice	Flexible (polyglot per service)	Limited (all services use same DB)	Choose right tool for each workload
Data Consistency	Eventual (cross-service)	Strong (ACID across all data)	Saga required for cross-service transactions
Cross-Service Queries	Complex (API composition)	Simple (SQL joins)	Additional infrastructure needed
Operational Overhead	High (multiple databases)	Low (single database)	Each DB needs backup, monitoring, tuning
Reporting/Analytics	Complex (ETL required)	Simple (direct queries)	Additional data pipeline needed
Time to Market	Slower (initial setup)	Faster (start immediately)	Trade-off shifts at scale

When to Use Database per Service

The decision to use database per service is not a technical preference, it is a structural one. You make this choice when the boundaries between services are stable and the teams owning them are large enough that coordination overhead is the bottleneck. If you have three teams and ten services, coordination is manageable. If you have thirty teams and sixty services, every shared database becomes a coupling point that slows everyone down.

Use database per service when different services have different access patterns and storage needs. A product catalog that is read-heavy with complex full-text search is not well served by the same database that handles order transactions. A session store that needs sub-millisecond latency has no business sharing infrastructure with a reporting database built for large scans. The mismatch in access patterns means one workload will always be suboptimal for the other.

Use it when teams need to deploy independently without coordination. If changing the user service requires permission from the order team because they share a database, you have a coupling bottleneck that will slow you down as you scale. Database per service makes ownership explicit and self-service.

Use it when regulatory requirements demand data isolation. PCI-DSS scope reduction is a real driver here: keeping cardholder data in its own database, separate from the rest of the application, shrinks your compliance surface area. HIPAA, GDPR, and other data residency regulations work the same way.

When to Use a Shared Database

Shared databases make sense at a certain scale and stage, and ignoring that threshold is how teams end up with a distributed monolith that has all the operational pain of microservices with none of the independence.

The threshold is real. A team of three building an MVP does not have the operational bandwidth to manage six different databases, six different backup strategies, six different monitoring setups, and six different upgrade paths. The productivity cost of database per service at this stage outweighs the coupling cost. A shared PostgreSQL instance serving three services is fine. The moment any of those services needs a different access pattern, you can extract it. Premature extraction is a mistake.

The other case where shared databases win is when cross-service transactions with strong consistency are frequent. If your business workflow involves validating an order against fraud rules, checking inventory, reserving stock, and charging payment in a single synchronous operation, managing that across database boundaries with saga takes significant engineering. If the workflow is core to your business and it happens thousands of times per minute, the complexity of distributed transactions may be worse than the complexity of a shared database. This is not an argument for shared databases forever, it is an argument for understanding your consistency requirements before you choose.

The trade-off flips as you grow. The operational maturity to manage multiple databases comes. The number of cross-service synchronous transactions drops as you move to event-driven patterns. The cost of coordination bottlenecks rises as team count rises. At some point, the shared database becomes the problem, not the solution.

Production Failure Scenarios

Failure	Impact	Mitigation
Database for one service goes down	Only that service fails; other services continue	Isolate service failures; implement circuit breakers; health checks
Cross-service query timeout	API gateway or BFF times out	Set appropriate timeouts; implement fallback; degrade gracefully
Data inconsistency between services	Services show different states for same entity	Implement idempotent operations; saga pattern for consistency; monitoring
Schema migration in shared DB blocks	All services using DB are blocked	Use online schema migration tools; avoid locking operations
Connection pool exhaustion	Service cannot connect to its DB	Configure appropriate pool sizes; monitor connections; implement connection timeouts
Backup failure for isolated DB	Data loss risk for that service	Regular backup verification; multi-region backup storage; test restores

Data Consistency Flow

graph TD
    Start[Order Request] --> CheckInventory{Check Inventory}
    CheckInventory -->|Sync Call| InvDB[Inventory DB]
    InvDB -->|Reserve| InvReserve[Reserved: 5 units]
    CheckInventory -->|Reserve OK| CheckPayment{Check Payment}
    CheckPayment -->|Sync Call| PayDB[Payment DB]
    PayDB -->|Authorize| PayAuth[Authorized: $100]
    CheckPayment -->|Auth OK| CreateOrder[Create Order]
    CreateOrder --> OrderDB[(Order DB)]
    CreateOrder -->|Publish Event| InvEvent[Inventory Event]
    InvEvent --> UpdateInv[Update Inventory Read Model]
    CreateOrder -->|Publish Event| PayEvent[Payment Event]
    PayEvent --> UpdatePay[Update Payment Read Model]

API Composition Flow

graph LR
    Client[Client] --> Gateway[API Gateway]
    Gateway --> UserSvc[User Service]
    Gateway --> OrderSvc[Order Service]
    Gateway --> ProductSvc[Product Service]

    UserSvc --> UserDB[(User DB)]
    OrderSvc --> OrderDB[(Order DB)]
    ProductSvc --> ProductDB[(Product DB)]

    subgraph Composition
        OrderSvc -->|Get User| UserSvc
        OrderSvc -->|Get Product| ProductSvc
    end

    OrderDB --> Aggregate[Aggregate Results]
    UserDB --> Aggregate
    ProductDB --> Aggregate
    Aggregate --> Client

Trade-off Analysis

Understanding the trade-offs helps you make informed decisions about when database per service is the right choice versus when a shared database serves you better.

Decision Framework Matrix

The matrix below gives you a starting point, not a final answer. The right choice depends on your specific context, team size, and how stable your service boundaries are. Use it to identify which direction to lean, then make the call based on your actual constraints.

For greenfield microservices, database per service is almost always the right starting point even with the setup cost. The long-term flexibility pays off once you have more than a couple of services. The migration pain from a shared database to per-service databases is real and expensive, so avoid starting with the wrong architecture.

For monolith migrations, the strangler fig pattern with per-service extraction is the safer path. Migrating everything at once creates too much risk. Extract services one at a time, and extract the database alongside the service. A big-bang migration that leaves you with separate services but a shared database is a distributed monolith, and it is worse than what you started with.

Scenario	Database per Service	Shared Database	Recommended Approach
Greenfield Microservices	High setup cost, long-term flexibility	Fast start, eventual migration pain	Database per Service
Migrating Monolith	Significant refactoring needed	Risky - creates distributed monolith	Strangler pattern with per-service extraction
High Regulatory Requirements	Strong data isolation	Complex compliance auditing	Database per Service with audit logging
Startup / MVP Phase	Operational overhead too high	Sufficient until scale issues hit	Shared Database → extract when needed
Multi-tenant SaaS	Tenant isolation per database	Shared with schema-level separation	Database per Service (per tenant)

Complexity vs. Benefit Trade-offs

The complexity curve for database per service is front-loaded. The first six months are slower: you set up databases, configure backups, establish monitoring per database type, and build the operational muscle to run them. If your service boundaries shift during this period, you pay twice. Accept this cost as the price of admission.

The benefit curve is back-loaded. The payoff arrives when you have ten services with different access patterns, five teams deploying independently, and a product catalog that needs different scaling than your order processing. At that scale, the coordination overhead of a shared database exceeds the operational overhead of separate databases. The crossover point varies, but it typically hits somewhere between five and fifteen services.

Debugging cross-service issues is where teams feel the most pain. Distributed tracing is not optional, it is table stakes. Without it, a request that touches five services is effectively opaque when something goes wrong. Budget the observability investment before you need it, not after.

Factor	Database per Service	Shared Database
Initial Development Speed	Slower (more setup)	Faster (simple starting point)
Long-term Maintenance	Lower coupling, easier independent evolution	Higher coupling, coordinated changes required
Debugging Cross-service Issues	Complex (distributed tracing needed)	Simple (single database, local transactions)
Team Scalability	High (teams work independently)	Low (coordination bottleneck emerges)
Infrastructure Cost	Higher (multiple database instances)	Lower (consolidated resources)

Data Consistency Trade-offs

Strong consistency (ACID) is what you want for anything where a wrong number has real consequences. Financial transactions, inventory reservations, seat bookings. The database tells you if you are about to oversell or double-book. The cost is that ACID transactions require all participants to be available and responsive at the same time, which does not work across service boundaries without the saga machinery.

Eventual consistency is what you get when you accept that perfect consistency is too expensive or impossible. It means that after a write, some period of time passes before all replicas agree on the state. For a product catalog that gets updated a few times per hour, stale reads for a few seconds are fine. For an inventory count during a flash sale, a few seconds of inconsistency could mean overselling fifty units.

Read-your-writes consistency is a practical requirement that trips up teams early. A user places an order and gets redirected to a confirmation page that shows the order is missing because the read replica has not caught up yet. The user sees their own data as stale even though no other user would see it that way. The fix is either sticky sessions that route the user back to the writer for a short window, or synchronous reads that go to the primary for a configurable cooldown period after a write.

Consistency Model	Pros	Cons
Strong (ACID)	Data integrity guaranteed	Cross-service transactions require saga
Eventual	High availability, scalable reads	Stale data possible, complex compensation logic
Read-your-writes	User sees their own updates	Requires sticky sessions or synchronous reads

Technology Selection Trade-offs

Choosing a database technology is choosing constraints. Every database optimizes for something, which means it de-optimizes for something else. The choice is not “which is best” but “which is least wrong for this workload.”

Relational databases optimize for consistency and joins. You get ACID transactions, foreign key enforcement, and query plans that are predictable and optimizable. The trade-off is that they expect your schema to be relatively stable and your data to fit into tables with fixed columns. When your schema varies per tenant or per product type, you either build the variation into the application layer or you use JSON columns that are harder to query.

Document databases optimize for schema flexibility and write throughput. You can store any shape of document and query it flexibly. The trade-off is that you lose the enforcement that relational databases give you for free. No foreign keys means the application has to maintain consistency. No cost-based optimizer means query plans can shift as data grows.

Key-value stores optimize for latency. Lookups are O(1) with no parsing, no planning, no index traversal. The trade-off is that you can only query by key. Any operation that requires scanning or joining is the wrong tool.

Time-series databases optimize for append-heavy workloads with time-range queries. Compression ratios on old data are 90% or better. The trade-off is that point lookups by ID are expensive and joins across series are not supported.

Database Type	Best For	Trade-off
Relational (PostgreSQL)	Financial, inventory, orders	Schema rigidity vs. consistency
Document (MongoDB)	Product catalogs, user profiles	Limited joins vs. schema flexibility
Key-Value (Redis)	Sessions, caching, rates	No complex queries vs. extreme speed
Time-series	Metrics, IoT, logs	Single use case vs. optimization

Migration Complexity Assessment

Migration Type	Effort	Risk	When to Use
Big Bang (big bang migration)	High	High	Small systems only
Strangler Fig	Medium	Low	Preferred for most migrations
Parallel Run	High	Low	Critical systems requiring validation
Feature Flagged Rollout	Medium	Low	High-traffic production systems

Common Pitfalls / Anti-Patterns

Looking at how teams struggle with this pattern, a few mistakes come up repeatedly.

Starting with a shared database “for now” and promising to split it later almost never happens. The tight coupling builds up immediately, and by the time you have real data and real traffic, migration costs feel insurmountable. Make the split early, even if it means more work upfront.

Over-normalizing to match the monolith leads to chatty APIs. If you model every table as its own microservice, you will spend all your time coordinating network calls. Look for natural aggregates that can live together in a single service.

Ignoring saga or compensation logic until production is a recipe for corrupted data. Design your cross-service workflows before you deploy, not after users start encountering inconsistencies.

Treating eventual consistency as a bug rather than a feature creates endless rework. If your business logic assumes immediate consistency, you will fight the architecture at every turn. Either change the business process to tolerate stale data, or use synchronous APIs where you need strong consistency.

Interview Questions

1. What is the database per service pattern and why is it considered a fundamental tenet of microservices architecture?

Expected answer points:

Each microservice owns and manages its own data store exclusively
No other service can directly query or modify another service's data
Services communicate through well-defined APIs instead of shared database access
Prevents tight coupling that leads to distributed monoliths

2. How does the database per service pattern impact team autonomy and deployment workflows?

Expected answer points:

Teams can work independently without coordination bottlenecks
Each team can change database schemas, optimize queries, or switch technologies without permission from other teams
Teams can release updates on their own schedule and roll back independently
Enables experimentation without risking the entire system

3. What are the main challenges when implementing cross-service queries in a database per service architecture?

Expected answer points:

Cannot use simple SQL joins across services - data is scattered across separate databases
Must assemble queries from multiple API responses
Options include API composition, CQRS, or event sourcing patterns
Each approach has trade-offs in consistency, complexity, and storage cost

4. Explain the saga pattern and how it handles data consistency across services.

Expected answer points:

Instead of ACID transactions spanning services, saga coordinates local transactions through messaging
Series of local transactions linked by events/messages
If one step fails, compensating transactions undo previous steps
More complex than ACID but works across distributed databases

5. What database technology would you recommend for a service handling financial transactions and why?

Expected answer points:

Relational database like PostgreSQL or MySQL for strict consistency requirements
ACID transactions ensure financial data integrity
Complex joins and referential integrity support
PostgreSQL preferred for microservices due to JSON support, powerful indexing, and mature ecosystem

6. How does CQRS (Command Query Responsibility Segregation) complement the database per service pattern?

Expected answer points:

Separates read and write operations into different models
Write operations go to the service that owns the data
Read operations served from specialized read store optimized for queries
Pairs well with event-driven architectures
Introduces eventual consistency between write and read models

7. When is data duplication acceptable in a database per service architecture?

Expected answer points:

When historical accuracy matters more than single source of truth
Order service storing product name/price at time of purchase preserves accurate purchase records
When services need to tolerate stale reads
Trade-off analysis needed: storage cost vs. consistency benefits

8. What are the key considerations for reporting and analytics in a database per service architecture?

Expected answer points:

Business intelligence usually wants unified view of data across services
Options include data warehousing with ETL pipelines
Streaming ETL with Apache Kafka for near-real-time analytics
Mirror databases for simpler cases with read replicas
Each adds operational complexity and data latency

9. What are the most common pitfalls teams encounter when adopting database per service?

Expected answer points:

Starting with shared database "for now" - migration becomes insurmountable later
Over-normalizing to match monolith - leads to chatty APIs
Ignoring saga/compensation logic until production causes data corruption
Treating eventual consistency as a bug rather than a feature

10. How do you choose the right database technology for a specific microservice?

Expected answer points:

Match database to access patterns and requirements of each service
Relational (PostgreSQL) for transactional consistency and complex joins
Document (MongoDB) for flexible schemas and evolving data models
Key-value (Redis) for high-speed lookups and caching
Time-series (TimescaleDB) for metrics and append-heavy workloads
Search (Elasticsearch) for full-text search and complex queries

11. How does the bounded context concept in Domain-Driven Design relate to database per service?

Expected answer points:

Bounded contexts define clear boundaries where a consistent model applies
Each service owns its bounded context completely with its own data model
Related entities modeled differently per context (e.g., Customer in ordering vs. User context)
Services maintain their own copies of shared concepts, synchronized through events
Helps define service boundaries and data ownership naturally

12. What is the relationship between database per service and the strangler fig pattern for monolith migration?

Expected answer points:

Strangler fig pattern incrementally extracts services from a monolith
New functionality built as separate services with their own databases
Old monolith functionality gradually replaced by new services
Both services and database schema are extracted over time
Reduces migration risk compared to big-bang approach

13. Explain how event sourcing works with database per service and when you would use it.

Expected answer points:

Stores sequence of events that led to current state, not just current state
Services consume events from other services to build their own projections
Billing service subscribes to order events rather than querying order database
New reports created by new projections without changing source service
Benefits: audit trails, flexibility, temporal queries
Trade-offs: infrastructure complexity, eventual consistency

14. What are circuit breakers and how do they help in database per service architectures?

Expected answer points:

Circuit breakers prevent cascading failures when a dependent service is down
When cross-service call fails repeatedly, circuit "opens" to fast-fail
Prevents resource exhaustion from waiting on failing service
Allows failing service time to recover while serving fallback responses
Implement with timeout, failure threshold, and recovery configuration

15. How do you handle schema migrations for services with their own databases?

Expected answer points:

Each service manages its own schema migrations independently
Use online schema migration tools (e.g., pg Online, liquibase) to avoid locking
Backward-compatible changes preferred - old and new versions can run simultaneously
Big-bang migrations avoided in favor of expand-contract pattern
Test migrations against production-like data volumes before deployment
Rollback plan essential for each migration

16. What is the difference between synchronous and asynchronous communication in database per service?

Expected answer points:

Synchronous: service calls another service directly and waits for response (REST, gRPC)
Asynchronous: service publishes event to message broker, other services consume independently
Synchronous provides strong consistency but creates temporal coupling
Asynchronous provides loose coupling and resilience but introduces eventual consistency
Hybrid approach common: synchronous for user-facing operations, async for background processing

17. How does database per service impact data backup and disaster recovery strategies?

Expected answer points:

Each service has independent backup schedule and strategy
Backup verification and restore testing required per database type
Multi-region backup storage important for critical services
Recovery point objective (RPO) may differ per service - financial vs. logging
Cross-service dependencies complicate recovery order
Documented recovery procedures and runbooks per service database

18. When would you choose Apache Kafka over REST/gRPC for cross-service communication?

Expected answer points:

Kafka preferred when: multiple services need same data, audit trail required, ordering guarantees needed
Kafka provides event streaming with retention, not just point-to-point messaging
REST/gRPC preferred when: request-response needed, low latency critical, simple use cases
Kafka enables microservices to react to events rather than poll or couple directly
Trade-off: Kafka adds operational complexity but provides better decoupling

19. How does database per service affect monitoring and observability requirements?

Expected answer points:

Each database needs its own monitoring: connection pools, query latency, disk usage
Distributed tracing essential to track requests across service boundaries
Correlation IDs help track transactions through multiple services
Metrics per service: error rates, latency percentiles, throughput
Log aggregation across services with structured logging
Custom dashboards per service's specific database technology

20. What strategies exist for handling data integrity when cross-service transactions fail?

Expected answer points:

Saga pattern with compensating transactions to undo failed operations
Idempotent operations so retries do not cause duplicate data
Event sourcing provides audit trail to reconstruct state after failures
Outbox pattern ensures events are published only after local transaction commits
Dead letter queues for failed messages that require manual intervention
Consistency budgets - how much inconsistency is acceptable for each business process

Quick Recap

Key Points

Each microservice owns its data; no direct cross-service database access
Database per service enables team autonomy, independent scaling, and technology diversity
Cross-service queries require API composition, CQRS, or event sourcing
Data consistency across services requires saga pattern or eventual consistency
Choose database technology based on each service’s access patterns
Data duplication is often the right trade-off for read performance and independence
Reporting across services requires ETL pipelines or data warehousing

Database Selection Guide

Workload Type	Recommended Database	Examples
Transactional (financial)	PostgreSQL, MySQL	Orders, payments, inventory
Document/flexible schema	MongoDB, CouchDB	User profiles, product catalog
Key-value / caching	Redis, Memcached	Sessions, rates, leaderboards
Search	Elasticsearch, OpenSearch	Product search, full-text search
Time-series	TimescaleDB, InfluxDB	Metrics, IoT, logs
Graph	Neo4j, Amazon Neptune	Recommendations, social graphs

Production Checklist

# Database per Service Production Readiness

- [ ] Each service's database backed up independently
- [ ] Database-specific monitoring in place (connection pool, query latency, disk usage)
- [ ] Cross-service data access patterns documented
- [ ] Saga or compensation logic designed for cross-service workflows
- [ ] API composition latency SLAs defined
- [ ] Data warehouse/ETL pipeline for reporting
- [ ] Database migration strategy per service
- [ ] Connection string secrets managed securely
- [ ] Circuit breakers on cross-service calls
- [ ] Idempotency implemented for all cross-service operations

Conclusion

The database per service pattern is a foundational microservices concept that requires careful consideration of data ownership, team structure, and cross-service consistency mechanisms. While it enables team autonomy, independent scaling, and technology diversity, it also introduces complexity in querying, reporting, and maintaining data consistency across service boundaries.

Successful implementation requires upfront investment in defining bounded contexts, choosing appropriate database technologies per service, and designing cross-service workflows using patterns like saga or CQRS. Teams should avoid common pitfalls such as starting with a shared database or underestimating eventual consistency requirements.

For teams building microservices from scratch, database per service provides the isolation necessary for independent deployment and scaling. For those migrating from a monolith, the pattern requires significant effort but prevents the distributed monolith anti-pattern that couples services through shared data stores.

Implementation Examples

Service-Side Data Access Pattern

class OrderService:
    """Order service accessing only its own database."""

    def __init__(self, order_db, http_client):
        self.db = order_db
        self.http = http_client

    def create_order(self, order_request: CreateOrderRequest) -> Order:
        # Validate customer exists via API call (not direct DB access)
        customer = self._get_customer(order_request.customer_id)
        if not customer:
            raise CustomerNotFoundError(order_request.customer_id)

        # Validate products via API call
        products = self._get_products(order_request.product_ids)
        if len(products) != len(order_request.product_ids):
            raise ProductNotFoundError("Some products not found")

        # Create order in local database
        order = Order(
            id=str(uuid.uuid4()),
            customer_id=customer.id,
            items=self._build_order_items(products, order_request),
            total=sum(p.price * qty for p, qty in zip(products, order_request.quantities)),
            status="pending",
            created_at=datetime.utcnow()
        )

        self.db.orders.insert(order.to_dict())
        return order

    def _get_customer(self, customer_id: str) -> Customer:
        response = self.http.get(f"http://user-service/customers/{customer_id}")
        return Customer.from_dict(response.json()) if response.ok else None

    def _get_products(self, product_ids: list[str]) -> list[Product]:
        response = self.http.post(
            "http://product-service/products/batch",
            json={"ids": product_ids}
        )
        return [Product.from_dict(p) for p in response.json()] if response.ok else []

Event-Driven Data Synchronization

class InventoryEventConsumer:
    """Consumes inventory events to maintain local read model."""

    def __init__(self, kafka_consumer, read_model_db):
        self.consumer = kafka_consumer
        self.db = read_model_db

    def start(self):
        for message in self.consumer:
            event = self._deserialize(message.value)
            self._handle_event(event)

    def _handle_event(self, event):
        if isinstance(event, InventoryReserved):
            self._update_reserved_quantity(event.product_id, event.quantity)
        elif isinstance(event, InventoryReleased):
            self._update_reserved_quantity(event.product_id, -event.quantity)
        elif isinstance(event, ProductPriceChanged):
            self._update_product_price(event.product_id, event.new_price)

    def _update_reserved_quantity(self, product_id, delta):
        self.db.products.update_one(
            {'product_id': product_id},
            {'$inc': {'reserved_quantity': delta}}
        )

Database Connection Configuration Per Service

# Example: PostgreSQL for order service
ORDER_DB_CONFIG = {
    "host": "orders-db.internal",
    "port": 5432,
    "database": "orders",
    "pool_size": 20,
    "max_overflow": 10,
    "pool_timeout": 30,
    "pool_recycle": 3600,
}

# Example: MongoDB for product catalog
PRODUCT_DB_CONFIG = {
    "host": "products-db.internal",
    "port": 27017,
    "database": "products",
    "max_pool_size": 50,
    "min_pool_size": 10,
    "server_selection_timeout_ms": 5000,
}

# Example: Redis for session/caching service
SESSION_DB_CONFIG = {
    "host": "sessions-db.internal",
    "port": 6379,
    "db": 0,
    "socket_timeout": 5,
    "max_connections": 100,
}

Database per Service: Data Isolation and Ownership in Microservices

Introduction

What This Looks Like in Practice

Why Teams Embrace This Pattern

Team Autonomy

Independent Scaling

Technology Diversity

The Challenges Nobody Talks About Upfront

Cross-Service Queries

Data Consistency Across Services

Reporting and Analytics

Approaches for Cross-Service Queries

API Composition

CQRS: Command Query Responsibility Segregation

Event Sourcing and Projections

Data Ownership and Bounded Contexts

Defining Service Boundaries Around Data

When to Duplicate Data

Choosing Database Technologies Per Service

Relational Databases for Transactional Services

Document Databases for Flexible Schemas

Key-Value Stores for High-Speed Access

Time-Series Databases for Metrics and Events

Search Engines for Full-Text Search

Reporting Across Services

Data Warehousing

Streaming ETL with Apache Kafka

Mirror Databases for Simple Cases

When to Use / When Not to Use Database per Service

When to Use Database per Service

When to Use a Shared Database

Production Failure Scenarios

Data Consistency Flow

API Composition Flow

Trade-off Analysis

Decision Framework Matrix

Complexity vs. Benefit Trade-offs

Data Consistency Trade-offs

Technology Selection Trade-offs

Migration Complexity Assessment

Common Pitfalls / Anti-Patterns

Interview Questions

Quick Recap

Key Points

Database Selection Guide

Production Checklist

Further Reading

Conclusion

Implementation Examples

Service-Side Data Access Pattern

Event-Driven Data Synchronization

Database Connection Configuration Per Service

Category

Tags

Related Posts

Amazon Architecture: Lessons from the Pioneer of Microservices

Asynchronous Communication in Microservices: Events and Patterns

Client-Side Discovery: Direct Service Routing in Microservices