Database per Service: Data Isolation and Ownership in Microservices

Learn how to implement the database per service pattern, manage data ownership, handle cross-service queries, and maintain data consistency.

published: reading time: 17 min read

Database per Service: Data Isolation and Ownership in Microservices

The database per service pattern is one of the fundamental tenets of microservices architecture. Rather than having a single, monolithic database shared across all services, each microservice owns and manages its own data store. This approach sounds simple on paper, but the implications ripple through every layer of your system design.

If you are building or migrating to microservices, understanding this pattern is non-negotiable. It shapes how teams work, how services scale, and how your data ultimately behaves under load.

The Core Principle: Each Service Owns Its Data

The database per service pattern means that a microservice is the sole authority over its data. No other service can directly query or modify that data. Instead, services communicate through well-defined APIs.

Think of it like a neighborhood where each house has its own yard and front door. Your neighbor cannot just walk into your kitchen, even though you might chat over the fence. If you want something from your neighbor’s kitchen, you knock on the door and ask.

This boundary is intentional. It prevents the kind of tight coupling that collapses a distributed system into a distributed monolith.

What This Looks Like in Practice

A user service manages user accounts and authentication data in its own database. An order service handles orders and line items separately. A product catalog service maintains its own product information. None of these services reaches into another service’s database.

When the order service needs to know who placed an order, it does not query the user database directly. Instead, it either receives user information as part of the API response when the order is created, or it calls a user service API to look up the information it needs.

graph TB
    UserService --> UserDB[(User Database)]
    OrderService --> OrderDB[(Order Database)]
    ProductService --> ProductDB[(Product Database)]
    InventoryService --> InventoryDB[(Inventory Database)]

    OrderService -->|REST/gRPC| UserService
    OrderService -->|REST/gRPC| ProductService
    OrderService -->|REST/gRPC| InventoryService

    class UserService,OrderService,ProductService,InventoryService service
    class UserDB,OrderDB,ProductDB,InventoryDB database

Why Teams Embrace This Pattern

The database per service pattern delivers several concrete advantages that matter in real-world development.

Team Autonomy

When each team owns a service and its data, work can happen independently without coordination bottlenecks. The ordering team can change their database schema, optimize queries, or even switch database technologies without getting permission from the user team or the product team.

This autonomy extends to deployment. Teams can release updates on their own schedule, roll back if something breaks, and experiment with different approaches without risking the entire system.

Independent Scaling

Different services have different resource needs. Your product catalog might need heavy read throughput during a sale event, while your order processing needs more write capacity during checkout. With separate databases, you can scale each service’s data layer independently.

You might run three replicas of your product database during a flash sale while keeping your order database on beefy write-optimized hardware. No single database becomes a bottleneck that throttles the entire system.

Technology Diversity

Not every problem needs the same tool. A product catalog with complex, hierarchical search requirements might benefit from a document database like MongoDB or Elasticsearch. User accounts with strict transactional requirements might work better in PostgreSQL. Session data that needs extreme read speeds might live in Redis.

Database per service lets you pick the right tool for each job rather than compromising everything on a single technology that has to serve all your needs.

The Challenges Nobody Talks About Upfront

The benefits are real, but so are the difficulties. Before you commit to this pattern, you need to understand what you are signing up for.

Cross-Service Queries

The hardest problem in microservices is querying data that spans multiple services. In a monolith, you could write a single SQL query joining users, orders, and products. With separate databases, that query has to be assembled from multiple API responses.

This is not impossible, but it requires different approaches. You might use an API composition pattern where a gateway aggregates responses, or you might duplicate some data across services to make local queries fast. Each approach has trade-offs in consistency, complexity, and storage cost.

Data Consistency Across Services

When you cannot use database transactions spanning multiple services, maintaining consistency becomes an architectural challenge. If placing an order should decrement inventory and create an order record atomically, you now need a mechanism to keep these actions in sync despite them touching different databases.

Patterns like the saga pattern exist specifically to handle this. Instead of ACID transactions across services, you coordinate a series of local transactions through messaging. It is more complex, but it works.

I have seen teams underestimate this problem severely. They assume eventual consistency will be fine, then spend months retrofitting saga coordination after users start seeing phantom inventory or double-booked orders.

Reporting and Analytics

Business intelligence usually wants a unified view of data. The finance team wants to see revenue across all products. Marketing wants to correlate user behavior with purchase patterns. With data scattered across dozens of service-specific databases, building these reports requires additional infrastructure.

You end up needing data warehousing solutions, ETL pipelines, or event streaming to assemble a coherent analytical picture. This is solvable, but it adds operational complexity and latency between when something happens and when it appears in your reports.

Approaches for Cross-Service Queries

When you need data that lives in multiple services, several patterns can help you retrieve it.

API Composition

The simplest approach is to have a service or API gateway call multiple backend services and combine the results. A reporting service might call the user service for demographics, the order service for purchase history, and the product service for catalog data, then merge everything in memory.

This works well when the data volumes are reasonable and latency is acceptable. The downside is that you are doing join-like work at the application layer, and the coordinating service becomes a potential bottleneck.

CQRS: Command Query Responsibility Segregation

CQRS separates read and write operations into different models. Write operations go to the service that owns the data. Read operations can be served from a specialized read store that is optimized for queries, even if it means the data is slightly stale.

You might have an order service that writes to a normalized PostgreSQL schema, but also publishes events to a message broker. A separate read service consumes those events and maintains a denormalized read model in Elasticsearch, optimized for complex queries across users, orders, and products.

This pattern pairs well with event-driven architectures and gives you excellent query flexibility, but it introduces eventual consistency between your write and read models.

Event Sourcing and Projections

With event sourcing, you store not the current state but the sequence of events that led to that state. Other services can consume those events and build their own projections of the data they need.

A billing service does not need to query the order database directly. It subscribes to order events, builds its own billing record projections, and serves queries from its local store. When you need a new report, you do not change the order service schema, you create a new projection from the event stream.

This approach provides excellent audit trails and flexibility, but it requires more infrastructure and expertise to implement correctly.

Data Ownership and Bounded Contexts

The database per service pattern forces you to think carefully about where data belongs. This is not just a technical decision, it is a domain modeling decision.

Defining Service Boundaries Around Data

You need to identify bounded contexts in your domain: areas where a clear and consistent model applies. A user context and an ordering context might share the concept of a “customer,” but they model it differently for their own needs.

The ordering context cares about customer IDs, shipping addresses, and contact preferences for fulfilling orders. The user context cares about login credentials, profile information, and communication preferences. These are related but distinct models of the same real-world entity.

Each service owns its bounded context completely. If two services need similar data, they each maintain their own copy, synchronized through events or API calls.

When to Duplicate Data

Data duplication is not always a problem to avoid. Sometimes it is the right trade-off.

If your order service stores a copy of the product name and price at the time of purchase, you have an accurate record of what the customer actually ordered, even if the product catalog changes later. This immutability of historical records is often more valuable than a single source of truth.

The cost is keeping the copies synchronized when things change. You have to decide which data changes slowly versus frequently, which historical accuracy matters, and which services can tolerate stale reads.

Choosing Database Technologies Per Service

There is no universal best database. The right choice depends on the access patterns and requirements of each service.

Relational Databases for Transactional Services

When you need strict consistency, complex joins, and ACID transactions, relational databases like PostgreSQL or MySQL remain the standard. Services that handle financial transactions, inventory with strict counts, or any data where a wrong number has real consequences benefit from these guarantees.

PostgreSQL in particular has become a popular choice for microservices because it offers JSON support for semi-structured data, powerful indexing, and a mature ecosystem.

Document Databases for Flexible Schemas

Product catalogs, content management systems, and user profiles often have varying structures that evolve over time. A document database like MongoDB lets you store these without rigid schemas while still providing secondary indexes and query capabilities.

You can iterate on your data model without migrations, which is valuable in the early stages of a service when you are still figuring out what you actually need.

Key-Value Stores for High-Speed Access

Session data, caching layers, and rate limiting often involve simple key-value lookups where speed matters more than query flexibility. Redis and Memcached excel here with sub-millisecond response times and built-in data structures that go beyond simple get-and-set.

Time-Series Databases for Metrics and Events

If your service handles monitoring data, IoT sensor readings, or any data where you primarily append and query by time ranges, a time-series database like TimescaleDB or InfluxDB offers compression and query optimizations that general-purpose databases cannot match.

Product search, log analysis, and any feature requiring complex text matching benefits from dedicated search engines. Elasticsearch and OpenSearch provide inverted indexes, relevance tuning, and aggregation pipelines that would be painful to implement on top of a traditional database.

Reporting Across Services

When business needs require reports that cross service boundaries, you need a strategy for assembling data from multiple sources.

Data Warehousing

The traditional approach is to funnel data from all your service databases into a centralized data warehouse. ETL pipelines read from each service’s database, transform the data into a unified schema, and load it into your warehouse on a schedule.

This gives you a consistent analytical view and powerful query capabilities, but the data is always somewhat stale, and building and maintaining the pipelines is significant work.

Streaming ETL with Apache Kafka

A more modern approach uses event streaming. Each service publishes events to a shared Apache Kafka cluster. Downstream consumers transform and load the events into analytical stores.

This gives you near-real-time data in your analytical systems and provides an excellent audit log of everything that happened. The operational complexity is higher, but the fresher data and better decoupling are often worth it for organizations with the engineering capacity to manage it.

Mirror Databases for Simple Cases

For smaller systems, you might simply replicate each service database to a read replica that analytics tools can query directly. This avoids building ETL pipelines but still gives analysts access to current data.

The trade-off is that your analytical queries run against production databases, which can impact your service’s performance if you are not careful about query patterns.

Common Pitfalls to Avoid

Looking at how teams struggle with this pattern, a few mistakes come up repeatedly.

Starting with a shared database “for now” and promising to split it later almost never happens. The tight coupling builds up immediately, and by the time you have real data and real traffic, the migration cost feels insurmountable. Make the split early, even if it means more work upfront.

Over-normalizing to match the monolith’s schema leads to chatty APIs. If you model every table as its own microservice, you will spend all your time coordinating network calls. Look for natural aggregates that can live together in a single service.

Ignoring the saga or compensation logic until production is a recipe for corrupted data. Design your cross-service workflows before you deploy, not after users start encountering inconsistencies.

Treating eventual consistency as a bug rather than a feature creates endless rework. If your business logic assumes immediate consistency, you will fight the architecture at every turn. Either change the business process to tolerate stale data, or use synchronous APIs where you need strong consistency.

When to Use / When Not to Use Database per Service

CriteriaDatabase per ServiceShared DatabaseNotes
Team AutonomyHigh (teams deploy independently)Low (schema changes require coordination)Shared DB creates coupling bottleneck
ScalingIndependent per serviceShared (single DB becomes bottleneck)Hot services can be scaled separately
Technology ChoiceFlexible (polyglot per service)Limited (all services use same DB)Choose right tool for each workload
Data ConsistencyEventual (cross-service)Strong (ACID across all data)Saga required for cross-service transactions
Cross-Service QueriesComplex (API composition)Simple (SQL joins)Additional infrastructure needed
Operational OverheadHigh (multiple databases)Low (single database)Each DB needs backup, monitoring, tuning
Reporting/AnalyticsComplex (ETL required)Simple (direct queries)Additional data pipeline needed
Time to MarketSlower (initial setup)Faster (start immediately)Trade-off shifts at scale

When to Use Database per Service

Use database per service when:

  • Different services have different access patterns and storage needs
  • Teams need to deploy independently without coordination
  • You need to scale specific services under heavy load
  • Services benefit from different database technologies
  • You are building a microservices architecture from scratch
  • Regulatory requirements demand data isolation between domains

When to Use a Shared Database

Avoid database per service when:

  • You are starting with a small team or prototype
  • All services have similar, simple CRUD requirements
  • Cross-service transactions with strong consistency are frequent
  • You lack operational maturity for multiple database platforms
  • Reporting and analytics are more important than scaling
  • Team coordination costs are lower than distributed system complexity

Production Failure Scenarios

FailureImpactMitigation
Database for one service goes downOnly that service fails; other services continueIsolate service failures; implement circuit breakers; health checks
Cross-service query timeoutAPI gateway or BFF times outSet appropriate timeouts; implement fallback; degrade gracefully
Data inconsistency between servicesServices show different states for same entityImplement idempotent operations; saga pattern for consistency; monitoring
Schema migration in shared DB blocksAll services using DB are blockedUse online schema migration tools; avoid locking operations
Connection pool exhaustionService cannot connect to its DBConfigure appropriate pool sizes; monitor connections; implement connection timeouts
Backup failure for isolated DBData loss risk for that serviceRegular backup verification; multi-region backup storage; test restores

Data Consistency Flow

graph TD
    Start[Order Request] --> CheckInventory{Check Inventory}
    CheckInventory -->|Sync Call| InvDB[Inventory DB]
    InvDB -->|Reserve| InvReserve[Reserved: 5 units]
    CheckInventory -->|Reserve OK| CheckPayment{Check Payment}
    CheckPayment -->|Sync Call| PayDB[Payment DB]
    PayDB -->|Authorize| PayAuth[Authorized: $100]
    CheckPayment -->|Auth OK| CreateOrder[Create Order]
    CreateOrder --> OrderDB[(Order DB)]
    CreateOrder -->|Publish Event| InvEvent[Inventory Event]
    InvEvent --> UpdateInv[Update Inventory Read Model]
    CreateOrder -->|Publish Event| PayEvent[Payment Event]
    PayEvent --> UpdatePay[Update Payment Read Model]

API Composition Flow

graph LR
    Client[Client] --> Gateway[API Gateway]
    Gateway --> UserSvc[User Service]
    Gateway --> OrderSvc[Order Service]
    Gateway --> ProductSvc[Product Service]

    UserSvc --> UserDB[(User DB)]
    OrderSvc --> OrderDB[(Order DB)]
    ProductSvc --> ProductDB[(Product DB)]

    subgraph Composition
        OrderSvc -->|Get User| UserSvc
        OrderSvc -->|Get Product| ProductSvc
    end

    OrderDB --> Aggregate[Aggregate Results]
    UserDB --> Aggregate
    ProductDB --> Aggregate
    Aggregate --> Client

Implementation Examples

Service-Side Data Access Pattern

class OrderService:
    """Order service accessing only its own database."""

    def __init__(self, order_db, http_client):
        self.db = order_db
        self.http = http_client

    def create_order(self, order_request: CreateOrderRequest) -> Order:
        # Validate customer exists via API call (not direct DB access)
        customer = self._get_customer(order_request.customer_id)
        if not customer:
            raise CustomerNotFoundError(order_request.customer_id)

        # Validate products via API call
        products = self._get_products(order_request.product_ids)
        if len(products) != len(order_request.product_ids):
            raise ProductNotFoundError("Some products not found")

        # Create order in local database
        order = Order(
            id=str(uuid.uuid4()),
            customer_id=customer.id,
            items=self._build_order_items(products, order_request),
            total=sum(p.price * qty for p, qty in zip(products, order_request.quantities)),
            status="pending",
            created_at=datetime.utcnow()
        )

        self.db.orders.insert(order.to_dict())
        return order

    def _get_customer(self, customer_id: str) -> Customer:
        response = self.http.get(f"http://user-service/customers/{customer_id}")
        return Customer.from_dict(response.json()) if response.ok else None

    def _get_products(self, product_ids: list[str]) -> list[Product]:
        response = self.http.post(
            "http://product-service/products/batch",
            json={"ids": product_ids}
        )
        return [Product.from_dict(p) for p in response.json()] if response.ok else []

Event-Driven Data Synchronization

class InventoryEventConsumer:
    """Consumes inventory events to maintain local read model."""

    def __init__(self, kafka_consumer, read_model_db):
        self.consumer = kafka_consumer
        self.db = read_model_db

    def start(self):
        for message in self.consumer:
            event = self._deserialize(message.value)
            self._handle_event(event)

    def _handle_event(self, event):
        if isinstance(event, InventoryReserved):
            self._update_reserved_quantity(event.product_id, event.quantity)
        elif isinstance(event, InventoryReleased):
            self._update_reserved_quantity(event.product_id, -event.quantity)
        elif isinstance(event, ProductPriceChanged):
            self._update_product_price(event.product_id, event.new_price)

    def _update_reserved_quantity(self, product_id, delta):
        self.db.products.update_one(
            {'product_id': product_id},
            {'$inc': {'reserved_quantity': delta}}
        )

Database Connection Configuration Per Service

# Example: PostgreSQL for order service
ORDER_DB_CONFIG = {
    "host": "orders-db.internal",
    "port": 5432,
    "database": "orders",
    "pool_size": 20,
    "max_overflow": 10,
    "pool_timeout": 30,
    "pool_recycle": 3600,
}

# Example: MongoDB for product catalog
PRODUCT_DB_CONFIG = {
    "host": "products-db.internal",
    "port": 27017,
    "database": "products",
    "max_pool_size": 50,
    "min_pool_size": 10,
    "server_selection_timeout_ms": 5000,
}

# Example: Redis for session/caching service
SESSION_DB_CONFIG = {
    "host": "sessions-db.internal",
    "port": 6379,
    "db": 0,
    "socket_timeout": 5,
    "max_connections": 100,
}

Quick Recap

Key Points

  • Each microservice owns its data; no direct cross-service database access
  • Database per service enables team autonomy, independent scaling, and technology diversity
  • Cross-service queries require API composition, CQRS, or event sourcing
  • Data consistency across services requires saga pattern or eventual consistency
  • Choose database technology based on each service’s access patterns
  • Data duplication is often the right trade-off for read performance and independence
  • Reporting across services requires ETL pipelines or data warehousing

Database Selection Guide

Workload TypeRecommended DatabaseExamples
Transactional (financial)PostgreSQL, MySQLOrders, payments, inventory
Document/flexible schemaMongoDB, CouchDBUser profiles, product catalog
Key-value / cachingRedis, MemcachedSessions, rates, leaderboards
SearchElasticsearch, OpenSearchProduct search, full-text search
Time-seriesTimescaleDB, InfluxDBMetrics, IoT, logs
GraphNeo4j, Amazon NeptuneRecommendations, social graphs

Production Checklist

# Database per Service Production Readiness

- [ ] Each service's database backed up independently
- [ ] Database-specific monitoring in place (connection pool, query latency, disk usage)
- [ ] Cross-service data access patterns documented
- [ ] Saga or compensation logic designed for cross-service workflows
- [ ] API composition latency SLAs defined
- [ ] Data warehouse/ETL pipeline for reporting
- [ ] Database migration strategy per service
- [ ] Connection string secrets managed securely
- [ ] Circuit breakers on cross-service calls
- [ ] Idempotency implemented for all cross-service operations

Category

Related Posts

Amazon's Architecture: Lessons from the Pioneer of Microservices

Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.

#microservices #amazon #architecture

Asynchronous Communication in Microservices: Events and Patterns

Deep dive into asynchronous communication patterns for microservices including event-driven architecture, message queues, and choreography vs orchestration.

#microservices #asynchronous #event-driven

Client-Side Discovery: Direct Service Routing in Microservices

Explore client-side service discovery patterns, how clients directly query the service registry, and when this approach works best.

#microservices #client-side-discovery #service-discovery