Database per Service: Data Isolation and Ownership in Microservices
Learn how to implement the database per service pattern, manage data ownership, handle cross-service queries, and maintain data consistency.
Database per Service: Data Isolation and Ownership in Microservices
The database per service pattern is one of the fundamental tenets of microservices architecture. Rather than having a single, monolithic database shared across all services, each microservice owns and manages its own data store. This approach sounds simple on paper, but the implications ripple through every layer of your system design.
If you are building or migrating to microservices, understanding this pattern is non-negotiable. It shapes how teams work, how services scale, and how your data ultimately behaves under load.
The Core Principle: Each Service Owns Its Data
The database per service pattern means that a microservice is the sole authority over its data. No other service can directly query or modify that data. Instead, services communicate through well-defined APIs.
Think of it like a neighborhood where each house has its own yard and front door. Your neighbor cannot just walk into your kitchen, even though you might chat over the fence. If you want something from your neighbor’s kitchen, you knock on the door and ask.
This boundary is intentional. It prevents the kind of tight coupling that collapses a distributed system into a distributed monolith.
What This Looks Like in Practice
A user service manages user accounts and authentication data in its own database. An order service handles orders and line items separately. A product catalog service maintains its own product information. None of these services reaches into another service’s database.
When the order service needs to know who placed an order, it does not query the user database directly. Instead, it either receives user information as part of the API response when the order is created, or it calls a user service API to look up the information it needs.
graph TB
UserService --> UserDB[(User Database)]
OrderService --> OrderDB[(Order Database)]
ProductService --> ProductDB[(Product Database)]
InventoryService --> InventoryDB[(Inventory Database)]
OrderService -->|REST/gRPC| UserService
OrderService -->|REST/gRPC| ProductService
OrderService -->|REST/gRPC| InventoryService
class UserService,OrderService,ProductService,InventoryService service
class UserDB,OrderDB,ProductDB,InventoryDB database
Why Teams Embrace This Pattern
The database per service pattern delivers several concrete advantages that matter in real-world development.
Team Autonomy
When each team owns a service and its data, work can happen independently without coordination bottlenecks. The ordering team can change their database schema, optimize queries, or even switch database technologies without getting permission from the user team or the product team.
This autonomy extends to deployment. Teams can release updates on their own schedule, roll back if something breaks, and experiment with different approaches without risking the entire system.
Independent Scaling
Different services have different resource needs. Your product catalog might need heavy read throughput during a sale event, while your order processing needs more write capacity during checkout. With separate databases, you can scale each service’s data layer independently.
You might run three replicas of your product database during a flash sale while keeping your order database on beefy write-optimized hardware. No single database becomes a bottleneck that throttles the entire system.
Technology Diversity
Not every problem needs the same tool. A product catalog with complex, hierarchical search requirements might benefit from a document database like MongoDB or Elasticsearch. User accounts with strict transactional requirements might work better in PostgreSQL. Session data that needs extreme read speeds might live in Redis.
Database per service lets you pick the right tool for each job rather than compromising everything on a single technology that has to serve all your needs.
The Challenges Nobody Talks About Upfront
The benefits are real, but so are the difficulties. Before you commit to this pattern, you need to understand what you are signing up for.
Cross-Service Queries
The hardest problem in microservices is querying data that spans multiple services. In a monolith, you could write a single SQL query joining users, orders, and products. With separate databases, that query has to be assembled from multiple API responses.
This is not impossible, but it requires different approaches. You might use an API composition pattern where a gateway aggregates responses, or you might duplicate some data across services to make local queries fast. Each approach has trade-offs in consistency, complexity, and storage cost.
Data Consistency Across Services
When you cannot use database transactions spanning multiple services, maintaining consistency becomes an architectural challenge. If placing an order should decrement inventory and create an order record atomically, you now need a mechanism to keep these actions in sync despite them touching different databases.
Patterns like the saga pattern exist specifically to handle this. Instead of ACID transactions across services, you coordinate a series of local transactions through messaging. It is more complex, but it works.
I have seen teams underestimate this problem severely. They assume eventual consistency will be fine, then spend months retrofitting saga coordination after users start seeing phantom inventory or double-booked orders.
Reporting and Analytics
Business intelligence usually wants a unified view of data. The finance team wants to see revenue across all products. Marketing wants to correlate user behavior with purchase patterns. With data scattered across dozens of service-specific databases, building these reports requires additional infrastructure.
You end up needing data warehousing solutions, ETL pipelines, or event streaming to assemble a coherent analytical picture. This is solvable, but it adds operational complexity and latency between when something happens and when it appears in your reports.
Approaches for Cross-Service Queries
When you need data that lives in multiple services, several patterns can help you retrieve it.
API Composition
The simplest approach is to have a service or API gateway call multiple backend services and combine the results. A reporting service might call the user service for demographics, the order service for purchase history, and the product service for catalog data, then merge everything in memory.
This works well when the data volumes are reasonable and latency is acceptable. The downside is that you are doing join-like work at the application layer, and the coordinating service becomes a potential bottleneck.
CQRS: Command Query Responsibility Segregation
CQRS separates read and write operations into different models. Write operations go to the service that owns the data. Read operations can be served from a specialized read store that is optimized for queries, even if it means the data is slightly stale.
You might have an order service that writes to a normalized PostgreSQL schema, but also publishes events to a message broker. A separate read service consumes those events and maintains a denormalized read model in Elasticsearch, optimized for complex queries across users, orders, and products.
This pattern pairs well with event-driven architectures and gives you excellent query flexibility, but it introduces eventual consistency between your write and read models.
Event Sourcing and Projections
With event sourcing, you store not the current state but the sequence of events that led to that state. Other services can consume those events and build their own projections of the data they need.
A billing service does not need to query the order database directly. It subscribes to order events, builds its own billing record projections, and serves queries from its local store. When you need a new report, you do not change the order service schema, you create a new projection from the event stream.
This approach provides excellent audit trails and flexibility, but it requires more infrastructure and expertise to implement correctly.
Data Ownership and Bounded Contexts
The database per service pattern forces you to think carefully about where data belongs. This is not just a technical decision, it is a domain modeling decision.
Defining Service Boundaries Around Data
You need to identify bounded contexts in your domain: areas where a clear and consistent model applies. A user context and an ordering context might share the concept of a “customer,” but they model it differently for their own needs.
The ordering context cares about customer IDs, shipping addresses, and contact preferences for fulfilling orders. The user context cares about login credentials, profile information, and communication preferences. These are related but distinct models of the same real-world entity.
Each service owns its bounded context completely. If two services need similar data, they each maintain their own copy, synchronized through events or API calls.
When to Duplicate Data
Data duplication is not always a problem to avoid. Sometimes it is the right trade-off.
If your order service stores a copy of the product name and price at the time of purchase, you have an accurate record of what the customer actually ordered, even if the product catalog changes later. This immutability of historical records is often more valuable than a single source of truth.
The cost is keeping the copies synchronized when things change. You have to decide which data changes slowly versus frequently, which historical accuracy matters, and which services can tolerate stale reads.
Choosing Database Technologies Per Service
There is no universal best database. The right choice depends on the access patterns and requirements of each service.
Relational Databases for Transactional Services
When you need strict consistency, complex joins, and ACID transactions, relational databases like PostgreSQL or MySQL remain the standard. Services that handle financial transactions, inventory with strict counts, or any data where a wrong number has real consequences benefit from these guarantees.
PostgreSQL in particular has become a popular choice for microservices because it offers JSON support for semi-structured data, powerful indexing, and a mature ecosystem.
Document Databases for Flexible Schemas
Product catalogs, content management systems, and user profiles often have varying structures that evolve over time. A document database like MongoDB lets you store these without rigid schemas while still providing secondary indexes and query capabilities.
You can iterate on your data model without migrations, which is valuable in the early stages of a service when you are still figuring out what you actually need.
Key-Value Stores for High-Speed Access
Session data, caching layers, and rate limiting often involve simple key-value lookups where speed matters more than query flexibility. Redis and Memcached excel here with sub-millisecond response times and built-in data structures that go beyond simple get-and-set.
Time-Series Databases for Metrics and Events
If your service handles monitoring data, IoT sensor readings, or any data where you primarily append and query by time ranges, a time-series database like TimescaleDB or InfluxDB offers compression and query optimizations that general-purpose databases cannot match.
Search Engines for Full-Text Search
Product search, log analysis, and any feature requiring complex text matching benefits from dedicated search engines. Elasticsearch and OpenSearch provide inverted indexes, relevance tuning, and aggregation pipelines that would be painful to implement on top of a traditional database.
Reporting Across Services
When business needs require reports that cross service boundaries, you need a strategy for assembling data from multiple sources.
Data Warehousing
The traditional approach is to funnel data from all your service databases into a centralized data warehouse. ETL pipelines read from each service’s database, transform the data into a unified schema, and load it into your warehouse on a schedule.
This gives you a consistent analytical view and powerful query capabilities, but the data is always somewhat stale, and building and maintaining the pipelines is significant work.
Streaming ETL with Apache Kafka
A more modern approach uses event streaming. Each service publishes events to a shared Apache Kafka cluster. Downstream consumers transform and load the events into analytical stores.
This gives you near-real-time data in your analytical systems and provides an excellent audit log of everything that happened. The operational complexity is higher, but the fresher data and better decoupling are often worth it for organizations with the engineering capacity to manage it.
Mirror Databases for Simple Cases
For smaller systems, you might simply replicate each service database to a read replica that analytics tools can query directly. This avoids building ETL pipelines but still gives analysts access to current data.
The trade-off is that your analytical queries run against production databases, which can impact your service’s performance if you are not careful about query patterns.
Common Pitfalls to Avoid
Looking at how teams struggle with this pattern, a few mistakes come up repeatedly.
Starting with a shared database “for now” and promising to split it later almost never happens. The tight coupling builds up immediately, and by the time you have real data and real traffic, the migration cost feels insurmountable. Make the split early, even if it means more work upfront.
Over-normalizing to match the monolith’s schema leads to chatty APIs. If you model every table as its own microservice, you will spend all your time coordinating network calls. Look for natural aggregates that can live together in a single service.
Ignoring the saga or compensation logic until production is a recipe for corrupted data. Design your cross-service workflows before you deploy, not after users start encountering inconsistencies.
Treating eventual consistency as a bug rather than a feature creates endless rework. If your business logic assumes immediate consistency, you will fight the architecture at every turn. Either change the business process to tolerate stale data, or use synchronous APIs where you need strong consistency.
When to Use / When Not to Use Database per Service
| Criteria | Database per Service | Shared Database | Notes |
|---|---|---|---|
| Team Autonomy | High (teams deploy independently) | Low (schema changes require coordination) | Shared DB creates coupling bottleneck |
| Scaling | Independent per service | Shared (single DB becomes bottleneck) | Hot services can be scaled separately |
| Technology Choice | Flexible (polyglot per service) | Limited (all services use same DB) | Choose right tool for each workload |
| Data Consistency | Eventual (cross-service) | Strong (ACID across all data) | Saga required for cross-service transactions |
| Cross-Service Queries | Complex (API composition) | Simple (SQL joins) | Additional infrastructure needed |
| Operational Overhead | High (multiple databases) | Low (single database) | Each DB needs backup, monitoring, tuning |
| Reporting/Analytics | Complex (ETL required) | Simple (direct queries) | Additional data pipeline needed |
| Time to Market | Slower (initial setup) | Faster (start immediately) | Trade-off shifts at scale |
When to Use Database per Service
Use database per service when:
- Different services have different access patterns and storage needs
- Teams need to deploy independently without coordination
- You need to scale specific services under heavy load
- Services benefit from different database technologies
- You are building a microservices architecture from scratch
- Regulatory requirements demand data isolation between domains
When to Use a Shared Database
Avoid database per service when:
- You are starting with a small team or prototype
- All services have similar, simple CRUD requirements
- Cross-service transactions with strong consistency are frequent
- You lack operational maturity for multiple database platforms
- Reporting and analytics are more important than scaling
- Team coordination costs are lower than distributed system complexity
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Database for one service goes down | Only that service fails; other services continue | Isolate service failures; implement circuit breakers; health checks |
| Cross-service query timeout | API gateway or BFF times out | Set appropriate timeouts; implement fallback; degrade gracefully |
| Data inconsistency between services | Services show different states for same entity | Implement idempotent operations; saga pattern for consistency; monitoring |
| Schema migration in shared DB blocks | All services using DB are blocked | Use online schema migration tools; avoid locking operations |
| Connection pool exhaustion | Service cannot connect to its DB | Configure appropriate pool sizes; monitor connections; implement connection timeouts |
| Backup failure for isolated DB | Data loss risk for that service | Regular backup verification; multi-region backup storage; test restores |
Data Consistency Flow
graph TD
Start[Order Request] --> CheckInventory{Check Inventory}
CheckInventory -->|Sync Call| InvDB[Inventory DB]
InvDB -->|Reserve| InvReserve[Reserved: 5 units]
CheckInventory -->|Reserve OK| CheckPayment{Check Payment}
CheckPayment -->|Sync Call| PayDB[Payment DB]
PayDB -->|Authorize| PayAuth[Authorized: $100]
CheckPayment -->|Auth OK| CreateOrder[Create Order]
CreateOrder --> OrderDB[(Order DB)]
CreateOrder -->|Publish Event| InvEvent[Inventory Event]
InvEvent --> UpdateInv[Update Inventory Read Model]
CreateOrder -->|Publish Event| PayEvent[Payment Event]
PayEvent --> UpdatePay[Update Payment Read Model]
API Composition Flow
graph LR
Client[Client] --> Gateway[API Gateway]
Gateway --> UserSvc[User Service]
Gateway --> OrderSvc[Order Service]
Gateway --> ProductSvc[Product Service]
UserSvc --> UserDB[(User DB)]
OrderSvc --> OrderDB[(Order DB)]
ProductSvc --> ProductDB[(Product DB)]
subgraph Composition
OrderSvc -->|Get User| UserSvc
OrderSvc -->|Get Product| ProductSvc
end
OrderDB --> Aggregate[Aggregate Results]
UserDB --> Aggregate
ProductDB --> Aggregate
Aggregate --> Client
Implementation Examples
Service-Side Data Access Pattern
class OrderService:
"""Order service accessing only its own database."""
def __init__(self, order_db, http_client):
self.db = order_db
self.http = http_client
def create_order(self, order_request: CreateOrderRequest) -> Order:
# Validate customer exists via API call (not direct DB access)
customer = self._get_customer(order_request.customer_id)
if not customer:
raise CustomerNotFoundError(order_request.customer_id)
# Validate products via API call
products = self._get_products(order_request.product_ids)
if len(products) != len(order_request.product_ids):
raise ProductNotFoundError("Some products not found")
# Create order in local database
order = Order(
id=str(uuid.uuid4()),
customer_id=customer.id,
items=self._build_order_items(products, order_request),
total=sum(p.price * qty for p, qty in zip(products, order_request.quantities)),
status="pending",
created_at=datetime.utcnow()
)
self.db.orders.insert(order.to_dict())
return order
def _get_customer(self, customer_id: str) -> Customer:
response = self.http.get(f"http://user-service/customers/{customer_id}")
return Customer.from_dict(response.json()) if response.ok else None
def _get_products(self, product_ids: list[str]) -> list[Product]:
response = self.http.post(
"http://product-service/products/batch",
json={"ids": product_ids}
)
return [Product.from_dict(p) for p in response.json()] if response.ok else []
Event-Driven Data Synchronization
class InventoryEventConsumer:
"""Consumes inventory events to maintain local read model."""
def __init__(self, kafka_consumer, read_model_db):
self.consumer = kafka_consumer
self.db = read_model_db
def start(self):
for message in self.consumer:
event = self._deserialize(message.value)
self._handle_event(event)
def _handle_event(self, event):
if isinstance(event, InventoryReserved):
self._update_reserved_quantity(event.product_id, event.quantity)
elif isinstance(event, InventoryReleased):
self._update_reserved_quantity(event.product_id, -event.quantity)
elif isinstance(event, ProductPriceChanged):
self._update_product_price(event.product_id, event.new_price)
def _update_reserved_quantity(self, product_id, delta):
self.db.products.update_one(
{'product_id': product_id},
{'$inc': {'reserved_quantity': delta}}
)
Database Connection Configuration Per Service
# Example: PostgreSQL for order service
ORDER_DB_CONFIG = {
"host": "orders-db.internal",
"port": 5432,
"database": "orders",
"pool_size": 20,
"max_overflow": 10,
"pool_timeout": 30,
"pool_recycle": 3600,
}
# Example: MongoDB for product catalog
PRODUCT_DB_CONFIG = {
"host": "products-db.internal",
"port": 27017,
"database": "products",
"max_pool_size": 50,
"min_pool_size": 10,
"server_selection_timeout_ms": 5000,
}
# Example: Redis for session/caching service
SESSION_DB_CONFIG = {
"host": "sessions-db.internal",
"port": 6379,
"db": 0,
"socket_timeout": 5,
"max_connections": 100,
}
Quick Recap
Key Points
- Each microservice owns its data; no direct cross-service database access
- Database per service enables team autonomy, independent scaling, and technology diversity
- Cross-service queries require API composition, CQRS, or event sourcing
- Data consistency across services requires saga pattern or eventual consistency
- Choose database technology based on each service’s access patterns
- Data duplication is often the right trade-off for read performance and independence
- Reporting across services requires ETL pipelines or data warehousing
Database Selection Guide
| Workload Type | Recommended Database | Examples |
|---|---|---|
| Transactional (financial) | PostgreSQL, MySQL | Orders, payments, inventory |
| Document/flexible schema | MongoDB, CouchDB | User profiles, product catalog |
| Key-value / caching | Redis, Memcached | Sessions, rates, leaderboards |
| Search | Elasticsearch, OpenSearch | Product search, full-text search |
| Time-series | TimescaleDB, InfluxDB | Metrics, IoT, logs |
| Graph | Neo4j, Amazon Neptune | Recommendations, social graphs |
Production Checklist
# Database per Service Production Readiness
- [ ] Each service's database backed up independently
- [ ] Database-specific monitoring in place (connection pool, query latency, disk usage)
- [ ] Cross-service data access patterns documented
- [ ] Saga or compensation logic designed for cross-service workflows
- [ ] API composition latency SLAs defined
- [ ] Data warehouse/ETL pipeline for reporting
- [ ] Database migration strategy per service
- [ ] Connection string secrets managed securely
- [ ] Circuit breakers on cross-service calls
- [ ] Idempotency implemented for all cross-service operations Category
Related Posts
Amazon's Architecture: Lessons from the Pioneer of Microservices
Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.
Asynchronous Communication in Microservices: Events and Patterns
Deep dive into asynchronous communication patterns for microservices including event-driven architecture, message queues, and choreography vs orchestration.
Client-Side Discovery: Direct Service Routing in Microservices
Explore client-side service discovery patterns, how clients directly query the service registry, and when this approach works best.