Multi-Tenancy: Shared Infrastructure, Isolated Data
Multi-tenancy lets multiple customers share infrastructure while keeping data isolated. Explore schema strategies, tenant isolation patterns, and SaaS architecture.
Multi-Tenancy: Shared Infrastructure, Isolated Data
Multi-tenancy is how most SaaS applications work. You deploy one application that serves thousands of customers, with each customer’s data kept separate. The appeal is obvious: shared infrastructure means lower costs, easier maintenance, fewer deployment pipelines.
But the complexity does not disappear. It moves. Instead of managing many isolated deployments, you manage isolation within a shared environment. Get isolation wrong and you leak data between tenants. Get performance wrong and one loud neighbor drowns out everyone else.
What is Multi-Tenancy?
A tenant is a group of users who share access to the same data. In a multi-tenant system, one application instance serves multiple tenants. Each tenant cannot see or access other tenants’ data.
Single-tenancy is the alternative: each customer gets their own application and database. Stronger isolation, higher costs.
graph TD
subgraph MultiTenant["Multi-Tenant Architecture"]
A[Application] --> B[Shared Database]
B --> C[Tenant A Data]
B --> D[Tenant B Data]
B --> E[Tenant C Data]
end
subgraph SingleTenant["Single-Tenant Architecture"]
F[App A] --> G[DB A]
H[App B] --> I[DB B]
end
The shared database approach is the most cost-effective. One database, one application, one deployment pipeline. Compute and storage costs scale sub-linearly with tenants.
Schema Strategies
How you organize tenant data in the database affects isolation, performance, and complexity.
Shared Schema with Tenant ID
All tenants share the same tables. A tenant_id column identifies which row belongs to which tenant.
CREATE TABLE orders (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
user_id UUID,
total DECIMAL(10,2),
created_at TIMESTAMP,
CONSTRAINT tenant_isolation CHECK (tenant_id IS NOT NULL)
);
-- Query always filters by tenant
SELECT * FROM orders WHERE tenant_id = 'tenant-123';
Every query must include the tenant_id filter. Miss it once and you have a data leak. Use row-level security (RLS) in PostgreSQL or similar features to enforce this at the database level.
-- Enable row-level security
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- Create policy that filters by current_setting
CREATE POLICY tenant_isolation ON orders
USING (tenant_id::text = current_setting('app.current_tenant'));
RLS makes it impossible to accidentally query across tenants. The database enforces isolation even if your application code has bugs.
Separate Schemas per Tenant
Each tenant gets their own schema within the same database.
-- Tenant A's schema
CREATE SCHEMA tenant_a;
CREATE TABLE tenant_a.orders (...);
-- Tenant B's schema
CREATE SCHEMA tenant_b;
CREATE TABLE tenant_b.orders (...);
Applications connect with a search_path that includes the tenant’s schema. Queries do not need explicit tenant_id filtering because the schema provides implicit isolation.
The trade-off: schema migrations become more complex. You must run migrations against every tenant schema. With thousands of tenants, this does not scale.
Separate Databases per Tenant
Maximum isolation. Each tenant gets their own database instance.
graph TD
A[Load Balancer] --> B[Application Cluster]
B --> C[Tenant A DB]
B --> D[Tenant B DB]
B --> E[Tenant N DB]
This approach suits regulatory requirements where data must be physically separated. It also simplifies per-tenant customization. But the operational overhead is brutal: thousands of databases mean thousands of backups, thousands of patches, thousands of failure points.
Most SaaS companies do not need this level of isolation until they have specific compliance requirements.
Tenant Isolation Patterns
Beyond the database, you need to think about isolation at every layer of your stack.
Application-Layer Isolation
Your application code must be tenant-aware from the start. Middleware extracts the tenant from the request and sets a context variable.
from contextvars import ContextVar
from flask import g, request
current_tenant: ContextVar[str] = ContextVar('tenant_id')
@app.before_request
def before_request():
# Extract tenant from JWT or subdomain
token = request.headers.get('Authorization')
tenant = extract_tenant_from_token(token)
current_tenant.set(tenant)
@app.route('/orders')
def get_orders():
tenant_id = current_tenant.get()
return query_orders_for_tenant(tenant_id)
Do not rely on user input to determine the tenant without validation. A user should not be able to specify their tenant_id in a request parameter unless your application explicitly maps users to tenants.
Caching Considerations
Redis and similar caches are shared across tenants. You must namespace keys by tenant.
# Bad: cache key could collide between tenants
cache.set(f"user:{user_id}", user_data)
# Good: tenant-scoped cache key
cache.set(f"tenant:{tenant_id}:user:{user_id}", user_data)
If you use cache-aside caching, be careful about cache stampedes when a tenant’s data expires. One tenant’s traffic spike could evict another tenant’s frequently-accessed data.
Background Jobs and Queues
Worker processes handle background tasks. These must also be tenant-aware.
# Task includes tenant context
@celery.task
def generate_report(tenant_id, report_type):
# Use tenant_id throughout
data = fetch_data_for_tenant(tenant_id)
report = build_report(data, report_type)
store_report_for_tenant(tenant_id, report)
Never assume that because a task was queued by one tenant, it only affects that tenant. Cross-tenant bugs in background jobs are particularly nasty because they may not be caught until data is already corrupted.
Performance Isolation
Shared infrastructure means shared resources. Without careful design, one tenant’s workload can degrade performance for everyone.
Resource Quotas
Implement per-tenant limits on CPU, memory, database connections, and API calls. Track usage against quotas and throttle or reject requests that exceed limits.
@dataclass
class TenantQuota:
tenant_id: str
monthly_spend_limit: float
api_rate_limit: int # requests per minute
max_db_connections: int
storage_gb: float
def check_quota(tenant_id: str, operation: str) -> bool:
quota = get_quota(tenant_id)
current_usage = get_current_usage(tenant_id)
if operation == 'api_request':
if current_usage.api_requests >= quota.api_rate_limit:
return False
elif operation == 'db_connection':
if current_usage.db_connections >= quota.max_db_connections:
return False
return True
Compute Isolation with Namespace
If you use Kubernetes, you can isolate tenants using namespaces and resource quotas. CPU limits prevent one tenant from consuming all available compute.
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-quota
namespace: tenant-123
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
Database Connection Pooling
Database connections are often the scarcest resource. Use PgBouncer or similar connection poolers to multiplex many application connections over fewer database connections.
; pgbouncer.ini
[databases]
app_db = host=db.example.com port=5432 dbname=production
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
With transaction-mode pooling, connections are released back to the pool when transactions commit. This lets you support far more tenants per database instance.
Security Considerations
Multi-tenancy amplifies security risks. A vulnerability affects not just one customer but potentially all customers.
Access Controls
Implement role-based access control that respects tenant boundaries. Users should only be able to access resources within their own tenant.
@require_permission('read:orders')
def get_orders(tenant_id):
# Permission check happens via decorator
# It verifies user belongs to tenant_id
return db.orders.filter(tenant_id=tenant_id)
Audit logging becomes critical. You need to know who accessed what and when. Log tenant_id with every security-relevant event.
Network Isolation
Consider network-level isolation for sensitive tenants. Private networking (VPC peering, private links) keeps tenant traffic off shared networks.
For highly regulated industries, some tenants may require dedicated infrastructure. This moves toward single-tenancy but within a managed environment.
Data Encryption
Encrypt data at rest and in transit. With shared databases, encryption at rest protects against infrastructure-level breaches but not against application-level vulnerabilities.
Use tenant-specific encryption keys if your compliance requirements demand it. AWS KMS and similar services support per-tenant keys with envelope encryption.
Cost Optimization
Multi-tenancy’s appeal is cost efficiency. Make sure you are actually achieving it.
Shared Compute
A single application deployment handling thousands of tenants uses resources far more efficiently than thousands of single-tenant deployments.
The numbers: if single-tenancy needs 1GB RAM per tenant, 1000 tenants requires 1TB RAM. With multi-tenancy and proper resource sharing, you might handle the same workload with 64GB RAM.
Database Cost per Tenant
Track your database cost per active tenant. As tenants grow, you need to decide whether to scale up (bigger database) or scale out (more database instances).
def calculate_cost_per_tenant():
monthly_db_cost = get_monthly_database_bill()
active_tenants = get_active_tenant_count()
return monthly_db_cost / active_tenants
If cost per tenant exceeds thresholds, investigate. Perhaps some tenants are outliers with unusual workloads. Perhaps your schema design needs optimization.
Metadata Tiering
Keep tenant metadata (billing info, subscription tier, settings) in a lightweight store. Full application data stays in your main database. This separation lets you scale metadata independently.
Common Pitfalls
Query Accidents
Forgetting to filter by tenant_id is the most common bug. Use RLS or similar database-level enforcement to protect against this.
-- This will fail or return empty if RLS is properly configured
SELECT * FROM orders;
Cache Invalidation
When tenant data changes, you must invalidate the correct cache keys. Use consistent key naming and always include tenant_id in cache invalidation logic.
Migration Runbook
When you need to alter a shared schema, you must:
- Test the migration against a representative sample of tenants
- Plan for backwards compatibility (old and new code running simultaneously)
- Have a rollback plan
- Execute during low-traffic windows
Schema changes on shared tables are risky. One bad migration affects all tenants simultaneously.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| tenant_id filter missing in query | Cross-tenant data exposure | Enable RLS at database level; audit queries in code review |
| One tenant consumes all connections | Other tenants cannot access database | Per-tenant connection limits; connection pool monitoring |
| Cache key collision between tenants | Tenant A sees Tenant B data | Use tenant-scoped cache keys; implement key prefixing |
| Background job processes wrong tenant | Data cross-contamination | Pass tenant_id explicitly; validate tenant context in every job |
| Schema migration fails on shared table | All tenants affected simultaneously | Test on sample tenants first; maintain backwards compatibility |
| Quota enforcement bug | One tenant monopolizes resources | Implement quota checks at multiple layers; monitor usage |
Observability Checklist
-
Metrics:
- Queries per tenant (identify noisy neighbors)
- Database connection usage per tenant
- Cache hit rate per tenant
- API latency per tenant
- Quota utilization per tenant
-
Logs:
- Tenant ID logged on every security-relevant event
- Cross-tenant access attempts (should be zero)
- Quota violations with tenant context
- Migration progress per tenant
-
Alerts:
- Any cross-tenant data access attempts
- Tenant exceeding quota thresholds
- Database connection pool saturation
- Slow queries from specific tenants
- Cache invalidation failures
Security Checklist
- Row-level security enabled on all tenant-scoped tables
- Tenant ID cannot be user-supplied without validation
- Cache keys namespaced by tenant
- Background jobs include tenant validation
- Audit logging captures tenant context on all data access
- Network isolation for sensitive tenants (VPC/private links)
- Per-tenant encryption keys for sensitive data
- Access control respects tenant boundaries at every layer
Common Anti-Patterns to Avoid
Storing tenant_id in User Input
The user should never provide their own tenant_id. Extract it from authenticated context (JWT, session, OAuth token).
Shared Cache Without Namespacing
Redis is shared across tenants. Without key namespacing, one tenant can evict another’s data.
Global State for Tenant Context
Using global variables for tenant context breaks under async/multithreaded execution. Use context variables or dependency injection.
Trusting Subdomain for Tenant Identification
Subdomains can be spoofed. Always validate tenant identity from authenticated credentials.
Quick Recap
Key Bullets:
- Multi-tenancy shares infrastructure for cost efficiency but requires strict isolation
- Schema strategies range from shared tables (tenant_id) to separate databases per tenant
- RLS and similar database features enforce isolation at the data layer
- Performance isolation requires quotas, resource limits, and monitoring
- Security isolation requires defense in depth across all layers
Copy/Paste Checklist:
Multi-Tenancy Setup:
[ ] Tenant context extraction from auth token
[ ] Tenant ID validation on every request
[ ] Row-level security enabled on databases
[ ] Cache keys namespaced by tenant
[ ] Per-tenant resource quotas defined
[ ] Background jobs include tenant context
[ ] Monitoring per tenant (not just aggregate)
[ ] Quota alerts configured
[ ] Cross-tenant access monitoring enabled
[ ] Regular tenant isolation audit
When to Use Multi-Tenancy
Multi-tenancy makes sense when:
- Your tenants have similar workloads and resource needs
- Cost efficiency matters more than maximum isolation
- You can build and maintain proper isolation tooling
- Regulatory requirements allow shared infrastructure
Single-tenancy makes sense when:
- Tenants have wildly different resource requirements
- Strong regulatory isolation is required
- Per-tenant customization is extensive
- Tenant count is small (dozens, not thousands)
Most SaaS applications start multi-tenant. The efficiency gains are hard to pass up. Build isolation and observability tooling early, before you have hundreds of tenants and bugs that are hard to fix.
Trade-off Analysis
| Factor | Shared Schema | Separate Schema | Separate Database |
|---|---|---|---|
| Isolation | Low - RLS required | Medium - schema separation | High - complete isolation |
| Cost Efficiency | Highest | High | Low |
| Operational Complexity | Medium | High | Highest |
| Schema Migrations | Shared - risky | Per-tenant - complex | Per-tenant - complex |
| Query Performance | Requires tenant_id indexes | Good - implicit isolation | Best per-tenant |
| Customization | Limited | Schema-level customization | Full customization |
| Backup/Restore | All tenants together | Per schema | Per database |
| Regulatory Fit | General data | Segregated data | Strict isolation |
| Tenant Count | Thousands | Hundreds | Dozens |
| Failure Domain | All tenants share | Schema-level failures | Isolated per tenant |
Multi-Tenancy Isolation Architecture
graph TB
subgraph SharedInfra["Shared Infrastructure"]
LB[Load Balancer]
App[Application Cluster]
Cache[(Shared Cache<br/>namespaced)]
end
subgraph DataLayer["Data Layer Options"]
direction LR
SharedSchema["Shared Schema<br/>tenant_id + RLS"]
SeparateSchema["Separate Schemas<br/>per tenant"]
SeparateDB["Separate Databases<br/>per tenant"]
end
subgraph IsolationBoundaries["Isolation Boundaries"]
direction TB
Network[Network Isolation<br/>VPC/Private Links]
Compute[Compute Quotas<br/>per tenant]
Storage[Storage Limits<br/>per tenant]
end
LB --> App
App --> Cache
App --> SharedSchema
App --> SeparateSchema
App --> SeparateDB
Network -.->|applies to| App
Compute -.->|applies to| App
Storage -.->|applies to| SharedSchema
For more on related topics, see Microservices Architecture, API Gateway Patterns, and Database Scaling.
Category
Related Posts
Distributed Caching: Multi-Node Cache Clusters
Scale caching across multiple nodes. Learn about cache clusters, consistency models, session stores, and cache coherence patterns.
Geo-Distribution: Multi-Region Deployment Strategies
Deploy applications across multiple geographic regions for low latency, high availability, and data locality. Covers latency-based routing, conflict resolution, and global distribution.
Data Vault: Scalable Enterprise Data Modeling
Learn Data Vault modeling methodology for building auditable, scalable enterprise data warehouses with hash keys and satellite tables.