Multi-Tenancy: Shared Infrastructure, Isolated Data

Multi-tenancy lets multiple customers share infrastructure while keeping data isolated. Explore schema strategies, tenant isolation patterns, and SaaS architecture.

published: reading time: 12 min read

Multi-Tenancy: Shared Infrastructure, Isolated Data

Multi-tenancy is how most SaaS applications work. You deploy one application that serves thousands of customers, with each customer’s data kept separate. The appeal is obvious: shared infrastructure means lower costs, easier maintenance, fewer deployment pipelines.

But the complexity does not disappear. It moves. Instead of managing many isolated deployments, you manage isolation within a shared environment. Get isolation wrong and you leak data between tenants. Get performance wrong and one loud neighbor drowns out everyone else.

What is Multi-Tenancy?

A tenant is a group of users who share access to the same data. In a multi-tenant system, one application instance serves multiple tenants. Each tenant cannot see or access other tenants’ data.

Single-tenancy is the alternative: each customer gets their own application and database. Stronger isolation, higher costs.

graph TD
    subgraph MultiTenant["Multi-Tenant Architecture"]
        A[Application] --> B[Shared Database]
        B --> C[Tenant A Data]
        B --> D[Tenant B Data]
        B --> E[Tenant C Data]
    end
    subgraph SingleTenant["Single-Tenant Architecture"]
        F[App A] --> G[DB A]
        H[App B] --> I[DB B]
    end

The shared database approach is the most cost-effective. One database, one application, one deployment pipeline. Compute and storage costs scale sub-linearly with tenants.

Schema Strategies

How you organize tenant data in the database affects isolation, performance, and complexity.

Shared Schema with Tenant ID

All tenants share the same tables. A tenant_id column identifies which row belongs to which tenant.

CREATE TABLE orders (
    id UUID PRIMARY KEY,
    tenant_id UUID NOT NULL,
    user_id UUID,
    total DECIMAL(10,2),
    created_at TIMESTAMP,
    CONSTRAINT tenant_isolation CHECK (tenant_id IS NOT NULL)
);

-- Query always filters by tenant
SELECT * FROM orders WHERE tenant_id = 'tenant-123';

Every query must include the tenant_id filter. Miss it once and you have a data leak. Use row-level security (RLS) in PostgreSQL or similar features to enforce this at the database level.

-- Enable row-level security
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Create policy that filters by current_setting
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id::text = current_setting('app.current_tenant'));

RLS makes it impossible to accidentally query across tenants. The database enforces isolation even if your application code has bugs.

Separate Schemas per Tenant

Each tenant gets their own schema within the same database.

-- Tenant A's schema
CREATE SCHEMA tenant_a;
CREATE TABLE tenant_a.orders (...);

-- Tenant B's schema
CREATE SCHEMA tenant_b;
CREATE TABLE tenant_b.orders (...);

Applications connect with a search_path that includes the tenant’s schema. Queries do not need explicit tenant_id filtering because the schema provides implicit isolation.

The trade-off: schema migrations become more complex. You must run migrations against every tenant schema. With thousands of tenants, this does not scale.

Separate Databases per Tenant

Maximum isolation. Each tenant gets their own database instance.

graph TD
    A[Load Balancer] --> B[Application Cluster]
    B --> C[Tenant A DB]
    B --> D[Tenant B DB]
    B --> E[Tenant N DB]

This approach suits regulatory requirements where data must be physically separated. It also simplifies per-tenant customization. But the operational overhead is brutal: thousands of databases mean thousands of backups, thousands of patches, thousands of failure points.

Most SaaS companies do not need this level of isolation until they have specific compliance requirements.

Tenant Isolation Patterns

Beyond the database, you need to think about isolation at every layer of your stack.

Application-Layer Isolation

Your application code must be tenant-aware from the start. Middleware extracts the tenant from the request and sets a context variable.

from contextvars import ContextVar
from flask import g, request

current_tenant: ContextVar[str] = ContextVar('tenant_id')

@app.before_request
def before_request():
    # Extract tenant from JWT or subdomain
    token = request.headers.get('Authorization')
    tenant = extract_tenant_from_token(token)
    current_tenant.set(tenant)

@app.route('/orders')
def get_orders():
    tenant_id = current_tenant.get()
    return query_orders_for_tenant(tenant_id)

Do not rely on user input to determine the tenant without validation. A user should not be able to specify their tenant_id in a request parameter unless your application explicitly maps users to tenants.

Caching Considerations

Redis and similar caches are shared across tenants. You must namespace keys by tenant.

# Bad: cache key could collide between tenants
cache.set(f"user:{user_id}", user_data)

# Good: tenant-scoped cache key
cache.set(f"tenant:{tenant_id}:user:{user_id}", user_data)

If you use cache-aside caching, be careful about cache stampedes when a tenant’s data expires. One tenant’s traffic spike could evict another tenant’s frequently-accessed data.

Background Jobs and Queues

Worker processes handle background tasks. These must also be tenant-aware.

# Task includes tenant context
@celery.task
def generate_report(tenant_id, report_type):
    # Use tenant_id throughout
    data = fetch_data_for_tenant(tenant_id)
    report = build_report(data, report_type)
    store_report_for_tenant(tenant_id, report)

Never assume that because a task was queued by one tenant, it only affects that tenant. Cross-tenant bugs in background jobs are particularly nasty because they may not be caught until data is already corrupted.

Performance Isolation

Shared infrastructure means shared resources. Without careful design, one tenant’s workload can degrade performance for everyone.

Resource Quotas

Implement per-tenant limits on CPU, memory, database connections, and API calls. Track usage against quotas and throttle or reject requests that exceed limits.

@dataclass
class TenantQuota:
    tenant_id: str
    monthly_spend_limit: float
    api_rate_limit: int  # requests per minute
    max_db_connections: int
    storage_gb: float

def check_quota(tenant_id: str, operation: str) -> bool:
    quota = get_quota(tenant_id)
    current_usage = get_current_usage(tenant_id)

    if operation == 'api_request':
        if current_usage.api_requests >= quota.api_rate_limit:
            return False
    elif operation == 'db_connection':
        if current_usage.db_connections >= quota.max_db_connections:
            return False

    return True

Compute Isolation with Namespace

If you use Kubernetes, you can isolate tenants using namespaces and resource quotas. CPU limits prevent one tenant from consuming all available compute.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-quota
  namespace: tenant-123
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi

Database Connection Pooling

Database connections are often the scarcest resource. Use PgBouncer or similar connection poolers to multiplex many application connections over fewer database connections.

; pgbouncer.ini
[databases]
app_db = host=db.example.com port=5432 dbname=production

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50

With transaction-mode pooling, connections are released back to the pool when transactions commit. This lets you support far more tenants per database instance.

Security Considerations

Multi-tenancy amplifies security risks. A vulnerability affects not just one customer but potentially all customers.

Access Controls

Implement role-based access control that respects tenant boundaries. Users should only be able to access resources within their own tenant.

@require_permission('read:orders')
def get_orders(tenant_id):
    # Permission check happens via decorator
    # It verifies user belongs to tenant_id
    return db.orders.filter(tenant_id=tenant_id)

Audit logging becomes critical. You need to know who accessed what and when. Log tenant_id with every security-relevant event.

Network Isolation

Consider network-level isolation for sensitive tenants. Private networking (VPC peering, private links) keeps tenant traffic off shared networks.

For highly regulated industries, some tenants may require dedicated infrastructure. This moves toward single-tenancy but within a managed environment.

Data Encryption

Encrypt data at rest and in transit. With shared databases, encryption at rest protects against infrastructure-level breaches but not against application-level vulnerabilities.

Use tenant-specific encryption keys if your compliance requirements demand it. AWS KMS and similar services support per-tenant keys with envelope encryption.

Cost Optimization

Multi-tenancy’s appeal is cost efficiency. Make sure you are actually achieving it.

Shared Compute

A single application deployment handling thousands of tenants uses resources far more efficiently than thousands of single-tenant deployments.

The numbers: if single-tenancy needs 1GB RAM per tenant, 1000 tenants requires 1TB RAM. With multi-tenancy and proper resource sharing, you might handle the same workload with 64GB RAM.

Database Cost per Tenant

Track your database cost per active tenant. As tenants grow, you need to decide whether to scale up (bigger database) or scale out (more database instances).

def calculate_cost_per_tenant():
    monthly_db_cost = get_monthly_database_bill()
    active_tenants = get_active_tenant_count()
    return monthly_db_cost / active_tenants

If cost per tenant exceeds thresholds, investigate. Perhaps some tenants are outliers with unusual workloads. Perhaps your schema design needs optimization.

Metadata Tiering

Keep tenant metadata (billing info, subscription tier, settings) in a lightweight store. Full application data stays in your main database. This separation lets you scale metadata independently.

Common Pitfalls

Query Accidents

Forgetting to filter by tenant_id is the most common bug. Use RLS or similar database-level enforcement to protect against this.

-- This will fail or return empty if RLS is properly configured
SELECT * FROM orders;

Cache Invalidation

When tenant data changes, you must invalidate the correct cache keys. Use consistent key naming and always include tenant_id in cache invalidation logic.

Migration Runbook

When you need to alter a shared schema, you must:

  1. Test the migration against a representative sample of tenants
  2. Plan for backwards compatibility (old and new code running simultaneously)
  3. Have a rollback plan
  4. Execute during low-traffic windows

Schema changes on shared tables are risky. One bad migration affects all tenants simultaneously.

Production Failure Scenarios

FailureImpactMitigation
tenant_id filter missing in queryCross-tenant data exposureEnable RLS at database level; audit queries in code review
One tenant consumes all connectionsOther tenants cannot access databasePer-tenant connection limits; connection pool monitoring
Cache key collision between tenantsTenant A sees Tenant B dataUse tenant-scoped cache keys; implement key prefixing
Background job processes wrong tenantData cross-contaminationPass tenant_id explicitly; validate tenant context in every job
Schema migration fails on shared tableAll tenants affected simultaneouslyTest on sample tenants first; maintain backwards compatibility
Quota enforcement bugOne tenant monopolizes resourcesImplement quota checks at multiple layers; monitor usage

Observability Checklist

  • Metrics:

    • Queries per tenant (identify noisy neighbors)
    • Database connection usage per tenant
    • Cache hit rate per tenant
    • API latency per tenant
    • Quota utilization per tenant
  • Logs:

    • Tenant ID logged on every security-relevant event
    • Cross-tenant access attempts (should be zero)
    • Quota violations with tenant context
    • Migration progress per tenant
  • Alerts:

    • Any cross-tenant data access attempts
    • Tenant exceeding quota thresholds
    • Database connection pool saturation
    • Slow queries from specific tenants
    • Cache invalidation failures

Security Checklist

  • Row-level security enabled on all tenant-scoped tables
  • Tenant ID cannot be user-supplied without validation
  • Cache keys namespaced by tenant
  • Background jobs include tenant validation
  • Audit logging captures tenant context on all data access
  • Network isolation for sensitive tenants (VPC/private links)
  • Per-tenant encryption keys for sensitive data
  • Access control respects tenant boundaries at every layer

Common Anti-Patterns to Avoid

Storing tenant_id in User Input

The user should never provide their own tenant_id. Extract it from authenticated context (JWT, session, OAuth token).

Shared Cache Without Namespacing

Redis is shared across tenants. Without key namespacing, one tenant can evict another’s data.

Global State for Tenant Context

Using global variables for tenant context breaks under async/multithreaded execution. Use context variables or dependency injection.

Trusting Subdomain for Tenant Identification

Subdomains can be spoofed. Always validate tenant identity from authenticated credentials.

Quick Recap

Key Bullets:

  • Multi-tenancy shares infrastructure for cost efficiency but requires strict isolation
  • Schema strategies range from shared tables (tenant_id) to separate databases per tenant
  • RLS and similar database features enforce isolation at the data layer
  • Performance isolation requires quotas, resource limits, and monitoring
  • Security isolation requires defense in depth across all layers

Copy/Paste Checklist:

Multi-Tenancy Setup:
[ ] Tenant context extraction from auth token
[ ] Tenant ID validation on every request
[ ] Row-level security enabled on databases
[ ] Cache keys namespaced by tenant
[ ] Per-tenant resource quotas defined
[ ] Background jobs include tenant context
[ ] Monitoring per tenant (not just aggregate)
[ ] Quota alerts configured
[ ] Cross-tenant access monitoring enabled
[ ] Regular tenant isolation audit

When to Use Multi-Tenancy

Multi-tenancy makes sense when:

  • Your tenants have similar workloads and resource needs
  • Cost efficiency matters more than maximum isolation
  • You can build and maintain proper isolation tooling
  • Regulatory requirements allow shared infrastructure

Single-tenancy makes sense when:

  • Tenants have wildly different resource requirements
  • Strong regulatory isolation is required
  • Per-tenant customization is extensive
  • Tenant count is small (dozens, not thousands)

Most SaaS applications start multi-tenant. The efficiency gains are hard to pass up. Build isolation and observability tooling early, before you have hundreds of tenants and bugs that are hard to fix.

Trade-off Analysis

FactorShared SchemaSeparate SchemaSeparate Database
IsolationLow - RLS requiredMedium - schema separationHigh - complete isolation
Cost EfficiencyHighestHighLow
Operational ComplexityMediumHighHighest
Schema MigrationsShared - riskyPer-tenant - complexPer-tenant - complex
Query PerformanceRequires tenant_id indexesGood - implicit isolationBest per-tenant
CustomizationLimitedSchema-level customizationFull customization
Backup/RestoreAll tenants togetherPer schemaPer database
Regulatory FitGeneral dataSegregated dataStrict isolation
Tenant CountThousandsHundredsDozens
Failure DomainAll tenants shareSchema-level failuresIsolated per tenant

Multi-Tenancy Isolation Architecture

graph TB
    subgraph SharedInfra["Shared Infrastructure"]
        LB[Load Balancer]
        App[Application Cluster]
        Cache[(Shared Cache<br/>namespaced)]
    end

    subgraph DataLayer["Data Layer Options"]
        direction LR
        SharedSchema["Shared Schema<br/>tenant_id + RLS"]
        SeparateSchema["Separate Schemas<br/>per tenant"]
        SeparateDB["Separate Databases<br/>per tenant"]
    end

    subgraph IsolationBoundaries["Isolation Boundaries"]
        direction TB
        Network[Network Isolation<br/>VPC/Private Links]
        Compute[Compute Quotas<br/>per tenant]
        Storage[Storage Limits<br/>per tenant]
    end

    LB --> App
    App --> Cache
    App --> SharedSchema
    App --> SeparateSchema
    App --> SeparateDB
    Network -.->|applies to| App
    Compute -.->|applies to| App
    Storage -.->|applies to| SharedSchema

For more on related topics, see Microservices Architecture, API Gateway Patterns, and Database Scaling.

Category

Related Posts

Distributed Caching: Multi-Node Cache Clusters

Scale caching across multiple nodes. Learn about cache clusters, consistency models, session stores, and cache coherence patterns.

#distributed-systems #caching #scalability

Geo-Distribution: Multi-Region Deployment Strategies

Deploy applications across multiple geographic regions for low latency, high availability, and data locality. Covers latency-based routing, conflict resolution, and global distribution.

#distributed-systems #geo-distribution #architecture

Data Vault: Scalable Enterprise Data Modeling

Learn Data Vault modeling methodology for building auditable, scalable enterprise data warehouses with hash keys and satellite tables.

#data-engineering #data-modeling #data-vault