Zero-Downtime Database Migration Strategies
Learn how to evolve your database schema in production without downtime. This guide covers expand-contract patterns, backward-compatible migrations, rollback strategies, and tools.
Zero-Downtime Database Migration Strategies
Database migrations seem simple until you’re doing them in production with millions of rows and zero tolerance for downtime. Get it wrong and you get angry users, late-night escalations, and potentially corrupted data. Get it right and your application evolves smoothly while users don’t notice a thing.
This guide covers the strategies and tools that professional teams use to ship database changes without drama.
Introduction
Database migrations are the most dangerous deployment operation you run. A bad migration in production with millions of rows and a zero-downtime requirement can cause data corruption, service outages, or require emergency rollbacks that create their own problems. The difference between a smooth migration and an incident is almost always whether you followed a proven pattern.
This guide covers the expand-contract pattern for breaking schema changes, backward-compatible migration strategies that work during rolling deployments, batch backfill techniques that do not lock tables, and rollback approaches when things go wrong. It focuses on PostgreSQL but the patterns apply broadly.
Understanding the Problem
When you’re working with a live database, every schema change carries risk. Adding a column seems innocuous until you realize it locks the table. Renaming a column sounds safe until the old code still running on some server tries to write to the old name. The problem isn’t the change itself—it’s the multiple versions of your application that exist during a deployment window.
Modern deployments use rolling updates or blue-green deployments. During these deployments, version N and version N+1 of your application are both running simultaneously. Your database migration must be compatible with both versions. This constraint is the foundation of all zero-downtime migration strategies.
The Expand-Contract Pattern
The expand-contract pattern, sometimes called the “two-phase migration,” works well for breaking schema changes. The core idea: never make a breaking change in a single step. Break it into three phases instead.
Phase 1: Expand
In the expand phase, you add new structure without removing the old. Suppose you want to rename a user_name column to display_name. Instead of renaming directly, you add the new column alongside the old one.
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);
Your application code now writes to both columns. Version N writes to user_name, and version N+1 writes to display_name. A background job backfills the new column with data from the old column.
Phase 2: Migrate
Once the expand phase is complete and all historical data has been backfilled, you update the application code to stop writing to the old column. The old column becomes read-only.
-- Application now writes to display_name only
-- Old user_name column is kept for backward compatibility
Phase 3: Contract
After you have verified that no code is writing to the old column and all reads have been migrated, you can safely remove the old structure.
ALTER TABLE users DROP COLUMN user_name;
This three-phase approach ensures that at every point during the migration, some version of your application can function correctly.
flowchart LR
subgraph Phase1["Phase 1: Expand"]
A1[Add new column<br/>display_name] --> B1[Deploy vN+1 writes<br/>to both columns]
B1 --> C1[Background job<br/>backfills new column]
end
subgraph Phase2["Phase 2: Migrate"]
C1 --> D1[Stop writing to<br/>old column]
D1 --> E1[Old column<br/>becomes read-only]
end
subgraph Phase3["Phase 3: Contract"]
E1 --> F1[Verify no writes<br/>to old column]
F1 --> G1[DROP old column<br/>user_name]
end
Phase1 --> Phase2 --> Phase3
The expand-contract pattern is a specific application of a broader principle: backward-compatible migrations. A migration is backward-compatible when it works correctly whether the application before or after the migration is running.
Adding Columns Without Locking
PostgreSQL’s MVCC (Multi-Version Concurrency Control) means that most DDL changes no longer require exclusive locks like they used to. However, some operations still cause issues.
Adding a column with a default value used to require a table rewrite in older PostgreSQL versions. Starting with PostgreSQL 11, adding a column with a volatile default only updates the table metadata, not the rows themselves. The default is stored once in the catalog and applied on reads.
-- Safe in PostgreSQL 11+
ALTER TABLE users ADD COLUMN is_verified BOOLEAN DEFAULT FALSE;
For older versions, use a separate step:
-- Step 1: Add column without default
ALTER TABLE users ADD COLUMN is_verified BOOLEAN;
-- Step 2: Add default separately
ALTER TABLE users ALTER COLUMN is_verified SET DEFAULT FALSE;
-- Step 3: Backfill existing rows
UPDATE users SET is_verified = FALSE WHERE is_verified IS NULL;
Index Creation
Creating an index locks writes to the table in versions before PostgreSQL 11. For large tables, this can cause significant downtime. Use CONCURRENTLY to build indexes without blocking writes.
-- Creates index without blocking writes
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
Note that CREATE INDEX CONCURRENTLY cannot run inside a transaction. Your migration tool must support transactional migrations that can also run non-transactional commands.
When to Use / When Not to Use
Use expand-contract migrations when you need to rename or remove columns that existing code still references, change column types requiring data transformation, split or merge tables, or make any schema change that breaks the current application version.
Use backward-compatible migrations when adding columns with safe defaults (PostgreSQL 11+), adding indexes concurrently, adding constraints as NOT VALID before validating, or working with rolling deployments where old and new app versions coexist.
Do not use expand-contract when the change is purely additive (new table, new column that nothing touches — just deploy it), when you have a maintenance window for a coordinated full migration, or when the risk of multi-version confusion outweighs the risk of a coordinated deployment.
Never do the following in production: single-step column renames or type changes on large tables; non-concurrent index creation on tables with active writes; migrations inside the same transaction as application code deployment.
Rollback Strategies
Manual migration management does not scale. As your team grows and your database schema becomes more complex, you need tools that track migration history, enforce ordering, and help with rollback.
Flyway
Flyway is a mature, open-source migration tool that works with virtually any database. Migrations are stored as versioned SQL scripts.
V1__Create_initial_schema.sql
V2__Add_users_table.sql
V3__Add_email_index.sql
Flyway tracks which migrations have been applied and applies pending ones on startup. It supports undo migrations (in paid version) and baseline migrations for existing databases.
Configuration:
flyway.url=jdbc:postgresql://localhost:5432/mydb
flyway.user=flyway
flyway.password=secret
flyway.locations=filesystem:./migrations
Liquibase
Liquibase uses a changelog file (XML, JSON, YAML, or SQL) to define migrations. This provides better cross-database compatibility since Liquibase handles the dialect translation.
<changeSet id="1" author="alex">
<addColumn tableName="users">
<column name="display_name" type="VARCHAR(255)"/>
</addColumn>
</changeSet>
Liquibase tracks changes via a changelog table, allowing it to apply only new changes. It also supports rollbacks via rollback SQL or rollback scripts.
Prisma Migrate
If you are using Prisma as your ORM, Prisma Migrate integrates directly with your schema definition. You define your schema in schema.prisma, and Prisma generates and tracks migrations.
model User {
id String @id @default(cuid())
email String @unique
displayName String?
createdAt DateTime @default(now())
}
Prisma Migrate generates SQL migrations based on your schema changes. It maintains a migration history table and supports development, staging, and production environments.
The main advantage is tight integration with your ORM layer. The downside is vendor lock-in—if you switch away from Prisma, you need to migrate your migration system too.
Adding Columns Without Locking Tables
This deserves its own section because it is such a common operation and a common source of production incidents.
The Problem with DEFAULT values
Before PostgreSQL 11, adding a column with a DEFAULT value caused a table rewrite. For a table with 100 million rows, this could take hours and lock the table the entire time.
Even in PostgreSQL 11+, certain operations still cause table locks:
- Changing a column’s type
- Changing a column’s NOT NULL constraint
- Adding a CHECK constraint
Techniques for Safe Column Addition
-
Add nullable columns first: Columns that allow NULL never require a table rewrite.
-
Add columns with non-blocking defaults: PostgreSQL 11+ handles volatile defaults efficiently for column addition.
-
Use concurrent index creation: As mentioned earlier, always use
CONCURRENTLYfor indexes on large tables. -
Avoid constraints initially: Add CHECK constraints as NOT VALID, then validate later:
-- Add constraint without scanning
ALTER TABLE orders ADD CONSTRAINT positive_total CHECK (total >= 0) NOT VALID;
-- Validate without locking
ALTER TABLE orders VALIDATE CONSTRAINT positive_total;
- Break up large alterations: For very large tables, consider a multi-step approach where you add a trigger to maintain the new column rather than backfilling all at once.
Best Practices
After working on this for years, a few practices consistently make migrations safer.
Test on production-size data: A migration that takes 2 seconds on your laptop might take 2 hours on production. Test against realistic data volumes.
Monitor table statistics: Run ANALYZE after major migrations to ensure the query planner has accurate information.
Have a rollback plan: Not for the migration itself, but for the deployment. Can you route traffic back to the old application version if something goes wrong?
Schedule during low traffic: Even with all precautions, migrations are safer during low-traffic windows.
Keep migrations small and focused: Large migrations are harder to review and harder to rollback. One change per migration makes debugging easier.
Production Failure Scenarios
| Failure | Cause | Mitigation |
|---|---|---|
| Table lock during column rename | PostgreSQL versions < 11 with DEFAULT values caused table rewrite | Use PostgreSQL 11+ or multi-step column addition without defaults |
| Dual-write window data divergence | Application bug writes to wrong column during expand phase | Dual-write validation checks in staging, read-after-write consistency tests |
| Long-running migration holding locks | Transaction timeout on large backfill UPDATE | Use smaller batch sizes with sleep intervals, never hold locks across batches |
| Rollback leaves partial schema state | Contract phase run before all app versions migrated | Gate contract phase on deployment confirmation, keep old column until fully verified |
| Concurrent REINDEX blocks writes | REINDEX without CONCURRENTLY on busy index | Always use REINDEX CONCURRENTLY in production |
Trade-Off Table: Migration Tools
| Dimension | Flyway | Liquibase | Prisma Migrate | Raw SQL |
|---|---|---|---|---|
| Learning curve | Low | Medium | Low | Low |
| Rollback support | Paid version | Yes (changelog) | Limited | Manual |
| Cross-database support | Yes (JDBC) | Yes | No (ORM-coupled) | Yes |
| Transaction handling | Partial (non-transactional commands break it) | Partial | Yes | Manual |
| State tracking | Versioned files | Changelog table | Migration history | None |
| CI/CD integration | Excellent | Good | Good | Requires custom script |
| Large team support | Good | Good | Good | Needs conventions |
Capacity Estimation: Migration Window Sizing
Migration window sizing determines how long your migration will take and how much downtime you actually need.
Batch size and migration time formula:
rows_per_batch = batch_size_bytes / avg_row_width_bytes
total_batches = total_rows / rows_per_batch
migration_time_minutes = total_batches × batch_interval_seconds / 60
For a 100M row table adding a NOT NULL column with a default value:
- Average row width: 200 bytes
- Batch size: 50MB
- Rows per batch: 50MB / 200 = 250,000 rows
- Total batches: 100M / 250K = 400 batches
- Batch interval (including index rebuild): 5 seconds
- Total migration time: 400 × 5 / 60 = ~33 minutes
Downtime window formula:
actual_downtime = sum(contract_phase_locks)
contract_phase_downtime = migration_phase_batches × batch_interval_seconds
The expand-contract pattern minimizes actual downtime to only the contract phase. A 100M row table with 5-second batch intervals and 400 batches has a contract phase of ~33 minutes. With a shadow table approach, actual lock time per batch drops to milliseconds—meaning ~5 minutes of total downtime.
Validation time formula:
validation_time = table_row_count × (1 / validation_rows_per_second)
Validating a migrated table against the original (row count, checksum) at 100K rows/second for a 100M row table: ~17 minutes. Plan for validation time equal to or greater than the migration itself.
Real-World Case Study: Stripe’s Zero-Downtime Migration Toolkit
Stripe runs one of the largest PostgreSQL deployments in the financial technology space, processing billions of dollars in transactions annually. Their migration toolkit is open-source and represents years of production experience.
The problem Stripe solved: At their scale, a naive ALTER TABLE statement blocking for even 30 seconds would result in thousands of failed payments. They needed a migration system that worked across multiple application versions simultaneously, with the ability to roll back instantly if something went wrong.
Key practices from Stripe’s approach:
- Shadow tables: Create an identical copy of the table to migrate. Applications write to both old and new tables during the expand phase.
- Backfill with minimal locks: Use batched updates with row-level locking rather than table-level locks. Stripe’s tooling uses
WHERE id > last_processed_id ORDER BY id LIMIT batch_sizepattern. - Validation before switchover: Compare row counts, checksums, and a random sample of rows between shadow and original tables before promoting the shadow.
- Backward-compatible application code: Every migration is paired with an application deployment that can read both old and new schemas. Old code never breaks—new columns are simply ignored.
- Instant rollback: If validation fails, drop the shadow table and roll back application code. No schema changes to undo.
The lesson: Stripe’s toolkit proves that “zero-downtime” is not marketing—it’s a specific set of patterns (shadow table, dual write, validation) that require upfront investment but pay off at scale.
Quick Recap Checklist
- Zero-downtime migrations require backward-compatible schema changes that work with multiple app versions
- The expand-contract pattern breaks breaking changes into three phases: add new, migrate data, remove old
- Always use CONCURRENTLY for index creation on tables with active writes
- Add columns as nullable first; add NOT NULL constraints after backfilling
- Use NOT VALID to add CHECK constraints without scanning rows
- Migration tools (Flyway, Liquibase, Prisma Migrate) track which migrations have been applied
- Test migrations on production-size data — 2 seconds on laptop can mean 2 hours in production
- Schedule migrations during low-traffic windows even with zero-downtime techniques
- Keep migrations small and focused — one change per migration is easier to debug and rollback
- Always have a rollback plan for the deployment (not the migration itself)
- Monitor table statistics after major migrations — run ANALYZE to update query planner
Related Posts
- Database Scaling Strategies - Horizontal and vertical scaling approaches
- DevOps Roadmap - Database deployment, CI/CD pipelines, and infrastructure-as-code patterns overlap heavily with zero-downtime migration tooling and practices
- Schema Design Fundamentals - Core principles of effective schema design
- Deployment Strategies - Modern deployment patterns that enable zero-downtime
Interview Questions
The key is to break the migration into zero-downtime phases using expand-contract. First, deploy application code that writes the default value to new and old columns simultaneously. Second, use a shadow table migration pattern: create a new table with the correct schema, backfill data in batches of ~10K rows using id > last_id ORDER BY id LIMIT batch_size to avoid table locks, replay live writes to both tables during the migration window. Third, validate row counts and checksums between shadow and original. Fourth, perform a quick lock switchover (typically under 30 seconds using ALTER TABLE ... RENAME). Finally, drop the old table after verifying the application works correctly.
Flyway uses versioned SQL scripts (plain SQL files) with a simple migration history table. It's straightforward, works with raw SQL expertise, and has minimal abstraction. Liquibase uses XML/YAML/JSON change logs with a declarative approach, tracking changes by change sets rather than raw SQL. Liquibase supports rollbacks natively through change set metadata; Flyway requires manually written undo scripts. For teams already using ORM tooling, Liquibase's change log approach integrates well with Prisma and Hibernate. For raw SQL shops, Flyway's simplicity wins.
Recovery depends on whether the migration was wrapped in a transaction. For expand-contract patterns without atomic transactions, the application is designed to be backward-compatible during the migration window—so the old application code continues working. Recovery steps: first, do not panic—the expand-contract pattern means the old schema still exists. Second, roll back application code to the previous version that speaks only the old schema. Third, if using a shadow table, validate data integrity between shadow and original. Fourth, after validating old schema is fully functional, clean up any partial shadow table artifacts. Finally, investigate failure root cause, adjust migration scripts, and re-run during the next maintenance window.
Multi-region migrations require either region-by-region rollouts or schemas that are backward-compatible across all regions simultaneously. Key challenges: DDL statements in PostgreSQL are not fully transactional across regions (certain DDL changes can cause brief locks). Solutions: use migrate-based expansion where each region validates its own schema version; use online DDL tools like pg_repack for table alterations instead of ALTER TABLE directly; avoid adding NOT NULL constraints without defaults in a single atomic step—instead, add nullable first, backfill, then add the NOT NULL constraint. For active-active, the safest pattern is schema changes that never take exclusive locks: add columns as nullable, add indexes CONCURRENTLY, and use trigger-based shadow tables for data migrations.
Prisma Migrate is ORM-coupled and generates database-specific SQL, so switching databases requires rewriting migrations entirely. Challenges: different SQL dialects (PostgreSQL's SERIAL vs MySQL's AUTO_INCREMENT), different constraint syntax, different index types, and data type mismatches (PostgreSQL's JSONB vs MySQL's JSON). Approach: export your current schema as SQL, translate to MySQL dialect manually or via a tool like pg2mysql, then create new migration files in MySQL format. For data migration, use a ETL tool or write a migration script that reads from PostgreSQL and writes to MySQL. Alternatively, consider using Flyway or Liquibase for database-agnostic migrations since they use pure SQL changelogs that can be dialect-translated.
Split the migration into phases: First, deploy application code that writes the computed value to both old and new columns during the expand phase. Second, run the backfill in batches using UPDATE ... WHERE id > last_id ORDER BY id LIMIT batch_size with small batches (e.g., 1000 rows) and sleep intervals (e.g., 100ms between batches) to avoid lock contention and allow the application to continue writing. Third, once backfill is complete, deploy the contract phase that drops the old column. The key is that the expand phase allows the application to continue functioning while backfill runs in the background — the application writes are the "live backfill" for new rows. Schedule the final contract phase (old column drop) during low-traffic window since it requires a brief lock.
Migrations modify schema (DDL) — adding columns, creating indexes, altering constraints. Seeds populate initial or reference data — lookup tables, default configurations, test data. Use migrations for schema evolution; use seeds for data that belongs in the database from the start. In deployment pipelines: run migrations before the application starts (schema must exist for the app to connect), run seeds after migrations complete but before traffic arrives (data must exist for the app to function correctly). Never mix seeds into migrations — if a seed fails, you should be able to re-run it independently without re-running the entire migration history.
Use the expand-contract pattern with dependent object handling: Phase 1 (Expand): Add the new column alongside the old one. Update all stored procedures, functions, and views to write to both columns using INSTEAD OF triggers or application-level dual write. Create new versions of all dependent objects that reference the new column name, keeping old versions for backward compatibility. Phase 2 (Migrate): Backfill data from old to new column. Switch all reads to use the new column. Phase 3 (Contract): Drop the old column and delete old versions of dependent objects. For 50+ dependent objects, use pg_proc and pg_depend to find all references: SELECT proname FROM pg_proc WHERE prosrc LIKE '%old_column_name%'. Automate the generation of the migration script using this query.
CREATE INDEX CONCURRENTLY cannot run inside a transaction and how this affects migration tooling.CREATE INDEX CONCURRENTLY must not run inside a transaction because it needs to commit each intermediate step separately — the index build happens in multiple phases (index creation, initial data population, index ready). If the entire operation were in one transaction and failed, rolling back would leave partial index state that PostgreSQL cannot clean up automatically. Migration tools like Flyway normally wrap migrations in transactions for atomicity, but CONCURRENTLY breaks this. Solutions: use migration tools that support non-transactional commands (Liquibase can be configured for this), separate index creation migrations from DDL migrations, or use pg_repack for non-blocking index builds instead of CONCURRENTLY. Always verify your migration tool's transaction handling before deploying index changes in production.
Blue-green deployments require backward-compatible migrations because the old (blue) and new (green) application versions both connect to the same database during the transition window. Rules: never drop or rename columns/tables that old code still references; never add NOT NULL constraints without defaults on columns that old code writes to; never create indexes without CONCURRENTLY. Migrations must be deployable before traffic switches to the new version, and they must be safe to run while both versions are active. The ideal migration is one that works with both versions — adding a nullable column, adding an index concurrently, or adding a constraint as NOT VALID. After traffic switches and old code is fully retired, you can run the contract phase to remove old structures.
Idempotent migrations check for existence before executing. Instead of ALTER TABLE orders ADD CONSTRAINT positive_total CHECK (total >= 0), use: ALTER TABLE orders ADD CONSTRAINT positive_total CHECK (total >= 0) NOT VALID combined with a pre-check: ALTER TABLE orders VALIDATE CONSTRAINT positive_total wrapped in DO $$ BEGIN ... EXCEPTION WHEN duplicate_object THEN NULL; END $$. For columns: ALTER TABLE users ADD COLUMN IF NOT EXISTS is_verified BOOLEAN DEFAULT FALSE (PostgreSQL 9.5+). For indexes: CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_email ON users(email). The pattern: always check existence or handle the exception, so re-running a migration that partially succeeded or already has the structure does not fail.
Deadlock occurs when a migration holds locks on multiple rows or tables in a different order than the application, or when multiple batches of the same migration run simultaneously. For example, a migration that updates rows in table A then table B, while an application updates B then A, creates deadlock potential. Batch processing prevents this by: acquiring locks in a consistent order (always update lower-id rows first), using smaller batch sizes to limit lock duration, adding SELECT ... FOR UPDATE SKIP LOCKED to skip rows already locked by concurrent operations, and adding sleep intervals between batches to allow other operations to proceed. Monitor pg_locks during large migrations and set alerts for blocked queries exceeding threshold.
GitOps migration safeguards: (1) Migration files must be reviewedable and versioned — no inline SQL execution; (2) Migrations must be idempotent — re-running must not fail; (3) Include rollback capability in migration history, not just the migration itself; (4) Add a dry-run or validation step before production execution; (5) Set up manual approval gates for destructive migrations (DROP COLUMN, DROP TABLE, DROP INDEX); (6) Monitor migration duration and set timeouts — a migration running longer than expected indicates a problem; (7) Have a kill-switch that stops the deployment pipeline if migration fails; (8) Test migrations on a production-data clone before running in production; (9) Maintain a migration freeze window during peak traffic or critical business periods.
Migration debt accumulates when schema changes diverge from application code expectations — old columns still referenced by code that was supposed to be removed, old tables that should have been dropped but weren't, constraints that were added but never validated. It builds up over time from rushed migrations, failed contracts, and deferred cleanup. Managing it: track pending contract phases in migration documentation or a project tracker; schedule periodic "schema cleanup" sprints; enforce that every migration includes a contract phase with a deadline; use static analysis tools like pgcluu or pg_stat_statements to detect unused columns and tables. The risk: migration debt increases operational complexity, slows queries (PostgreSQL still scans dead columns), and creates security surfaces.
Testing migrations: (1) Run on production-size data clone — use a point-in-time recovery replica or snapshot for testing; (2) Measure execution time — a migration that takes 2 seconds on laptop might take 2 hours on production with real data volumes; (3) Check lock escalation — verify the migration does not escalate to table-level locks under load; (4) Test rollback — verify that if the migration fails partway, the rollback path works correctly; (5) Run concurrent load test — while migration is running, simulate application traffic to verify no deadlock or blocking; (6) Check dependency impacts — run EXPLAIN on dependent queries to verify indexes are used correctly after the migration.
Lazy migration (where application code and database schema are allowed to be out of sync temporarily) is safe only with strong backward compatibility guarantees and short drift windows. Requirements: application must handle both old and new schema simultaneously (expand phase is complete before deploy); drift window must be minutes to hours, not days; rollback must be possible without data loss; all team members must be aware of the drift and not introduce conflicting changes during the window. The expand-contract pattern is a form of lazy migration where the expand phase intentionally creates drift. The danger is when drift becomes unbounded — multiple teams making independent changes that each assume the other has migrated, leading to incompatible schema versions running simultaneously.
Schema-first (Prisma): schema definition is the source of truth; migrations are generated from schema diffs. Pros: single source of truth, no drift between schema and migrations. Cons: ORM lock-in, less control over SQL syntax, harder to handle complex migrations. Migration-first (Flyway): SQL scripts are the source of truth; schema is built from migrations. Pros: full SQL control, database-agnostic, works with any ORM. Cons: schema and migration history can drift if not maintained carefully, manual schema management required. For teams using Prisma as ORM and wanting tight integration, schema-first makes sense. For teams that need cross-database support, complex custom SQL, or want to treat migrations as the audit log of schema changes, migration-first is better.
On PostgreSQL versions before 11, adding a column with a DEFAULT required a table rewrite (every row was updated). For a 2TB table, this could take hours and lock the table. Workaround: add the column as nullable without a default (ALTER TABLE users ADD COLUMN is_active BOOLEAN); update in batches (UPDATE users SET is_active = FALSE WHERE is_active IS NULL AND ctid IN (SELECT ctid FROM users WHERE is_active IS NULL LIMIT 10000)); add the default after all rows are backfilled (ALTER TABLE users ALTER COLUMN is_active SET DEFAULT FALSE); finally, backfill the default for any remaining NULLs if needed. This multi-step approach avoids the table rewrite while achieving the same result.
Adding a UNIQUE constraint when duplicates exist will fail. Handle it before the migration: identify duplicates: SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1. Decide on a resolution strategy: delete duplicates (keep row with highest ID, delete others), merge duplicates (combine data into one row, update references), or update duplicates to be unique (e.g., append a suffix to email). For production with duplicates, the safest approach is a multi-step expand-contract: add a nullable unique column with a temporary unique index; backfill with unique values (append a suffix to duplicates); validate uniqueness; then add the proper NOT NULL UNIQUE constraint. Always document how duplicates were resolved for audit purposes.
Cross-database migrations require an ETL (Extract-Transform-Load) approach rather than a single SQL migration. Extract: read data from the source database using its native tools (pg_dump for PostgreSQL, mysqldump for MySQL). Transform: convert data types, handle dialect differences (e.g., TEXT vs VARCHAR, different date formats), and transform schemas (column renames, type mappings). Load: use a tool like pgloader, a custom ETL script, or a data integration platform. For zero-downtime cross-database migration: set up dual-write at the application layer (write to both databases simultaneously), run parallel reads during migration window, validate data integrity between source and target, then switch read path to new database. Tools: pgloader (PostgreSQL to many targets), AWS DMS (Database Migration Service), or custom scripts with batch processing.
Further Reading
- PostgreSQL ALTER TABLE Documentation — Locking behavior for DDL operations
- Flyway Documentation — Versioned migration management
- Liquibase Documentation — Change log-based migrations
- Prisma Migrate — ORM-integrated schema migrations
- pg-migrate — Node.js database migration tool
Conclusion
Zero-downtime database migrations are not about any single technique—they are about a mindset. Assume your application code and database schema will be out of sync during deployments. Design migrations that work with multiple application versions. Use the expand-contract pattern for breaking changes. Leverage your migration tooling to track history and automate consistency.
The investment in proper migration practices pays off every deployment. No drama, no late-night incidents, no corrupted data. Just smooth, boring deployments—the kind you want.
Category
Related Posts
Deployment Strategies: Rolling, Blue-Green, and Canary Releases
Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.
CI/CD Pipelines for Microservices
Learn how to design and implement CI/CD pipelines for microservices with automated testing, blue-green deployments, and canary releases.
Health Checks: Liveness, Readiness, and Service Availability
Master health check implementation for microservices including liveness probes, readiness probes, and graceful degradation patterns.