Vacuuming and Reindexing in PostgreSQL
PostgreSQL's MVCC requires regular maintenance. This guide explains dead tuples, VACUUM vs VACUUM FULL, autovacuum tuning, REINDEX strategies, and how to monitor bloat with pg_stat_user_tables.
Vacuuming and Reindexing in PostgreSQL
PostgreSQL uses Multi-Version Concurrency Control (MVCC) to allow concurrent reads and writes without locking. This design gives PostgreSQL its excellent concurrency characteristics. But MVCC is not free—every UPDATE and DELETE leaves behind dead tuples that need to be cleaned up. Ignore this cleanup long enough, and your tables bloat, your queries slow down, and you wonder why your 100GB database is eating 500GB of disk.
This guide covers the maintenance tasks that keep PostgreSQL running smoothly: vacuuming and reindexing.
Understanding MVCC and Dead Tuples
When you UPDATE a row in PostgreSQL, it does not modify the existing row. Instead, it marks the old row as dead and inserts a new row. Similarly, DELETE marks rows as dead rather than removing them. This is how MVCC works—transactions see the version of the data that existed when they started.
The problem is that dead tuples accumulate. They take up space, and they slow down scans because the query planner has to skip over them. Eventually, if you never clean them up, you run out of disk space or your tables become so bloated that performance grinds to a halt.
VACUUM is PostgreSQL’s garbage collector. It marks the space used by dead tuples as free for future use. It does not return the space to the operating system—it just marks the pages as reusable. For that, you need VACUUM FULL.
VACUUM vs VACUUM FULL
Let me distinguish between the two clearly.
VACUUM (the standard command) marks dead tuples as free space within the table. It does not shrink the table’s physical size on disk. The freed space is immediately available for new rows within the same table. This is the command you should run most of the time.
VACUUM FULL rewrites the entire table, compacting it and returning unused space to the operating system. It requires an exclusive lock on the table—writes and reads are blocked for the duration. On a large table, this can take hours. Avoid this in production unless you have a maintenance window and genuinely need the space back.
-- Standard vacuum (run regularly)
VACUUM users;
-- Full vacuum (rarely needed, blocks everything)
VACUUM FULL users;
Let autovacuum handle routine vacuuming. Only use VACUUM FULL when you need to reclaim disk space after deleting a large amount of data—and even then, do it during a maintenance window.
Autovacuum Tuning
PostgreSQL’s autovacuum daemon automates vacuuming, but the defaults are conservative. For a busy database, you will likely need to tune these settings.
The key parameters are:
autovacuum_vacuum_threshold(default 50): Minimum number of dead tuples before VACUUM is triggeredautovacuum_vacuum_scale_factor(default 0.2): Fraction of table size to add to thresholdautovacuum_analyze_threshold(default 50): Minimum number of tuple changes before ANALYZEautovacuum_analyze_scale_factor(default 0.1): Fraction of table size for ANALYZE thresholdautovacuum_vacuum_cost_delay(default 20ms): Sleep time between vacuum pages
For a table with 10 million rows, the scale factor approach means autovacuum waits until there are 2 million dead tuples. That is often too aggressive for busy tables.
You can configure autovacuum per-table:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 1000
);
This reduces the scale factor to 1% and sets a minimum threshold of 1000 dead tuples. For large tables, this triggers vacuuming much sooner.
Watch out for tables with very high update rates. If you are updating millions of rows per hour, the default settings will let bloat accumulate faster than autovacuum can handle. You might need to manually VACUUM those tables on a schedule.
REINDEX Strategies
Indexes in PostgreSQL can also become bloated over time. B-tree indexes, the default index type, can develop page-level fragmentation. REINDEX rebuilds the index, compacting it and improving performance.
Blocking vs Concurrent REINDEX
The standard REINDEX command blocks writes and reads while rebuilding:
REINDEX INDEX idx_users_email;
For production systems with large indexes, use CONCURRENTLY:
REINDEX INDEX CONCURRENTLY idx_users_email;
REINDEX CONCURRENTLY builds a new index alongside the existing one, then swaps them. Writes are never blocked. The main caveat is that REINDEX CONCURRENTLY cannot run inside a transaction, and it requires extra disk space for the temporary new index.
When to REINDEX
Index bloat is not always the problem people think it is. B-tree indexes self-balance, and some fragmentation is normal. REINDEX is most valuable when:
- After a bulk DELETE on a heavily indexed table
- When an index has become significantly bloated (verify with pgstatcache or pgstattuple)
- Following a PostgreSQL major version upgrade (some versions have index format changes)
- When index statistics have become stale despite ANALYZE
For routine maintenance, weekly or monthly REINDEX on problematic indexes is reasonable. Do not reindex everything blindly—focus on indexes that show bloat in your monitoring.
When to Use / When Not to Use
Use VACUUM (standard) for routine maintenance — it reclaims dead tuple space within the table without blocking reads or writes. Run it after large batch DELETE or UPDATE operations, or when n_dead_tup climbs above meaningful levels but disk space is not yet critical.
Use VACUUM FULL only when you genuinely need to return disk space to the operating system and can afford a maintenance window with exclusive table access. Reserve this for tables with severe bloat above 50% that cannot wait for normal cleanup.
Use CLUSTER when you want to physically reorder table data for I/O efficiency on a specific index — particularly useful after bulk loads where the table is accessed sequentially on an indexed column.
Use REINDEX CONCURRENTLY in production systems where index rebuilds must not block writes, or after bulk deletes that significantly fragmented specific indexes.
Never run VACUUM FULL on a large table without a maintenance window — it holds an exclusive lock for the entire duration. Never use standard REINDEX on busy production indexes — use CONCURRENTLY. Never blindly reindex everything — target bloated ones only based on monitoring data.
flowchart TB
subgraph Write["UPDATE/DELETE Transaction"]
W1[Mark old row as DEAD] --> W2[Insert new row version]
end
subgraph Vacuum["VACUUM Process"]
V1[Scan table pages] --> V2[Find dead tuples]
V2 --> V3[Mark page space as FREE]
V3 --> V4[Update FSM<br/>Free Space Map]
end
subgraph Reindex["REINDEX Process"]
R1[Build new index<br/>alongside old] --> R2[Swap when complete]
R2 --> R3[Drop old index]
end
Write --> Vacuum
Write --> Reindex
When to Manually VACUUM/REINDEX
Autovacuum handles most situations. Manual intervention is warranted in specific scenarios.
After Large Batch Operations
If you have just loaded 10 million rows via COPY or a bulk UPDATE, autovacuum might not catch up immediately. Running VACUUM ANALYZE afterward ensures statistics are updated and dead tuples are cleaned up promptly.
COPY users FROM '/tmp/users.csv';
VACUUM ANALYZE users;
After Major Deletions
When you delete millions of rows (say, purging old orders), autovacuum will eventually clean up, but you might want to run VACUUM immediately if disk space is a concern. If you need the space back immediately, VACUUM FULL (during maintenance) or CLUSTER (which reorders physical storage) might be necessary.
Before Major Queries
If you are about to run a long report that will scan millions of rows, running ANALYZE first ensures the query planner has accurate statistics. This is especially relevant after large data changes.
ANALYZE users;
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2025-01-01';
Monitoring Bloat with pg_stat_user_tables
PostgreSQL provides built-in statistics for monitoring table health. The pg_stat_user_tables view is your first stop for diagnosing bloat issues.
SELECT
relname,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum,
vacuum_count,
autovacuum_count,
n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0) AS dead_tuple_pct
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC;
This query shows dead tuple counts and percentages for all tables. A table with 100 million live tuples and 10 million dead tuples has 9% bloat. Tables approaching 20-30% bloat benefit from manual vacuuming.
For more detailed bloat analysis, the pgstattuple extension provides actual bloat estimates:
SELECT * FROM pgstattuple('users');
This returns estimated free space, dead tuples, and bloat percentage. You need to install the extension first:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
Another useful view is pg_stat_user_indexes, which shows index usage statistics. Indexes that are never scanned (high idx_scan of 0) might be unnecessary:
SELECT
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
Table Bloat Detection Query
For a more comprehensive bloat analysis, you can use this query that calculates bloat based on page counts:
WITH constants AS (
SELECT current_setting('block_size')::numeric AS bs, 24 AS page_header
), calc AS (
SELECT
ma,
attrelid,
relname,
blksizepg,
(datawidth + (blksizepg - page_header) * (1 - fillfactor::real / 100)) AS expected,
(headerlen + (blksizepg - page_header) * (1 - fillfactor::real / 100)) AS expected_header
FROM (
SELECT
0 AS ma,
c.oid AS attrelid,
c.relname,
c.reloptions,
s.starelid,
8 AS headerlen,
s.stawidth AS datawidth,
s.stanullfrac,
COALESCE(c.reloptions[1], 'block_size=8192') AS blksizepg,
COALESCE(a.attoptions[1], 'fillfactor=90') AS fillfactor
FROM pg_attribute a
JOIN pg_statistic s ON attrelid = starelid AND attnum = staattnum
JOIN pg_class c ON oid = attrelid
WHERE staattnum > 0
) AS sub
)
SELECT
relname,
ROUND(100 * (actual - expected) / actual, 2) AS bloat_pct,
actual,
expected
FROM (
SELECT
calc.relname,
c.relpages AS actual,
(c.relpages::numeric * (SELECT blksizepg FROM calc LIMIT 1) / (SELECT expected FROM calc LIMIT 1)) AS expected
FROM calc
JOIN pg_class c ON c.oid = calc.attrelid
) sub
ORDER BY bloat_pct DESC;
Putting It Together: A Maintenance Routine
A reasonable PostgreSQL maintenance routine looks like this:
-
Daily: Check
pg_stat_user_tablesfor high dead tuple counts. Manually VACUUM tables that exceed thresholds. -
Weekly: Run VACUUM ANALYZE on all tables. Consider REINDEX on indexes with high bloat.
-
Monthly: Review autovacuum configuration. Adjust thresholds for tables with changing patterns.
-
After major data changes: Always run ANALYZE. Consider VACUUM if deleting significant data.
-
Monitoring: Graph key metrics over time—dead tuple counts, table size trends, autovacuum frequency.
The specifics depend on your workload. A database with heavy UPDATE/DELETE patterns needs more frequent maintenance than a mostly-append database. Watch your data and adjust accordingly.
Production Failure Scenarios
| Failure | Cause | Mitigation |
|---|---|---|
| Autovacuum thrashing | Tables with very high update rates generate dead tuples faster than autovacuum can process | Lower per-table autovacuum thresholds, run manual VACUUM on schedule, or partition high-churn tables |
| Index bloat emergency | Bulk deletes left indexes fragmented, query performance degrades | Run REINDEX CONCURRENTLY on affected indexes, monitor with pg_stat_user_indexes |
| VACUUM FULL locks table for hours | Large table with extreme bloat requires exclusive access | Use CLUSTER instead (faster for I/O-ordered results), or do incremental cleanup via small batch DELETEs |
| Transaction ID wraparound | Long-running transactions prevent VACUUM from reclaiming XIDs | Monitor pg_database.datfrozenxid, never hold open transactions for days, set idle_in_transaction_session_timeout |
| Table size explosion without dead tuples | Rapid data growth clutters buffer cache and slows scans | Partition tables by time range, use partition pruning to limit scans |
Trade-Off Table: VACUUM FULL vs CLUSTER vs REINDEX
| Dimension | VACUUM FULL | CLUSTER | REINDEX CONCURRENTLY |
|---|---|---|---|
| Lock behavior | Exclusive — blocks all reads and writes | Exclusive — blocks all reads and writes | Share lock — allows writes |
| Disk space needed | Extra space for the rewritten table | Extra space for new table copy | Extra space for new index |
| Physical order | Compacts but no ordering guarantee | Reorders rows by index for I/O efficiency | Preserves existing order |
| Speed | Slowest for large tables | Faster than VACUUM FULL for sequential workloads | Comparable to standard REINDEX |
| Use when | Need OS space back urgently | Table is accessed sequentially on a key after bulk load | Production index rebuild with zero downtime |
Capacity Estimation: Bloat Threshold and VACUUM Cost
Bloat percentage formula:
bloat_percentage = (dead_tuple_pages / (dead_tuple_pages + live_tuple_pages)) × 100
The pg_stat_user_tables view gives n_dead_tuples and n_live_tuples. Divide by the average tuples-per-page (typically ~300 for typical row widths) to estimate pages. For a table with 1M dead tuples and 9M live tuples at 300 tuples/page: 1M/300 ≈ 3,333 dead pages, 9M/300 ≈ 30,000 live pages. Bloat = 3,333 / 33,333 = 10%.
Autovacuum cost scaling formula:
vacuum_cost_delay_ms = vacuum_cost_delay × (scale_factor / 1000)
effective_vacuum_rate = vacuum_cost_page_hit / vacuum_cost_delay
Scale factor determines how aggressively autovacuum reclaims dead tuples. At autovacuum_vacuum_cost_delay = 20ms (default), each vacuum operation sleeps 20ms per cost unit. Raising scale factor too high (e.g., autovacuum_vacuum_scale_factor = 0.5 meaning 50% of table) causes autovacuum to be too aggressive and compete with production queries. For high-churn tables, use lower scale factors (0.05-0.1) and lower cost delays (2-5ms) to keep vacuum running continuously at low intensity.
Index bloat estimation:
index_bloat_ratio = index_page_count / expected_page_count_for_live_tuples
For a B-tree index with high turnover, bloated pages contain many dead index tuples. The pgstattuple extension provides pgstatindex() to report fragmentation. An index bloat ratio above 1.5 (50% more pages than necessary) warrants REINDEX.
REINDEX lock time estimation:
reindex_time_per_gb = index_size_gb × reindex_speed_factor_seconds_per_gb
For typical B-tree indexes on spinning disk: ~30-60 seconds per GB. On NVMe: ~5-10 seconds per GB. For a 50GB index: expect 4-8 minutes on NVMe. Use CONCURRENTLY to avoid locks—REINDEX CONCURRENTLY takes 2-3x longer but maintains availability.
Real-World Case Study: E-Commerce Site Bloat Incident
A major e-commerce platform experienced a severe bloat incident during a flash sale event that provides a cautionary tale about autovacuum tuning in production.
What happened: During a 24-hour sale event, the platform processed 2M orders. Each order generated 3-5 rows in various tables, and completed orders were soft-deleted (marked as cancelled/completed with a status flag). By the end of the event, 500GB of dead tuple space had accumulated across 12 tables. Their autovacuum workers, running at default settings (autovacuum_vacuum_scale_factor = 0.2, meaning vacuum only triggers at 20% dead tuples), could not keep up with the deletion rate.
The impact: Query performance degraded by 80% on order lookup tables. Page read times went from ~1ms to ~50ms as PostgreSQL scanned through dead tuple chains. The checkout flow started timing out. By the time the on-call DBA investigated, the autovacuum workers were running but could not catch up—a 50GB table with 20% bloat threshold requires ~10GB of dead tuple accumulation before triggering.
The fix: The team temporarily lowered autovacuum_vacuum_scale_factor to 0.05 (5%) and autovacuum_vacuum_cost_delay to 2ms on the affected tables using ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_cost_delay = 2). This caused autovacuum to trigger more frequently and run at lower intensity. Within 2 hours, bloat dropped from 50GB to 8GB. A VACUUM ANALYZE on the worst offending tables provided immediate relief.
The lesson: Default autovacuum settings are calibrated for development workloads, not high-churn production tables. After any event that generates bulk deletions, immediately run VACUUM on affected tables rather than waiting for autovacuum. For tables with high DELETE rates, set autovacuum_vacuum_scale_factor to 0.05-0.1 and cost delay to 1-2ms. Monitor n_dead_tuples per table and alert when dead tuple counts exceed 1M.
Interview Questions
Q1: What is the difference between VACUUM and VACUUM FULL, and when would you use each?
A: VACUUM (standard) reclaims dead tuples in place, marking pages as free space without changing physical file size. It runs concurrently with production traffic, acquires only ShareUpdateExclusiveLock (which does not block reads or writes), and is the default maintenance operation. VACUUM FULL rewrites the entire table to a new file, physically compacting it and returning space to the OS. It requires AccessExclusiveLock (blocks all reads and writes) and cannot run concurrently. Use VACUUM for regular maintenance (daily or weekly, depending on churn). Use VACUUM FULL only when you need to shrink a table’s disk footprint urgently AND can afford a maintenance window, or use pg_repack as a non-blocking alternative that achieves the same result with zero downtime.
Q2: A table has 80GB of disk space but only 10GB of live data. How do you diagnose and fix this?
A: First, diagnose with SELECT relname, n_dead_tuples, n_live_tuples, last_vacuum FROM pg_stat_user_tables WHERE relname = 'your_table'. High dead tuple count with no recent vacuum confirms bloat is from uncleared dead tuples. For a 10GB live data table with 70GB of dead space, a standard VACUUM will reclaim the free space within the table file but not return it to the OS. To actually shrink the file, use VACUUM FULL (blocks writes) or pg_repack -k -t your_table (zero-downtime compaction). Also check for bloated indexes: use pgstatindex() on each index. Often, bloated indexes contribute as much or more to storage bloat as the table itself—REINDEX each bloated index.
Q3: How does autovacuum tuning differ for a write-heavy OLTP table versus an analytical table with bulk inserts?
A: For write-heavy OLTP tables with frequent UPDATEs and DELETEs, autovacuum must run frequently with low cost to keep up with dead tuple generation. Configure: autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_cost_delay = 2ms, autovacuum_vacuum_cost_limit increased to allow more work per cycle (e.g., 2000). Monitor n_dead_tuples and set an alert when it exceeds thresholds specific to that table. For analytical tables with bulk loads (no updates/deletes between loads), autovacuum rarely needs to run—the dead tuple generation is bursty (during loads) rather than continuous. Configure autovacuum to be less aggressive: autovacuum_vacuum_scale_factor = 0.2 and let it trigger after large batch loads complete. Run VACUUM ANALYZE immediately after each bulk load to update statistics.
Q4: What causes index bloat, and how do you decide when to REINDEX?
A: Index bloat accumulates when many rows are updated or deleted in an index—PostgreSQL marks index tuples as dead but never removes them until REINDEX runs. Unlike table bloat which VACUUM handles continuously, index bloat only improves with REINDEX. Symptoms: an index grows significantly larger than expected for its row count; query performance degrades despite table-level vacuum; pgstatindex() shows avg_leaf_density below 50%. REINDEX is needed when bloat ratio exceeds 1.5 (50% more pages than necessary). Use pgstatindex() output to compare index_size to tree_height and leaf_pages — a tree that is too tall for its row count indicates fragmentation. Schedule REINDEX during low-write windows or use REINDEX CONCURRENTLY to avoid blocking production traffic.
Conclusion
MVCC is what makes PostgreSQL handle concurrency so well, but it requires maintenance. Vacuuming reclaims space from dead tuples. Reindexing compacts bloated indexes. Autovacuum handles the routine work, but you need to tune it for busy tables and watch for situations that overwhelm the defaults.
Build monitoring into your routine. Track dead tuple counts, vacuum timing, and table sizes. The first sign of bloat problems is easier to fix than advanced bloat disease.
PostgreSQL’s visibility into its own internals is excellent. The tools exist—you just need to use them.
Related Posts
- Database Monitoring - Tracking key metrics for PostgreSQL health
- Database Migration Strategies - Day-to-day operational practices
- Relational Databases - Core concepts and architecture
Category
Related Posts
Database Capacity Planning: A Practical Guide
Plan for growth before you hit walls. This guide covers growth forecasting, compute and storage sizing, IOPS requirements, and cloud vs on-prem decisions.
Connection Pooling: HikariCP, pgBouncer, and ProxySQL
Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.
Database Monitoring: Metrics, Tools, and Alerting
Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.