Vacuuming and Reindexing in PostgreSQL
PostgreSQL's MVCC requires regular maintenance. This guide explains dead tuples, VACUUM vs VACUUM FULL, autovacuum tuning, REINDEX strategies, and how to monitor bloat with pg_stat_user_tables.
Vacuuming and Reindexing in PostgreSQL
PostgreSQL uses Multi-Version Concurrency Control (MVCC) to allow concurrent reads and writes without locking. This design gives PostgreSQL its excellent concurrency characteristics. But MVCC is not free—every UPDATE and DELETE leaves behind dead tuples that need to be cleaned up. Ignore this cleanup long enough, and your tables bloat, your queries slow down, and you wonder why your 100GB database is eating 500GB of disk.
This guide covers the maintenance tasks that keep PostgreSQL running smoothly: vacuuming and reindexing.
Introduction
PostgreSQL’s MVCC design allows concurrent reads and writes without locks, but it accumulates dead tuples with every UPDATE and DELETE. If you never clean these up, tables bloat, disk fills up, and query performance degrades. VACUUM reclaims dead tuple space; REINDEX rebuilds corrupted or bloated indexes. These maintenance operations are not optional — they are part of normal operation PostgreSQL.
This guide covers how PostgreSQL’s MVCC produces dead tuples, how VACUUM and VACUUM FULL differ, autovacuum tuning for high-write tables, REINDEX strategies for index bloat and corruption, and how to monitor bloat using pg_stat_user_tables and pgstattuple so you know when maintenance is needed before it becomes a problem.
Understanding MVCC and Dead Tuples
When you UPDATE a row in PostgreSQL, it does not modify the existing row. Instead, it marks the old row as dead and inserts a new row. Similarly, DELETE marks rows as dead rather than removing them. This is how MVCC works—transactions see the version of the data that existed when they started.
The problem is that dead tuples accumulate. They take up space, and they slow down scans because the query planner has to skip over them. Eventually, if you never clean them up, you run out of disk space or your tables become so bloated that performance grinds to a halt.
VACUUM is PostgreSQL’s garbage collector. It marks the space used by dead tuples as free for future use. It does not return the space to the operating system—it just marks the pages as reusable. For that, you need VACUUM FULL.
VACUUM vs VACUUM FULL
Let me distinguish between the two clearly.
VACUUM (the standard command) marks dead tuples as free space within the table. It does not shrink the table’s physical size on disk. The freed space is immediately available for new rows within the same table. This is the command you should run most of the time.
VACUUM FULL rewrites the entire table, compacting it and returning unused space to the operating system. It requires an exclusive lock on the table—writes and reads are blocked for the duration. On a large table, this can take hours. Avoid this in production unless you have a maintenance window and genuinely need the space back.
-- Standard vacuum (run regularly)
VACUUM users;
-- Full vacuum (rarely needed, blocks everything)
VACUUM FULL users;
Let autovacuum handle routine vacuuming. Only use VACUUM FULL when you need to reclaim disk space after deleting a large amount of data—and even then, do it during a maintenance window.
Autovacuum Tuning
PostgreSQL’s autovacuum daemon automates vacuuming, but the defaults are conservative. For a busy database, you will likely need to tune these settings.
The key parameters are:
autovacuum_vacuum_threshold(default 50): Minimum number of dead tuples before VACUUM is triggeredautovacuum_vacuum_scale_factor(default 0.2): Fraction of table size to add to thresholdautovacuum_analyze_threshold(default 50): Minimum number of tuple changes before ANALYZEautovacuum_analyze_scale_factor(default 0.1): Fraction of table size for ANALYZE thresholdautovacuum_vacuum_cost_delay(default 20ms): Sleep time between vacuum pages
For a table with 10 million rows, the scale factor approach means autovacuum waits until there are 2 million dead tuples. That is often too aggressive for busy tables.
You can configure autovacuum per-table:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 1000
);
This reduces the scale factor to 1% and sets a minimum threshold of 1000 dead tuples. For large tables, this triggers vacuuming much sooner.
Watch out for tables with very high update rates. If you are updating millions of rows per hour, the default settings will let bloat accumulate faster than autovacuum can handle. You might need to manually VACUUM those tables on a schedule.
REINDEX Strategies
Indexes in PostgreSQL can also become bloated over time. B-tree indexes, the default index type, can develop page-level fragmentation. REINDEX rebuilds the index, compacting it and improving performance.
Blocking vs Concurrent REINDEX
The standard REINDEX command blocks writes and reads while rebuilding:
REINDEX INDEX idx_users_email;
For production systems with large indexes, use CONCURRENTLY:
REINDEX INDEX CONCURRENTLY idx_users_email;
REINDEX CONCURRENTLY builds a new index alongside the existing one, then swaps them. Writes are never blocked. The main caveat is that REINDEX CONCURRENTLY cannot run inside a transaction, and it requires extra disk space for the temporary new index.
When to REINDEX
Index bloat is not always the problem people think it is. B-tree indexes self-balance, and some fragmentation is normal. REINDEX is most valuable when:
- After a bulk DELETE on a heavily indexed table
- When an index has become significantly bloated (verify with pgstatcache or pgstattuple)
- Following a PostgreSQL major version upgrade (some versions have index format changes)
- When index statistics have become stale despite ANALYZE
For routine maintenance, weekly or monthly REINDEX on problematic indexes is reasonable. Do not reindex everything blindly—focus on indexes that show bloat in your monitoring.
When to Use / When Not to Use
Use VACUUM (standard) for routine maintenance — it reclaims dead tuple space within the table without blocking reads or writes. Run it after large batch DELETE or UPDATE operations, or when n_dead_tup climbs above meaningful levels but disk space is not yet critical.
Use VACUUM FULL only when you genuinely need to return disk space to the operating system and can afford a maintenance window with exclusive table access. Reserve this for tables with severe bloat above 50% that cannot wait for normal cleanup.
Use CLUSTER when you want to physically reorder table data for I/O efficiency on a specific index — particularly useful after bulk loads where the table is accessed sequentially on an indexed column.
Use REINDEX CONCURRENTLY in production systems where index rebuilds must not block writes, or after bulk deletes that significantly fragmented specific indexes.
Never run VACUUM FULL on a large table without a maintenance window — it holds an exclusive lock for the entire duration. Never use standard REINDEX on busy production indexes — use CONCURRENTLY. Never blindly reindex everything — target bloated ones only based on monitoring data.
flowchart TB
subgraph Write["UPDATE/DELETE Transaction"]
W1[Mark old row as DEAD] --> W2[Insert new row version]
end
subgraph Vacuum["VACUUM Process"]
V1[Scan table pages] --> V2[Find dead tuples]
V2 --> V3[Mark page space as FREE]
V3 --> V4[Update FSM<br/>Free Space Map]
end
subgraph Reindex["REINDEX Process"]
R1[Build new index<br/>alongside old] --> R2[Swap when complete]
R2 --> R3[Drop old index]
end
Write --> Vacuum
Write --> Reindex
When to Manually VACUUM/REINDEX
Autovacuum handles most situations. Manual intervention is warranted in specific scenarios.
After Large Batch Operations
If you have just loaded 10 million rows via COPY or a bulk UPDATE, autovacuum might not catch up immediately. Running VACUUM ANALYZE afterward ensures statistics are updated and dead tuples are cleaned up promptly.
COPY users FROM '/tmp/users.csv';
VACUUM ANALYZE users;
After Major Deletions
When you delete millions of rows (say, purging old orders), autovacuum will eventually clean up, but you might want to run VACUUM immediately if disk space is a concern. If you need the space back immediately, VACUUM FULL (during maintenance) or CLUSTER (which reorders physical storage) might be necessary.
Before Major Queries
If you are about to run a long report that will scan millions of rows, running ANALYZE first ensures the query planner has accurate statistics. This is especially relevant after large data changes.
ANALYZE users;
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2025-01-01';
Monitoring Bloat with pg_stat_user_tables
PostgreSQL provides built-in statistics for monitoring table health. The pg_stat_user_tables view is your first stop for diagnosing bloat issues.
SELECT
relname,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum,
vacuum_count,
autovacuum_count,
n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0) AS dead_tuple_pct
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC;
This query shows dead tuple counts and percentages for all tables. A table with 100 million live tuples and 10 million dead tuples has 9% bloat. Tables approaching 20-30% bloat benefit from manual vacuuming.
For more detailed bloat analysis, the pgstattuple extension provides actual bloat estimates:
SELECT * FROM pgstattuple('users');
This returns estimated free space, dead tuples, and bloat percentage. You need to install the extension first:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
Another useful view is pg_stat_user_indexes, which shows index usage statistics. Indexes that are never scanned (high idx_scan of 0) might be unnecessary:
SELECT
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
Table Bloat Detection Query
For a more comprehensive bloat analysis, you can use this query that calculates bloat based on page counts:
WITH constants AS (
SELECT current_setting('block_size')::numeric AS bs, 24 AS page_header
), calc AS (
SELECT
ma,
attrelid,
relname,
blksizepg,
(datawidth + (blksizepg - page_header) * (1 - fillfactor::real / 100)) AS expected,
(headerlen + (blksizepg - page_header) * (1 - fillfactor::real / 100)) AS expected_header
FROM (
SELECT
0 AS ma,
c.oid AS attrelid,
c.relname,
c.reloptions,
s.starelid,
8 AS headerlen,
s.stawidth AS datawidth,
s.stanullfrac,
COALESCE(c.reloptions[1], 'block_size=8192') AS blksizepg,
COALESCE(a.attoptions[1], 'fillfactor=90') AS fillfactor
FROM pg_attribute a
JOIN pg_statistic s ON attrelid = starelid AND attnum = staattnum
JOIN pg_class c ON oid = attrelid
WHERE staattnum > 0
) AS sub
)
SELECT
relname,
ROUND(100 * (actual - expected) / actual, 2) AS bloat_pct,
actual,
expected
FROM (
SELECT
calc.relname,
c.relpages AS actual,
(c.relpages::numeric * (SELECT blksizepg FROM calc LIMIT 1) / (SELECT expected FROM calc LIMIT 1)) AS expected
FROM calc
JOIN pg_class c ON c.oid = calc.attrelid
) sub
ORDER BY bloat_pct DESC;
Putting It Together: A Maintenance Routine
A reasonable PostgreSQL maintenance routine looks like this:
-
Daily: Check
pg_stat_user_tablesfor high dead tuple counts. Manually VACUUM tables that exceed thresholds. -
Weekly: Run VACUUM ANALYZE on all tables. Consider REINDEX on indexes with high bloat.
-
Monthly: Review autovacuum configuration. Adjust thresholds for tables with changing patterns.
-
After major data changes: Always run ANALYZE. Consider VACUUM if deleting significant data.
-
Monitoring: Graph key metrics over time—dead tuple counts, table size trends, autovacuum frequency.
The specifics depend on your workload. A database with heavy UPDATE/DELETE patterns needs more frequent maintenance than a mostly-append database. Watch your data and adjust accordingly.
Production Failure Scenarios
| Failure | Cause | Mitigation |
|---|---|---|
| Autovacuum thrashing | Tables with very high update rates generate dead tuples faster than autovacuum can process | Lower per-table autovacuum thresholds, run manual VACUUM on schedule, or partition high-churn tables |
| Index bloat emergency | Bulk deletes left indexes fragmented, query performance degrades | Run REINDEX CONCURRENTLY on affected indexes, monitor with pg_stat_user_indexes |
| VACUUM FULL locks table for hours | Large table with extreme bloat requires exclusive access | Use CLUSTER instead (faster for I/O-ordered results), or do incremental cleanup via small batch DELETEs |
| Transaction ID wraparound | Long-running transactions prevent VACUUM from reclaiming XIDs | Monitor pg_database.datfrozenxid, never hold open transactions for days, set idle_in_transaction_session_timeout |
| Table size explosion without dead tuples | Rapid data growth clutters buffer cache and slows scans | Partition tables by time range, use partition pruning to limit scans |
Trade-Off Table: VACUUM FULL vs CLUSTER vs REINDEX
| Dimension | VACUUM FULL | CLUSTER | REINDEX CONCURRENTLY |
|---|---|---|---|
| Lock behavior | Exclusive — blocks all reads and writes | Exclusive — blocks all reads and writes | Share lock — allows writes |
| Disk space needed | Extra space for the rewritten table | Extra space for new table copy | Extra space for new index |
| Physical order | Compacts but no ordering guarantee | Reorders rows by index for I/O efficiency | Preserves existing order |
| Speed | Slowest for large tables | Faster than VACUUM FULL for sequential workloads | Comparable to standard REINDEX |
| Use when | Need OS space back urgently | Table is accessed sequentially on a key after bulk load | Production index rebuild with zero downtime |
Capacity Estimation: Bloat Threshold and VACUUM Cost
Bloat percentage formula:
bloat_percentage = (dead_tuple_pages / (dead_tuple_pages + live_tuple_pages)) × 100
The pg_stat_user_tables view gives n_dead_tuples and n_live_tuples. Divide by the average tuples-per-page (typically ~300 for typical row widths) to estimate pages. For a table with 1M dead tuples and 9M live tuples at 300 tuples/page: 1M/300 ≈ 3,333 dead pages, 9M/300 ≈ 30,000 live pages. Bloat = 3,333 / 33,333 = 10%.
Autovacuum cost scaling formula:
vacuum_cost_delay_ms = vacuum_cost_delay × (scale_factor / 1000)
effective_vacuum_rate = vacuum_cost_page_hit / vacuum_cost_delay
Scale factor determines how aggressively autovacuum reclaims dead tuples. At autovacuum_vacuum_cost_delay = 20ms (default), each vacuum operation sleeps 20ms per cost unit. Raising scale factor too high (e.g., autovacuum_vacuum_scale_factor = 0.5 meaning 50% of table) causes autovacuum to be too aggressive and compete with production queries. For high-churn tables, use lower scale factors (0.05-0.1) and lower cost delays (2-5ms) to keep vacuum running continuously at low intensity.
Index bloat estimation:
index_bloat_ratio = index_page_count / expected_page_count_for_live_tuples
For a B-tree index with high turnover, bloated pages contain many dead index tuples. The pgstattuple extension provides pgstatindex() to report fragmentation. An index bloat ratio above 1.5 (50% more pages than necessary) warrants REINDEX.
REINDEX lock time estimation:
reindex_time_per_gb = index_size_gb × reindex_speed_factor_seconds_per_gb
For typical B-tree indexes on spinning disk: ~30-60 seconds per GB. On NVMe: ~5-10 seconds per GB. For a 50GB index: expect 4-8 minutes on NVMe. Use CONCURRENTLY to avoid locks—REINDEX CONCURRENTLY takes 2-3x longer but maintains availability.
Real-World Case Study: E-Commerce Site Bloat Incident
A major e-commerce platform experienced a severe bloat incident during a flash sale event that provides a cautionary tale about autovacuum tuning in production.
What happened: During a 24-hour sale event, the platform processed 2M orders. Each order generated 3-5 rows in various tables, and completed orders were soft-deleted (marked as cancelled/completed with a status flag). By the end of the event, 500GB of dead tuple space had accumulated across 12 tables. Their autovacuum workers, running at default settings (autovacuum_vacuum_scale_factor = 0.2, meaning vacuum only triggers at 20% dead tuples), could not keep up with the deletion rate.
The impact: Query performance degraded by 80% on order lookup tables. Page read times went from ~1ms to ~50ms as PostgreSQL scanned through dead tuple chains. The checkout flow started timing out. By the time the on-call DBA investigated, the autovacuum workers were running but could not catch up—a 50GB table with 20% bloat threshold requires ~10GB of dead tuple accumulation before triggering.
The fix: The team temporarily lowered autovacuum_vacuum_scale_factor to 0.05 (5%) and autovacuum_vacuum_cost_delay to 2ms on the affected tables using ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_cost_delay = 2). This caused autovacuum to trigger more frequently and run at lower intensity. Within 2 hours, bloat dropped from 50GB to 8GB. A VACUUM ANALYZE on the worst offending tables provided immediate relief.
The lesson: Default autovacuum settings are calibrated for development workloads, not high-churn production tables. After any event that generates bulk deletions, immediately run VACUUM on affected tables rather than waiting for autovacuum. For tables with high DELETE rates, set autovacuum_vacuum_scale_factor to 0.05-0.1 and cost delay to 1-2ms. Monitor n_dead_tuples per table and alert when dead tuple counts exceed 1M.
Quick Recap Checklist
- VACUUM reclaims dead tuple space within the table; VACUUM FULL rewrites the entire table and returns space to OS
- Autovacuum handles routine vacuuming but defaults are conservative for high-churn tables
- Per-table autovacuum tuning: lower
autovacuum_vacuum_scale_factor(0.05-0.1) andautovacuum_vacuum_cost_delay(2ms) for busy tables - REINDEX CONCURRENTLY rebuilds indexes without blocking writes in production
- Standard REINDEX blocks reads and writes — never use on busy production indexes
- Monitor
n_dead_tupfrompg_stat_user_tables— alert when dead tuple counts exceed thresholds -
pgstattupleextension provides actual bloat estimates for detailed analysis - Index bloat ratio above 1.5 warrants REINDEX; use
pgstatindex()to verify - Run VACUUM ANALYZE after large batch DELETE or COPY operations
- Transaction ID wraparound is a real danger — never hold open transactions for days
- Never run VACUUM FULL without a maintenance window — it holds an exclusive lock
- Table bloat above 20-30% benefits from manual VACUUM; above 50% consider VACUUM FULL during maintenance
Related Posts
- Database Monitoring - Tracking key metrics for PostgreSQL health
- Database Migration Strategies - Day-to-day operational practices
- Relational Databases - Core concepts and architecture
Interview Questions
VACUUM (standard) reclaims dead tuples in place, marking pages as free space without changing physical file size. It runs concurrently with production traffic, acquires only ShareUpdateExclusiveLock (which does not block reads or writes), and is the default maintenance operation. VACUUM FULL rewrites the entire table to a new file, physically compacting it and returning space to the OS. It requires AccessExclusiveLock (blocks all reads and writes) and cannot run concurrently. Use VACUUM for regular maintenance (daily or weekly, depending on churn). Use VACUUM FULL only when you need to shrink a table's disk footprint urgently AND can afford a maintenance window, or use pg_repack as a non-blocking alternative that achieves the same result with zero downtime.
First, diagnose with SELECT relname, n_dead_tuples, n_live_tuples, last_vacuum FROM pg_stat_user_tables WHERE relname = 'your_table'. High dead tuple count with no recent vacuum confirms bloat is from uncleared dead tuples. For a 10GB live data table with 70GB of dead space, a standard VACUUM will reclaim the free space within the table file but not return it to the OS. To actually shrink the file, use VACUUM FULL (blocks writes) or pg_repack -k -t your_table (zero-downtime compaction). Also check for bloated indexes: use pgstatindex() on each index. Often, bloated indexes contribute as much or more to storage bloat as the table itself—REINDEX each bloated index.
For write-heavy OLTP tables with frequent UPDATEs and DELETEs, autovacuum must run frequently with low cost to keep up with dead tuple generation. Configure: autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_cost_delay = 2ms, autovacuum_vacuum_cost_limit increased to allow more work per cycle (e.g., 2000). Monitor n_dead_tuples and set an alert when it exceeds thresholds specific to that table. For analytical tables with bulk loads (no updates/deletes between loads), autovacuum rarely needs to run—the dead tuple generation is bursty (during loads) rather than continuous. Configure autovacuum to be less aggressive: autovacuum_vacuum_scale_factor = 0.2 and let it trigger after large batch loads complete. Run VACUUM ANALYZE immediately after each bulk load to update statistics.
Index bloat accumulates when many rows are updated or deleted in an index—PostgreSQL marks index tuples as dead but never removes them until REINDEX runs. Unlike table bloat which VACUUM handles continuously, index bloat only improves with REINDEX. Symptoms: an index grows significantly larger than expected for its row count; query performance degrades despite table-level vacuum; pgstatindex() shows avg_leaf_density below 50%. REINDEX is needed when bloat ratio exceeds 1.5 (50% more pages than necessary). Use pgstatindex() output to compare index_size to tree_height and leaf_pages — a tree that is too tall for its row count indicates fragmentation. Schedule REINDEX during low-write windows or use REINDEX CONCURRENTLY to avoid blocking production traffic.
PostgreSQL uses a 32-bit XID counter (about 4 billion) that wraps around from 2^32-1 to 0. When wraparound approaches, PostgreSQL refuses to allocate new XIDs until VACUUM reclaims old XIDs — this is a safety feature to prevent data corruption. If pg_database.datfrozenxid is within 10 million transactions of the wrap point, PostgreSQL starts logging warnings. If it reaches the limit, PostgreSQL shuts down to prevent corruption — this is an emergency that requires immediate manual VACUUM. Prevention: never hold open transactions for days (set idle_in_transaction_session_timeout), monitor pg_database.datfrozenxid and pg_class.relfrozenxid, ensure autovacuum runs on all tables regularly. Alert threshold: warning at 10M XIDs remaining, critical at 1M remaining.
Bulk COPY bypasses autovacuum, so statistics become stale and dead tuples may accumulate. When COPY runs, PostgreSQL marks new tuples but does not trigger autovacuum until the threshold is reached — with 10M rows, this could be significant. The query planner uses stale statistics (table appears smaller than it is, indexes may be underutilized). Fix: immediately after COPY, run VACUUM ANALYZE table_name to update statistics and reclaim any dead tuples. If indexes were also copied, run REINDEX TABLE table_name to ensure index health. For future bulk loads, consider wrapping the load in a transaction with ANALYZE immediately after, or temporarily lowering the autovacuum threshold on the target table.
CLUSTER and VACUUM FULL. When would you choose one over the other?Both rewrite the table to reclaim space, but CLUSTER reorders rows based on an index, while VACUUM FULL only compacts without reordering. CLUSTER physically orders the table to match an index, which improves range scans and sequential I/O for queries that use that index. VACUUM FULL compacts but preserves insertion order, which may be random. CLUSTER is faster for tables accessed sequentially on a particular key after bulk loads. Both require AccessExclusiveLock and block reads/writes. CLUSTER uses the existing index ordering; to reorder on a different index, you must drop the old index first or use CLUSTER on a different index. For I/O-ordered workloads (datawarehouse, analytics), CLUSTER on the primary access pattern index is the better choice.
n_dead_tup but autovacuum is not cleaning it up. What are the possible causes?Possible causes for autovacuum failure: (1) Autovacuum is disabled for the table: ALTER TABLE orders SET (autovacuum_enabled = off); (2) Autovacuum threshold not reached: with default scale factor of 20%, a table needs 20% dead tuples before triggering — for large tables this is millions of tuples; (3) Autovacuum worker is blocked by long-running transactions: check pg_stat_activity for state = 'active' queries holding locks; (4) Table has large fillfactor or unusual storage parameters affecting vacuum behavior; (5) Autovacuum is overwhelmed — too many tables with high churn simultaneously; (6) PostgreSQL version with autovacuum bugs (older 9.x versions had various issues). Diagnosis: check last_autovacuum and last_vacuum in pg_stat_user_tables, and check autovacuum worker logs for errors.
The visibility map tracks which pages contain only tuples visible to all current transactions (fully frozen). VACUUM uses it to skip pages that don't need cleaning — if a page is marked visible, VACUUM ignores it entirely. This makes VACUUM much faster on tables with mostly-old data because it only scans pages with dead or recently-updated tuples. The visibility map is updated during VACUUM. A table with low fragmentation will have most pages marked visible, making subsequent VACUUMs very fast. Conversely, a table with high churn will have many pages that VACUUM must scan. Monitor visibility map effectiveness: SELECT relname, pg_size_pretty(sum(vis_pages * 8192)) AS visible_size FROM pg_stat_user_tables GROUP BY relname. Tables with poor visibility map coverage (most pages need vacuum) are likely high-churn tables that need lower autovacuum thresholds.
pg_repack achieve zero-downtime table compaction when VACUUM FULL blocks the table?pg_repack achieves zero-downtime compaction by creating a shadow table, rebuilding data into it via sorted bulk inserts, then swapping the shadow table with the original. The process: (1) Create a new table (the repacked version) with the same schema; (2) Populate it by inserting sorted data from the original table — sorted order matches the clustered index if specified; (3) Create triggers on the original table to keep the new table in sync during the rebuild; (4) Apply any changes that occurred during the rebuild; (5) Drop the original table and rename the new one; (6) Apply indexes, constraints, and triggers. Because the trigger maintains data integrity during the rebuild, the table remains available for writes throughout. pg_repack requires extra disk space (roughly the size of the table being repacked) and is generally faster than VACUUM FULL for large tables.
REINDEX shows no improvement. What else could be causing the problem?Index performance issues not fixed by REINDEX: (1) Stale statistics — REINDEX updates the index but not table statistics; run ANALYZE after REINDEX; (2) Wrong index chosen by planner — use EXPLAIN (ANALYZE, BUFFERS) to see if the index is being used; (3) Index bloat at the heap level — REINDEX cleans the index but not the table heap; (4) Corrupted pages — use pg_checksums to verify page integrity; (5) Sequential scan being chosen instead of index scan — check enable_seqscan and random_page_cost; (6) Index is on an expression that doesn't match the query — an index on LOWER(email) won't be used for WHERE email = 'test'. If REINDEX doesn't help, the problem is likely in statistics, planner misconfiguration, or query formulation rather than index bloat.
fillfactor is the percentage of each page reserved for updates — default is 100% (all space used). With lower fillfactor (e.g., 90%), each page has 10% space free for updates without needing a new page. This reduces table bloat for UPDATE-heavy tables because updated rows can often fit in their original page. However, lower fillfactor means more pages, which can slow sequential scans. The trade-off: for tables with frequent UPDATEs, fillfactor of 80-90% reduces dead tuple accumulation. For tables that are mostly inserted (append-only), fillfactor of 100% is better. When combined with autovacuum tuning, lower fillfactor on high-churn tables allows autovacuum to reclaim space more efficiently because dead tuples are more likely to fit in their original pages.
Standard VACUUM will reclaim 450GB within the table file (mark as reusable) but not return it to the OS. For fastest space recovery: use pg_repack (zero-downtime, non-blocking) — pg_repack -k -t table_name. This rewrites the table in sorted order (if an index is specified) and returns unused space to OS. It does not block reads or writes during the operation. For a blocking approach during maintenance window: VACUUM FULL is fastest for pure compaction but blocks everything. CLUSTER is faster than VACUUM FULL if you want to reorder by an index. After the repack or VACUUM FULL, run REINDEX on all indexes to clean index bloat. Then verify with SELECT pg_size_pretty(pg_total_relation_size('table_name')) to confirm space is reclaimed.
maintenance_work_mem setting affect VACUUM and REINDEX performance?maintenance_work_mem is the memory allocated for maintenance operations (VACUUM, ANALYZE, CREATE INDEX, REINDEX). Higher values allow these operations to work with larger memory allocations, reducing the number of passes needed and improving performance. Default is 64MB, which is too small for large tables. For a 32GB server, setting maintenance_work_mem to 1-2GB significantly speeds up VACUUM and REINDEX. REINDEX benefits because sorting (for btree index rebuild) is done in-memory with maintenance_work_mem. VACUUM benefits less directly but ANALYZE uses it for statistics gathering. Best practice: set maintenance_work_mem to a higher value (e.g., 1GB) and set autovacuum_max_work_mem to the same or higher for dedicated database servers. Note: this setting is per-operation, not shared across operations, so setting it high is safe.
VACUUM and VACUUM ANALYZE? When would you use each?VACUUM reclaims dead tuple space but does not update statistics. VACUUM ANALYZE does both — it vacuums AND runs ANALYZE to update table statistics for the query planner. Use VACUUM alone when: you need to reclaim space urgently (disk pressure), the table's data distribution hasn't changed (no new rows, no significant updates), and you don't want to trigger a planner statistics refresh. Use VACUUM ANALYZE when: data has changed significantly (bulk loads, large batch updates), you want to reclaim space and refresh statistics in one operation, or you want to update statistics for the planner after maintenance. For routine maintenance, VACUUM ANALYZE is preferred because it handles both issues in one pass. Running ANALYZE separately after VACUUM is wasteful — VACUUM ANALYZE does both efficiently.
B-tree indexes maintain sorted order for fast lookups. When pages split due to insertions or updates, fragmentation occurs — pages are no longer physically contiguous in the order the index expects. This causes: (1) More I/O per lookup — the index traversal must jump to non-contiguous pages; (2) Higher tree depth — fragmented indexes are taller (more levels) because page splits leave partially empty pages; (3) Poor sequential I/O — defragmented indexes enable read-ahead optimizations. REINDEX rebuilds the index compactly, reducing tree depth and improving leaf page density. For indexes with high churn, fragmentation can cause a 2-3x slowdown in index lookups even when cache hit ratio is high. Monitoring: pgstatindex() shows avg_leaf_density — below 50% indicates significant fragmentation warranting REINDEX.
A multi-column index on (a, b, c) supports queries filtering on a, a+b, or a+b+c, but not queries filtering only on b or c. The query planner chooses the index based on statistics — if the first column has low cardinality (few unique values), the planner may skip the index and do a sequential scan. VACUUM doesn't directly affect index ordering, but vacuuming the table updates statistics that the planner uses to decide whether to use the index. For multi-column indexes, the order of columns matters: put high-cardinality (many unique values) columns first for equality filters, or put range-scan columns last. Maintenance: REINDEX on multi-column indexes after bulk updates to improve leaf node density. The index structure itself is independent of table VACUUM, but the planner's decision to use it depends on table statistics.
Leaf page density is the ratio of used space to total space in the leaf pages of a B-tree index. Low density (e.g., 30%) means each leaf page is only 30% full — the index has more pages than needed for its data, increasing tree depth and I/O. Causes: heavy UPDATE patterns that cause page splits leaving gaps, bulk insertions followed by many deletions, or PostgreSQL version with poor fillfactor handling. REINDEX improves leaf density by rebuilding the index with optimal page utilization. pgstatindex('idx_name') returns avg_leaf_density and leaf_pages. Target: above 60% for active indexes. Below 40% warrants REINDEX. High leaf density means fewer pages to read, faster lookups, and better cache utilization — a 50% density improvement translates roughly to 50% fewer I/O operations per index scan.
buffersBackend and buffersBackend_fsync in pg_stat_bgwriter. What does this indicate?High buffersBackend means the backend processes (user sessions) are directly writing buffers to disk because the background writer (bgwriter) cannot keep up. This is a sign of write pressure exceeding bgwriter capacity. High buffersBackend_fsync means those backend writes required fsync operations to confirm durability — indicating that write durability is at risk if the bgwriter is overwhelmed. If these values are high: increase bgwriter_delay to let the bgwriter do more work between cycles, increase bgwriter_lru_maxpages to allow the bgwriter to flush more buffers per cycle, or increase bgwriter_lru_multiplier to allow more aggressive flushing. Also consider: reducing effective_io_concurrency if using async I/O, adding more RAM to increase shared_buffers and reduce I/O pressure, or sharding writes across multiple tables or databases to reduce pressure on a single instance.
Maintenance-related causes of replica lag: (1) VACUUM on primary generating WAL — autovacuum on a large table on primary writes many WAL records that replica must apply; (2) Large transactions on primary — a single large transaction generates many WAL records that must be replayed atomically on replica (replica can't catch up until the transaction commits); (3) Frequent autovacuum on primary creating many small WAL records that overwhelm the replica's apply rate; (4) Missing indexes causing sequential scans on replica that slow replay; (5) Disk I/O saturation on replica (replica is applying WAL while serving reads); (6) full_page_writes enabled (every page write generates full page image in WAL, increasing WAL volume). Diagnosis: compare pg_stat_replication lag with pg_stat_bgwriter on primary — high bgwriter activity often correlates with high replica lag. Solutions: tune autovacuum on primary, batch large transactions, add more replicas for read scaling.
Further Reading
- PostgreSQL VACUUM Documentation — Official documentation on VACUUM and VACUUM FULL
- PostgreSQL REINDEX Documentation — REINDEX options including CONCURRENTLY
- Routine Vacuuming Guide — PostgreSQL’s official best practices for maintenance
- pg_stat_statements — Query performance statistics extension
- pgstattuple — Tuple-level statistics for bloat measurement
Conclusion
MVCC is what makes PostgreSQL handle concurrency so well, but it requires maintenance. Vacuuming reclaims space from dead tuples. Reindexing compacts bloated indexes. Autovacuum handles the routine work, but you need to tune it for busy tables and watch for situations that overwhelm the defaults.
Build monitoring into your routine. Track dead tuple counts, vacuum timing, and table sizes. The first sign of bloat problems is easier to fix than advanced bloat disease.
PostgreSQL’s visibility into its own internals is excellent. The tools exist—you just need to use them.
Category
Related Posts
Database Capacity Planning: A Practical Guide
Plan for growth before you hit walls. This guide covers growth forecasting, compute and storage sizing, IOPS requirements, and cloud vs on-prem decisions.
Connection Pooling: HikariCP, pgBouncer, and ProxySQL
Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.
Database Monitoring: Metrics, Tools, and Alerting
Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.