Vacuuming and Reindexing in PostgreSQL

PostgreSQL's MVCC requires regular maintenance. This guide explains dead tuples, VACUUM vs VACUUM FULL, autovacuum tuning, REINDEX strategies, and how to monitor bloat with pg_stat_user_tables.

published: March 26, 2026 reading time: 34 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

PostgreSQL's MVCC allows concurrent reads and writes without locks, but every UPDATE and DELETE leaves dead tuples that accumulate over time. VACUUM reclaims that space; VACUUM FULL rewrites the entire table and requires an exclusive lock. Autovacuum runs automatically but needs tuning for high-write tables—if it can't keep up, tables bloat and queries slow down. REINDEX rebuilds bloated or corrupted indexes. The key metrics to watch are dead tuple counts and index bloat via pg_stat_user_tables. After reading this you'll set up autovacuum properly, diagnose bloat, and know when to REINDEX versus when to let VACUUM handle it.

Vacuuming and Reindexing in PostgreSQL

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to allow concurrent reads and writes without locking. This design gives PostgreSQL its excellent concurrency characteristics. But MVCC is not free—every UPDATE and DELETE leaves behind dead tuples that need to be cleaned up. Ignore this cleanup long enough, and your tables bloat, your queries slow down, and you wonder why your 100GB database is eating 500GB of disk.

This guide covers the maintenance tasks that keep PostgreSQL running smoothly: vacuuming and reindexing.

Introduction

PostgreSQL’s MVCC design allows concurrent reads and writes without locks, but it accumulates dead tuples with every UPDATE and DELETE. If you never clean these up, tables bloat, disk fills up, and query performance degrades. VACUUM reclaims dead tuple space; REINDEX rebuilds corrupted or bloated indexes. These maintenance operations are not optional — they are part of normal operation PostgreSQL.

This guide covers how PostgreSQL’s MVCC produces dead tuples, how VACUUM and VACUUM FULL differ, autovacuum tuning for high-write tables, REINDEX strategies for index bloat and corruption, and how to monitor bloat using pg_stat_user_tables and pgstattuple so you know when maintenance is needed before it becomes a problem.

Understanding MVCC and Dead Tuples

When you UPDATE a row in PostgreSQL, it does not modify the existing row. Instead, it marks the old row as dead and inserts a new row. Similarly, DELETE marks rows as dead rather than removing them. This is how MVCC works—transactions see the version of the data that existed when they started.

The problem is that dead tuples accumulate. They take up space, and they slow down scans because the query planner has to skip over them. Eventually, if you never clean them up, you run out of disk space or your tables become so bloated that performance grinds to a halt.

VACUUM is PostgreSQL’s garbage collector. It marks the space used by dead tuples as free for future use. It does not return the space to the operating system—it just marks the pages as reusable. For that, you need VACUUM FULL.

VACUUM vs VACUUM FULL

Let me distinguish between the two clearly.

VACUUM (the standard command) marks dead tuples as free space within the table. It does not shrink the table’s physical size on disk. The freed space is immediately available for new rows within the same table. This is the command you should run most of the time.

VACUUM FULL rewrites the entire table, compacting it and returning unused space to the operating system. It requires an exclusive lock on the table—writes and reads are blocked for the duration. On a large table, this can take hours. Avoid this in production unless you have a maintenance window and genuinely need the space back.

-- Standard vacuum (run regularly)
VACUUM users;

-- Full vacuum (rarely needed, blocks everything)
VACUUM FULL users;

Let autovacuum handle routine vacuuming. Only use VACUUM FULL when you need to reclaim disk space after deleting a large amount of data—and even then, do it during a maintenance window.

Autovacuum Tuning

PostgreSQL’s autovacuum daemon automates vacuuming, but the defaults are conservative. For a busy database, you will likely need to tune these settings.

The key parameters are:

autovacuum_vacuum_threshold (default 50): Minimum number of dead tuples before VACUUM is triggered
autovacuum_vacuum_scale_factor (default 0.2): Fraction of table size to add to threshold
autovacuum_analyze_threshold (default 50): Minimum number of tuple changes before ANALYZE
autovacuum_analyze_scale_factor (default 0.1): Fraction of table size for ANALYZE threshold
autovacuum_vacuum_cost_delay (default 20ms): Sleep time between vacuum pages

For a table with 10 million rows, the scale factor approach means autovacuum waits until there are 2 million dead tuples. That is often too aggressive for busy tables.

You can configure autovacuum per-table:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 1000
);

This reduces the scale factor to 1% and sets a minimum threshold of 1000 dead tuples. For large tables, this triggers vacuuming much sooner.

Watch out for tables with very high update rates. If you are updating millions of rows per hour, the default settings will let bloat accumulate faster than autovacuum can handle. You might need to manually VACUUM those tables on a schedule.

REINDEX Strategies

Indexes in PostgreSQL can also become bloated over time. B-tree indexes, the default index type, can develop page-level fragmentation. REINDEX rebuilds the index, compacting it and improving performance.

Blocking vs Concurrent REINDEX

The standard REINDEX command blocks writes and reads while rebuilding:

REINDEX INDEX idx_users_email;

For production systems with large indexes, use CONCURRENTLY:

REINDEX INDEX CONCURRENTLY idx_users_email;

REINDEX CONCURRENTLY builds a new index alongside the existing one, then swaps them. Writes are never blocked. The main caveat is that REINDEX CONCURRENTLY cannot run inside a transaction, and it requires extra disk space for the temporary new index.

When to REINDEX

Index bloat is not always the problem people think it is. B-tree indexes self-balance, and some fragmentation is normal. REINDEX is most valuable when:

After a bulk DELETE on a heavily indexed table
When an index has become significantly bloated (verify with pgstatcache or pgstattuple)
Following a PostgreSQL major version upgrade (some versions have index format changes)
When index statistics have become stale despite ANALYZE

For routine maintenance, weekly or monthly REINDEX on problematic indexes is reasonable. Do not reindex everything blindly—focus on indexes that show bloat in your monitoring.

When to Use / When Not to Use

Use VACUUM (standard) for routine maintenance — it reclaims dead tuple space within the table without blocking reads or writes. Run it after large batch DELETE or UPDATE operations, or when n_dead_tup climbs above meaningful levels but disk space is not yet critical.

Use VACUUM FULL only when you genuinely need to return disk space to the operating system and can afford a maintenance window with exclusive table access. Reserve this for tables with severe bloat above 50% that cannot wait for normal cleanup.

Use CLUSTER when you want to physically reorder table data for I/O efficiency on a specific index — particularly useful after bulk loads where the table is accessed sequentially on an indexed column.

Use REINDEX CONCURRENTLY in production systems where index rebuilds must not block writes, or after bulk deletes that significantly fragmented specific indexes.

Never run VACUUM FULL on a large table without a maintenance window — it holds an exclusive lock for the entire duration. Never use standard REINDEX on busy production indexes — use CONCURRENTLY. Never blindly reindex everything — target bloated ones only based on monitoring data.

flowchart TB
    subgraph Write["UPDATE/DELETE Transaction"]
        W1[Mark old row as DEAD] --> W2[Insert new row version]
    end

    subgraph Vacuum["VACUUM Process"]
        V1[Scan table pages] --> V2[Find dead tuples]
        V2 --> V3[Mark page space as FREE]
        V3 --> V4[Update FSM<br/>Free Space Map]
    end

    subgraph Reindex["REINDEX Process"]
        R1[Build new index<br/>alongside old] --> R2[Swap when complete]
        R2 --> R3[Drop old index]
    end

    Write --> Vacuum
    Write --> Reindex

When to Manually VACUUM/REINDEX

Autovacuum handles most situations. Manual intervention is warranted in specific scenarios.

After Large Batch Operations

Bulk loads via COPY and batch updates behave differently from continuous OLTP traffic. Autovacuum decides whether to run based on tuple counts relative to table size, so a 10 million row COPY into a 100 million row table is only a 10% increase, and autovacuum might not trigger at all if the table already had significant dead tuples from prior churn. COPY bypasses the normal insertion path that would normally accumulate toward the autovacuum threshold.

After a large load, there are a few distinct problems. Statistics in pg_statistic go stale immediately — the planner thinks the table is smaller than it actually is, which can cause it to pick sequential scans over index scans for queries that would only return a small fraction of rows. If the load overwrites existing rows (an upsert-style operation), dead tuples accumulate at the table level even though COPY did not update in place. Indexes on the loaded table also develop fragmentation because the new pages are appended at the end of the heap while the index entries are sorted — the index tree has many half-empty pages from the initial build, and then new entries get appended during the load.

Running VACUUM ANALYZE immediately after the load handles the first two issues in a single pass. For index fragmentation, check pgstatindex() on each index after the load. If avg_leaf_density is below 60%, run REINDEX on that index to rebuild it compactly. On very large tables (100GB+), raise maintenance_work_mem above the default 64MB before running REINDEX so the sort phase stays in memory. On a 32GB server, 2GB is a practical target for this setting.

If the load runs inside a transaction, put VACUUM ANALYZE at the end so both operations commit together. This avoids a window where statistics are stale and the table is already queryable. For partitioned tables loaded partition-by-partition, run VACUUM ANALYZE on each partition as it completes rather than waiting for the entire load to finish.

COPY users FROM '/tmp/users.csv';
VACUUM ANALYZE users;
-- Check index health after load
SELECT indexrelname, avg_leaf_density, index_size
FROM pg_stat_user_indexes idx
JOIN pgstatindex(idx.indexrelname::regclass) pi ON true
WHERE idx.relname = 'users'
ORDER BY indexrelname;

After Major Deletions

When you delete millions of rows (say, purging old orders), autovacuum will eventually clean up, but you might want to run VACUUM immediately if disk space is a concern. If you need the space back immediately, VACUUM FULL (during maintenance) or CLUSTER (which reorders physical storage) might be necessary.

The decision between VACUUM FULL and CLUSTER depends on what you are trying to achieve. If you only need to reclaim disk space and can tolerate a table lock, VACUUM FULL compacts the table file and returns free space to the OS. If your deleted orders table is accessed sequentially on created_at or status, CLUSTER rewrites the table ordered by that index, which can dramatically speed up the next sequential scan. CLUSTER is faster than VACUUM FULL for I/O-ordered workloads because it sorts data during the rewrite.

For a more nuanced approach when zero-downtime is required, consider pg_repack. It achieves the same result as CLUSTER without blocking reads or writes by maintaining a shadow table with triggers. The trade-off is extra disk space (roughly the size of the target table) and longer total runtime. Here is when to reach for each option:

Scenario	Command	Downtime	Disk Impact
Reclaim space, no time pressure	`VACUUM FULL`	Blocks all access	Returns space to OS
Reclaim space + order by index	`CLUSTER`	Blocks all access	Returns space to OS
Zero-downtime reclaim	`pg_repack -k -t table`	None	~2x table size during operation
Immediate dead tuple cleanup	`VACUUM` (standard)	None	Reclaims within table file

If you are purging historical data on a schedule (e.g., quarterly), build the VACUUM or pg_repack step into the purge job itself so maintenance happens in the same maintenance window that already exists for the deletion operation.

Before Major Queries

If you are about to run a long report that will scan millions of rows, running ANALYZE first ensures the query planner has accurate statistics. This is especially relevant after large data changes.

ANALYZE users;
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2025-01-01';

The query planner makes decisions based on statistical metadata — row counts, value distributions, column correlations — choosing between sequential scans, index scans, join methods, and sort strategies. When those statistics are stale, the planner can pick a nested loop join when a hash join would be ten times faster, or choose a sequential scan over an index on a table with millions of rows where the index would isolate a fraction of a percent. ANALYZE refreshes pg_statistic so the planner sees what is actually there.

On large tables where data has changed significantly, a bad plan is not a minor inconvenience — it can turn a query that should take seconds into something that runs for hours. The planner needs to know how many rows match your filter conditions to make the right call. Run ANALYZE after bulk loads, after ATTACH or DETACH operations on partitions, or when a query that used to perform adequately starts dragging despite no application changes. It runs in seconds on almost any table.

Before committing to a long-running report, run EXPLAIN (ANALYZE, BUFFERS) on the query first. It shows you exactly which plan the planner picked, how many rows it estimated versus what it actually saw, and where the buffer activity happened. If estimated and actual row counts are wildly off, that is a clear sign ANALYZE is overdue. Fifteen minutes spent verifying the plan beats starting an overnight query only to watch it crawl.

Monitoring Bloat with pg_stat_user_tables

PostgreSQL provides built-in statistics for monitoring table health. The pg_stat_user_tables view is your first stop for diagnosing bloat issues.

SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count,
  n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0) AS dead_tuple_pct
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC;

This query shows dead tuple counts and percentages for all tables. A table with 100 million live tuples and 10 million dead tuples has 9% bloat. Tables approaching 20-30% bloat benefit from manual vacuuming.

For more detailed bloat analysis, the pgstattuple extension provides actual bloat estimates:

SELECT * FROM pgstattuple('users');

This returns estimated free space, dead tuples, and bloat percentage. You need to install the extension first:

CREATE EXTENSION IF NOT EXISTS pgstattuple;

Another useful view is pg_stat_user_indexes, which shows index usage statistics. Indexes that are never scanned (high idx_scan of 0) might be unnecessary:

SELECT
  indexrelname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Table Bloat Detection Query

For a more comprehensive bloat analysis, you can use this query that calculates bloat based on page counts:

WITH constants AS (
  SELECT current_setting('block_size')::numeric AS bs, 24 AS page_header
), calc AS (
  SELECT
    ma,
    attrelid,
    relname,
    blksizepg,
    (datawidth + (blksizepg - page_header) * (1 - fillfactor::real / 100)) AS expected,
    (headerlen + (blksizepg - page_header) * (1 - fillfactor::real / 100)) AS expected_header
  FROM (
    SELECT
      0 AS ma,
      c.oid AS attrelid,
      c.relname,
      c.reloptions,
      s.starelid,
      8 AS headerlen,
      s.stawidth AS datawidth,
      s.stanullfrac,
      COALESCE(c.reloptions[1], 'block_size=8192') AS blksizepg,
      COALESCE(a.attoptions[1], 'fillfactor=90') AS fillfactor
    FROM pg_attribute a
    JOIN pg_statistic s ON attrelid = starelid AND attnum = staattnum
    JOIN pg_class c ON oid = attrelid
    WHERE staattnum > 0
  ) AS sub
)
SELECT
  relname,
  ROUND(100 * (actual - expected) / actual, 2) AS bloat_pct,
  actual,
  expected
FROM (
  SELECT
    calc.relname,
    c.relpages AS actual,
    (c.relpages::numeric * (SELECT blksizepg FROM calc LIMIT 1) / (SELECT expected FROM calc LIMIT 1)) AS expected
  FROM calc
  JOIN pg_class c ON c.oid = calc.attrelid
) sub
ORDER BY bloat_pct DESC;

Putting It Together: A Maintenance Routine

A reasonable PostgreSQL maintenance routine looks like this:

Daily: Check pg_stat_user_tables for high dead tuple counts. Manually VACUUM tables that exceed thresholds.
Weekly: Run VACUUM ANALYZE on all tables. Consider REINDEX on indexes with high bloat.
Monthly: Review autovacuum configuration. Adjust thresholds for tables with changing patterns.
After major data changes: Always run ANALYZE. Consider VACUUM if deleting significant data.
Monitoring: Graph key metrics over time—dead tuple counts, table size trends, autovacuum frequency.

The specifics depend on your workload. A database with heavy UPDATE/DELETE patterns needs more frequent maintenance than a mostly-append database. Watch your data and adjust accordingly.

Production Failure Scenarios

Failure	Cause	Mitigation
Autovacuum thrashing	Tables with very high update rates generate dead tuples faster than autovacuum can process	Lower per-table autovacuum thresholds, run manual VACUUM on schedule, or partition high-churn tables
Index bloat emergency	Bulk deletes left indexes fragmented, query performance degrades	Run REINDEX CONCURRENTLY on affected indexes, monitor with pg_stat_user_indexes
VACUUM FULL locks table for hours	Large table with extreme bloat requires exclusive access	Use CLUSTER instead (faster for I/O-ordered results), or do incremental cleanup via small batch DELETEs
Transaction ID wraparound	Long-running transactions prevent VACUUM from reclaiming XIDs	Monitor `pg_database.datfrozenxid`, never hold open transactions for days, set `idle_in_transaction_session_timeout`
Table size explosion without dead tuples	Rapid data growth clutters buffer cache and slows scans	Partition tables by time range, use partition pruning to limit scans

Trade-Off Table: VACUUM FULL vs CLUSTER vs REINDEX

Dimension	VACUUM FULL	CLUSTER	REINDEX CONCURRENTLY
Lock behavior	Exclusive — blocks all reads and writes	Exclusive — blocks all reads and writes	Share lock — allows writes
Disk space needed	Extra space for the rewritten table	Extra space for new table copy	Extra space for new index
Physical order	Compacts but no ordering guarantee	Reorders rows by index for I/O efficiency	Preserves existing order
Speed	Slowest for large tables	Faster than VACUUM FULL for sequential workloads	Comparable to standard REINDEX
Use when	Need OS space back urgently	Table is accessed sequentially on a key after bulk load	Production index rebuild with zero downtime

Capacity Estimation: Bloat Threshold and VACUUM Cost

Bloat percentage formula:

bloat_percentage = (dead_tuple_pages / (dead_tuple_pages + live_tuple_pages)) × 100

The pg_stat_user_tables view gives n_dead_tuples and n_live_tuples. Divide by the average tuples-per-page (typically ~300 for typical row widths) to estimate pages. For a table with 1M dead tuples and 9M live tuples at 300 tuples/page: 1M/300 ≈ 3,333 dead pages, 9M/300 ≈ 30,000 live pages. Bloat = 3,333 / 33,333 = 10%.

Autovacuum cost scaling formula:

vacuum_cost_delay_ms = vacuum_cost_delay × (scale_factor / 1000)
effective_vacuum_rate = vacuum_cost_page_hit / vacuum_cost_delay

Scale factor determines how aggressively autovacuum reclaims dead tuples. At autovacuum_vacuum_cost_delay = 20ms (default), each vacuum operation sleeps 20ms per cost unit. Raising scale factor too high (e.g., autovacuum_vacuum_scale_factor = 0.5 meaning 50% of table) causes autovacuum to be too aggressive and compete with production queries. For high-churn tables, use lower scale factors (0.05-0.1) and lower cost delays (2-5ms) to keep vacuum running continuously at low intensity.

Index bloat estimation:

index_bloat_ratio = index_page_count / expected_page_count_for_live_tuples

For a B-tree index with high turnover, bloated pages contain many dead index tuples. The pgstattuple extension provides pgstatindex() to report fragmentation. An index bloat ratio above 1.5 (50% more pages than necessary) warrants REINDEX.

REINDEX lock time estimation:

reindex_time_per_gb = index_size_gb × reindex_speed_factor_seconds_per_gb

For typical B-tree indexes on spinning disk: ~30-60 seconds per GB. On NVMe: ~5-10 seconds per GB. For a 50GB index: expect 4-8 minutes on NVMe. Use CONCURRENTLY to avoid locks—REINDEX CONCURRENTLY takes 2-3x longer but maintains availability.

Real-World Case Study: E-Commerce Site Bloat Incident

A major e-commerce platform experienced a severe bloat incident during a flash sale event that provides a cautionary tale about autovacuum tuning in production.

What happened: During a 24-hour sale event, the platform processed 2M orders. Each order generated 3-5 rows in various tables, and completed orders were soft-deleted (marked as cancelled/completed with a status flag). By the end of the event, 500GB of dead tuple space had accumulated across 12 tables. Their autovacuum workers, running at default settings (autovacuum_vacuum_scale_factor = 0.2, meaning vacuum only triggers at 20% dead tuples), could not keep up with the deletion rate.

The impact: Query performance degraded by 80% on order lookup tables. Page read times went from ~1ms to ~50ms as PostgreSQL scanned through dead tuple chains. The checkout flow started timing out. By the time the on-call DBA investigated, the autovacuum workers were running but could not catch up—a 50GB table with 20% bloat threshold requires ~10GB of dead tuple accumulation before triggering.

The fix: The team temporarily lowered autovacuum_vacuum_scale_factor to 0.05 (5%) and autovacuum_vacuum_cost_delay to 2ms on the affected tables using ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_cost_delay = 2). This caused autovacuum to trigger more frequently and run at lower intensity. Within 2 hours, bloat dropped from 50GB to 8GB. A VACUUM ANALYZE on the worst offending tables provided immediate relief.

The lesson: Default autovacuum settings are calibrated for development workloads, not high-churn production tables. After any event that generates bulk deletions, immediately run VACUUM on affected tables rather than waiting for autovacuum. For tables with high DELETE rates, set autovacuum_vacuum_scale_factor to 0.05-0.1 and cost delay to 1-2ms. Monitor n_dead_tuples per table and alert when dead tuple counts exceed 1M.

Quick Recap Checklist

Database Monitoring - Tracking key metrics for PostgreSQL health
Database Migration Strategies - Day-to-day operational practices
Relational Databases - Core concepts and architecture

Interview Questions

1. What is the difference between VACUUM and VACUUM FULL, and when would you use each?

VACUUM (standard) reclaims dead tuples in place, marking pages as free space without changing physical file size. It runs concurrently with production traffic, acquires only ShareUpdateExclusiveLock (which does not block reads or writes), and is the default maintenance operation. VACUUM FULL rewrites the entire table to a new file, physically compacting it and returning space to the OS. It requires AccessExclusiveLock (blocks all reads and writes) and cannot run concurrently. Use VACUUM for regular maintenance (daily or weekly, depending on churn). Use VACUUM FULL only when you need to shrink a table's disk footprint urgently AND can afford a maintenance window, or use pg_repack as a non-blocking alternative that achieves the same result with zero downtime.

2. A table has 80GB of disk space but only 10GB of live data. How do you diagnose and fix this?

First, diagnose with SELECT relname, n_dead_tuples, n_live_tuples, last_vacuum FROM pg_stat_user_tables WHERE relname = 'your_table'. High dead tuple count with no recent vacuum confirms bloat is from uncleared dead tuples. For a 10GB live data table with 70GB of dead space, a standard VACUUM will reclaim the free space within the table file but not return it to the OS. To actually shrink the file, use VACUUM FULL (blocks writes) or pg_repack -k -t your_table (zero-downtime compaction). Also check for bloated indexes: use pgstatindex() on each index. Often, bloated indexes contribute as much or more to storage bloat as the table itself—REINDEX each bloated index.

3. How does autovacuum tuning differ for a write-heavy OLTP table versus an analytical table with bulk inserts?

For write-heavy OLTP tables with frequent UPDATEs and DELETEs, autovacuum must run frequently with low cost to keep up with dead tuple generation. Configure: autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_cost_delay = 2ms, autovacuum_vacuum_cost_limit increased to allow more work per cycle (e.g., 2000). Monitor n_dead_tuples and set an alert when it exceeds thresholds specific to that table. For analytical tables with bulk loads (no updates/deletes between loads), autovacuum rarely needs to run—the dead tuple generation is bursty (during loads) rather than continuous. Configure autovacuum to be less aggressive: autovacuum_vacuum_scale_factor = 0.2 and let it trigger after large batch loads complete. Run VACUUM ANALYZE immediately after each bulk load to update statistics.

4. What causes index bloat, and how do you decide when to REINDEX?

Index bloat accumulates when many rows are updated or deleted in an index—PostgreSQL marks index tuples as dead but never removes them until REINDEX runs. Unlike table bloat which VACUUM handles continuously, index bloat only improves with REINDEX. Symptoms: an index grows significantly larger than expected for its row count; query performance degrades despite table-level vacuum; pgstatindex() shows avg_leaf_density below 50%. REINDEX is needed when bloat ratio exceeds 1.5 (50% more pages than necessary). Use pgstatindex() output to compare index_size to tree_height and leaf_pages — a tree that is too tall for its row count indicates fragmentation. Schedule REINDEX during low-write windows or use REINDEX CONCURRENTLY to avoid blocking production traffic.

5. What is Transaction ID (XID) wraparound and why is it a critical emergency in PostgreSQL?

PostgreSQL uses a 32-bit XID counter (about 4 billion) that wraps around from 2^32-1 to 0. When wraparound approaches, PostgreSQL refuses to allocate new XIDs until VACUUM reclaims old XIDs — this is a safety feature to prevent data corruption. If pg_database.datfrozenxid is within 10 million transactions of the wrap point, PostgreSQL starts logging warnings. If it reaches the limit, PostgreSQL shuts down to prevent corruption — this is an emergency that requires immediate manual VACUUM. Prevention: never hold open transactions for days (set idle_in_transaction_session_timeout), monitor pg_database.datfrozenxid and pg_class.relfrozenxid, ensure autovacuum runs on all tables regularly. Alert threshold: warning at 10M XIDs remaining, critical at 1M remaining.

6. After bulk loading 10 million rows via COPY, you notice query performance is degraded. What happened and how do you fix it?

Bulk COPY bypasses autovacuum, so statistics become stale and dead tuples may accumulate. When COPY runs, PostgreSQL marks new tuples but does not trigger autovacuum until the threshold is reached — with 10M rows, this could be significant. The query planner uses stale statistics (table appears smaller than it is, indexes may be underutilized). Fix: immediately after COPY, run VACUUM ANALYZE table_name to update statistics and reclaim any dead tuples. If indexes were also copied, run REINDEX TABLE table_name to ensure index health. For future bulk loads, consider wrapping the load in a transaction with ANALYZE immediately after, or temporarily lowering the autovacuum threshold on the target table.

7. Explain the difference between CLUSTER and VACUUM FULL. When would you choose one over the other?

Both rewrite the table to reclaim space, but CLUSTER reorders rows based on an index, while VACUUM FULL only compacts without reordering. CLUSTER physically orders the table to match an index, which improves range scans and sequential I/O for queries that use that index. VACUUM FULL compacts but preserves insertion order, which may be random. CLUSTER is faster for tables accessed sequentially on a particular key after bulk loads. Both require AccessExclusiveLock and block reads/writes. CLUSTER uses the existing index ordering; to reorder on a different index, you must drop the old index first or use CLUSTER on a different index. For I/O-ordered workloads (datawarehouse, analytics), CLUSTER on the primary access pattern index is the better choice.

8. A table shows high n_dead_tup but autovacuum is not cleaning it up. What are the possible causes?

Possible causes for autovacuum failure: (1) Autovacuum is disabled for the table: ALTER TABLE orders SET (autovacuum_enabled = off); (2) Autovacuum threshold not reached: with default scale factor of 20%, a table needs 20% dead tuples before triggering — for large tables this is millions of tuples; (3) Autovacuum worker is blocked by long-running transactions: check pg_stat_activity for state = 'active' queries holding locks; (4) Table has large fillfactor or unusual storage parameters affecting vacuum behavior; (5) Autovacuum is overwhelmed — too many tables with high churn simultaneously; (6) PostgreSQL version with autovacuum bugs (older 9.x versions had various issues). Diagnosis: check last_autovacuum and last_vacuum in pg_stat_user_tables, and check autovacuum worker logs for errors.

9. What is the "visibility map" in PostgreSQL and how does it affect VACUUM performance?

The visibility map tracks which pages contain only tuples visible to all current transactions (fully frozen). VACUUM uses it to skip pages that don't need cleaning — if a page is marked visible, VACUUM ignores it entirely. This makes VACUUM much faster on tables with mostly-old data because it only scans pages with dead or recently-updated tuples. The visibility map is updated during VACUUM. A table with low fragmentation will have most pages marked visible, making subsequent VACUUMs very fast. Conversely, a table with high churn will have many pages that VACUUM must scan. Monitor visibility map effectiveness: SELECT relname, pg_size_pretty(sum(vis_pages * 8192)) AS visible_size FROM pg_stat_user_tables GROUP BY relname. Tables with poor visibility map coverage (most pages need vacuum) are likely high-churn tables that need lower autovacuum thresholds.

10. How does pg_repack achieve zero-downtime table compaction when VACUUM FULL blocks the table?

pg_repack achieves zero-downtime compaction by creating a shadow table, rebuilding data into it via sorted bulk inserts, then swapping the shadow table with the original. The process: (1) Create a new table (the repacked version) with the same schema; (2) Populate it by inserting sorted data from the original table — sorted order matches the clustered index if specified; (3) Create triggers on the original table to keep the new table in sync during the rebuild; (4) Apply any changes that occurred during the rebuild; (5) Drop the original table and rename the new one; (6) Apply indexes, constraints, and triggers. Because the trigger maintains data integrity during the rebuild, the table remains available for writes throughout. pg_repack requires extra disk space (roughly the size of the table being repacked) and is generally faster than VACUUM FULL for large tables.

11. You suspect an index is causing query degradation but REINDEX shows no improvement. What else could be causing the problem?

Index performance issues not fixed by REINDEX: (1) Stale statistics — REINDEX updates the index but not table statistics; run ANALYZE after REINDEX; (2) Wrong index chosen by planner — use EXPLAIN (ANALYZE, BUFFERS) to see if the index is being used; (3) Index bloat at the heap level — REINDEX cleans the index but not the table heap; (4) Corrupted pages — use pg_checksums to verify page integrity; (5) Sequential scan being chosen instead of index scan — check enable_seqscan and random_page_cost; (6) Index is on an expression that doesn't match the query — an index on LOWER(email) won't be used for WHERE email = 'test'. If REINDEX doesn't help, the problem is likely in statistics, planner misconfiguration, or query formulation rather than index bloat.

12. What is the relationship between fillfactor, autovacuum, and table bloat?

fillfactor is the percentage of each page reserved for updates — default is 100% (all space used). With lower fillfactor (e.g., 90%), each page has 10% space free for updates without needing a new page. This reduces table bloat for UPDATE-heavy tables because updated rows can often fit in their original page. However, lower fillfactor means more pages, which can slow sequential scans. The trade-off: for tables with frequent UPDATEs, fillfactor of 80-90% reduces dead tuple accumulation. For tables that are mostly inserted (append-only), fillfactor of 100% is better. When combined with autovacuum tuning, lower fillfactor on high-churn tables allows autovacuum to reclaim space more efficiently because dead tuples are more likely to fit in their original pages.

13. A table has grown to 500GB with 50GB of live data and 450GB of dead tuples. What is the fastest path to recovering disk space?

Standard VACUUM will reclaim 450GB within the table file (mark as reusable) but not return it to the OS. For fastest space recovery: use pg_repack (zero-downtime, non-blocking) — pg_repack -k -t table_name. This rewrites the table in sorted order (if an index is specified) and returns unused space to OS. It does not block reads or writes during the operation. For a blocking approach during maintenance window: VACUUM FULL is fastest for pure compaction but blocks everything. CLUSTER is faster than VACUUM FULL if you want to reorder by an index. After the repack or VACUUM FULL, run REINDEX on all indexes to clean index bloat. Then verify with SELECT pg_size_pretty(pg_total_relation_size('table_name')) to confirm space is reclaimed.

14. How does the maintenance_work_mem setting affect VACUUM and REINDEX performance?

maintenance_work_mem is the memory allocated for maintenance operations (VACUUM, ANALYZE, CREATE INDEX, REINDEX). Higher values allow these operations to work with larger memory allocations, reducing the number of passes needed and improving performance. Default is 64MB, which is too small for large tables. For a 32GB server, setting maintenance_work_mem to 1-2GB significantly speeds up VACUUM and REINDEX. REINDEX benefits because sorting (for btree index rebuild) is done in-memory with maintenance_work_mem. VACUUM benefits less directly but ANALYZE uses it for statistics gathering. Best practice: set maintenance_work_mem to a higher value (e.g., 1GB) and set autovacuum_max_work_mem to the same or higher for dedicated database servers. Note: this setting is per-operation, not shared across operations, so setting it high is safe.

15. What is the difference between VACUUM and VACUUM ANALYZE? When would you use each?

VACUUM reclaims dead tuple space but does not update statistics. VACUUM ANALYZE does both — it vacuums AND runs ANALYZE to update table statistics for the query planner. Use VACUUM alone when: you need to reclaim space urgently (disk pressure), the table's data distribution hasn't changed (no new rows, no significant updates), and you don't want to trigger a planner statistics refresh. Use VACUUM ANALYZE when: data has changed significantly (bulk loads, large batch updates), you want to reclaim space and refresh statistics in one operation, or you want to update statistics for the planner after maintenance. For routine maintenance, VACUUM ANALYZE is preferred because it handles both issues in one pass. Running ANALYZE separately after VACUUM is wasteful — VACUUM ANALYZE does both efficiently.

16. Explain why preventing page-level fragmentation in B-tree indexes matters for query performance.

B-tree indexes maintain sorted order for fast lookups. When pages split due to insertions or updates, fragmentation occurs — pages are no longer physically contiguous in the order the index expects. This causes: (1) More I/O per lookup — the index traversal must jump to non-contiguous pages; (2) Higher tree depth — fragmented indexes are taller (more levels) because page splits leave partially empty pages; (3) Poor sequential I/O — defragmented indexes enable read-ahead optimizations. REINDEX rebuilds the index compactly, reducing tree depth and improving leaf page density. For indexes with high churn, fragmentation can cause a 2-3x slowdown in index lookups even when cache hit ratio is high. Monitoring: pgstatindex() shows avg_leaf_density — below 50% indicates significant fragmentation warranting REINDEX.

17. How does multi-column index ordering interact with query planner decisions and VACUUM?

A multi-column index on (a, b, c) supports queries filtering on a, a+b, or a+b+c, but not queries filtering only on b or c. The query planner chooses the index based on statistics — if the first column has low cardinality (few unique values), the planner may skip the index and do a sequential scan. VACUUM doesn't directly affect index ordering, but vacuuming the table updates statistics that the planner uses to decide whether to use the index. For multi-column indexes, the order of columns matters: put high-cardinality (many unique values) columns first for equality filters, or put range-scan columns last. Maintenance: REINDEX on multi-column indexes after bulk updates to improve leaf node density. The index structure itself is independent of table VACUUM, but the planner's decision to use it depends on table statistics.

18. What is "index leaf page density" and how does it affect index scan performance?

Leaf page density is the ratio of used space to total space in the leaf pages of a B-tree index. Low density (e.g., 30%) means each leaf page is only 30% full — the index has more pages than needed for its data, increasing tree depth and I/O. Causes: heavy UPDATE patterns that cause page splits leaving gaps, bulk insertions followed by many deletions, or PostgreSQL version with poor fillfactor handling. REINDEX improves leaf density by rebuilding the index with optimal page utilization. pgstatindex('idx_name') returns avg_leaf_density and leaf_pages. Target: above 60% for active indexes. Below 40% warrants REINDEX. High leaf density means fewer pages to read, faster lookups, and better cache utilization — a 50% density improvement translates roughly to 50% fewer I/O operations per index scan.

19. Your PostgreSQL instance shows high buffersBackend and buffersBackend_fsync in pg_stat_bgwriter. What does this indicate?

High buffersBackend means the backend processes (user sessions) are directly writing buffers to disk because the background writer (bgwriter) cannot keep up. This is a sign of write pressure exceeding bgwriter capacity. High buffersBackend_fsync means those backend writes required fsync operations to confirm durability — indicating that write durability is at risk if the bgwriter is overwhelmed. If these values are high: increase bgwriter_delay to let the bgwriter do more work between cycles, increase bgwriter_lru_maxpages to allow the bgwriter to flush more buffers per cycle, or increase bgwriter_lru_multiplier to allow more aggressive flushing. Also consider: reducing effective_io_concurrency if using async I/O, adding more RAM to increase shared_buffers and reduce I/O pressure, or sharding writes across multiple tables or databases to reduce pressure on a single instance.

20. After setting up PostgreSQL streaming replication, the replica shows significant lag. What maintenance-related factors could be causing this?

Maintenance-related causes of replica lag: (1) VACUUM on primary generating WAL — autovacuum on a large table on primary writes many WAL records that replica must apply; (2) Large transactions on primary — a single large transaction generates many WAL records that must be replayed atomically on replica (replica can't catch up until the transaction commits); (3) Frequent autovacuum on primary creating many small WAL records that overwhelm the replica's apply rate; (4) Missing indexes causing sequential scans on replica that slow replay; (5) Disk I/O saturation on replica (replica is applying WAL while serving reads); (6) full_page_writes enabled (every page write generates full page image in WAL, increasing WAL volume). Diagnosis: compare pg_stat_replication lag with pg_stat_bgwriter on primary — high bgwriter activity often correlates with high replica lag. Solutions: tune autovacuum on primary, batch large transactions, add more replicas for read scaling.

Conclusion

MVCC is what makes PostgreSQL handle concurrency so well, but it requires maintenance. Vacuuming reclaims space from dead tuples. Reindexing compacts bloated indexes. Autovacuum handles the routine work, but you need to tune it for busy tables and watch for situations that overwhelm the defaults.

Build monitoring into your routine. Track dead tuple counts, vacuum timing, and table sizes. The first sign of bloat problems is easier to fix than advanced bloat disease.

PostgreSQL’s visibility into its own internals is excellent. The tools exist—you just need to use them.

Vacuuming and Reindexing in PostgreSQL

Introduction

Understanding MVCC and Dead Tuples

VACUUM vs VACUUM FULL

Autovacuum Tuning

REINDEX Strategies

Blocking vs Concurrent REINDEX

When to REINDEX

When to Use / When Not to Use

When to Manually VACUUM/REINDEX

After Large Batch Operations

After Major Deletions

Before Major Queries

Monitoring Bloat with pg_stat_user_tables

Table Bloat Detection Query

Putting It Together: A Maintenance Routine

Production Failure Scenarios

Trade-Off Table: VACUUM FULL vs CLUSTER vs REINDEX

Capacity Estimation: Bloat Threshold and VACUUM Cost

Real-World Case Study: E-Commerce Site Bloat Incident

Quick Recap Checklist

Related Posts

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Database Capacity Planning: A Practical Guide

Connection Pooling: HikariCP, pgBouncer, and ProxySQL

Database Monitoring: Metrics, Tools, and Alerting