Query Execution Plans: Reading EXPLAIN Output Like a Pro

Learn to decode PostgreSQL EXPLAIN output, understand sequential vs index scans, optimize join orders, and compare bitmap heap scans with index-only scans.

published: March 26, 2026 reading time: 28 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

EXPLAIN shows you exactly how PostgreSQL will execute your query—the planner's cost estimates, scan types, join orders, and row counts. Sequential scans read the whole table; index scans use the B-tree to find rows directly; bitmap scans batch row matches for cache efficiency. Stale statistics make the planner guess wrong, leading to bad plan choices. Once you learn to read the tree of nodes and identify bottlenecks like missing indexes or expensive nested loops, you stop guessing at optimizations and know precisely what to fix. After reading this you'll decode EXPLAIN output confidently and optimize queries based on evidence, not intuition.

Query Execution Plans: Reading EXPLAIN Output Like a Pro

When your query runs slow, the database is trying to tell you something. EXPLAIN output is that conversation — once you learn the language.

Most developers have experienced the frustration of staring at a mysterious tree of nodes, each with cryptic cost estimates. After you get past the initial confusion, though, you’ll find that EXPLAIN gives you everything you need to understand why your query chose to do X instead of Y.

Introduction

Every query passes through a planner that decides how to retrieve your data. That decision has massive consequences for performance—the same query with different execution paths can run in milliseconds or minutes. Reading EXPLAIN output is the skill that separates engineers who guess at performance problems from those who diagnose them precisely.

The planner looks at your SQL, considers available statistics and indexes, generates candidate execution plans, estimates their costs, and picks the lowest-cost plan. EXPLAIN shows the result of that decision-making process. When you understand it, you understand why your query is fast or slow and exactly what to fix.

This guide teaches you to read EXPLAIN output confidently: what each node type means, how to spot problems like missing indexes or stale statistics, and how to verify whether your fixes actually worked.

How a Query Plan Takes Shape

flowchart TD
    A[SQL Query Received] --> B[Parser creates AST]
    B --> C[Rewriter applies rules]
    C --> D[Planner generates plans]
    D --> E{Table stats available?}
    E -->|No| F[Use default estimates]
    E -->|Yes| G[Use histogram & statistics]
    F --> H{Indexes available?}
    G --> H
    H -->|Yes| I[Consider index scans]
    H -->|No| J[Sequential scan only]
    I --> J
    J --> K[Estimate rows per plan]
    I --> K
    K --> L[Cost each plan]
    L --> M[Pick lowest cost plan]
    M --> N[Execute plan]

The planner considers scan types, join orders, and join algorithms. It estimates row counts using statistics, costs each approach, and picks the cheapest. If statistics are stale, estimates are wrong and the plan is bad.

What is EXPLAIN, Really?

EXPLAIN shows you the execution plan PostgreSQL’s query planner generates for a given SQL statement. It does not run the query — it just shows you the plan.

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

                         QUERY PLAN
------------------------------------------------------------
 Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=89)
   Index Cond: (customer_id = 42)

The planner estimates the cost of each operation based on statistics it maintains about your data. Lower cost is better, but relative values matter more than absolute numbers.

Understanding Sequential Scans vs Index Scans

Sequential Scans

A sequential scan reads every row in the table, one after another. The database reads the entire table from disk.

Seq Scan on orders  (cost=0.00..4582.00 rows=100000 width=89)
  Filter: (customer_id = 42)

You see this when the table is small, the query returns a large percentage of the table, no index exists on the filtered column, or your statistics are stale.

Sequential scans are not always bad. If you need 80% of the table, reading the whole thing with sequential I/O is faster than bouncing around an index.

Index Scans

An index scan walks the index tree to find matching rows, then fetches the actual data from the heap.

Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=89)
  Index Cond: (customer_id = 42)

The planner picks this when your query is selective, meaning few rows match, and the index covers the join or the heap fetch is cheap enough.

Index Only Scans

If all columns in your query exist in the index, PostgreSQL can skip the heap fetch entirely.

Index Only Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=89)
  Index Cond: (customer_id = 42)

This works because PostgreSQL maintains a visibility map for each table. If all the rows you need are marked visible to your current transaction, the heap fetch gets skipped. On tables with heavy UPDATE traffic, this optimization degrades because visibility information becomes stale faster.

Bitmap Heap Scans

When an index returns many row pointers, PostgreSQL switches to bitmap mode. It collects all matching heap locations first, then sorts them and reads the heap in physical order.

Bitmap Heap Scan on orders  (cost=412.00..5218.00 rows=5000 width=89)
  Recheck Cond: (customer_id = 42)
  ->  Bitmap Index Scan on idx_orders_customer_id  (cost=0.00..412.00 rows=5000 width=0)

The advantage: it reduces random I/O by sorting heap locations and reading them sequentially. It’s usually faster than index scan when many rows match.

Understanding Join Order Impact

The order in which tables are joined has a huge impact on performance. Consider:

SELECT *
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id
WHERE c.region = 'NORTH';

Default Behavior

PostgreSQL considers all possible join orders and picks the cheapest one. Here’s what it might choose:

EXPLAIN SELECT *
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id
WHERE c.region = 'NORTH';

Hash Join  (cost=5218.00..8942.00 rows=5000)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders
  ->  Hash  (cost=4682.00..4682.00 rows=30000)
        ->  Hash Join  (cost=42.00..4682.00 rows=30000)
              Hash Cond: (o.product_id = p.id)
              ->  Seq Scan on products
              ->  Hash  (cost=30.00..30.00 rows=1000 width=89)
                    ->  Seq Scan on customers
                          Filter: (region = 'NORTH')

Nested Loop Joins

For small tables or when you have good indexes on the join columns, nested loops work well:

Nested Loop  (cost=0.43..150.00 rows=50)
  ->  Index Scan on customers
        Index Cond: (region = 'NORTH')
  ->  Index Scan on orders
        Index Cond: (customer_id = c.id)

Hash Joins

For larger tables where sorting would be expensive, hash joins scale better. PostgreSQL builds a hash table on the smaller relation:

Hash Join  (cost=3000.00..8000.00 rows=50000)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders
  ->  Hash  (cost=2000.00..2000.00 rows=50000 width=45)
        ->  Seq Scan on customers

Merge Joins

When inputs are already sorted on the join key, merge joins are efficient and avoid the hash table overhead:

Merge Join  (cost=4500.00..9000.00 rows=50000)
  Merge Cond: (o.customer_id = c.id)
  ->  Sort
        Sort Key: c.id
        ->  Seq Scan on customers
  ->  Sort
        Sort Key: o.customer_id
        ->  Seq Scan on orders

Key EXPLAIN Output Fields

Cost

The first number is the startup cost (cost before first row can be returned). The second is the total cost (cost to return all rows).

Index Scan (cost=0.43..8.45 rows=1 width=89)
            ↑ startup  ↑ total

Costs are estimated in arbitrary units tied to page reads.

Rows

The rows estimate is the most consequential number in the plan. Every major decision flows from the planner’s row count guess: scan type, join algorithm, sort strategy. When it estimates 10 rows, nested loop join looks cheap. When it estimates 10 million, hash join wins.

Row estimates go wrong in predictable ways. Bulk loads skip ANALYZE, leaving statistics that reflect the old table size. Columns with skewed distributions fool the histogram. Expressions in the WHERE clause like lower(), substring(), or function calls prevent the planner from using indexes because it cannot estimate selectivity. Check pg_stats for column-level statistics when estimates look wrong.

When actual rows diverge sharply from estimates, the plan degrades. Run EXPLAIN ANALYZE and compare the rows column against actual rows on each node. A hash join building a 10x oversized hash table spills to disk. A nested loop with a misestimated outer cardinality runs the inner scan thousands of extra times. The fix is usually ANALYZE, a more selective index, or raising the statistics target for skewed columns with ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500.

Width

Width is the planner’s estimate of average row size in bytes, derived from column types and null bitmap overhead. It feeds into memory consumption estimates for sorts, hashes, and materialization nodes. A row width of 500 bytes means a 10,000-row sort needs roughly 5MB of work_mem.

Wide rows amplify the cost of heap fetches during index scans. If each row is 500 bytes and the index returns 1,000 row pointers, that is 500KB of heap data to fetch. Index-only scans sidestep this entirely: when all selected columns live in the index, width becomes irrelevant for I/O. This is why covering indexes matter more for wide tables.

Width also affects sequential scan cost. A table with 1 million rows at 200 bytes per row has a different I/O profile than one with 50-byte rows, even at the same page count. The planner uses width to estimate sequential scan time, though the correlation is loose since page density, alignment padding, and TOAST compression for out-of-line values all introduce variation.

Buffers

With EXPLAIN (ANALYZE, BUFFERS), you see actual buffer usage:

Buffers: shared hit=1234 read=567

hit means pages found in PostgreSQL’s shared buffers — no disk I/O required. read means pages fetched from disk or the OS file cache because they were not in shared buffers. High read numbers tell you the query is doing a lot of physical I/O.

The hit/(hit+read) ratio is the first place I look. Above 95% means the working set fits in cache. Below 80% means the query is reaching for disk on most of its work — which is a bottleneck even when the plan looks reasonable. And if you see read=567 on a node that runs a lot, those disk round-trips are not negligible.

Cost estimates do not tell the whole story. A hash join with a low estimated cost can still hammer disk if it spills because it exceeded work_mem. A sequential scan might show near-zero reads if the table is already cached from a prior query. Looking at BUFFERS on each node tells you where the actual work happens, not just where the planner thinks it is expensive.

The hit count covers both heap and index pages. In an index scan, you typically see index pages hit before heap pages — the index gets read first, then the heap fetch follows. High read on an index scan node means the index itself is not cached. High read on the heap fetch means the table pages are cold. That tells you whether to warm the index, the table, or both.

Practical Example

Here’s a reporting query that was running slow:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.created_at, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at > '2026-01-01'
  AND c.region = 'NORTH';

The problematic plan looked like:

Nested Loop  (cost=0.43..15234.00 rows=50000)
  Rows Removed by Index Recheck: 234
  Buffers: shared hit=2345 read=8900
  ->  Seq Scan on customers
        Filter: (region = 'NORTH')
        Rows Removed by Filter: 45000
  ->  Index Scan using idx_orders_customer_id
        Index Cond: (customer_id = c.id)

Three things jump out: sequential scan on customers, 45,000 rows thrown away by the filter, and 8,900 buffer reads. The planner thought it would return 50,000 rows when in reality it was discarding most of the table.

The fix was adding an index:

CREATE INDEX idx_customers_region ON customers(region);

After that, the same query used:

Nested Loop  (cost=0.43..1234.00 rows=5000)
  Buffers: shared hit=1234 read=45
  ->  Index Scan on customers
        Index Cond: (region = 'NORTH')
  ->  Index Scan using idx_orders_customer_id
        Index Cond: (customer_id = c.id)

Rows estimate dropped from 50,000 to 5,000, and buffer reads went from 8,900 to 45. The query got faster.

Scan Type Selection: When Each Applies

Scan Type	Choose when	Avoid when
Sequential Scan	Query needs most of the table, table is small, no useful index exists, or statistics are stale	Query is highly selective (few rows match)
Index Scan	Highly selective query, index covers join columns, heap fetch is cheap	Query returns large % of table
Index Only Scan	Query columns are all in the index, table has good visibility map coverage	Heavy UPDATE traffic makes visibility map stale
Bitmap Heap Scan	Moderate selectivity (many rows match), reduces random I/O	Very selective queries (few rows match)

Sequential scans get a bad reputation but are often the right choice for small tables and for queries that need most of the data anyway.

Common Production Failures

Stale statistics causing wrong plans: The planner thinks a table has 1,000 rows when it actually has 10 million. This happens after bulk loads that skip ANALYZE, or after large DELETEs. The result is seq scans on huge tables and wrong join orders. Fix: run ANALYZE or ANALYZE VERBOSE after any bulk data change.

Missing index causing full table scans: A query filtering on status with no index on status seq scans the table. On small tables this is fine. On 100 million rows it is not. Use EXPLAIN to find the seq scans, then add indexes.

Planner choosing nested loop on large tables: Nested loop is correct for small tables with good indexes, but catastrophic when the inner table is large. If you see nested loop on a large join and estimated rows are far off, the inner index is likely wrong or missing.

“Rows Removed by Index Recheck” accumulating: Bitmap index scans can re-check many rows due to visibility map staleness. High “Rows Removed by Index Recheck” numbers mean the index scan is returning too many rows the filter then throws away. Increasing random_page_cost for SSDs (or decreasing it for spinning disks) can nudge the planner toward different choices.

Join order catastrophe: The planner joins a small table to a large one first when it should do the opposite. This produces enormous intermediate result sets. Check that statistics are current on all tables in the join.

Trade-off Analysis: Sequential vs Index Scan

The choice between scan types involves trade-offs that depend on data size, selectivity, and I/O patterns.

Scenario	Sequential Scan	Index Scan
Data fits in cache	10-50ms for full table	1-5ms for targeted lookup
Data larger than cache	500ms+ sequential read (predictable pattern)	200ms+ random reads (unpredictable, cache misses dominate)
Selectivity > 5%	Correct — covering entire table is faster than random I/O	Usually wrong — random I/O cost exceeds sequential
Selectivity < 0.1%	Usually wrong — reading entire table to find 10 rows	Correct — finding 10 rows with minimal I/O
Wide rows	High per-row cost	Index-only scan avoids heap entirely
Hot data (frequently accessed)	Cache-friendly re-reads	B-tree traversal adds overhead per access

The planner’s default assumptions about random_page_cost vs seq_page_cost are calibrated for spinning disks. On SSDs, random I/O is much cheaper relative to sequential I/O than the defaults assume. Set random_page_cost = 1.1 (or lower) on SSD-backed storage to encourage the planner to choose index scans more aggressively.

Quick Recap Checklist

Sequential scans are correct when query returns most of the table
Index scans are correct when query is highly selective
Index-only scans require all-columns-in-index AND a current visibility map
Bitmap heap scans reduce random I/O when many rows match
Nested loop joins suit small tables with good indexes on join columns
Hash joins scale better for larger tables without sorting requirements
Merge joins need pre-sorted inputs, avoiding hash table overhead
Cost = startup cost (before first row) + total cost (all rows)
Rows estimate drives every planner decision
Run ANALYZE after bulk loads to update statistics
Run EXPLAIN ANALYZE with BUFFERS to see actual vs estimated performance
VACUUM keeps visibility maps current for index-only scan efficiency
Compare random_page_cost and seq_page_cost between environments

Common Problems and Fixes

Problem	Fix
Seq Scan on large table	Add or rebuild index on filtered column
Bitmap Heap Scan on small result	Increase statistics or use index-only scan
Hash Join on unsorted data	Add ORDER BY on join column to enable merge join
Wrong join order	Run ANALYZE on all tables, increase join_collapse_limit
Stale statistics	Run ANALYZE or VACUUM ANALYZE

Interview Questions

1. A query runs fast in development but slow in production. The table has 10x more rows. EXPLAIN shows a sequential scan. What do you do?

First, confirm the production plan matches what you expect. Run EXPLAIN ANALYZE with BUFFERS to see actual timing, not just estimates. A sequential scan on a 10x larger table can be correct if the query is returning 30% of the rows — index lookups would cause more random I/O than a sequential scan. But if the query is selective and should use an index, the statistics are likely stale. Run ANALYZE on the table. If that does not fix it, check whether an index was dropped or whether the query planner is choosing a different plan because of different session settings (like enable_seqscan). Compare the random_page_cost and seq_page_cost settings between environments — cloud storage often has lower random read penalties than the defaults assume.

2. You see "Rows Removed by Index Recheck: 45000" in an EXPLAIN ANALYZE output. What does this mean and how do you fix it?

Bitmap index scans build an in-memory bitmap of all matching row pointers, then check visibility per page during the heap fetch. When the visibility map is stale (common on heavily-updated tables), pages get fetched that contain rows that no longer match the condition, and those rows get discarded. The "Rows Removed" number shows how much wasted heap fetching happened. The fix is usually to VACUUM the table more aggressively so the visibility map stays current. If the table is too volatile for a fresh visibility map, consider whether a different scan type would be more efficient.

3. Explain the difference between cost=0.43..8.45 in EXPLAIN output.

The two numbers are startup cost and total cost. Startup cost is the work before the first row can be returned — for an index scan this includes walking the B-tree to the first matching leaf page. Total cost is the estimated work to return all rows. For a query returning one row, the difference between startup and total cost tells you how expensive it is to find the first row. For a sorted query, startup cost includes the sort. For a LIMIT query, the planner uses the startup cost to estimate whether returning the first N rows is cheap even if total cost is high.

4. A three-table join is producing a bad plan. The planner is joining a small lookup table first instead of starting with the filtered result set. Why?

PostgreSQL's planner has a configurable limit on the number of join arrangements it will consider. By default it considers up to join_collapse_limit (usually 8) join orderings. Beyond that, it stops exploring and picks the first viable arrangement. If your tables have unusual statistics or if the planner cannot accurately estimate row counts for one of the joins, it may pick a suboptimal starting point. Increase join_collapse_limit to let it explore more arrangements, or reorder the FROM clause explicitly to guide it. Running ANALYZE on all tables in the join usually fixes the root cause.

5. A query uses nested loop join but takes seconds instead of milliseconds. EXPLAIN shows the inner table has no index on the join column. How do you fix it?

Nested loop join requires an efficient index on the inner (driven) table's join column. Without it, the inner table scan executes once per outer row, creating an N+1 problem. If the outer table returns 10,000 rows, the inner scan runs 10,000 times. Add an index on the join column of the inner table — CREATE INDEX idx_inner ON inner_table(join_column). After the index exists, the planner should switch to index scan on the inner table, turning the nested loop into an efficient index lookup per outer row.

6. After a bulk load of 10 million rows, queries are running slower than before. EXPLAIN shows unexpected sequential scans. What happened and what do you do?

Bulk loads skip ANALYZE by default. PostgreSQL's statistics reflect the old table size, so the planner thinks it is dealing with a much smaller dataset and makes poor choices. The fix is ANALYZE or VACUUM ANALYZE after any bulk operation. On a table that grew from 1 million to 11 million rows, the planner might have been choosing index scans because it estimated 1 million rows — now it needs to know about the 11 million rows to pick sequential scans for full-table operations.

7. You notice "Rows Removed by Index Recheck" is growing over time on a heavily-updated table. EXPLAIN ANALYZE shows significant heap fetches that return no matching rows. What is happening?

Bitmap index scans build an in-memory bitmap of all matching row pointers, then check visibility per page during the heap fetch. When the visibility map is stale (common on heavily-updated tables), pages get fetched that contain rows that no longer match the condition, and those rows get discarded. The "Rows Removed" number shows how much wasted heap fetching happened. Run VACUUM on the table to update the visibility map. If vacuum cannot keep up because the table is too volatile, consider whether a different scan type would be more efficient for your workload pattern.

8. You run EXPLAIN on the same query in two identical PostgreSQL environments but get different plans. Both have similar data volumes. What settings could cause this?

Several session-level settings affect planner behavior: enable_seqscan, enable_hashjoin, enable_nestloop can force or disable specific plan types. random_page_cost and seq_page_cost affect the planner's estimate of I/O costs — SSDs should have random_page_cost set close to seq_page_cost, but defaults assume spinning disks. work_mem affects when the planner prefers hash operations over sorting. Check SHOW ALL for non-default settings and compare them between environments. Also check search_path and schema usage — if the planner is hitting a different index or table in each environment, there may be schema differences.

9. An index-only scan is not being chosen even though all columns in the SELECT are in the index. The query filters on an indexed column. Why is the heap still being touched?

Index-only scans require a current visibility map for the table. PostgreSQL skips the heap fetch only when it can confirm via the visibility map that the rows are visible to the current transaction. On tables with heavy UPDATE traffic, the visibility map becomes stale frequently — the planner falls back to a regular index scan because it cannot guarantee visibility without checking the heap. Run VACUUM on the table to refresh the visibility map. If vacuum cannot keep up, consider whether the table is a candidate for a covering index that reduces the working set size.

10. You see actual time values in EXPLAIN ANALYZE that differ significantly from what the cost estimates would suggest. A node with lower cost takes longer than a higher-cost node. What does this tell you?

Cost estimates are in arbitrary units, not milliseconds. They model the planner's assumptions about I/O and CPU costs, not actual wall-clock time. A lower-cost node taking longer usually means the planner's model does not match reality: the node returns many more rows than estimated (causing more work), the data is not in cache (causing physical I/O the cost model underestimated), or the operation is CPU-intensive in ways the cost model does not capture well (like complex expressions or function calls). Look at rows vs estimated rows — if actual rows far exceed estimates, the statistics are stale.

11. A query with a WHERE clause on an indexed column still does a sequential scan. The index exists and ANALYZE has been run. What else could be causing this?

Several possibilities: the index may be on a low-selectivity column where the planner correctly determines a sequential scan is faster — if the index returns 40% of the table, the random I/O of index access outweighs the sequential scan. Check the column statistics and selectivity — if status has only 3 values and each represents 30% or more of rows, the index is not helpful. Also check enable_indexscan and other planner flags that might be forcing sequential scan. Finally, if the table is small enough to fit in a single page, PostgreSQL may always choose sequential scan regardless of index availability.

12. You run EXPLAIN ANALYZE on a query and see that actual rows are 10x the estimated rows for a hash join. What does this tell you about the statistics?

The statistics are stale or inaccurate. PostgreSQL uses statistics to estimate row counts for join sizing — if the hash join is building a hash table for 10x more rows than estimated, the memory allocation is wrong and the join spills to disk. Run ANALYZE on the tables involved. Also check pg_statistic or pg_stats for the join columns — if the distribution is skewed and the statistics target is too low, the planner underestimates. Increase the statistics target with ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500 for heavily-skewed columns.

13. What is the difference between EXPLAIN and EXPLAIN ANALYZE? When would you use one over the other?

EXPLAIN shows the planned cost and row estimates without running the query — it is fast and safe for production queries. EXPLAIN ANALYZE executes the query and shows actual timing, buffer usage, and row counts alongside estimates — it reveals whether the planner's estimates were accurate. Use EXPLAIN when you want a quick check of the plan without execution overhead or side effects. Use EXPLAIN ANALYZE when you need to diagnose performance problems or verify that actual execution matches estimates. EXPLAIN ANALYZE modifies data for write queries so it cannot be used safely on production with DML statements unless you wrap them in a transaction and rollback.

14. A bitmap heap scan appears in your plan but you expected a regular index scan. Under what conditions does PostgreSQL choose bitmap heap scan over index scan?

Bitmap heap scan is chosen when the index returns many row pointers — too many for an efficient index scan where each row requires a separate heap fetch. Instead of fetching each row individually (random I/O), the bitmap collector gathers all matching row locations, sorts them by physical page order, then reads the heap sequentially. This reduces random I/O at the cost of memory for the bitmap and rechecking the index condition per row. It typically appears when the index is not highly selective or when random_page_cost is set high relative to seq_page_cost, making random access seem expensive.

15. Your query plan shows a Materialize node. What does it mean and when does it hurt performance?

A Materialize node means PostgreSQL is buffering the result of a subplan in memory (or to disk if too large) so it can be iterated multiple times. It appears in nested loop joins where the inner plan is computed once and then used for each outer row. Materialize hurts performance when the materialized result is large — it consumes memory and the iteration becomes expensive. It also appears in CTEs that are referenced multiple times. If you see Materialize on a large intermediate result, consider whether the CTE materialization is necessary or whether the query can be rewritten to avoid buffering.

16. What does a Sort node with Sort Method: external merge disk tell you? How do you fix it?

The sort operation spilled to disk because work_mem was insufficient to hold the entire sort in memory. PostgreSQL uses a merge sort algorithm and when the data exceeds work_mem, it splits into batches written to disk. Fix it by increasing work_mem for the session or query — SET work_mem = '256MB' — or by adding an index on the ORDER BY column so the data comes pre-sorted. Be careful not to set work_mem too high globally because it is per-sort, not global, and a query with many concurrent sorts can exhaust memory. For one-off large sorts, increasing work_mem just for that query is safer.

17. A query plan shows a sequential scan on a large table even though an index exists on that column. The column has high cardinality. What is happening?

First, verify the index is not disabled via enable_seqscan=off forcing index scan — but if seqscan is chosen naturally, the planner likely estimates that returning rows via index would require more I/O than a sequential scan. Even with high cardinality, if the query returns a large percentage of the table (say 30% or more), sequential I/O is faster than random I/O for that volume. Check random_page_cost — if it is set too high, the planner assumes random I/O is very expensive and prefers sequential scans. On SSDs, set random_page_cost = 1.1. Also verify the planner is not choosing the index for a different reason like index-only scan being attempted but failing due to stale visibility map.

18. Explain what Buffers: shared hit=123 read=456 means in EXPLAIN ANALYZE output.

Buffers: shared hit=123 read=456 shows cache statistics for the plan node. hit=123 means 123 buffer pages were found already in PostgreSQL's shared buffers — no disk I/O needed. read=456 means 456 pages were read from disk or file system cache because they were not in shared buffers. A high hit/(hit+read) ratio means good cache efficiency. A high read number means the query is doing physical I/O. If you see high read counts on a query that used to be fast, the working set may have shifted out of cache — either due to other queries evicting pages or increased data volume.

19. Your query plan shows a hash join but one input is marked with Batches: 4. What does this mean and when does it happen?

Hash join batches appear when the hash table does not fit in work_mem. PostgreSQL partitions the larger relation into batches that can be processed within work_mem. Batches: 4 means the hash join spilled to disk and required 4 passes. This is far slower than a single-batch hash join because each batch may be written to and read from disk. The fix is to increase work_mem until the hash join fits in a single batch, or reduce the data volume by adding filters earlier in the query. You can see the number of batches in EXPLAIN ANALYZE output — if it exceeds 1, performance is degrading due to insufficient work_mem.

20. A query with LIMIT 10 is taking longer than a query without LIMIT even though they use the same plan. What could be causing this?

A LIMIT query taking longer than a full scan with no LIMIT is counter-intuitive but can happen: if the planner chooses a different (worse) plan because of the LIMIT, estimates change. LIMIT reduces the planner's cost estimate for returning a few rows, making it prefer index scans or nested loop joins over hash joins even when hash join would be faster for full execution. Also check whether the LIMIT is inside a subquery — the outer LIMIT may not push down to the inner plan, causing the inner query to execute fully then limit. Check with EXPLAIN ANALYZE comparing both plans and look for different join strategies or scan types being chosen.

Query Execution Plans: Reading EXPLAIN Output Like a Pro

Introduction

How a Query Plan Takes Shape

What is EXPLAIN, Really?

Understanding Sequential Scans vs Index Scans

Sequential Scans

Index Scans

Index Only Scans

Bitmap Heap Scans

Understanding Join Order Impact

Default Behavior

Nested Loop Joins

Hash Joins

Merge Joins

Key EXPLAIN Output Fields

Cost

Rows

Width

Buffers

Practical Example

Scan Type Selection: When Each Applies

Common Production Failures

Trade-off Analysis: Sequential vs Index Scan

Quick Recap Checklist

Common Problems and Fixes

Interview Questions

Further Reading

Category

Tags

Related Posts

Database Indexes: B-Tree, Hash, Covering, and Beyond

Denormalization

Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes