Query Execution Plans: Reading EXPLAIN Output Like a Pro

Learn to decode PostgreSQL EXPLAIN output, understand sequential vs index scans, optimize join orders, and compare bitmap heap scans with index-only scans.

published: reading time: 12 min read

Query Execution Plans: Reading EXPLAIN Output Like a Pro

When your query runs slow, the database is trying to tell you something. EXPLAIN output is that conversation — once you learn the language.

Most developers have experienced the frustration of staring at a mysterious tree of nodes, each with cryptic cost estimates. After you get past the initial confusion, though, you’ll find that EXPLAIN gives you everything you need to understand why your query chose to do X instead of Y.

How a Query Plan Takes Shape

flowchart TD
    A[SQL Query Received] --> B[Parser creates AST]
    B --> C[Rewriter applies rules]
    C --> D[Planner generates plans]
    D --> E{Table stats available?}
    E -->|No| F[Use default estimates]
    E -->|Yes| G[Use histogram & statistics]
    F --> H{Indexes available?}
    G --> H
    H -->|Yes| I[Consider index scans]
    H -->|No| J[Sequential scan only]
    I --> J
    J --> K[Estimate rows per plan]
    I --> K
    K --> L[Cost each plan]
    L --> M[Pick lowest cost plan]
    M --> N[Execute plan]

The planner considers scan types, join orders, and join algorithms. It estimates row counts using statistics, costs each approach, and picks the cheapest. If statistics are stale, estimates are wrong and the plan is bad.

What is EXPLAIN, Really?

EXPLAIN shows you the execution plan PostgreSQL’s query planner generates for a given SQL statement. It does not run the query — it just shows you the plan.

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
                         QUERY PLAN
------------------------------------------------------------
 Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=89)
   Index Cond: (customer_id = 42)

The planner estimates the cost of each operation based on statistics it maintains about your data. Lower cost is better, but relative values matter more than absolute numbers.

Understanding Sequential Scans vs Index Scans

Sequential Scans

A sequential scan reads every row in the table, one after another. The database reads the entire table from disk.

Seq Scan on orders  (cost=0.00..4582.00 rows=100000 width=89)
  Filter: (customer_id = 42)

You see this when the table is small, the query returns a large percentage of the table, no index exists on the filtered column, or your statistics are stale.

Sequential scans are not always bad. If you need 80% of the table, reading the whole thing with sequential I/O is faster than bouncing around an index.

Index Scans

An index scan walks the index tree to find matching rows, then fetches the actual data from the heap.

Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=89)
  Index Cond: (customer_id = 42)

The planner picks this when your query is selective, meaning few rows match, and the index covers the join or the heap fetch is cheap enough.

Index Only Scans

If all columns in your query exist in the index, PostgreSQL can skip the heap fetch entirely.

Index Only Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=89)
  Index Cond: (customer_id = 42)

This works because PostgreSQL maintains a visibility map for each table. If all the rows you need are marked visible to your current transaction, the heap fetch gets skipped. On tables with heavy UPDATE traffic, this optimization degrades because visibility information becomes stale faster.

Bitmap Heap Scans

When an index returns many row pointers, PostgreSQL switches to bitmap mode. It collects all matching heap locations first, then sorts them and reads the heap in physical order.

Bitmap Heap Scan on orders  (cost=412.00..5218.00 rows=5000 width=89)
  Recheck Cond: (customer_id = 42)
  ->  Bitmap Index Scan on idx_orders_customer_id  (cost=0.00..412.00 rows=5000 width=0)

The advantage: it reduces random I/O by sorting heap locations and reading them sequentially. It’s usually faster than index scan when many rows match.

Understanding Join Order Impact

The order in which tables are joined has a huge impact on performance. Consider:

SELECT *
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id
WHERE c.region = 'NORTH';

Default Behavior

PostgreSQL considers all possible join orders and picks the cheapest one. Here’s what it might choose:

EXPLAIN SELECT *
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id
WHERE c.region = 'NORTH';
Hash Join  (cost=5218.00..8942.00 rows=5000)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders
  ->  Hash  (cost=4682.00..4682.00 rows=30000)
        ->  Hash Join  (cost=42.00..4682.00 rows=30000)
              Hash Cond: (o.product_id = p.id)
              ->  Seq Scan on products
              ->  Hash  (cost=30.00..30.00 rows=1000 width=89)
                    ->  Seq Scan on customers
                          Filter: (region = 'NORTH')

Nested Loop Joins

For small tables or when you have good indexes on the join columns, nested loops work well:

Nested Loop  (cost=0.43..150.00 rows=50)
  ->  Index Scan on customers
        Index Cond: (region = 'NORTH')
  ->  Index Scan on orders
        Index Cond: (customer_id = c.id)

Hash Joins

For larger tables where sorting would be expensive, hash joins scale better. PostgreSQL builds a hash table on the smaller relation:

Hash Join  (cost=3000.00..8000.00 rows=50000)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders
  ->  Hash  (cost=2000.00..2000.00 rows=50000 width=45)
        ->  Seq Scan on customers

Merge Joins

When inputs are already sorted on the join key, merge joins are efficient and avoid the hash table overhead:

Merge Join  (cost=4500.00..9000.00 rows=50000)
  Merge Cond: (o.customer_id = c.id)
  ->  Sort
        Sort Key: c.id
        ->  Seq Scan on customers
  ->  Sort
        Sort Key: o.customer_id
        ->  Seq Scan on orders

Key EXPLAIN Output Fields

Cost

The first number is the startup cost (cost before first row can be returned). The second is the total cost (cost to return all rows).

Index Scan (cost=0.43..8.45 rows=1 width=89)
            ↑ startup  ↑ total

Costs are estimated in arbitrary units tied to page reads.

Rows

Estimated rows returned. This is what drives most planner decisions. If the planner thinks 10 rows will return, it will pick nested loops. If it thinks 10 million will return, it will pick hash joins.

Width

Average row width in bytes. Helps estimate memory consumption and I/O requirements.

Buffers

With EXPLAIN (ANALYZE, BUFFERS), you see actual buffer usage:

Buffers: shared hit=1234 read=567

hit means pages found in cache. read means pages that had to be read from disk. High read numbers tell you the query is doing a lot of physical I/O.

Practical Example

Here’s a reporting query that was running slow:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.created_at, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at > '2026-01-01'
  AND c.region = 'NORTH';

The problematic plan looked like:

Nested Loop  (cost=0.43..15234.00 rows=50000)
  Rows Removed by Index Recheck: 234
  Buffers: shared hit=2345 read=8900
  ->  Seq Scan on customers
        Filter: (region = 'NORTH')
        Rows Removed by Filter: 45000
  ->  Index Scan using idx_orders_customer_id
        Index Cond: (customer_id = c.id)

Three things jump out: sequential scan on customers, 45,000 rows thrown away by the filter, and 8,900 buffer reads. The planner thought it would return 50,000 rows when in reality it was discarding most of the table.

The fix was adding an index:

CREATE INDEX idx_customers_region ON customers(region);

After that, the same query used:

Nested Loop  (cost=0.43..1234.00 rows=5000)
  Buffers: shared hit=1234 read=45
  ->  Index Scan on customers
        Index Cond: (region = 'NORTH')
  ->  Index Scan using idx_orders_customer_id
        Index Cond: (customer_id = c.id)

Rows estimate dropped from 50,000 to 5,000, and buffer reads went from 8,900 to 45. The query got faster.

Scan Type Selection: When Each Applies

Scan TypeChoose whenAvoid when
Sequential ScanQuery needs most of the table, table is small, no useful index exists, or statistics are staleQuery is highly selective (few rows match)
Index ScanHighly selective query, index covers join columns, heap fetch is cheapQuery returns large % of table
Index Only ScanQuery columns are all in the index, table has good visibility map coverageHeavy UPDATE traffic makes visibility map stale
Bitmap Heap ScanModerate selectivity (many rows match), reduces random I/OVery selective queries (few rows match)

Sequential scans get a bad reputation but are often the right choice for small tables and for queries that need most of the data anyway.

Common Production Failures

Stale statistics causing wrong plans: The planner thinks a table has 1,000 rows when it actually has 10 million. This happens after bulk loads that skip ANALYZE, or after large DELETEs. The result is seq scans on huge tables and wrong join orders. Fix: run ANALYZE or ANALYZE VERBOSE after any bulk data change.

Missing index causing full table scans: A query filtering on status with no index on status seq scans the table. On small tables this is fine. On 100 million rows it is not. Use EXPLAIN to find the seq scans, then add indexes.

Planner choosing nested loop on large tables: Nested loop is correct for small tables with good indexes, but catastrophic when the inner table is large. If you see nested loop on a large join and estimated rows are far off, the inner index is likely wrong or missing.

“Rows Removed by Index Recheck” accumulating: Bitmap index scans can re-check many rows due to visibility map staleness. High “Rows Removed by Index Recheck” numbers mean the index scan is returning too many rows the filter then throws away. Increasing random_page_cost for SSDs (or decreasing it for spinning disks) can nudge the planner toward different choices.

Join order catastrophe: The planner joins a small table to a large one first when it should do the opposite. This produces enormous intermediate result sets. Check that statistics are current on all tables in the join.

Interview Questions

Q: A query runs fast in development but slow in production. The table has 10x more rows. EXPLAIN shows a sequential scan. What do you do?

First, confirm the production plan matches what you expect. Run EXPLAIN ANALYZE with BUFFERS to see actual timing, not just estimates. A sequential scan on a 10x larger table can be correct if the query is returning 30% of the rows — index lookups would cause more random I/O than a sequential scan. But if the query is selective and should use an index, the statistics are likely stale. Run ANALYZE on the table. If that does not fix it, check whether an index was dropped or whether the query planner is choosing a different plan because of different session settings (like enable_seqscan). Compare the random_page_cost and seq_page_cost settings between environments — cloud storage often has lower random read penalties than the defaults assume.

Q: You see “Rows Removed by Index Recheck: 45000” in an EXPLAIN ANALYZE output. What does this mean and how do you fix it?

Bitmap index scans build an in-memory bitmap of all matching row pointers, then check visibility per page during the heap fetch. When the visibility map is stale (common on heavily-updated tables), pages get fetched that contain rows that no longer match the condition, and those rows get discarded. The “Rows Removed” number shows how much wasted heap fetching happened. The fix is usually to VACUUM the table more aggressively so the visibility map stays current. If the table is too volatile for a fresh visibility map, consider whether a different scan type would be more efficient.

Q: Explain the difference between cost=0.43..8.45 in EXPLAIN output.

The two numbers are startup cost and total cost. Startup cost is the work before the first row can be returned — for an index scan this includes walking the B-tree to the first matching leaf page. Total cost is the estimated work to return all rows. For a query returning one row, the difference between startup and total cost tells you how expensive it is to find the first row. For a sorted query, startup cost includes the sort. For a LIMIT query, the planner uses the startup cost to estimate whether returning the first N rows is cheap even if total cost is high.

Q: A three-table join is producing a bad plan. The planner is joining a small lookup table first instead of starting with the filtered result set. Why?

PostgreSQL’s planner has a configurable limit on the number of join arrangements it will consider. By default it considers up to join_collapse_limit (usually 8) join orderings. Beyond that, it stops exploring and picks the first viable arrangement. If your tables have unusual statistics or if the planner cannot accurately estimate row counts for one of the joins, it may pick a suboptimal starting point. Increase join_collapse_limit to let it explore more arrangements, or reorder the FROM clause explicitly to guide it. Running ANALYZE on all tables in the join usually fixes the root cause.

Common Problems and Fixes

ProblemFix
Seq Scan on large tableAdd or rebuild index on filtered column
Bitmap Heap Scan on small resultIncrease statistics or use index-only scan
Hash Join on unsorted dataAdd ORDER BY on join column to enable merge join
Wrong join orderRun ANALYZE on all tables, increase join_collapse_limit
Stale statisticsRun ANALYZE or VACUUM ANALYZE

Conclusion

Reading EXPLAIN output is a skill that improves with practice. Start with the cost estimates and row counts — they drive every decision the planner makes. Look for sequential scans that process too many rows, bitmap operations on small result sets, and join orders that force unnecessary work.

For further reading, explore our guides on Database Indexes and Query Rewriting techniques.

Category

Related Posts

Database Indexes: B-Tree, Hash, Covering, and Beyond

A practical guide to database indexes. Learn when to use B-tree, hash, composite, and partial indexes, understand index maintenance overhead, and avoid common performance traps.

#database #indexes #performance

Denormalization

When to intentionally duplicate data for read performance. Tradeoffs with normalization, update anomalies, and application-level denormalization strategies.

#database #denormalization #performance

Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes

Master composite index column ordering, covering indexes for query optimization, partial indexes for partitioned data, expression indexes, and selectivity.

#database #indexes #composite-indexes