Database Indexes: B-Tree, Hash, Covering, and Beyond
A practical guide to database indexes. Learn when to use B-tree, hash, composite, and partial indexes, understand index maintenance overhead, and avoid common performance traps.
Database Indexes: B-Tree, Hash, Covering, and Beyond
Without indexes, every query scans every row. With the right indexes, the database finds rows in milliseconds. The wrong indexes waste space and slow writes. This covers the index types, when to use them, and when not to.
How B-Tree Indexes Work
flowchart TD
Root["Root: [15]"]
L1A["Leaf A: [5, 10]"] --> Root
L1B["Leaf B: [15, 20]"] --> Root
L1C["Leaf C: [25, 30, 35]"] --> Root
L1A --> L1AL["5 → row_ptr"]
L1A --> L1AR["10 → row_ptr"]
L1B --> L1BL["15 → row_ptr"]
L1B --> L1BR["20 → row_ptr"]
L1C --> L1CL["25 → row_ptr"]
L1C --> L1CM["30 → row_ptr"]
L1C --> L1CR["35 → row_ptr"]
B-trees cluster data at leaf nodes, with internal nodes routing lookups. A lookup for WHERE id = 20 traverses root → Leaf B → returns the row pointer. Range queries like WHERE id > 15 AND id <= 25 follow the same path to 15, then scan forward through Leaf B and Leaf C.
Why Indexes Exist
A table scan reads every row to find matches. For 100 rows this is fine. For 100 million rows it is prohibitively slow. An index is a separate data structure the database maintains, sorted by indexed columns, allowing fast lookups.
The tradeoff is space and maintenance. Indexes consume disk space. When rows are inserted, updated, or deleted, the database must update every index on that table.
B-Tree Indexes
B-tree is the default and most common index type. Most databases use B-tree for primary indexes, foreign key indexes, and general-purpose queries.
B-trees maintain sorted order, making them effective for range queries. WHERE created_at > '2024-01-01' uses a B-tree index efficiently. The database traverses the tree to the start of the range and reads sequentially.
Equality queries also work well. WHERE email = 'user@example.com' finds the row directly through the B-tree structure.
CREATE INDEX idx_users_email ON users(email);
B-trees handle most query patterns. If you are not sure what index type to use, start with B-tree.
Hash Indexes
Hash indexes compute a hash of the indexed value and store mappings to row locations. They are faster than B-trees for equality queries because they do not traverse a tree, they compute an address and read directly.
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);
Hash indexes have limitations. They do not maintain sorted order, so range queries (WHERE created_at > '2024-01-01') cannot use hash indexes efficiently. Hash indexes also cannot help with prefix matching (WHERE name LIKE 'John%').
Session stores and caches use hash indexes. Any use case where you look up by exact value and never need range operations fits hash indexes well.
Composite Indexes
A composite index spans multiple columns. The index is sorted by the first column, then the second within each first-column value.
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date DESC);
This index supports queries filtering by customer_id alone, or by customer_id and order_date together. It does not help queries that only filter by order_date.
Column order matters. Put the most selective column first, or the column used in equality conditions first. A query filtering on customer_id and order_date can use the full index. A query filtering only on order_date cannot use the index because order_date is the second column.
Covering Indexes
A covering index includes all columns needed by a query, not just the indexed ones. The database can satisfy the query entirely from the index without reading the table.
CREATE INDEX idx_orders_covering ON orders(customer_id)
INCLUDE (order_id, order_date, total_amount);
This index covers queries like:
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = 123;
The database reads only the index, not the table. For high-frequency queries that select few columns, covering indexes provide significant speedups.
Covering indexes add overhead. They consume more space and slow down writes because more data must be written to the index. Use them for frequently executed queries, not every query.
Partial Indexes
A partial index covers only rows matching a condition. This reduces size and maintenance overhead while speeding up specific queries.
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
This index only indexes pending orders. Queries filtering by status=‘pending’ and created_at benefit. Other queries do not.
Partial indexes suit tables with skewed distributions. If most rows have a common status and you frequently query the rare statuses, partial indexes focus on what matters.
When to Index
Index columns that appear in WHERE clauses. If you frequently filter by status, index status. If you join on customer_id, index customer_id.
Index columns used in ORDER BY. Databases can avoid sorting steps if an index provides the required order.
Index foreign keys. Joins perform much better when the database can look up rows by indexed values.
Index columns with high selectivity. An index on a boolean column that is 99% true provides little benefit because most rows match. An index on an account number that is unique is maximally selective.
Capacity Estimation: B-Tree Depth and Size
B-tree depth determines how many disk reads a lookup requires. Most B-trees stay shallow even with large tables.
Depth calculation: a B-tree with a branching factor of roughly 100–300 entries per internal node reaches depth 3 with 1 million to 270 million leaf entries. In practice, a B-tree index on an ID column with 100 million rows typically has depth 3 or 4. Each depth level corresponds to one disk I/O operation, so an index lookup does 3–4 disk reads, compared to a table scan which reads every data page.
Index size estimation: the index entry storage is roughly key_size + 8 bytes per row for the ctid/toast pointer plus B-tree overhead. A 4-byte INTEGER key with a 6-byte PostgreSQL ctid totals about 18 bytes per entry. For 100 million rows, that is roughly 1.8 GB for the leaf level, plus internal node overhead of maybe 5–10% more. A composite index on two 4-byte integers is 26 bytes per entry, or 2.6 GB for 100 million rows.
For covering indexes with INCLUDE columns, add the included column sizes. A covering index on customer_id (4 bytes) with INCLUDE of order_id (4 bytes), order_date (8 bytes), and total_amount (8 bytes) is effectively 32 bytes per row, not 12.
If your table is 10 GB and the indexed columns represent 20% of row size, the index will be roughly 2 GB. If you are unsure, most databases provide commands to estimate index size before creating it.
Index Type Trade-offs
| Index Type | Best for | Avoid when | Notes |
|---|---|---|---|
| B-tree (default) | Range queries, equality, sorted output | Suffix/prefix matching requires hash or GIN | Default choice — use unless you have a reason not to |
| Hash | Fast equality lookups, session stores | Range queries, sorted output, prefix matching | Memcached, Redis, DynamoDB use hash internally |
| Composite (B-tree) | Queries filtering on leading column, or both columns | Queries on second column only | Column order matters — put equality columns first |
| Covering (B-tree with INCLUDE) | High-frequency queries with small SELECT lists | Writes, queries selecting many columns | Adds storage and slows writes |
| Partial | Skewed distributions, rare status values | Common values (most rows match) | Dramatically reduces index size |
| GiST (PostgreSQL) | Geometric, full-text, range types | Simple equality | Not a general-purpose index |
| GIN (PostgreSQL) | JSONB, arrays, full-text search | Simple scalar columns | Slow to build, fast to query |
Common Production Failures
Wrong index selected by the query planner: Sometimes the planner picks a suboptimal index — a partial index with a different WHERE clause, or a composite index on the wrong column order. Query plans that looked fine on small tables behave differently at scale. Monitor slow queries and use EXPLAIN ANALYZE to verify the index actually gets used.
Index bloat from frequent updates: Heap-only tuples (HOT updates) that cannot reuse dead tuples cause index bloat over time. In PostgreSQL, monitor pg_stat_user_indexes for increasing idx_scan counts without corresponding query speedup. Heavy UPDATE workloads on columns with moderate cardinality can bloat indexes faster than expected.
Covering index column order mistakes: Adding columns to the INCLUDE clause of a covering index without considering query patterns wastes space and slows writes. INCLUDE columns should be the most frequently selected non-indexed columns, not an afterthought.
Forgetting that partial indexes are invisible to statistics: PostgreSQL’s statistics collector does not track partial indexes well. The planner may underestimate selectivity for partial indexes on rare values, leading to table scans when the index would be faster. Verify planner choices with EXPLAIN ANALYZE, not assumptions.
When NOT to Index
Do not index columns you never filter on. Indexes consume space and slow writes without helping reads.
Do not index low-selectivity columns unless you specifically need them. An index on a gender column or a status column with three values rarely helps.
Do not over-index write-heavy tables. Each index must be updated on every insert, update, and delete. Tables with heavy write loads need fewer indexes.
Index Maintenance Overhead
Indexes are not free. Every write operation must update every index on the table. A table with ten indexes costs ten times more to write to than a table with one index.
This matters for write-heavy workloads. Bulk inserts are much slower with many indexes. Some ETL processes drop indexes before loading and recreate them after.
Monitor index usage. Databases track whether indexes are used. Unused indexes waste space and add overhead. Drop them.
Real-World Case Study: PostgreSQL Index-Only Scans
PostgreSQL’s index-only scan feature lets a query return results entirely from the index, without touching the heap. For queries that select only indexed columns, this can cut response time dramatically.
Stack Overflow used this extensively. Their database serves millions of reads per day. A query like SELECT id, title, created_at FROM posts WHERE user_id = 123 can be served entirely from an index on (user_id) INCLUDE (id, title, created_at) — the database reads the index, finds matching rows, and returns without ever touching the main posts table. For their read-heavy workload, this cut latency in half on their most common query patterns.
The catch: visibility map. PostgreSQL only uses index-only scans on pages where it knows all rows are visible to the current transaction. Pages with recently modified rows have bits cleared in the visibility map, forcing heap access anyway. Running VACUUM frequently keeps the visibility map current. On tables with high write volumes, index-only scans might not trigger at all without aggressive autovacuum tuning.
The lesson: covering indexes with INCLUDE columns are not free — they consume more storage and slow writes. But for read-heavy tables where the same few columns are selected repeatedly, the read performance gain is worth it. Profile the actual query patterns before adding covering indexes speculatively.
Identifying Missing Indexes
Query execution plans reveal missing indexes. Most databases provide EXPLAIN or SHOW PLAN commands that display how queries execute. Look for table scans on large tables.
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
If the plan shows a table scan on orders and orders is large, you need an index on customer_id.
Slow query logs flag frequently executed slow queries. Add indexes based on what your application actually queries, not what you think it queries.
Interview Questions
The most common causes: statistics went stale after a large data load, causing the planner to underestimate row counts. An index was dropped or disabled. The query changed — different operator or column — and the existing index no longer matches. A different execution path is being chosen because session parameters changed (like enable_seqscan). Run EXPLAIN and compare to a known-good plan from when it was fast.
Composite indexes are better when queries consistently filter on both columns together, or when the index can serve as a covering index for common queries. Multiple single-column indexes are better when queries filter on different combinations of columns independently. The PostgreSQL query planner can combine single-column indexes with index scans, but a composite index may be faster for some query shapes. Profile the actual queries.
Partial indexes make sense when a significant portion of the table is never queried. If 95% of your orders have status='completed' and you only query status='pending' orders, a partial index on pending orders is dramatically smaller and faster than a full index on status. They also help when you need an index on an expression that only applies to a subset of rows. The tradeoff is that queries outside the partial index condition cannot use it.
For small result sets, PostgreSQL sometimes prefers sequential scan because it is faster for retrieving many rows from the same pages — index lookups require random I/O. If the query returns more than roughly 5–10% of the table, seqscan wins on modern storage. The planner also chooses seqscan when statistics are stale, or when random_page_cost is set too high relative to seq_page_cost (e.g., on SSDs where random and sequential reads have similar cost). Lower random_page_cost or increase enable_seqscan to force index use for testing.
Start with the queries the application actually runs, not theoretical access patterns. Identify the most frequent and most latency-sensitive queries. For each, find the filter columns and select columns. Add indexes to support the filters, and covering indexes if the same columns are selected repeatedly. Add foreign key indexes for joins. Then measure under realistic write load to see whether the indexes slow inserts enough to matter. On write-heavy tables, keep indexes lean — every unnecessary index delays writes.
B-tree indexes maintain sorted order, enabling efficient range queries and sorted output. Hash indexes compute a hash of the value and store direct mappings to row locations, making equality lookups faster but range queries impossible. B-trees use roughly 2-4 bytes per index entry plus overhead; hash indexes use the full hash value plus overhead. B-trees handle most use cases. Hash indexes suit session stores and caches where you only look up by exact key and never need range operations.
A covering index includes all columns needed by a query, not just the indexed ones. The database can satisfy the query entirely from the index without reading the table (index-only scan). Use covering indexes when you frequently select a small set of columns and the WHERE clause filters on a different set. The tradeoff is more storage and slower writes because more data must be written to the index. Use INCLUDE columns (non-key columns stored in the index leaf) rather than adding them as key columns when the included columns are not used in filtering.
Index bloat occurs when deleted or updated rows leave gaps in the index that are not reclaimed efficiently. In PostgreSQL, frequent HOT updates (heap-only tuple updates that reuse the same page) can cause index bloat over time because the index entries must still point to the updated rows. Bloat increases index size, reduces cache efficiency, and slows scans because more pages must be read. Monitor with pg_stat_user_indexes and reclaim with REINDEX or VACUUM. Heavy UPDATE workloads on columns with moderate cardinality bloat indexes faster than expected.
Index entry size is roughly key_size + 8 bytes for the ctid/toast pointer plus B-tree overhead. A 4-byte INTEGER key with a 6-byte PostgreSQL ctid totals about 18 bytes per entry. For 100 million rows, that is roughly 1.8 GB for the leaf level, plus internal node overhead of 5-10% more. A composite index on two 4-byte integers is 26 bytes per entry, or 2.6 GB for 100 million rows. For covering indexes, add the included column sizes. If your table is 10 GB and the indexed columns represent 20% of row size, the index will be roughly 2 GB.
Every index must be updated on every INSERT. A table with 10 indexes costs 10 times more to insert into than a table with one index. Bulk inserts are much slower with many indexes — some ETL processes drop indexes before loading and recreate them after. The write overhead is most noticeable on high-frequency write paths. Estimate: if you insert 10,000 rows per second and have 5 indexes, you are doing 50,000 index inserts per second. Profile actual write throughput to determine whether indexes are a bottleneck.
Both enforce uniqueness, but primary key indexes cannot contain NULLs while unique indexes allow NULLs (multiple NULLs depending on database). Primary key indexes create a clustered index by default in most databases except PostgreSQL. Unique indexes create non-clustered indexes by default. A table can have many unique indexes but only one primary key. Use primary keys for row identifiers. Use unique indexes for columns that should be unique but are not the row identifier (like email addresses).
In a composite index (a, b), the index is sorted by column a first, then by column b within each value of a. The index supports queries filtering on column a alone, or on columns a and b together. It does not help queries that filter only on column b. Put the most selective column first, or the column used in equality conditions first. If you query by customer_id and date, and customer_id has higher cardinality, put customer_id first. If you only query by date, the composite index does not help.
An index-only scan returns results entirely from the index without touching the heap. PostgreSQL uses it when the query selects only indexed columns and the planner estimates the visibility map shows all relevant pages are visible to the current transaction. The visibility map must be current — on tables with high write volumes, index-only scans may not trigger without frequent VACUUM. Covering indexes with INCLUDE columns are the most reliable way to enable index-only scans for specific queries.
PostgreSQL's query planner uses statistics to estimate row counts and selectivity for different access paths. When statistics are stale (after bulk loads, large deletes, or significant data changes), the planner makes poor index selection decisions. Always run ANALYZE after bulk operations. The statistics collector tracks the distribution of column values. If a partial index captures a rare value but statistics do not reflect this, the planner may underestimate the index's selectivity and choose a sequential scan instead.
The fillfactor determines how full each index page is when created. Default is 90% for B-tree indexes, leaving 10% free for future inserts. A lower fillfactor reduces page splits during inserts, which is beneficial for indexes with high insert rates. However, lower fillfactor means more pages, larger indexes, and fewer rows per page, which can slow read-only scans. Set fillfactor based on the workload: high write volume benefits from lower fillfactor, read-heavy workloads benefit from higher fillfactor.
Low selectivity means most rows have the same value — like a boolean column that is 99% true. An index on such a column returns nearly the full table regardless of the filter value. The index lookup saves little or no work, but the database still must traverse the index and fetch rows. In the worst case, the planner chooses the index, reads all matching rows from the index, then does a full table scan anyway because nearly every row matches. Low-selectivity indexes waste storage and slow inserts.
A partial index is created with a WHERE clause, indexing only rows matching the condition. An expression index indexes the result of a function or expression on columns. Partial indexes filter rows by a condition on the base columns — WHERE status = 'pending'. Expression indexes transform the indexed value — ON lower(email). Both reduce index size by excluding rows that do not match the expression. Use partial indexes when you filter on a specific condition. Use expression indexes when you query on a function of a column.
Include columns that are frequently selected but not used in the WHERE clause. The goal is to eliminate heap access entirely for specific queries. Start with the columns in your WHERE clause as key columns, then add columns from the SELECT list that are not already in the key columns as INCLUDE columns. Do not add every frequently selected column — each included column increases index size and slows writes. Only include columns where eliminating heap access provides measurable query improvement.
When a B-tree index page fills up and a new row needs to be inserted in the middle, the page splits into two pages, with roughly half the entries moving to the new page. This creates pages that are not fully utilized. Over time with random inserts (especially UUIDs), this pattern repeats, causing the index to grow larger than necessary and reducing cache efficiency. Sequential inserts (auto-increment keys) minimize splits because new values always go at the end of the index. UUIDs inserted with uuid_generate_v4() are random and cause significant page splits.
Drop an index when it has never been used (check pg_stat_user_indexes for nonzero idx_scan), when it is used so rarely that its storage cost and write overhead exceed its occasional read benefit, or when a query change means the index is no longer needed for any query pattern. Before dropping any index, verify it is not used by any query including infrequent batch jobs. Monitor for at least one full business cycle before dropping, and always verify using pg_stat_statements rather than just pg_stat_user_indexes, since some usage patterns may not trigger idx_scan.
Real-world Failure Scenarios
Scenario 1: The Covering Index That Doubled Storage Costs
What happened: A developer added a covering index to speed up a high-frequency query that selected 8 columns. The index grew to 40GB on a 60GB table, doubling the effective storage footprint and slowing down writes significantly. The query was 20% faster but storage costs were unacceptable.
Root cause: The covering index included columns that changed frequently (status, updated_at), causing index maintenance overhead on every write. The INCLUDE clause added 6 columns to a table with already 12 indexes, multiplying write amplification.
Impact: Storage costs increased by 40GB. Write throughput decreased by 30% on the indexed table. The query improvement did not justify the storage and write overhead.
Lesson learned: Covering indexes are not free. Each included column adds storage and slows writes. Only use covering indexes when the read performance gain is measured and justified, and when write throughput is not a bottleneck.
Scenario 2: The Partial Index That the Planner Ignored
What happened: A partial index on rare status values was created to speed up queries for those specific rows. Query plans showed the planner was ignoring the partial index and doing sequential scans instead. The rare status queries that should have been fast were still slow.
Root cause: PostgreSQL statistics do not track partial indexes well. The planner underestimated how selective the partial index was and chose a sequential scan because it appeared cheaper. Stale statistics after a large data load made the problem worse.
Impact: Slow queries for rare status values persisted despite the partial index existing. The team spent 2 days debugging before identifying the statistics issue.
Lesson learned: Partial indexes are invisible to PostgreSQL statistics. Always verify that partial indexes are actually being used with EXPLAIN ANALYZE, and run ANALYZE after creating them to ensure the planner has current information.
Scenario 3: The Dropped Index That Caused a Cascade of Slow Queries
What happened: A DBA dropped an unused index to reduce write overhead. Within hours, multiple queries that the application depended on became dramatically slower. The queries had never been flagged as slow because they ran infrequently, but when triggered they were now doing full table scans.
Root cause: The queries were infrequent but expensive. They relied on an index that appeared unused based on low idx_scan counts but was actually essential for those specific query patterns. The idx_scan metric had not captured the queries because they ran during off-peak hours.
Impact: Several business-critical reports timed out during the next morning’s batch run. The index was re-added immediately, but the incident took 3 hours to fully diagnose and resolve.
Lesson learned: Before dropping any index, verify it is not used by any query pattern, including infrequent or batch queries. Check pg_stat_statements for actual query patterns, not just pg_stat_user_indexes. Monitor for at least one full business cycle (including batch jobs) before dropping any index.
Further Reading
- PostgreSQL Documentation: Indexes — Official documentation on PostgreSQL index types and usage
- Use The Index, Luke — Definitive book on SQL indexing by Markus Winand
- PostgreSQL pgstattuple Extension — Tool for measuring index effectiveness and bloat
- PostgreSQL Monitoring — How to track index usage and query performance
Conclusion
Indexes are the primary tool for optimizing query performance in relational databases. B-tree indexes excel at range queries and sorted access. Hash indexes work best for simple equality lookups. Composite indexes enable efficient queries on multiple columns, but column order matters. Covering indexes can eliminate table lookups entirely for specific queries. Partial indexes reduce storage overhead for selective conditions. The right indexing strategy depends on understanding your query patterns, data distribution, and write-to-read ratio. Always measure the impact with real workloads before committing to an indexing strategy.
Category
Related Posts
Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes
Master composite index column ordering, covering indexes for query optimization, partial indexes for partitioned data, expression indexes, and selectivity.
Query Execution Plans: Reading EXPLAIN Output Like a Pro
Learn to decode PostgreSQL EXPLAIN output, understand sequential vs index scans, optimize join orders, and compare bitmap heap scans with index-only scans.
Denormalization
When to intentionally duplicate data for read performance. Tradeoffs with normalization, update anomalies, and application-level denormalization strategies.