Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes

Master composite index column ordering, covering indexes for query optimization, partial indexes for partitioned data, expression indexes, and selectivity.

published: March 26, 2026 reading time: 31 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Composite index column ordering follows the leftmost prefix rule, with equality columns before range columns and the more selective column first to narrow results as quickly as possible. Covering indexes with INCLUDE eliminate heap fetches entirely for high-frequency queries but add write overhead since every indexed column must be updated on INSERT. Partial indexes dramatically reduce index size and write amplification when only a subset of rows match your actual query conditions. Expression indexes handle computed values but require exact match with the query expression and add CPU overhead on every write.

Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes

Indexes are a double-edged sword. The right index makes queries fly. The wrong one wastes storage, slows down writes, and gives you nothing.

Introduction

Most queries need indexes to perform well, but adding indexes without understanding column order or selectivity is like navigating with a broken map. Composite indexes are the most misunderstood part of database performance — get the column order wrong and your index becomes useless; get it right and one index serves multiple query patterns.

This clinic is about composite index design, covering indexes for index-only scans, partial indexes for selective conditions, and expression indexes for filtered columns. The goal: build an indexing strategy that matches your actual query patterns rather than guessing blindly.

How Composite Index Ordering Affects Queries

flowchart LR
    subgraph Index["Index: (customer_id, status, created_at)"]
    C1["customer_id=1"]
    C2["customer_id=2"]
    C3["customer_id=3"]
    end
    C1 -->|= 1| S1["status=shipped"]
    C1 -->|= 1| S2["status=pending"]
    C2 -->|= 2| S3["status=shipped"]
    C2 -->|= 2| S4["status=cancelled"]
    C3 -->|= 3| S5["status=pending"]
    S1 -->|Range >| R1["created_at > Jan"]
    S3 -->|Range >| R2["created_at > Jan"]

This shows how a composite index on (customer_id, status, created_at) narrows down. Equality on customer_id first, then equality on status, then range on created_at. Skip the leftmost column and the index cannot help at all.

Composite Index Column Ordering

A composite index covers multiple columns. The order matters more than most people realize.

Consider a query:

SELECT * FROM orders
WHERE customer_id = 42
  AND status = 'shipped'
ORDER BY created_at;

The intuitive approach is to index status first if it seems more selective. Do not do this.

The Leftmost Prefix Rule

PostgreSQL can only use a composite index for a range scan or equality check if you’re querying from the leftmost column. Given this index:

CREATE INDEX idx_orders_composite ON orders(customer_id, status, created_at);

The index supports these query patterns:

WHERE customer_id = 42
WHERE customer_id = 42 AND status = 'shipped'
WHERE customer_id = 42 AND status = 'shipped' AND created_at > '2026-01-01'

But it does NOT support:

WHERE status = 'shipped' (missing leftmost column)
WHERE status = 'shipped' AND created_at > '2026-01-01' (missing leftmost column)

Equality First, Range Last

When you have both equality and range conditions, put equality columns first:

-- Good: equality columns (customer_id, status) before range (created_at)
CREATE INDEX idx_orders_good ON orders(customer_id, status, created_at);

-- Bad: range column in the middle breaks index usage for later columns
CREATE INDEX idx_orders_bad ON orders(customer_id, created_at, status);

Which Column Goes First?

When multiple equality columns exist, put the more selective column first. You want the index to narrow down results as quickly as possible at the first level.

-- If customers have many orders but status filters significantly
-- More selective: customer_id (1 customer vs many statuses)
CREATE INDEX idx_orders ON orders(customer_id, status, created_at);

-- If statuses are roughly equally distributed
-- Consider which column has better statistics in your version of PostgreSQL

Covering Indexes for Query Optimization

A covering index includes all the columns a query needs, so PostgreSQL never has to touch the heap.

-- Without covering index, query requires heap fetch
CREATE INDEX idx_orders_customer ON orders(customer_id);

SELECT id, created_at FROM orders WHERE customer_id = 42;
-- Index scan + heap fetch for id, created_at

-- With covering index, query is index-only
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) INCLUDE (id, created_at);

SELECT id, created_at FROM orders WHERE customer_id = 42;
-- Index Only Scan - no heap access needed

When Covering Indexes Help

Covering indexes deliver the most value when three conditions align: the SELECT list is narrow relative to the table width, the query runs frequently enough to amortize index maintenance costs, and heap access is expensive due to large row sizes or storage latency. A table with 50 columns where most queries only need 3 or 4 is a strong candidate. In contrast, a narrow table with 8 columns where queries already select 6 of them gains little from a covering index.

The classic indicator is a query that already uses an index efficiently but still triggers heap fetches. If EXPLAIN shows an Index Scan rather than Index Only Scan, the covering index can close that gap. The INCLUDE approach is preferable to adding columns as full index keys when the included columns do not participate in equality or range filtering, since included columns avoid the sorting and B-tree maintenance overhead of key columns.

Here is a practical decision table:

Table width	Query frequency	Heap cost	Covering index likely?
Wide (40+ cols)	High (1000+/min)	High (HDD)	Yes
Wide	Low (10/day)	Any	No
Narrow (under 10 cols)	Any	Any	Rarely
Any	Very high (10000+/min)	Low (SSD)	Only if selectivity is poor

Run the query through EXPLAIN before and after adding a covering index to confirm the plan switches from Index Scan to Index Only Scan. If the plan does not change, the covering index is not helping.

When Covering Indexes Hurt

They can backfire on wide tables where including columns duplicates significant data. Heavy UPDATE workloads suffer because every index must be updated. And if your queries are already selective enough to use the index efficiently, the covering index adds maintenance cost for little gain.

The INCLUDE clause is PostgreSQL’s way to add columns without making them part of the index key. This avoids the sorting and B-tree maintenance overhead of full index columns.

Partial Indexes for Partitioned Data

A partial index only covers rows that match a condition. This makes them smaller and faster to maintain.

Common Use Cases

Partial indexes are not a general-purpose tool — they earn their place when your data has a natural skew that you can exploit. The three patterns below cover most production scenarios: recent-time data that dominates queries, stable subsets like active records, and exceptional cases worth tracking separately.

Recent Data

-- Only index recent orders, assuming old ones are rarely queried
CREATE INDEX idx_orders_recent ON orders(created_at, customer_id)
WHERE created_at > '2025-01-01';

Active Status

-- Only index active users
CREATE INDEX idx_users_active ON users(email) WHERE status = 'active';

Specific Values

-- Index only failed transactions for debugging
CREATE INDEX idx_transactions_failed
ON transactions(transaction_id, created_at)
WHERE status = 'failed';

Partial Index Trade-offs

The downside: the optimizer might not choose a partial index when the WHERE clause doesn’t match your filter. PostgreSQL has to prove the partial index is relevant.

-- This query uses idx_orders_recent (good)
SELECT * FROM orders WHERE created_at > '2026-02-01';

-- This query probably won't use idx_orders_recent
SELECT * FROM orders WHERE created_at > '2025-06-01';  -- too old

Expression Indexes

Sometimes you need to index a computed value. Expression indexes let you index functions and expressions.

Common Examples

Expression indexes handle query patterns that regular column indexes cannot cover. The three patterns here — case-insensitive search, date truncation, and JSON field extraction — represent the most common production uses. Each trades index maintenance overhead for the ability to query computed values directly.

Case-Insensitive Search

-- Without expression index, LIKE lower() can't use a standard index
SELECT * FROM users WHERE LOWER(email) = LOWER('test@example.com');

-- With expression index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

SELECT * FROM users WHERE LOWER(email) = LOWER('test@example.com');
-- Now uses the index

Date Truncation

-- If you frequently query by day
CREATE INDEX idx_orders_day ON orders(DATE_TRUNC('day', created_at));

SELECT * FROM orders
WHERE DATE_TRUNC('day', created_at) = '2026-03-26';

JSON Extraction

-- Index a specific JSON field
CREATE INDEX idx_orders_metadata ON orders((metadata->>'customer_type'));

SELECT * FROM orders WHERE metadata->>'customer_type' = 'premium';

Expression Index Considerations

The expression is evaluated for every row on INSERT and UPDATE. Complex expressions add overhead.
The optimizer has to match the exact expression. Small differences prevent usage.
Expression indexes cannot be used for ORDER BY unless the expression is used in the ORDER BY clause.

Index Selectivity

Selectivity is the ratio of distinct index values to total rows. High selectivity (close to 1.0) means each value is nearly unique. Low selectivity means many rows share values.

Low Selectivity Example

Low selectivity means many rows share the same value. With selectivity close to 0, almost every row has the same indexed value — the index will not narrow down results much.

-- If 95% of orders are 'pending', indexing status has poor selectivity
CREATE INDEX idx_orders_status ON orders(status);
-- Most queries still have to scan most of the index

The status column with 95% pending orders is a common trap. An index on status returns roughly 95% of the table for a query like WHERE status = 'pending'. PostgreSQL correctly decides a sequential scan is faster than bouncing around an index for that many rows. The index only helps for the 5% where status is not pending.

Low selectivity wastes index space too. An index on a boolean column has at best 50% selectivity per value. A million-row table stores a million index entries and still returns hundreds of thousands of rows per value.

High Selectivity Example

High selectivity means each value is nearly unique. A primary key or UUID column typically has selectivity close to 1.0 — each lookup hits very few rows.

-- customer_id might be unique or near-unique
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Each lookup returns very few rows

PostgreSQL leans on the index heavily when selectivity is high. A lookup by customer_id does an O(log n) B-tree traversal then one heap fetch. For one row, that is faster than scanning any significant fraction of the table. Index scans on high-selectivity columns are almost always the right call.

When Low Selectivity Can Still Help

Low-selectivity indexes sometimes pay off in unexpected ways. If your pending orders are always recent and you mostly query recent pending orders, the index gives you locality even though many rows share the status value. Composite indexes can also rescue poor selectivity: (status, created_at) might be highly selective even if status alone is not. And for sorting, an index on status pre-sorts your results regardless of how many rows share each status value.

The first scenario where low selectivity earns its place is locality in range scans. Picture a status column where 85% of rows are 'completed' and 15% are 'pending'. A query like WHERE status = 'pending' AND created_at > '2026-01-01' uses the index on (status, created_at) to jump directly to pending rows, then narrows further by date. The index is not selective in absolute terms, but it eliminates the heap scan for completed rows entirely. Without it, PostgreSQL scans the entire table and filters every row. Even though 15% of 10 million rows is 1.5 million rows, those rows are contiguous on disk because the index keeps them in order — the I/O pattern is nothing like a random heap scan.

Composite indexes rescue low-selectivity leading columns by adding a second column that improves selectivity at the next level. A single-column index on status with 85% completed rows is poor. But (status, created_at) with a date predicate becomes highly selective because the date range cuts the result set dramatically. The index still cannot help a query that filters only on status — you would scan all pending rows — but the moment you add any second filter that correlates with physical order, the composite index pays for itself. You do not replace the column, you extend it.

For sorting workloads, low-selectivity indexes provide value that selectivity calculations completely miss. An index on status pre-sorts all rows by status value. A query with ORDER BY status never triggers a sort step — the index returns rows in the correct order already. On a table with millions of rows where most queries filter by status and sort by created_at, the composite index (status, created_at) eliminates two expensive operations: the heap scan for status filtering and the sort for the ORDER BY clause. The index is not selective in the traditional sense, but it reduces query cost in ways that do not show up in selectivity statistics. Run the query before and after to see the actual difference — the theoretical gain only matters if the numbers bear it out.

Index Selectivity vs. Query Selectivity

The key distinction: index selectivity is about the index itself, but query selectivity is about your specific query. A low-selectivity index on status is useless for SELECT * FROM orders, but if your app always queries WHERE status = 'pending' plus a date range, the index becomes more useful because the date range reduces the result set.

Index Design: When to Use Each Type

Index Type	Use when	Avoid when
B-tree composite (equality first)	Queries filter on leading column(s) then range, plus ORDER BY	Queries skip the leading column
Covering index with INCLUDE	High-frequency queries selecting few columns, table is wide	Writes dominate reads, or query selects many columns
Partial index	Column has skewed distribution, only a subset is queried	Most queries hit the excluded rows, or condition changes frequently
Expression index	Query uses function on column, cannot use regular index	Expression is expensive to compute, changes frequently
Partial covering	High-frequency query on subset of rows with small SELECT	Subset is not stable, or query Selects many columns

Common Indexing Mistakes

-- Don't do this
CREATE INDEX idx_orders_everything ON orders(id, customer_id, status, created_at, updated_at, total, ...);

This bloats your index size and slows down inserts. Only index columns you actually filter or sort by.

Ignoring the query pattern

-- Query: WHERE status = 'shipped' AND customer_id = 42
CREATE INDEX idx_wrong ON orders(status, customer_id);

This order is wrong even though it technically works due to leftmost prefix. Put the more selective column first.

Not considering write overhead

Every index you add translates directly into additional write amplification. For an INSERT, PostgreSQL writes the new row and then updates every index that covers the table. That means one INSERT becomes N index writes, one per index. For an UPDATE, if any indexed column changes, every index containing that column gets an entry written. This is not theoretical: on a table ingesting 50,000 rows per minute with 4 indexes, the database performs 200,000 index entries per minute just to keep indexes current.

Measure the actual write cost before deploying. The pg_stat_user_indexes view shows idx_scan, idx_tup_read, and idx_tup_fetch for each index. An index with zeros across all three counters is pure write overhead with zero read benefit. Drop it. An index with moderate scan counts but heavy write volume might still be net negative if the queries it serves are infrequent.

There is a rough formula for write-heavy workloads: if you are doing more than 10,000 writes per minute and an index does not appear in the top 5 most-scanned indexes, it is a candidate for removal or consolidation into a composite index. On read-heavy workloads, you have more freedom to add indexes since writes are not the bottleneck.

Be especially cautious with expression indexes on high-write tables. The expression must be recomputed on every INSERT and UPDATE, even if the underlying column value has not meaningfully changed. A function like LOWER(email) or DATE_TRUNC('day', created_at) applied to millions of rows per hour adds measurable CPU overhead before you even consider the storage cost of the index entries themselves.

Assuming the planner knows best

PostgreSQL’s planner doesn’t always pick the optimal index. Check with EXPLAIN and test queries directly.

Common Production Failures

Composite index with wrong column order: You create (status, customer_id) when queries filter by customer_id first. The index is useless for those queries. Always index based on your actual query filter order, not some theoretical selectivity ranking.

Expression index desynchronized from queries: You create an expression index on LOWER(email) but queries sometimes use lower(email) with different capitalization, or the application sends email = 'User@Example.com'. Expression indexes must match the query expression exactly. Audit all code paths that query the column.

Partial index not matching the query filter: You create a partial index for WHERE status = 'active' but queries use WHERE status = 'ACTIVE' (uppercase) or WHERE status != 'inactive'. The partial index is ignored. Be consistent with the values your application actually uses.

Covering index missing frequently selected columns: You include id and created_at in the covering index but the query also selects status. The heap is still touched. Verify the covering index covers the entire SELECT list before assuming it eliminates heap access.

Expression index bloating on heavy UPDATE: The expression LOWER(email) must be recomputed on every INSERT and UPDATE to every row. On a table with 100M rows and heavy write volume, the maintenance cost outweighs the query benefit. Measure write overhead before deploying expression indexes at scale.

Capacity Estimation: Composite Index Sizing

Composite indexes grow faster than single-column indexes. Here is how to estimate the cost.

A composite index on (customer_id, status, created_at) where customer_id is INT (4 bytes), status is VARCHAR(20) in the index (up to 20 bytes), and created_at is TIMESTAMP (8 bytes) totals roughly 32 bytes per index entry. For 100 million rows, that is 3.2 GB for the leaf level alone. Add the internal B-tree node overhead of roughly 5–10% and you are near 3.5 GB. A single-column index on just customer_id would be roughly 1.2 GB for the same rows.

Every INSERT to the orders table must now update all three columns in the composite index. If you also have individual indexes on each column, a single INSERT could trigger four separate index updates. On a write-heavy workload inserting 50,000 rows per minute, the index maintenance becomes measurable CPU overhead. Use pg_stat_user_indexes to verify the index is actually used before accepting that write cost.

PostgreSQL estimates plan costs using statistics on the leading column. If the leading column has high cardinality (many distinct values), the planner estimates narrow result sets and prefers index scans. If statistics are stale on the leading column, the planner may assume a composite index is more selective than it actually is and choose it when a simpler index would be faster.

If a composite index is not reducing the result set significantly over the leading column alone, the composite is not earning its storage and write cost. Use EXPLAIN to compare plans with and without the composite index.

Quick Recap Checklist

Put equality columns before range columns in composite indexes
Leftmost prefix rule: skipping the leading column breaks index usage
Covering indexes with INCLUDE eliminate heap fetches but add write overhead
Partial indexes only help when the WHERE clause matches the index predicate
Expression indexes must match the query expression exactly
Low selectivity indexes can still help for sorting and locality
Run ANALYZE after bulk loads to keep statistics current
VACUUM keeps visibility maps fresh for index-only scan efficiency
Compare estimated rows vs actual rows to detect stale statistics
Index-only scan requires all columns in SELECT plus current visibility map

Interview Questions

1. You are designing an index for a query that filters by status, customer_id, and created_at. The application frequently changes — sometimes it queries by status alone, sometimes by customer_id alone, sometimes by all three. How do you approach index design?

The hard constraint is that PostgreSQL needs the leading column to be present for the index to help. If the application queries all three columns together, a composite index on (status, customer_id, created_at) works for those cases. But if queries also filter by customer_id alone, you need a separate index on (customer_id) because a composite index with status as the leading column cannot help when status is omitted. The practical answer is to profile the actual query mix, identify the most frequent combination, and start with a composite index on that combination. Add secondary single-column indexes only when profiling shows the planner is making bad choices without them.

2. A partial covering index sounds ideal for a query on recent pending orders. When would it backfire?

Partial covering indexes backfire when the predicate is not stable. If status = 'pending' represents 5% of rows today but grows to 40% after a product launch, the partial index no longer reflects actual query behavior and queries slow down unexpectedly. The partial index also requires the planner to prove the query matches the index predicate. If your ORM generates queries with different literal casing or slightly different date ranges, the partial index silently gets ignored. And on tables with high INSERT rates, the partial index offers no write savings — new rows are simply excluded from the index, which is correct behavior but does not reduce write amplification unless the WHERE clause matches on a stable, infrequent value.

3. How do expression indexes behave differently from regular indexes in terms of maintenance and planner usage?

Expression indexes must be updated on every INSERT and UPDATE even when the expression value does not change. If you index LOWER(email) and a user updates their address, the expression gets recomputed even though email changed in a case-insensitive way. On large tables with frequent writes, this adds up. For the planner, the expression must match exactly — WHERE LOWER(email) = 'test' uses the index but WHERE email ILIKE 'test' does not. This means expression indexes are fragile when application code changes how it formats queries. Regular indexes on columns are immune to this because the planner can use them for any operator that works on the column type.

4. You are reviewing a schema with 12 indexes on a table that handles 30,000 INSERTs per minute. The application team says reads are slow. What do you check?

Start with pg_stat_user_indexes to see which indexes actually get scanned and which get updated. Indexes with zero scans but constant idx_scan counts are pure write overhead. 12 indexes on 30,000 INSERTs per minute means roughly 360,000 index updates per minute — at that write volume, index overhead is measurable. Check whether some indexes are redundant: a composite index on (a, b) makes a separate index on (a) redundant for most queries. Check whether covering indexes are in use — if queries are doing index scans plus heap fetches, the covering index might eliminate the heap access without needing a separate query optimization.

5. You add a composite index on (customer_id, status, created_at) but EXPLAIN shows the planner is doing a sequential scan anyway. The query filters on customer_id=42 and status='shipped'. What could be causing this?

Several things: first, check if statistics are stale — the planner might estimate that the filtered result set is large (say, 80% of the table) and decide a sequential scan is faster. Run ANALYZE on the table. Second, check work_mem — if the intermediate result set is large and work_mem is small, the planner might avoid the index because it would require too many index-only scans. Third, check whether the index is actually smaller than the table — on small tables, sequential scans are always faster. Fourth, verify the index exists and is not disabled via enable_seqscan=off or similar. Finally, compare estimated rows vs actual rows — if they differ by orders of magnitude, the statistics are stale.

6. You create a covering index with INCLUDE (id, created_at) but the query still does an index scan, not an index-only scan. The SELECT is SELECT id, created_at FROM orders WHERE customer_id = 42. Why isn't index-only scan being chosen?

Index-only scans require a current visibility map. If the visibility map is stale, PostgreSQL cannot guarantee the rows are visible without checking the heap. Run VACUUM on the table to refresh the visibility map. On tables with heavy UPDATE or DELETE traffic, the visibility map can become stale quickly — the planner falls back to a regular index scan rather than risk returning invisible rows. If VACUUM does not help, the table might have too many dead tuples preventing a fresh visibility map. The other possibility is that the visibility map coverage is poor — if most pages have visible rows, PostgreSQL still has to check the heap for pages where visibility is unknown.

7. A query has WHERE LOWER(email) = $1 and you create an expression index on LOWER(email). The ORM generates queries with ILIKE instead of =. Does the expression index help?

No. Expression indexes match the exact expression — WHERE LOWER(email) = 'test' uses the index, but WHERE email ILIKE 'test' does not. ILIKE with a leading wildcard (%test%) cannot use any index anyway. Even ILIKE 'test%' (without leading wildcard) would not use a B-tree index on LOWER(email) because ILIKE is case-insensitive but the index stores lowercase values — PostgreSQL cannot convert ILIKE to a simple index lookup. The fix is to audit the application for consistent query patterns and either fix the ORM-generated code or document the exact query shape the expression index supports.

8. You have a high-selectivity composite index that seems correct for the query, but the planner ignores it in favor of a sequential scan. You run ANALYZE and the statistics are current. What configuration setting might be causing this?

Check enable_indexscan, enable_bitmapscan, and enable_seqscan — if index scanning is disabled, the planner falls back to sequential scan. Check random_page_cost — if it is set too high relative to seq_page_cost, the planner assumes random I/O is expensive and prefers sequential scans even for selective queries. On SSDs, set random_page_cost = 1.1 (or lower) instead of the default 4.0. Check effective_cache_size — if it is set too low, the planner assumes the working set does not fit in memory and overestimates the cost of index operations. Finally, check constraint_exclusion if you are using partitioning — misconfiguration there can cause the planner to ignore indexes on child tables.

9. You have a composite index on (a, b, c) and query WHERE a = AND c = . The query does not use the index efficiently for the c filter. Why?

The leftmost prefix rule — the index can use columns from left to right without gaps, but skipping b breaks the index for c. With WHERE a = AND c = , PostgreSQL uses the index for the equality on a, but c is evaluated as a filter after the index returns rows matching a. The index cannot help c directly because b is the middle column and was not in the query. The fix is either to add b to the query (if possible), or create a separate index on (a, c) if c filtering is important enough.

10. A partial index is not being used even though the WHERE clause matches. What are the most likely causes?

Three common causes: first, the partial index predicate does not match the query — if you created WHERE status = 'active' but the query uses WHERE status = 'ACTIVE' (different case), the index is ignored. Second, the query planner may estimate that the partial index is not selective enough to be useful — if the predicate matches 40% of rows, a sequential scan may be faster. Third, statistics may not be gathered on the partial index since partial indexes are not auto-analyzed. Run ANALYZE on the table and check the pg_indexes view to confirm the index exists and the predicate matches exactly.

11. You create an expression index on LOWER(email) but the query uses LOWER(email) = 'test' but the index is not used. What do you check?

The expression must match exactly, character for character. Check for trailing spaces, different function variants (e.g., lower() vs LOWER() — PostgreSQL normalizes function names), or invisible characters in the query. Also verify the index exists and is not disabled. Run EXPLAIN to see whether the index is being considered. If the ORM generates different SQL (e.g., using ILIKE instead of =), the index will not be used because ILIKE on an expression index is not supported. Audit the actual query text being generated versus what you expect.

12. When should you use a BRIN index instead of a B-tree index?

BRIN (Block Range Index) indexes are designed for naturally ordered data where the physical order of rows correlates with the indexed value — typically timestamp-based data where recent rows are appended at the end. BRIN stores the min/max values for a range of pages rather than individual rows, making it extremely small. Use BRIN for time-series tables where you query recent data often (e.g., WHERE created_at > '2026-01-01'), or for append-only tables with monotonic insertion. Do not use BRIN for high-cardinality columns, data that is frequently updated in-place, or queries that need random access patterns.

13. What is index bloat and how does it affect query performance? How do you detect and fix it?

Index bloat accumulates when pages in the index become dead due to UPDATEs and DELETEs — the index entries pointing to dead row versions are not immediately removed. Bloat increases index size beyond what is necessary and causes extra I/O during index scans (more pages to read). Detect bloat with pgstattuple extension or by comparing pg_relation_size vs the expected size based on row count. Fix bloat with REINDEX (blocks concurrent access) or REINDEX CONCURRENTLY (PostgreSQL 13+, non-blocking but slower). Regular autovacuum helps prevent bloat from accumulating.

14. A query filtering on a range of dates runs faster after you add an index on the date column. But a query filtering on EXTRACT(YEAR FROM created_at) = 2026 does not use the index. Why?

The range query WHERE created_at > '2026-01-01' uses the B-tree index directly because the comparison operators match the index. EXTRACT(YEAR FROM created_at) is a function applied to the column — a standard B-tree index on created_at cannot be used because the indexed values are not being compared directly. The fix is to create an expression index: CREATE INDEX idx_year ON orders(EXTRACT(YEAR FROM created_at)). Note that expression indexes only help when the expression matches exactly, including the EXTRACT syntax.

15. What is the difference between a covering index with INCLUDE and a composite index for making an index-only scan possible?

A composite index (a, b, c) includes all columns as index keys — the B-tree stores all three values in every index entry, sorted by a then b then c. A covering index with INCLUDE (a) INCLUDE (b, c) makes only the first column part of the index key while b and c are appended without being sorted. The INCLUDE approach is better when you want to avoid sorting overhead for non-key columns or when you want a smaller index (included columns are not used for sorting or equality lookups). For index-only scans, INCLUDE lets you add frequently-selected columns without bloating the index key structure.

16. You have a table with 50 indexes on it. What are the concrete write performance costs?

Every INSERT must update all 50 indexes — that is 50 index entries written per row. Every UPDATE must update all indexes where any indexed column changed. On a table with heavy write volume, index overhead is measurable: if each index entry takes 50 microseconds to write, 50 indexes means 2.5ms overhead per INSERT just for indexing. For 10,000 INSERTs per minute, that is 25 seconds of index overhead per minute. Use pg_stat_user_indexes to identify unused indexes and drop them. Also consider whether composite indexes can replace multiple single-column indexes.

17. How do you decide the order of columns in a composite index when both columns have equality conditions?

When both columns have equality conditions, put the more selective (higher cardinality) column first. The index narrows down faster at the first level, reducing the number of rows the second level must process. However, also consider query patterns — if you frequently query by column A alone but rarely query by column B alone, put A first regardless of selectivity. If you query both equally but also filter by A alone frequently, A-first is still correct. Test with EXPLAIN comparing both orderings and check estimated rows at each index level.

18. What happens to your partial covering index if the predicate column is updated to no longer match the partial index condition?

The index entry is removed when the row no longer matches the partial index predicate — the row is no longer part of the indexed subset. For example, with WHERE status = 'active', if you update status to 'inactive', that row's entry is removed from the index. This is correct behavior and means the partial index stays accurate. The row still exists in the table but is not in the index. Queries that rely on the partial index for data retrieval will simply not find those updated rows, which is the intended behavior.

19. Can PostgreSQL use multiple indexes simultaneously for a single query? Under what conditions?

PostgreSQL can combine multiple indexes using bitmap index scans. It builds a bitmap of matching row pointers from each index, then intersects (AND) or unions (OR) the bitmaps based on the WHERE clause. This works when you have WHERE a = 1 AND b = 2 with separate indexes on a and b. Bitmap scan is chosen when neither index is selective enough alone but together they narrow down results efficiently. It is efficient for moderately selective queries but can be slower than a single index scan for highly selective queries.

20. An index exists on (created_at DESC) but a query with ORDER BY created_at DESC is doing a sort instead of using the index. Why?

B-tree indexes store values in sorted order and can be scanned forward or backward — the index on (created_at DESC) can satisfy both ORDER BY created_at DESC and ORDER BY created_at ASC. If the query is doing a sort instead, it is likely because the index is not selective enough to be useful (the planner estimates a sequential scan with sort is faster), or the LIMIT is small relative to the overall table making index scan less beneficial. Check enable_indexscan and verify the planner is considering the index. Also confirm the statistics are current with ANALYZE.

Conclusion

Good index design comes down to understanding your query patterns. Composite index order follows the leftmost prefix rule: equality columns first, range columns last. Covering indexes eliminate heap fetches for frequently-run queries. Partial indexes keep index size manageable when you only need a subset of data. Expression indexes handle computed values but come with write overhead.

Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes

Introduction

How Composite Index Ordering Affects Queries

Composite Index Column Ordering

The Leftmost Prefix Rule

Equality First, Range Last

Which Column Goes First?

Covering Indexes for Query Optimization

When Covering Indexes Help

When Covering Indexes Hurt

Partial Indexes for Partitioned Data

Common Use Cases

Recent Data

Active Status

Specific Values

Partial Index Trade-offs

Expression Indexes

Common Examples

Case-Insensitive Search

Date Truncation

JSON Extraction

Expression Index Considerations

Index Selectivity

Low Selectivity Example

High Selectivity Example

When Low Selectivity Can Still Help

Index Selectivity vs. Query Selectivity

Index Design: When to Use Each Type

Common Indexing Mistakes

Ignoring the query pattern

Not considering write overhead

Assuming the planner knows best

Common Production Failures

Capacity Estimation: Composite Index Sizing

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Database Indexes: B-Tree, Hash, Covering, and Beyond

Denormalization

Query Execution Plans: Reading EXPLAIN Output Like a Pro