Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes
Master composite index column ordering, covering indexes for query optimization, partial indexes for partitioned data, expression indexes, and selectivity.
Index Design Clinic: Composite Indexes, Covering Indexes, and Partial Indexes
Indexes are a double-edged sword. The right index makes queries fly. The wrong one wastes storage, slows down writes, and gives you nothing.
How Composite Index Ordering Affects Queries
flowchart LR
subgraph Index["Index: (customer_id, status, created_at)"]
C1["customer_id=1"]
C2["customer_id=2"]
C3["customer_id=3"]
end
C1 -->|= 1| S1["status=shipped"]
C1 -->|= 1| S2["status=pending"]
C2 -->|= 2| S3["status=shipped"]
C2 -->|= 2| S4["status=cancelled"]
C3 -->|= 3| S5["status=pending"]
S1 -->|Range >| R1["created_at > Jan"]
S3 -->|Range >| R2["created_at > Jan"]
This shows how a composite index on (customer_id, status, created_at) narrows down. Equality on customer_id first, then equality on status, then range on created_at. Skip the leftmost column and the index cannot help at all.
Composite Index Column Ordering
A composite index covers multiple columns. The order matters more than most people realize.
Consider a query:
SELECT * FROM orders
WHERE customer_id = 42
AND status = 'shipped'
ORDER BY created_at;
The intuitive approach is to index status first if it seems more selective. Do not do this.
The Leftmost Prefix Rule
PostgreSQL can only use a composite index for a range scan or equality check if you’re querying from the leftmost column. Given this index:
CREATE INDEX idx_orders_composite ON orders(customer_id, status, created_at);
The index supports these query patterns:
WHERE customer_id = 42WHERE customer_id = 42 AND status = 'shipped'WHERE customer_id = 42 AND status = 'shipped' AND created_at > '2026-01-01'
But it does NOT support:
WHERE status = 'shipped'(missing leftmost column)WHERE status = 'shipped' AND created_at > '2026-01-01'(missing leftmost column)
Equality First, Range Last
When you have both equality and range conditions, put equality columns first:
-- Good: equality columns (customer_id, status) before range (created_at)
CREATE INDEX idx_orders_good ON orders(customer_id, status, created_at);
-- Bad: range column in the middle breaks index usage for later columns
CREATE INDEX idx_orders_bad ON orders(customer_id, created_at, status);
Which Column Goes First?
When multiple equality columns exist, put the more selective column first. You want the index to narrow down results as quickly as possible at the first level.
-- If customers have many orders but status filters significantly
-- More selective: customer_id (1 customer vs many statuses)
CREATE INDEX idx_orders ON orders(customer_id, status, created_at);
-- If statuses are roughly equally distributed
-- Consider which column has better statistics in your version of PostgreSQL
Covering Indexes for Query Optimization
A covering index includes all the columns a query needs, so PostgreSQL never has to touch the heap.
-- Without covering index, query requires heap fetch
CREATE INDEX idx_orders_customer ON orders(customer_id);
SELECT id, created_at FROM orders WHERE customer_id = 42;
-- Index scan + heap fetch for id, created_at
-- With covering index, query is index-only
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) INCLUDE (id, created_at);
SELECT id, created_at FROM orders WHERE customer_id = 42;
-- Index Only Scan - no heap access needed
When Covering Indexes Help
Covering indexes shine when your queries return a small number of columns compared to the table width, and when those queries run frequently (the index cost spreads across many executions). They also help when heap access is expensive due to large rows or slow storage.
When Covering Indexes Hurt
They can backfire on wide tables where including columns duplicates significant data. Heavy UPDATE workloads suffer because every index must be updated. And if your queries are already selective enough to use the index efficiently, the covering index adds maintenance cost for little gain.
The INCLUDE clause is PostgreSQL’s way to add columns without making them part of the index key. This avoids the sorting and B-tree maintenance overhead of full index columns.
Partial Indexes for Partitioned Data
A partial index only covers rows that match a condition. This makes them smaller and faster to maintain.
Common Use Cases
Recent Data
-- Only index recent orders, assuming old ones are rarely queried
CREATE INDEX idx_orders_recent ON orders(created_at, customer_id)
WHERE created_at > '2025-01-01';
Active Status
-- Only index active users
CREATE INDEX idx_users_active ON users(email) WHERE status = 'active';
Specific Values
-- Index only failed transactions for debugging
CREATE INDEX idx_transactions_failed
ON transactions(transaction_id, created_at)
WHERE status = 'failed';
Partial Index Trade-offs
The downside: the optimizer might not choose a partial index when the WHERE clause doesn’t match your filter. PostgreSQL has to prove the partial index is relevant.
-- This query uses idx_orders_recent (good)
SELECT * FROM orders WHERE created_at > '2026-02-01';
-- This query probably won't use idx_orders_recent
SELECT * FROM orders WHERE created_at > '2025-06-01'; -- too old
Expression Indexes
Sometimes you need to index a computed value. Expression indexes let you index functions and expressions.
Common Examples
Case-Insensitive Search
-- Without expression index, LIKE lower() can't use a standard index
SELECT * FROM users WHERE LOWER(email) = LOWER('test@example.com');
-- With expression index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = LOWER('test@example.com');
-- Now uses the index
Date Truncation
-- If you frequently query by day
CREATE INDEX idx_orders_day ON orders(DATE_TRUNC('day', created_at));
SELECT * FROM orders
WHERE DATE_TRUNC('day', created_at) = '2026-03-26';
JSON Extraction
-- Index a specific JSON field
CREATE INDEX idx_orders_metadata ON orders((metadata->>'customer_type'));
SELECT * FROM orders WHERE metadata->>'customer_type' = 'premium';
Expression Index Considerations
- The expression is evaluated for every row on INSERT and UPDATE. Complex expressions add overhead.
- The optimizer has to match the exact expression. Small differences prevent usage.
- Expression indexes cannot be used for ORDER BY unless the expression is used in the ORDER BY clause.
Index Selectivity
Selectivity is the ratio of distinct index values to total rows. High selectivity (close to 1.0) means each value is nearly unique. Low selectivity means many rows share values.
Low Selectivity Example
-- If 95% of orders are 'pending', indexing status has poor selectivity
CREATE INDEX idx_orders_status ON orders(status);
-- Most queries still have to scan most of the index
High Selectivity Example
-- customer_id might be unique or near-unique
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Each lookup returns very few rows
When Low Selectivity Can Still Help
Low-selectivity indexes sometimes pay off in unexpected ways. If your pending orders are always recent and you mostly query recent pending orders, the index gives you locality even though many rows share the status value. Composite indexes can also rescue poor selectivity: (status, created_at) might be highly selective even if status alone is not. And for sorting, an index on status pre-sorts your results regardless of how many rows share each status value.
Index Selectivity vs. Query Selectivity
The key distinction: index selectivity is about the index itself, but query selectivity is about your specific query. A low-selectivity index on status is useless for SELECT * FROM orders, but if your app always queries WHERE status = 'pending' plus a date range, the index becomes more useful because the date range reduces the result set.
Index Design: When to Use Each Type
| Index Type | Use when | Avoid when |
|---|---|---|
| B-tree composite (equality first) | Queries filter on leading column(s) then range, plus ORDER BY | Queries skip the leading column |
| Covering index with INCLUDE | High-frequency queries selecting few columns, table is wide | Writes dominate reads, or query selects many columns |
| Partial index | Column has skewed distribution, only a subset is queried | Most queries hit the excluded rows, or condition changes frequently |
| Expression index | Query uses function on column, cannot use regular index | Expression is expensive to compute, changes frequently |
| Partial covering | High-frequency query on subset of rows with small SELECT | Subset is not stable, or query Selects many columns |
Common Indexing Mistakes
Mistake 1: Indexing Every Column
-- Don't do this
CREATE INDEX idx_orders_everything ON orders(id, customer_id, status, created_at, updated_at, total, ...);
This bloats your index size and slows down inserts. Only index columns you actually filter or sort by.
Mistake 2: Ignoring the Query Pattern
-- Query: WHERE status = 'shipped' AND customer_id = 42
CREATE INDEX idx_wrong ON orders(status, customer_id);
This order is wrong even though it technically works due to leftmost prefix. Put the more selective column first.
Mistake 3: Not Considering Write Overhead
Every index adds write overhead. A table with 5 indexes takes longer to UPDATE than a table with 2. Profile your workload before adding indexes.
Mistake 4: Assuming the Planner Knows Best
PostgreSQL’s planner doesn’t always pick the optimal index. Check with EXPLAIN and test queries directly.
Common Production Failures
Composite index with wrong column order: You create (status, customer_id) when queries filter by customer_id first. The index is useless for those queries. Always index based on your actual query filter order, not some theoretical selectivity ranking.
Expression index desynchronized from queries: You create an expression index on LOWER(email) but queries sometimes use lower(email) with different capitalization, or the application sends email = 'User@Example.com'. Expression indexes must match the query expression exactly. Audit all code paths that query the column.
Partial index not matching the query filter: You create a partial index for WHERE status = 'active' but queries use WHERE status = 'ACTIVE' (uppercase) or WHERE status != 'inactive'. The partial index is ignored. Be consistent with the values your application actually uses.
Covering index missing frequently selected columns: You include id and created_at in the covering index but the query also selects status. The heap is still touched. Verify the covering index covers the entire SELECT list before assuming it eliminates heap access.
Expression index bloating on heavy UPDATE: The expression LOWER(email) must be recomputed on every INSERT and UPDATE to every row. On a table with 100M rows and heavy write volume, the maintenance cost outweighs the query benefit. Measure write overhead before deploying expression indexes at scale.
Capacity Estimation: Composite Index Sizing
Composite indexes grow faster than single-column indexes. Here is how to estimate the cost.
A composite index on (customer_id, status, created_at) where customer_id is INT (4 bytes), status is VARCHAR(20) in the index (up to 20 bytes), and created_at is TIMESTAMP (8 bytes) totals roughly 32 bytes per index entry. For 100 million rows, that is 3.2 GB for the leaf level alone. Add the internal B-tree node overhead of roughly 5–10% and you are near 3.5 GB. A single-column index on just customer_id would be roughly 1.2 GB for the same rows.
Every INSERT to the orders table must now update all three columns in the composite index. If you also have individual indexes on each column, a single INSERT could trigger four separate index updates. On a write-heavy workload inserting 50,000 rows per minute, the index maintenance becomes measurable CPU overhead. Use pg_stat_user_indexes to verify the index is actually used before accepting that write cost.
PostgreSQL estimates plan costs using statistics on the leading column. If the leading column has high cardinality (many distinct values), the planner estimates narrow result sets and prefers index scans. If statistics are stale on the leading column, the planner may assume a composite index is more selective than it actually is and choose it when a simpler index would be faster.
If a composite index is not reducing the result set significantly over the leading column alone, the composite is not earning its storage and write cost. Use EXPLAIN to compare plans with and without the composite index.
Interview Questions
Q: You are designing an index for a query that filters by status, customer_id, and created_at. The application frequently changes — sometimes it queries by status alone, sometimes by customer_id alone, sometimes by all three. How do you approach index design?
The hard constraint is that PostgreSQL needs the leading column to be present for the index to help. If the application queries all three columns together, a composite index on (status, customer_id, created_at) works for those cases. But if queries also filter by customer_id alone, you need a separate index on (customer_id) because a composite index with status as the leading column cannot help when status is omitted. The practical answer is to profile the actual query mix, identify the most frequent combination, and start with a composite index on that combination. Add secondary single-column indexes only when profiling shows the planner is making bad choices without them.
Q: A partial covering index sounds ideal for a query on recent pending orders. When would it backfire?
Partial covering indexes backfire when the predicate is not stable. If status = 'pending' represents 5% of rows today but grows to 40% after a product launch, the partial index no longer reflects actual query behavior and queries slow down unexpectedly. The partial index also requires the planner to prove the query matches the index predicate. If your ORM generates queries with different literal casing or slightly different date ranges, the partial index silently gets ignored. And on tables with high INSERT rates, the partial index offers no write savings — new rows are simply excluded from the index, which is correct behavior but does not reduce write amplification unless the WHERE clause matches on a stable, infrequent value.
Q: How do expression indexes behave differently from regular indexes in terms of maintenance and planner usage?
Expression indexes must be updated on every INSERT and UPDATE even when the expression value does not change. If you index LOWER(email) and a user updates their address, the expression gets recomputed even though email changed in a case-insensitive way. On large tables with frequent writes, this adds up. For the planner, the expression must match exactly — WHERE LOWER(email) = 'test' uses the index but WHERE email ILIKE 'test' does not. This means expression indexes are fragile when application code changes how it formats queries. Regular indexes on columns are immune to this because the planner can use them for any operator that works on the column type.
Q: You are reviewing a schema with 12 indexes on a table that handles 30,000 INSERTs per minute. The application team says reads are slow. What do you check?
Start with pg_stat_user_indexes to see which indexes actually get scanned and which get updated. Indexes with zero scans but constant idx_scan counts are pure write overhead. 12 indexes on 30,000 INSERTs per minute means roughly 360,000 index updates per minute — at that write volume, index overhead is measurable. Check whether some indexes are redundant: a composite index on (a, b) makes a separate index on (a) redundant for most queries. Check whether covering indexes are in use — if queries are doing index scans plus heap fetches, the covering index might eliminate the heap access without needing a separate query optimization.
Conclusion
Good index design comes down to understanding your query patterns. Composite index order follows the leftmost prefix rule: equality columns first, range columns last. Covering indexes eliminate heap fetches for frequently-run queries. Partial indexes keep index size manageable when you only need a subset of data. Expression indexes handle computed values but come with write overhead.
For further reading on how the planner uses these indexes, see our guide on Query Execution Plans and Database Indexes.
Category
Related Posts
Database Indexes: B-Tree, Hash, Covering, and Beyond
A practical guide to database indexes. Learn when to use B-tree, hash, composite, and partial indexes, understand index maintenance overhead, and avoid common performance traps.
Denormalization
When to intentionally duplicate data for read performance. Tradeoffs with normalization, update anomalies, and application-level denormalization strategies.
Query Execution Plans: Reading EXPLAIN Output Like a Pro
Learn to decode PostgreSQL EXPLAIN output, understand sequential vs index scans, optimize join orders, and compare bitmap heap scans with index-only scans.