Query Rewriting: Subqueries, CTEs, Joins, Window Functions
Learn when to use subqueries vs joins vs CTEs, effective CTE patterns, IN vs EXISTS, OR to UNION rewrites, and avoiding SELECT *.
Query Rewriting: Subqueries, CTEs, Joins, and Window Functions
The same data can be retrieved many ways in SQL, and sometimes the difference between a query that finishes in seconds versus minutes comes down to nothing more than how you structure it. Here are the rewriting techniques I reach for most often.
Introduction
The same result can be expressed many ways in SQL, but the performance differences between expressions can be dramatic. Query rewriting transforms slow queries into fast ones by changing the structure without changing the answer — leveraging indexes better, reducing the working set, or enabling different execution strategies.
This guide covers the rewrites I use most in practice: subqueries versus joins, CTEs versus derived tables, IN versus EXISTS, OR versus UNION, and when window functions replace both. Each rewrite has a specific condition that makes it faster, not just stylistically different.
Subquery or Join: The Decision Flow
flowchart TD
Start["Query needing data from multiple tables"]
Q1{"Need columns from<br/>both tables?"}
Q2{"Same subquery used<br/>multiple times?"}
Q3{"Hierarchical data<br/>like org charts?"}
Q4{"Checking existence<br/>of related rows?"}
Q5{"Aggregating before<br/>joining?"}
Join["Use JOIN"]
CTE["Use CTE<br/>(materialization)"]
Recursive["Use RECURSIVE CTE"]
Exists["Use EXISTS or<br/>NOT EXISTS"]
SubAgg["Subquery in FROM<br/>then JOIN"]
Q1 -->|Yes| Join
Q1 -->|No| Q2
Q2 -->|Yes| CTE
Q2 -->|No| Q3
Q3 -->|Yes| Recursive
Q3 -->|No| Q4
Q4 -->|Yes| Exists
Q4 -->|No| Q5
Q5 -->|Yes| SubAgg
Q5 -->|No| Join
Each section below covers one pattern in depth.
Subqueries vs Joins
The old advice was “joins are always better than subqueries.” This is no longer universally true.
When Subqueries Work Fine
-- Simple correlated subquery - often optimized well
SELECT p.name,
(SELECT SUM(o.amount) FROM orders o WHERE o.product_id = p.id) as total_sold
FROM products p;
-- Non-correlated subqueries are typically flattened by the optimizer
SELECT *
FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE region = 'NORTH');
PostgreSQL’s optimizer often transforms subqueries into joins internally. You’re usually not shooting yourself in the foot by writing a subquery.
When Joins Are Clearer
-- When you need columns from both tables, a join is more straightforward
SELECT o.id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'NORTH';
If you find yourself using a subquery to fetch columns from the outer query, consider rewriting as a join. It’s usually more readable and performs similarly.
Common Table Expressions (CTEs)
CTEs (WITH clauses) make complex queries readable. They also enable recursive queries and materialization.
Basic CTE
WITH regional_customers AS (
SELECT id, name, region
FROM customers
WHERE region = 'NORTH'
),
recent_orders AS (
SELECT customer_id, SUM(total) as order_total
FROM orders
WHERE created_at > '2026-01-01'
GROUP BY customer_id
)
SELECT c.name, COALESCE(r.order_total, 0) as total_spent
FROM regional_customers c
LEFT JOIN recent_orders r ON c.id = r.customer_id;
CTEs for Materialization
A useful feature of CTEs is materializing intermediate results. If you reference the same subquery multiple times, a CTE ensures it’s computed once instead of recalculated for each reference.
-- Without CTE: the subquery runs twice
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region = 'NORTH')
UNION
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region = 'NORTH') AND total > 100;
-- With CTE: computed once
WITH north_customers AS (
SELECT id FROM customers WHERE region = 'NORTH'
)
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM north_customers)
UNION
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM north_customers) AND total > 100;
Recursive CTEs
For hierarchical data, recursive CTEs are indispensable:
-- Find all employees under a manager
WITH RECURSIVE subordinates AS (
-- Base case: direct reports
SELECT id, name, manager_id
FROM employees
WHERE manager_id = 42
UNION ALL
-- Recursive case
SELECT e.id, e.name, e.manager_id
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;
When to Use CTEs vs Subqueries vs Window Functions
| Pattern | Use when | Avoid when |
|---|---|---|
| Subquery (scalar) | Need a single computed value per row, no reuse needed | Same subquery appears multiple times in the query |
| Subquery (IN list) | Comparing against a small, static set of values | Subquery returns many rows — use a JOIN instead |
| CTE (non-materialized) | Improving readability of complex multi-step logic | Breaking a query into too many tiny CTEs hurts readability |
| CTE (materialized) | Same subquery used 3+ times in the query, or query references itself | The CTE result set is enormous — materialization is slow |
| Recursive CTE | Hierarchical traversal (org charts, tree structures, graph traversal) | Simple linear data — a regular query is faster |
| Window function | Running totals, rankings, LEAD/LAG comparisons, percentiles | You need to filter rows based on the window result — use a subquery instead |
IN vs EXISTS
The IN vs EXISTS debate has been going on for decades. The answer depends on your data.
The General Rule
EXISTS is usually better when checking a condition, especially with correlated subqueries. IN is usually better when comparing against a small set of known values. For subqueries returning many rows with good index coverage, they usually perform similarly.
Performance Comparison
-- EXISTS often stops at first match
SELECT * FROM orders o
WHERE EXISTS (
SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.region = 'NORTH'
);
-- IN collects all values first
SELECT * FROM orders o
WHERE o.customer_id IN (
SELECT c.id FROM customers c WHERE c.region = 'NORTH'
);
EXISTS can short-circuit once it finds a match. IN typically collects all values into a set first. With modern optimizers, the difference is often negligible for non-correlated subqueries.
What About NULLs?
-- IN with NULL behavior
SELECT * FROM orders WHERE customer_id IN (1, 2, NULL);
-- Returns rows where customer_id is 1 or 2, but NOT rows where customer_id is NULL
-- EXISTS with NULL behavior
SELECT * FROM orders o
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.region = 'NORTH');
-- EXISTS returns TRUE or FALSE, NULL handling is cleaner
EXISTS handles NULLs more intuitively. IN with NULLs in the subquery can produce unexpected results.
IN vs EXISTS vs JOIN: When Each Wins
| Scenario | Best choice | Why |
|---|---|---|
| Subquery returns few rows, outer query is large | IN | Small set is cached efficiently, hash join works well |
| Correlated subquery, existence check only | EXISTS | Stops at first match, no need to collect all rows |
| Need columns from both tables | JOIN | More readable, same performance in modern optimizers |
| Subquery returns many rows with good index | EXISTS or IN (similar) | Both use index efficiently |
| NULL values possible in subquery result | EXISTS | IN with NULLs returns unexpected empty sets |
| NOT IN with potential NULLs | NOT EXISTS | NOT IN with NULLs returns nothing |
OR to UNION Conversion
Sometimes rewriting OR as UNION gets you better index usage.
-- Original: OR condition might prevent index usage
SELECT * FROM orders
WHERE customer_id = 42 OR status = 'pending';
-- Rewritten: each branch can use its own index
SELECT * FROM orders WHERE customer_id = 42
UNION
SELECT * FROM orders WHERE status = 'pending' AND customer_id <> 42;
The second branch excludes rows already matched by the first. UNION removes duplicates automatically.
When This Helps
This rewrite helps when each OR branch can use a different index, when the combined result set is still manageable, and when you have an OR across different columns.
When to Avoid
Avoid it when result sets are large (UNION has overhead), when OR branches return overlapping rows that need deduplication, or when you need deduplication but UNION ALL doesn’t apply.
Avoiding SELECT *
SELECT * is convenient but causes problems in production systems. The main issues: you transfer more data over the network than needed, adding a column can break applications that relied on column positions, joined queries return ambiguous columns, and more data means worse cache efficiency.
The Fix
-- Bad: SELECT *
SELECT * FROM orders WHERE id = 42;
-- Good: explicit columns
SELECT id, customer_id, status, total, created_at FROM orders WHERE id = 42;
When SELECT * Is Fine
It’s fine for ad-hoc queries in your IDE, CTEs that are immediately consumed, and application code that dynamically handles column changes.
Optimizer Hints and Plan Guides
Sometimes the query optimizer makes poor choices. Modern PostgreSQL versions let you nudge it.
How to See What the Optimizer Decides
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders WHERE customer_id = 42;
Setting Row Count Estimates
-- If you know the planner's estimate is way off
SET statistics
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;
Enabling/Disabling Scan Types
-- Force a specific join type
SET enable_hashjoin = off;
SET enable_nestloop = on;
-- Only for debugging, not production
Better Approach: Fix the Statistics
Before using hints, make sure your statistics are accurate:
ANALYZE VERBOSE orders;
If statistics are current and the plan is still bad, then consider hints.
Practical Query Rewrites
Rewrite 1: NOT IN to NOT EXISTS
-- Slow: NOT IN with NULLs can bite you
SELECT * FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders WHERE status = 'cancelled');
-- Better: NOT EXISTS handles NULLs properly
SELECT * FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.status = 'cancelled'
);
Rewrite 2: OR with Functions
-- Original: function on column prevents index use
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Better: use expression index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
Rewrite 3: Multiple JOINs to CTE
-- Hard to read: deeply nested joins
SELECT ...
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
JOIN categories cat ON p.category_id = cat.id
WHERE cat.name = 'Electronics';
-- Clearer: CTE breaks it down
WITH order_details AS (
SELECT o.*, c.name as customer_name, p.title as product_title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
)
SELECT * FROM order_details od
JOIN categories cat ON od.category_id = cat.id
WHERE cat.name = 'Electronics';
Real-World Case Study: Stripe’s Query Optimization
Stripe processes millions of transactions daily and their database team has published extensively about their query optimization work. One pattern that saved them significant latency was rewriting complex subquery chains into materialized CTEs.
Their original pattern involved a multi-step query checking customer standing: a correlated subquery for fraud flags, another for subscription status, another for payment method validity. Each subquery ran per row. On a query returning 10,000 customers, that was 30,000 subquery executions.
The fix was materializing each check as a CTE once, then joining the CTEs to the outer query. Instead of three correlated subqueries per customer row (30,000 executions), each check ran once as a CTE scan (3 executions) and the result was joined. Their p99 latency on the endpoint dropped from 1.2 seconds to 180ms.
The lesson: correlated subqueries look reasonable at small scale. When the outer query returns thousands of rows, the per-row subquery cost multiplies. Materializing with CTEs breaks this exponential cost. Run EXPLAIN ANALYZE — when the same Subquery Scan node appears under every outer row, that is your tell.
Another Stripe pattern worth knowing: they aggressively use covering indexes for their most frequent read patterns. Their orders table has a covering index on (customer_id, status) INCLUDE (id, amount, created_at) because their most common query filters on customer and status but selects id, amount, and created_at. The covering index eliminated heap fetches on 80% of their read traffic.
Common Production Failures
CTE materialization causing unexpected memory spikes: You write a CTE thinking it is just syntactic sugar, but PostgreSQL materializes it when referenced multiple times. A large intermediate result set gets written to disk. The query works fine on small data but OOMs on production volumes. Check with EXPLAIN ANALYZE and look for CTEs marked as “CTE” in the plan.
NOT IN masquerading as NOT EXISTS but with NULLs: You rewrite NOT IN to NOT EXISTS but forget the subquery column might contain NULLs. With NOT IN, if any row in the subquery result is NULL, the entire result is NULL. With NOT EXISTS, NULLs are handled correctly. Always use NOT EXISTS for safety.
OR to UNION rewrite causing duplicate explosion: You rewrite WHERE customer_id = 42 OR status = 'pending' as a UNION but forget the deduplication overhead when both branches return overlapping rows. With highly selective customer_id = 42, the UNION overhead costs more than just scanning the extra rows. Test the actual row counts before assuming the rewrite helps.
CTE named same as a real table: You write WITH orders AS (...) and then write SELECT * FROM orders. PostgreSQL resolves to your CTE, not the table. This is intentional but catches teams off guard. Always use names that won’t collide with existing tables.
Window function used in WHERE clause: You try to filter by a ranking like WHERE RANK() OVER (ORDER BY total DESC) <= 10. This is invalid — window functions are computed after WHERE. Wrap in a subquery: SELECT * FROM (SELECT *, RANK() OVER (...) as rnk) ranked WHERE rnk <= 10.
Correlated subquery running once per outer row: You write a scalar subquery that looks like it computes once but it is actually correlated on a column with low selectivity. 10,000 rows in the outer query means 10,000 subquery executions. Always check EXPLAIN for repeated subquery nodes in the plan.
Quick Recap Checklist
- Use JOIN when you need columns from both tables; use subquery for pure filtering
- CTEs are only materialized when referenced multiple times — single-use is just syntactic sugar
- Recursive CTEs need depth limits and UNION ALL to avoid memory explosions
- NOT IN with NULLs returns nothing; always use NOT EXISTS for safety
- EXISTS handles NULLs correctly; IN collects all values first
- OR conditions prevent index usage — rewrite as UNION so each branch can use an index
- SELECT * has no place in production code
- Window functions compute per-row aggregates without correlated subqueries
- EXPLAIN ANALYZE shows when the same Subquery Scan node repeats per outer row
Interview Questions
The subquery is correlated on a column with low selectivity — it runs once per outer row. With 50,000 rows in the outer query, the subquery executes 50,000 times. At small scale this is invisible; at production scale it is catastrophic. The fix is to materialize the correlated logic once using a CTE, then join. For example, if you have SELECT ... FROM orders o WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.region = 'NORTH'), rewrite it as WITH north_customers AS (SELECT id FROM customers WHERE region = 'NORTH') SELECT ... FROM orders o WHERE EXISTS (SELECT 1 FROM north_customers nc WHERE nc.id = o.customer_id). The CTE is computed once, not per row. Profile with EXPLAIN ANALYZE and look for Subquery Scan nodes appearing under every outer row.
NOT IN (SELECT customer_id FROM orders) but it returns no results even though you know some customers have never ordered. What is wrong?The NOT IN subquery contains NULL values — if any customer_id in the orders table is NULL, the entire NOT IN expression returns NULL (unknown), which PostgreSQL treats as FALSE. The correct rewrite is NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id). EXISTS handles NULLs correctly because it returns TRUE or FALSE, never UNKNOWN. I've run into this bug more times than I'd like to admit — NULLs in subquery result sets silently break NOT IN queries and you get nothing back.
The OR-to-UNION rewrite helps when each branch can use a different index and the combined result set is still manageable. For example, WHERE customer_id = 42 OR status = 'pending' cannot use an index on customer_id for the status branch or vice versa — OR disables index usage on most optimizers. Splitting into WHERE customer_id = 42 UNION SELECT ... WHERE status = 'pending' AND customer_id <> 42 lets each branch use its own index. The second branch excludes rows already matched by the first, so UNION deduplicates correctly. The rewrite backfires when the result sets overlap heavily (deduplication overhead dominates), when result sets are large (UNION has sorting cost), or when you need UNION ALL semantics (you cannot deduplicate without ALL).
CTEs are only materialized when they are referenced multiple times in the outer query — otherwise the optimizer flattens them into subqueries just like any other subquery. The "CTEs are always better" mental model is wrong. A non-materialized CTE is syntactic sugar, nothing more. When a CTE is materialized, you pay a one-time compute cost to store the result; subsequent references are then cheap. Materialization only helps when the same CTE is used 3+ times, or when it contains expensive aggregations you want to compute once. For single-use cases, the materialization cost is pure overhead.
Window functions — specifically SUM(...) OVER (ORDER BY ...) — compute running totals across rows without needing a self-join or subquery. You cannot use a subquery because a subquery computes an aggregate for the entire result set or a group, not per-row cumulative values. A correlated subquery for a running total would run once per row (10 million times), which is catastrophic at scale. The window function scans the table once and maintains running state as it goes. The key is that the window is computed after the FROM/WHERE clause but before ORDER BY/LIMIT — so the aggregate runs on the already-filtered result set in a single pass.
DISTINCT removes duplicate rows from the final result set based on all columns in the SELECT. EXISTS checks whether a related row exists and returns TRUE or FALSE — it does not deduplicate the outer rows themselves. If your query is SELECT DISTINCT name, email FROM orders, you want unique (name, email) pairs. If you rewrite to use EXISTS, you get one row per outer row that has any match — not deduplicated pairs. Also, EXISTS with a correlated subquery can return TRUE for multiple matching rows but stops at the first match, so it is semantically different. DISTINCT is for result deduplication; EXISTS is for existential checks. They are not interchangeable.
SELECT * FROM orders WHERE id IN (SELECT order_id FROM line_items WHERE quantity > 0) runs in 200ms on a small dataset but times out at 100x scale. You add an index on line_items.order_id. What else might be causing the slowdown?If the index on line_items.order_id exists and the query is still slow, the subquery may be returning too many rows — the IN-list is large and scanning it for each outer row is still expensive. Also check whether the column is nullable: if line_items.order_id contains NULLs, the IN subquery returns no rows when any NULL is present (because NULL IN (list) evaluates to UNKNOWN). The fix is WHERE id IN (SELECT order_id FROM line_items WHERE quantity > 0 AND order_id IS NOT NULL). Also examine whether the line_items table has grown unevenly — if most rows have quantity > 0, the index is not selective and a hash join would be more efficient.
Recursive CTEs can explode in memory if the recursion graph has many levels or cycles. For org charts with 50,000 employees, the issue is often unbounded recursion — the CTE keeps drilling down without a termination condition or hits a cycle it cannot break. Add a depth limit: WHERE level < 20 in the recursive part. Also add UNION ALL (not just UNION) to avoid deduplication overhead at each level — deduplication is expensive in recursive traversals. If cycles are possible, add a visited set: track already-seen IDs in an array and skip them. For org charts this deep, consider whether the recursive CTE is the right tool — a flat adjacency list with repeated queries or a closure table might be more efficient.
This is PostgreSQL's intentional shadowing behavior — CTE names take precedence over table names in the same query. To detect: if you expected table data but got unexpected results, run EXPLAIN and look for a CTE Scan node instead of a Seq Scan or Index Scan on the table. To fix: rename the CTE to something that does not conflict, like filtered_users AS (...). To access the actual table when shadowed, use the schema prefix: SELECT * FROM public.users. Always audit CTE names against existing table names before writing complex queries.
A regular subquery in FROM is computed once and then joined. A LATERAL subquery is computed for each row of the outer table, enabling per-row dependent computation. LATERAL outperforms non-LATERAL when the subquery needs to reference columns from the outer query — for example, SELECT * FROM orders CROSS JOIN LATERAL (SELECT SUM(amount) FROM line_items WHERE line_items.order_id = orders.id). The LATERAL approach is efficient when each outer row produces a small result from the subquery. Non-LATERAL would compute the subquery once over all orders, losing the per-order context.
Window functions are computed after WHERE filtering — you cannot reference them in WHERE because they do not exist yet. Wrap the query in a subquery: SELECT * FROM (SELECT *, RANK() OVER (PARTITION BY category ORDER BY price DESC) as rnk FROM products) ranked WHERE rnk <= 5. The inner query computes the window function, the outer query filters on it. This is the standard pattern for filtering by window function results.
Use UNION ALL when you know there are no duplicates between the result sets — it does not deduplicate and therefore does not need to sort. UNION incurs sorting overhead to remove duplicates. If your OR-to-UNION rewrite uses branches that return disjoint row sets (e.g., WHERE id = 1 UNION ALL WHERE id = 2 where id is a primary key), use UNION ALL. In the OR-to-UNION rewrite, use UNION (with deduplication) when branches might overlap and you need distinct results, or when you are not certain the branches are disjoint.
The planner chooses hash join when both inputs are large and unsorted — it builds a hash table on the smaller input and probes with the larger. Nested loop is chosen when the inner (driven) table has an efficient index on the join column and the outer table is small, or when the join is highly selective. Nested loop is wrong when the inner table is large and has no index — each outer row triggers a full scan of the inner table, making the join O(n*m). If you see nested loop with estimated rows in the millions on the inner side, the planner may have stale statistics or a missing index.
A scalar subquery returns a single value and is independent of the outer query — it runs once. A correlated subquery references columns from the outer query and runs once per outer row. Correlated subqueries are usually more expensive because they repeat execution per row. A SELECT (SELECT MAX(created_at) FROM orders WHERE customer_id = c.id) from customers is correlated — it runs for every customer. If you have 100,000 customers, the subquery runs 100,000 times. Rewrite correlated subqueries using JOINs or CTEs for materialization if the subquery is expensive.
Not necessarily — HashAggregate is correct for grouping when the number of groups is small relative to the row count. An index scan with grouped aggregation (Index Only Scan + Partial GroupAggregate) only helps when the data is already sorted by the GROUP BY column and the index covers the query. For 3 distinct values across 10 million rows, HashAggregate is likely more efficient than trying to use an index. The planner is correctly choosing sort-based grouping. If you want to verify, compare SET enable_hashagg = off — but do not disable it in production.
A function sequence error occurs when you try to use a set-returning function (like generate_series) in an invalid position within a CTE. It also occurs when a data-modifying CTE (INSERT/UPDATE/DELETE) is followed by another CTE in the same WITH clause and the order is incorrect. For example, WITH cte AS (INSERT ... RETURNING ...) SELECT ... FROM cte JOIN cte AS c2 where cte is referenced twice — PostgreSQL may not allow reuse of a data-modifying CTE. The fix is to materialize the CTE result first or restructure the query to avoid reuse of data-modifying CTEs.
SELECT DISTINCT ON (col) ... returns one row per distinct value of col. It is a PostgreSQL extension. The portable alternative uses window functions: SELECT * FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY col ORDER BY created_at) as rn FROM table) t WHERE rn = 1. This works across databases with window function support. Note that DISTINCT ON requires the DISTINCT ON column to match the leftmost columns in ORDER BY — this constraint ensures deterministic results.
The most common cause is an infinite recursion cycle — the termination condition is never met and the CTE keeps recursing. For org charts or tree structures, this happens when a node references itself (employee is their own manager) or cycles exist in the data. To diagnose: add a level counter and a WHERE level < 100 limit to the recursive part. If the query returns 100 rows before hanging, you have a cycle. Use UNION ALL (not UNION) to avoid deduplication overhead. Add cycle detection: track visited IDs in an array and check WHERE NOT (id = ANY(visited)).
For single reference, a CTE and a subquery in FROM are essentially equivalent — the optimizer flattens the CTE and treats it like a subquery. The CTE is syntactic sugar in this case. The difference is only in readability and scoping. If the CTE is referenced multiple times, the CTE is materialized once and reused; the subquery in FROM would be computed separately for each reference. Use a CTE when the same subquery appears multiple times and materialization helps.
B-tree indexes store NULLs at the end (for ASC) or beginning (for DESC) by default. An index on (col ASC NULLS LAST) respects that ordering. But a plain index on (col) does not guarantee NULLS LAST — it only guarantees ascending order with NULLs at the end for ASC. If you need explicit NULLS LAST semantics, create the index with that clause: CREATE INDEX idx ON orders(created_at ASC NULLS LAST). Without the explicit clause in the index, PostgreSQL may still use the index but the NULL ordering in the plan depends on the query's NULLS LAST specification.
Further Reading
- PostgreSQL WITHIN GROUP and window function documentation — window function reference including LEAD, LAG, RANK, SUM OVER
- PostgreSQL CTE documentation — WITH clause semantics and recursion examples
- SQLite query planner documentation — excellent conceptual overview of how query planners work across databases
- Stripe engineering blog on query optimization — real-world case study on materializing correlated subquery chains
- High Scalability blog on query rewriting patterns — scaling patterns for high-volume query workloads
Conclusion
Writing good SQL is about matching the structure to the problem. Joins and subqueries each have their place, and the optimizer handles most of the old debates about which is faster. CTEs excel at making complex logic readable and enabling recursive patterns. EXISTS handles NULLs more safely than IN. UNION rewrites can unlock indexes that OR conditions hide behind. And SELECT * has no place in production application code.
For more on making the database execute your queries efficiently, see Query Execution Plans and Joins and Relationships.
Category
Related Posts
Database Indexes: B-Tree, Hash, Covering, and Beyond
A practical guide to database indexes. Learn when to use B-tree, hash, composite, and partial indexes, understand index maintenance overhead, and avoid common performance traps.
Understanding SQL JOINs and Database Relationships
Master SQL JOINs with this practical guide covering INNER, LEFT, RIGHT, FULL OUTER, and CROSS joins. Learn how relationship types between tables shape your queries.
Query Execution Plans: Reading EXPLAIN Output Like a Pro
Learn to decode PostgreSQL EXPLAIN output, understand sequential vs index scans, optimize join orders, and compare bitmap heap scans with index-only scans.