Table Partitioning: Range, Hash, and List Strategies
Learn how table partitioning improves query performance and maintenance, including range partitioning, hash partitioning, and partition pruning.
Introduction
Partitioning divides a table into smaller pieces called partitions. Each partition contains a subset of rows. The application queries the parent table as if it were whole. The database automatically routes queries to relevant partitions.
-- Create partitioned table
CREATE TABLE orders (
id SERIAL,
user_id INTEGER NOT NULL,
total DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create partitions for specific ranges
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
The parent table orders has no data. It is a shell that routes queries to partitions. The partitions hold the actual rows.
Queries that filter by the partition key benefit from partition pruning. The database only scans relevant partitions.
-- This query only scans orders_2024_q1
SELECT * FROM orders
WHERE created_at BETWEEN '2024-02-01' AND '2024-03-15'
AND user_id = 123;
When to Use / When Not to Use
When to Use Table Partitioning:
- Tables exceed tens of gigabytes and queries are slow due to full scans
- Queries commonly filter by a specific column (partition key)
- Old data is regularly deleted (archival, cleanup)
- Maintenance operations on large tables cause downtime
- You need to manage data lifecycle by time periods
- Backups of large tables take too long
When Not to Use Table Partitioning:
- Tables are small enough for simple indexes to handle efficiently
- Queries do not filter by any specific column (no clear partition key)
- Data access patterns are random across the table
- Your team lacks experience with partitioning maintenance
- The added complexity does not provide measurable benefit
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Partition key causes hot spots | Some partitions receive disproportionate writes | Choose partition key carefully, consider hash partitioning for write distribution |
| Too many partitions | Planner overhead, metadata bloat | Right-size partition count (tens to low hundreds, not thousands) |
| Queries missing partition key filter | Full table scan across all partitions | Monitor slow queries, add partition key filtering hints |
| Old partitions not dropped | Storage grows unbounded | Automate partition lifecycle with scheduled jobs |
| Partition maintenance causing locks | Application queries timeout | Use CONCURRENTLY for index builds, schedule maintenance during off-peak |
| Incorrect partition bounds | Data goes to wrong partition or fails | Validate bounds before inserting, test with boundary values |
| Index propagation too slow | Partition indexes lag behind | Build indexes on partitions individually, not on parent |
| Cascading drops affecting replicas | Replica lag during partition drops | Coordinate partition maintenance with replication monitoring |
Common Pitfalls / Anti-Patterns
-
Partitioning without partition key in queries: Partitioning only helps if queries filter on the partition key. Without the partition key in WHERE clauses, you scan all partitions.
-
Too many partitions: Each partition adds overhead. Hundreds of partitions slow down the query planner and increase memory usage.
-
Aligning partitions with application code: Embedding partition names in application code creates tight coupling. Keep partition details in the database.
-
Neglecting partition maintenance: Without automated maintenance, partitions grow unbounded and become unmanageable.
-
Using inappropriate partition granularity: Monthly partitions work until one month gets viral. Adjust granularity based on data distribution.
-
Assuming partitions are always beneficial: Partitions add complexity. Measure performance improvement before committing to partitioning.
-
Not propagating index changes to partitions: Indexes on the parent table do not always propagate correctly. Verify partition indexes after creation.
-
Partition-boundary bugs: Off-by-one errors in partition bounds cause data to fail or go to wrong partition. Always validate boundary conditions.
Trade-off Analysis
| Factor | Range Partitioning | List Partitioning | Hash Partitioning |
|---|---|---|---|
| Best for | Time-series data, chronological access patterns | Categorical data, discrete values (regions, departments) | Even distribution, write-heavy workloads |
| Query pruning | Excellent for range filters on partition key | Good for exact match filters | Limited — range scans without hash key become full scans |
| Hot spot risk | High if time-based access is skewed | High if category distribution is skewed | Low — distribution is uniform |
| Maintenance complexity | Medium — boundary management for date ranges | Low — discrete value lists | Low — no boundary management |
| Flexibility | Inflexible — requires contiguously defined bounds | Moderate — can add new list values easily | Inflexible — cannot perform range queries efficiently |
| Storage alignment | Good for tiered storage (SSD for recent, HDD for archival) | Good for location-based data isolation | No natural alignment with storage tiers |
| Index strategy | Local indexes work well for time-range queries | Local indexes work well for category filters | Local indexes required; global indexes lose partition benefits |
| Skew tolerance | Low — skewed data per partition causes imbalance | Low — skewed category distribution causes imbalance | High — hash function distributes evenly |
When Each Strategy Wins
Choose Range Partitioning when:
- Data has natural temporal ordering (logs, events, transactions)
- Queries almost always include a time range filter
- You need to archive or drop old data by time period
- Storage tiering aligns with time (recent data on fast storage)
Choose List Partitioning when:
- Data naturally groups into categories (region, status, account tier)
- You frequently filter by exact category values
- Categories are stable and infrequently added
- You want to isolate specific categories for performance or compliance
Choose Hash Partitioning when:
- Data distribution is highly skewed and range partitioning would create hot partitions
- Write throughput is the primary concern and writes must be evenly distributed
- You do not need range queries on the partition key
- You need to add partitions for scalability without respecting logical boundaries
Partition Count Trade-offs
| Partition Count | Planner Overhead | Metadata Memory | Parallelism Benefit | Recommended Scenario |
|---|---|---|---|---|
| 4–8 | Minimal | Low | Moderate | Small tables, simple queries |
| 12–24 | Low | Moderate | High | Medium tables, moderate query complexity |
| 36–64 | Moderate | Moderate-High | High | Large time-series tables |
| 100+ | High | High | Very High | Large tables with diverse query patterns, use with caution |
| 1000+ | Very High | Very High | Limited | Avoid — metadata bloat outweighs benefits |
Real-World Case Studies
Real-world case studies coming soon.
Quick Recap Checklist
- Partition by range for time-series data; list for categorical; hash for even distribution
- Queries must filter on partition key to benefit from pruning
- Keep partition count in tens to low hundreds — not thousands
- Automate partition creation before gaps cause insert failures
- Detach before dropping old partitions to minimize replication impact
- Use CONCURRENTLY for index builds on partitioned tables
- Monitor default partition size — growth indicates stale bounds
- Multi-column partition keys require leading column in query filters
- Subpartitioning adds planner overhead — limit to 2 levels
- Test partition pruning with EXPLAIN before deploying queries
Interview Questions
created_at but some queries filter by user_id only, without a time filter. What happens to those queries and how do you fix it?Queries that filter only on user_id must scan all 500GB of data across all monthly partitions — partition pruning cannot help because the time filter is absent. The query does a full table scan across all partitions. The fix is either to add a time range constraint to the query when possible, or to create a secondary index on user_id that propagates to all partitions. For very high-volume user_id queries, consider hash partitioning by user_id instead of range partitioning by time, or use a composite partition key that includes both user_id and a time component.
DROP TABLE on old partitions but you notice replication lag spikes during the drop. Why?Dropping a table acquires an ACCESS EXCLUSIVE lock that blocks all concurrent operations on the table. Even though the partition is separate from the parent table, the metadata operation still generates WAL and can block the replication slot from advancing. If you are using replication slots, the dropped partition's WAL cannot be removed until the replica acknowledges it. The fix is to use ALTER TABLE DETACH PARTITION instead of DROP TABLE — detaching is less aggressive on the replication slot, and you can drop the detached table during a low-traffic window.
Static partition pruning happens at query planning time — the planner determines which partitions match before execution based on constants in the query. If your query has WHERE created_at = '2024-06-15', the planner knows exactly which partition to scan and excludes all others before executing. Dynamic partition pruning evaluates at execution time — used when the partition key depends on a subquery, parameter, or function result that is not known at planning time. Dynamic pruning has overhead because the executor must determine which partitions to scan during execution. Queries using dynamic pruning are typically slower than those using static pruning.
Hash partitioning when you cannot predict data distribution and need even row counts across partitions. If your data has natural ordering that aligns with your query patterns (like time-series data), range partitioning is better because it groups related data together. Hash partitioning is also better when you have high-concurrency inserts on the same partition key — hash distribution spreads inserts across partitions and reduces lock contention. The tradeoff is that range queries without the hash key become full-table scans across all hash partitions.
PostgreSQL supports multi-column partition keys for both range and list partitioning, specified as a tuple in parentheses: PARTITION BY RANGE (created_at, user_id). The partition bounds are compared tuple-wise — meaning (2024-01-01, 100) to (2024-01-01, 200) covers dates of 2024-01-01 with user_ids between 100 and 200. The limitation is that the planner can only prune partitions when the query includes all leading columns of the partition key. If you partition by (created_at, user_id) and query only on user_id, no pruning occurs because the planner needs the leading column (created_at) to determine which partitions to skip. Always put the most frequently queried column first in a composite partition key.
The insert fails with a no partition of relation error. PostgreSQL will not insert the row into a default partition or fall back to the parent table — the row must match an explicit partition bound or an attached default partition. To handle out-of-range data gracefully, attach a DEFAULT PARTITION using FOR VALUES DEFAULT. Without a default partition, applications must handle the insertion error and route data appropriately. For time-based partitions, a default partition acts as a catchall for dates beyond your planned ranges, but you should monitor it closely — a growing default partition indicates your partition bounds are stale.
The safest approach uses a two-phase migration. First, create the partitioned table structure alongside the original table with no data, then use a batched INSERT INTO ... SELECT FROM to copy data in manageable chunks (e.g., 100,000 rows per transaction) to avoid locking and replication lag. Second, use ALTER TABLE ... ATTACH PARTITION to attach partitions as they fill. For very large tables, use the pg_partman extension which automates incremental data migration and partition management. The key is to migrate data while the original table continues accepting writes, then do a final cutover where you rename the old table, rename the partitioned table to the original name, and replay any writes that arrived during migration using a logical replication slot or change-data-capture pipeline.
PARTITION BY RANGE and PARTITION BY LIST, and when would you use each?PARTITION BY RANGE defines partitions as non-overlapping value ranges — FROM ('2024-01-01') TO ('2024-04-01') covers everything between those two boundaries. Ranges must be contiguous and non-overlapping, which makes them ideal for time-series data where partitions represent periods. PARTITION BY LIST defines partitions as specific discrete values — FOR VALUES IN ('warehouse_a', 'warehouse_b') covers only those exact values. List partitioning is better when data naturally groups into categories (regions, departments, account tiers) where you want queries for one category to hit only that partition. List partitions can be non-contiguous unlike range partitions. The choice depends on your access patterns: time-based queries favor range partitioning; categorical or geographic queries favor list partitioning.
Each partition runs its own autovacuum workers independently, meaning a partitioned table with 36 monthly partitions can have 36 concurrent vacuum workers. This is generally beneficial — active partitions get frequent vacuum while historical partitions rarely need it. However, index maintenance must be considered: indexes on the parent table propagate to all partitions, but building a large index on the parent blocks writes across all partitions. Use CREATE INDEX CONCURRENTLY on the parent to build indexes without blocking, but be aware this takes longer and cannot be run inside a transaction. For very large partitions, consider building indexes on individual partitions rather than the parent — this lets you stagger index builds and avoid a single massive operation. Partition-specific indexes also allow optimization based on each partition's data characteristics.
Yes, PostgreSQL allows partitioning by an expression: PARTITION BY RANGE (EXTRACT(YEAR FROM created_at)) or PARTITION BY RANGE (LOWER(category)). However, partition pruning only works when the query's filter matches the expression exactly. If you partition by EXTRACT(YEAR FROM created_at), a query with WHERE created_at >= '2024-01-01' will NOT prune — PostgreSQL cannot derive the year expression from a raw timestamp comparison. The query must use WHERE EXTRACT(YEAR FROM created_at) = 2024 to enable pruning. This creates an impedance mismatch: application queries naturally use range comparisons on timestamps, not year extractions. To avoid forcing application changes, partition by the column directly and let the planner handle range pruning naturally. Expressions in partition keys are most useful for list partitioning where you want to partition by a transformed value.
Subpartitioning creates a hierarchy where the planner must evaluate both levels to determine which partitions to scan. For a query to prune at both levels, it must include filters for both the top-level and second-level partition keys. If you have RANGE by year then RANGE by month within each year, a query filtering only on month without year cannot prune the top-level year partitions — the planner must scan all year partitions and evaluate month subpartitions within each. The planner overhead increases with more levels because it must evaluate more bounds. PostgreSQL supports exactly two levels of partitioning (parent + one level of subpartitions), which is a deliberate design choice to keep planning time reasonable. Best practice is to limit subpartitioning to two levels with clear purposes for each: year for archival boundaries, month for query granularity.
Partition maintenance operations acquire locks of varying severity: ALTER TABLE ATTACH/DETACH takes a SHARE UPDATE EXCLUSIVE lock which allows reads and updates to continue but blocks other maintenance operations; DROP TABLE takes an ACCESS EXCLUSIVE lock which blocks all queries until the drop completes. For production safety, detach partitions before dropping — ALTER TABLE orders DETACH PARTITION orders_2023_q1 is lightweight, then DROP TABLE orders_2023_q1 can run during low-traffic windows. For index creation on partitioned tables, always use CREATE INDEX CONCURRENTLY which takes a SHARE UPDATE EXCLUSIVE lock instead of blocking — but it cannot run inside a transaction, so plan accordingly. Partition splits are the most dangerous operations because they require acquiring locks on both the parent and new partition while redistributing rows.
Indexes on partitioned tables inherit the cardinality of their underlying data. If one partition contains 80% of rows (hot partition), indexes on that partition are larger and more expensive to scan than indexes on smaller partitions. When you create an index on the parent table (CREATE INDEX ON orders(user_id)), PostgreSQL creates indexes on each partition automatically — but each partition's index reflects that partition's data distribution. Partition-specific indexes let you optimize for each partition's characteristics: the hot partition might need a covering index with include columns, while cold partitions might need no index at all. The tradeoff is management overhead — you must maintain partition-specific indexes manually, while parent-propagated indexes are managed automatically. For time-range partitioned tables, keep indexes on the parent for queries that span partitions, and add partition-specific indexes only for the most active partitions where you can add optimization tricks like partial indexes or different index types.
Partition-wise joins occur when a query joins two partitioned tables on the partition key, allowing PostgreSQL to join only corresponding partitions instead of all partitions. If you have orders partitioned by customer_id and payments also partitioned by customer_id, a join on customer_id can be executed as independent joins between matching partition pairs. This reduces the volume of data moved through the join algorithm. Partition-wise aggregation works similarly — when you aggregate data grouped by the partition key, PostgreSQL can aggregate within each partition in parallel and then combine results. The benefit is proportional to the number of partitions and the data volume within each partition. For small partition counts (under 8), the overhead of coordinating partition-wise operations often exceeds the benefit. For larger partition counts with substantial data in each partition, partition-wise operations can reduce memory usage and execution time significantly.
PostgreSQL's partition estimation uses the number of partitions and partition bounds to estimate row counts, but it does not have sophisticated partition-level statistics like Oracle's partition-wise statistics. In Oracle, each partition can have its own statistics, allowing precise cardinality estimates for queries that target specific partitions. PostgreSQL relies on the parent table's statistics combined with partition bound information to estimate partition cardinalities — this estimation degrades when partitions have significantly different data distributions (skewed partitions). PostgreSQL 14+ improved partition estimation for certain query patterns, but Oracle still has an advantage in multi-partition queries where partition-level statistics allow more accurate join ordering and join method selection. The implication is that PostgreSQL partitioned queries may choose suboptimal join orders more often than Oracle when partitions have unequal sizes, and you should verify EXPLAIN output for large partitioned tables to ensure join decisions are reasonable.
PostgreSQL 11 added partition-wise joins and aggregation which dramatically improve performance for partitioned table queries. PostgreSQL 13 added ALTER TABLE ... ATTACH PARTITION CONCURRENTLY which prevents locking during partition attachment. PostgreSQL 14 improved partition statistics collection which helps the planner make better decisions for skewed partitions. PostgreSQL 15 added asynchronous partition creation. The main upgrade caveat is that some partition-related query plans may change — a query that ran acceptably slow on PG10 may now run faster but with different memory usage patterns. Test partition-heavy workloads before upgrading. Also verify that any partitioning extensions like pg_partman are compatible with the target version.
pg_partman differ from manual partition management, and what operational risks does it introduce?pg_partman automates partition creation, retention, and data movement through background workers. It pre-creates partitions ahead of time based on a configured interval (daily, monthly, etc.) and can automatically drop partitions older than a retention period. This eliminates the risk of missing a partition creation window that would cause inserts to fail. The operational risks: pg_partman runs as a background worker that consumes resources — configure its schedule carefully to avoid contention with peak traffic. It also creates a part_config table in the target schema that must be backed up. If pg_partman is misconfigured (wrong interval, wrong retention), it can create partitions too aggressively or drop partitions with live data. Always test the configuration on a non-production system first.
WHERE created_at >= $1 using a prepared statement. You add a new partition for next month but queries using the prepared statement do not see improved performance. Why?PostgreSQL caches the execution plan for prepared statements. When the prepared statement was first created, PostgreSQL compiled a plan based on the partitions that existed at that time. The cached plan does not automatically adapt when partitions are added or removed. The solution is to use EXECUTE with a parameterized query instead of PREPARE — each execution generates a fresh plan that accounts for the current partition configuration. Alternatively, call DEALLOCATE and re-prepare periodically, or use SET plan_cache_mode = force_custom_plan to always use custom plans for prepared statements. For dynamic partition scenarios, custom plans are almost always better because partition changes invalidate cached generic plans.
user_id using hash partitioning with 16 partitions. A query filtering on user_id still seems slow even though EXPLAIN shows only one partition is scanned. What else should you investigate?If pruning shows only one partition is scanned but the query is still slow, the bottleneck is likely the index on that partition. Even a single-partition scan can be slow if the partition's index is large or fragmented. Check pg_relation_size('partition_index') to see index size. Check pg_stat_user_indexes for high index scan counts versus actual row fetches — a low ratio indicates index inefficiency. Also check for missing indexes: a query with WHERE user_id = X AND status = 'active' may be scanning the partition index on user_id but still filtering on status row-by-row. Run ANALYZE partition_name to ensure statistics are up to date. Check pg_statio_user_indexes for index buffer cache hit rates — if the index does not fit in memory, disk I/O becomes the bottleneck.
A default partition acts as a catchall for rows that do not match any explicit partition bound. The risk is that the default partition can accumulate significant data, and queries that should prune to specific partitions still touch the default partition because PostgreSQL must check it for non-matching rows. Queries that join or aggregate across the partitioned table also include the default partition. Monitoring: set an alert on the default partition size — any growth above a small threshold (e.g., 1GB) indicates partition bounds are stale and need updating. Query pg_stat_user_tables for the default partition to track row count growth over time. Ideally, a healthy partitioned table has a zero-row default partition because all data falls within explicit bounds. If the default partition is growing, that is a signal to extend your partition range before the next maintenance window.
Further Reading
- Database Scaling - Broader scaling strategies
- Relational Databases - Indexing and query optimization
Official Documentation:
- PostgreSQL Partitioning — Official documentation on table partitioning
- Partition Bound Specification — Syntax for partition ranges and lists
Tools:
- pg_partman — Automated partition management extension
- Percona: PostgreSQL Partitioning Guide — Practical partitioning patterns
Conclusion
- Partitioning divides large tables into smaller, manageable pieces
- Range partitioning works well for time-based data
- List partitioning suits categorical data
- Hash partitioning ensures even distribution
- Partition pruning gives query benefits only when queries use the partition key
- Dropping partitions is faster than deleting rows
- Partitioning is complexity; only use when benefits outweigh costs
Copy/Paste Checklist:
-- Create range-partitioned table
CREATE TABLE orders (
id SERIAL,
user_id INTEGER NOT NULL,
total DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Create partition for specific list
CREATE TABLE orders_us PARTITION OF orders
FOR VALUES IN ('US', 'CA', 'MX');
-- Detach and drop old partition
ALTER TABLE orders DETACH PARTITION orders_2023_01;
DROP TABLE orders_2023_01;
-- Check which partitions are being used
EXPLAIN SELECT * FROM orders WHERE created_at = '2024-06-15';
Best Practices Summary
Design phase:
- Start with 20-50 partitions for time-based tables; scale up only if needed
- Put the most frequently queried column first in composite partition keys
- Reserve 10-20% partition capacity headroom for unexpected growth spikes
- Use consistent naming conventions that make partition identification easy
Operations phase:
- Schedule partition maintenance during low-traffic windows
- Always use
CONCURRENTLYfor index creation on partitioned tables - Monitor partition sizes — alert when any partition exceeds 50GB
- Detach before dropping to minimize replication impact
- Run
ANALYZEafter partition attachment or major data loads
Query phase:
- Verify partition pruning with
EXPLAINon every new query pattern - Add partition key filters to high-frequency queries
- Consider composite indexes that start with the partition key
Anti-patterns to avoid:
- Do not partition tables under 100GB unless you have specific pain points
- Do not create partitions that will require daily or weekly maintenance
- Do not use expressions in partition keys — use bare columns
- Do not ignore partition maintenance — set up automation before you need it
-- Verify pruning is working (should show partition names, not parent scan)
EXPLAIN SELECT * FROM orders
WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';
-- Check partition sizes and growth trends
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
pg_stat_get_live_tuples(schemaname||'.'||tablename::regclass) AS rows
FROM pg_tables
WHERE tablename LIKE 'orders_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; Category
Related Posts
Connection Pooling: HikariCP, pgBouncer, and ProxySQL
Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.
Database Monitoring: Metrics, Tools, and Alerting
Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.
Vacuuming and Reindexing in PostgreSQL
PostgreSQL's MVCC requires regular maintenance. This guide explains dead tuples, VACUUM vs VACUUM FULL, autovacuum tuning, REINDEX strategies, and how to monitor bloat with pg_stat_user_tables.