Table Partitioning: Range, Hash, and List Strategies
Learn how table partitioning improves query performance and maintenance, including range partitioning, hash partitioning, and partition pruning.
Table Partitioning: Range, Hash, and List Strategies for Large Tables
Large tables slow down everything. Queries scan more rows. Indexes grow unwieldy. Backups take longer. Partitioning splits large tables into smaller pieces while maintaining the illusion of a single table.
I have seen PostgreSQL tables grow to hundreds of gigabytes. Queries that once took milliseconds start taking seconds. The solution was not more memory or faster disks. Partitioning divided the table into monthly partitions. Queries touching recent data became instant.
What Table Partitioning Is
Partitioning divides a table into smaller pieces called partitions. Each partition contains a subset of rows. The application queries the parent table as if it were whole. The database automatically routes queries to relevant partitions.
-- Create partitioned table
CREATE TABLE orders (
id SERIAL,
user_id INTEGER NOT NULL,
total DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create partitions for specific ranges
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
The parent table orders has no data. It is a shell that routes queries to partitions. The partitions hold the actual rows.
Queries that filter by the partition key benefit from partition pruning. The database only scans relevant partitions.
-- This query only scans orders_2024_q1
SELECT * FROM orders
WHERE created_at BETWEEN '2024-02-01' AND '2024-03-15'
AND user_id = 123;
Range Partitioning
Range partitioning divides data by ranges of values. Time-based ranges are most common. Sales data partitions by month. Log data partitions by week.
CREATE TABLE events (
id BIGSERIAL,
event_type VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
occurred_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (occurred_at);
-- Monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Range partitioning excels when queries commonly filter by time. Analytics dashboards querying last 30 days hit only recent partitions. Historical queries touch only older partitions.
The challenge is uneven distribution. Some ranges accumulate more data than others. Monthly partitions might be fine until one month goes viral. List partitioning offers more control.
List Partitioning
List partitioning assigns rows to partitions based on discrete values. This works well when data naturally groups into categories.
CREATE TABLE inventory (
id SERIAL,
product_id INTEGER NOT NULL,
warehouse VARCHAR(50) NOT NULL,
quantity INTEGER NOT NULL
) PARTITION BY LIST (warehouse);
CREATE TABLE inventory_warehouse_a PARTITION OF inventory
FOR VALUES IN ('warehouse_a', 'warehouse_b');
CREATE TABLE inventory_warehouse_c PARTITION OF inventory
FOR VALUES IN ('warehouse_c', 'warehouse_d');
Queries filtering by warehouse hit only the relevant partition. List partitioning is less common than range but useful for geographic distribution or categorical data.
Hash Partitioning
Hash partitioning distributes rows evenly across partitions using a hash function. This is useful when you cannot predict data distribution.
CREATE TABLE user_sessions (
id SERIAL,
user_id INTEGER NOT NULL,
token VARCHAR(100) NOT NULL,
created_at TIMESTAMP NOT NULL
) PARTITION BY HASH (user_id);
-- Create 8 hash partitions
CREATE TABLE user_sessions_0 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE user_sessions_1 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... etc
Hash partitioning guarantees even distribution. Adding partitions redistributes rows. The trade-off is that queries without the partition key must scan all partitions.
-- Efficient: only hits one partition
SELECT * FROM user_sessions WHERE user_id = 123;
-- Inefficient: must scan all partitions
SELECT * FROM user_sessions WHERE token = 'abc';
Partition Pruning
flowchart TD
Query["SELECT * FROM orders<br/>WHERE created_at >= '2024-06-01'<br/>AND created_at < '2024-07-01'"]
Query --> Planner["PostgreSQL Planner"]
Planner --> PruneCheck{"Partition Key<br/>in Query?"}
PruneCheck -->|Yes| Static["Static Pruning<br/>Determines partitions at plan time"]
PruneCheck -->|No, subquery/param| Dynamic["Dynamic Pruning<br/>Determines partitions at execution time"]
PruneCheck -->|No filter| FullScan["Full Table Scan<br/>All partitions accessed"]
Static --> Partitions["Only<br/>orders_2024_06<br/>orders_2024_07<br/>scanned"]
Dynamic --> Partitions
FullScan --> AllParts["All partitions scanned<br/>orders_2024_01...2024_12<br/>+ historical partitions"]
AllParts --> Slow["Full scan across<br/>all data — slow"]
Partitions --> Fast["Targeted scan — fast"]
Partition pruning is the performance benefit of partitioning. The database skips partitions that cannot contain relevant data.
EXPLAIN SELECT * FROM orders
WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';
-- EXPLAIN output shows only relevant partitions scanned
-- -> Seq Scan on orders_2024_q2 (actual rows=0)
Pruning happens automatically when queries use the partition key. The planner identifies which partitions overlap with the query filter.
Static pruning happens at query planning time. Dynamic pruning evaluates at execution time. Dynamic pruning handles cases where the planner cannot determine which partitions apply until runtime.
Maintenance Benefits
Partitioning simplifies maintenance. Deleting old data means dropping partitions rather than deleting rows. Updating large datasets can use partition operations instead of row-by-row updates.
-- Fast: drop old partition
DROP TABLE events_2023_01;
-- Slow: delete individual rows
DELETE FROM events WHERE occurred_at < '2023-02-01';
Deleting millions of rows takes time and causes table bloat. Dropping a partition is instant regardless of size.
Backups become incremental. You can backup specific partitions rather than entire tables. This reduces backup time and storage.
Indexes on Partitioned Tables
Indexes on partitioned tables require thought. You can create indexes on the parent table, which propagates to all partitions. Or create indexes on individual partitions.
-- Index on parent propagates to all partitions
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Partition-specific index
CREATE INDEX idx_orders_2024_q1_user_id ON orders_2024_q1(user_id);
Propagated indexes are easier to manage. Partition-specific indexes can be optimized for that partition’s data distribution.
For time-range partitions, indexes on the parent plus partition-specific indexes for hot partitions often works best.
Common Pitfalls
Partitioning adds complexity. The benefits only come if queries use the partition key. Queries scanning all partitions negate the benefit.
Too many partitions causes overhead. Each partition has its own overhead: metadata, file descriptors, planner time. Hundreds of partitions can actually slow things down.
Align partitions with query patterns. If queries always filter by user_id, range partitioning by time alone helps little. Composite partition keys can support multiple query patterns.
When to Partition
Not every table needs partitioning. Partitioning helps when:
- Tables exceed tens of gigabytes
- Queries commonly filter by a specific column
- Old data is deleted regularly
- Maintenance operations on large tables cause downtime
Most applications do not need partitioning for years, if ever. Start without it. Add partitioning when you have clear evidence that it would help.
When to Use and When Not to Use Table Partitioning
When to Use Table Partitioning:
- Tables exceed tens of gigabytes and queries are slow due to full scans
- Queries commonly filter by a specific column (partition key)
- Old data is regularly deleted (archival, cleanup)
- Maintenance operations on large tables cause downtime
- You need to manage data lifecycle by time periods
- Backups of large tables take too long
When Not to Use Table Partitioning:
- Tables are small enough for simple indexes to handle efficiently
- Queries do not filter by any specific column (no clear partition key)
- Data access patterns are random across the table
- Your team lacks experience with partitioning maintenance
- The added complexity does not provide measurable benefit
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Partition key causes hot spots | Some partitions receive disproportionate writes | Choose partition key carefully, consider hash partitioning for write distribution |
| Too many partitions | Planner overhead, metadata bloat | Right-size partition count (tens to low hundreds, not thousands) |
| Queries missing partition key filter | Full table scan across all partitions | Monitor slow queries, add partition key filtering hints |
| Old partitions not dropped | Storage grows unbounded | Automate partition lifecycle with scheduled jobs |
| Partition maintenance causing locks | Application queries timeout | Use CONCURRENTLY for index builds, schedule maintenance during off-peak |
| Incorrect partition bounds | Data goes to wrong partition or fails | Validate bounds before inserting, test with boundary values |
| Index propagation too slow | Partition indexes lag behind | Build indexes on partitions individually, not on parent |
| Cascading drops affecting replicas | Replica lag during partition drops | Coordinate partition maintenance with replication monitoring |
Observability Checklist
Metrics to Monitor:
- Query execution time by partition
- Partition sizes and row counts
- Partition pruning effectiveness (check EXPLAIN output)
- Index size per partition
- Autovacuum activity per partition
- Partition maintenance job duration
- Storage usage by partition
Logs to Capture:
- Queries that do not use partition pruning (check EXPLAIN)
- Partition maintenance job results
- Autovacuum saturation on active partitions
- Lock waits during partition operations
- Partition creation and deletion events
Alerts to Set:
- Partition size approaching maximum
- Query performance degradation (partition pruning not working)
- Partition maintenance taking too long
- Autovacuum backlog on partitions
- Storage growth rate anomaly
- Missing indexes on frequently-used partitions
-- Check partition usage and pruning
EXPLAIN SELECT * FROM orders
WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';
-- Monitor partition sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE tablename LIKE 'orders_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Check for queries not using partitions
SELECT query, calls, mean_time
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY mean_time DESC;
Security Checklist
- Partition tables use the same access controls as parent tables
- Implement row-level security consistent across partitions
- Audit partition maintenance operations (who can drop partitions)
- Secure partition management scripts and credentials
- Ensure partition metadata does not leak sensitive information
- Test access controls on each partition
- Backup partition data with same encryption as other data
- Monitor for unauthorized partition creation or deletion
Common Pitfalls and Anti-Patterns
-
Partitioning without partition key in queries: Partitioning only helps if queries filter on the partition key. Without the partition key in WHERE clauses, you scan all partitions.
-
Too many partitions: Each partition adds overhead. Hundreds of partitions slow down the query planner and increase memory usage.
-
Aligning partitions with application code: Embedding partition names in application code creates tight coupling. Keep partition details in the database.
-
Neglecting partition maintenance: Without automated maintenance, partitions grow unbounded and become unmanageable.
-
Using inappropriate partition granularity: Monthly partitions work until one month gets viral. Adjust granularity based on data distribution.
-
Assuming partitions are always beneficial: Partitions add complexity. Measure performance improvement before committing to partitioning.
-
Not propagating index changes to partitions: Indexes on the parent table do not always propagate correctly. Verify partition indexes after creation.
-
Partition-boundary bugs: Off-by-one errors in partition bounds cause data to fail or go to wrong partition. Always validate boundary conditions.
Capacity Estimation: Partition Count, Storage, and Pruning
Partition count planning starts with your query patterns. Each partition must be small enough to fit in memory for efficient scans, but you need enough partitions to isolate hot data. A good rule of thumb: target 1-10 partitions for queries you run frequently. If you query orders by month and keep 3 years of data, that is 36 monthly partitions — reasonable. If you query by day across 3 years, that is 1,095 daily partitions — the planner starts to struggle.
Storage per partition: estimate as total table size divided by number of partitions, then add 20% overhead for indexes. If your orders table is 300GB with 3 years of monthly partitions, each monthly partition is roughly 8.3GB with indexes pushing it toward 10GB. PostgreSQL can handle 10GB partition scans quickly if they fit in the buffer cache.
The practical partition size limit depends on your hardware. Partitions above 50GB start to feel slow for point queries. Partitions above 100GB cause noticeable latency even for range scans. If your partitions are approaching 100GB, consider splitting them — either by using a finer time granularity or by using hash partitioning to distribute data across more, smaller partitions.
For hash partitioning, the modulus determines partition count. With modulus 16, you get 16 partitions of roughly equal size. Adding partitions redistributes all existing data — this is expensive and requires a lock. Design your hash partition count with growth in mind. If you expect to grow from 8 to 64 partitions, start with 64 and smaller modulus batches, or use a two-level partitioning scheme (range by month, hash within each month).
Real-World Case Study: Booking.com’s PostgreSQL Partitioning
Booking.com manages one of the largest PostgreSQL deployments in the world. Their hotel reservation database handles enormous write volumes during peak booking periods — Think Black Friday for travel, concentrated into a few days. Their partitioning strategy evolved through painful experience.
Early architecture used a single large table for reservations. As the table grew past 200GB, maintenance operations became painful. Vacuum took hours. Index builds required maintenance windows. Backups of the large table took longer than the backup window allowed.
Their first approach was range partitioning by reservation date. Monthly partitions kept each partition manageable. But hotel reservations are not uniformly distributed — peak seasons (summer holidays, Christmas) generate 5x more reservations than slow months. Their monthly partitions for peak months grew to 50GB while slow months sat at 10GB.
The fix was composite partitioning: range by year, then range by month within each year. This let them split a large peak month into multiple smaller partitions. They also introduced hash sub-partitioning for the most active tables, distributing hot data across multiple partitions that could be queried in parallel.
The operational lesson: start partitioning when your table reaches roughly 100GB, not when it reaches 1TB. Waiting until you have a 1TB table means your migration is painful. Partitioning a 100GB table is manageable in a maintenance window. Partitioning a 1TB table with live traffic is a multi-day migration.
Interview Questions
Q: You have a table with 500GB of data partitioned by month. Most queries filter by created_at but some queries filter by user_id only, without a time filter. What happens to those queries and how do you fix it?
Queries that filter only on user_id must scan all 500GB of data across all monthly partitions — partition pruning cannot help because the time filter is absent. The query does a full table scan across all partitions. The fix is either to add a time range constraint to the query when possible, or to create a secondary index on user_id that propagates to all partitions. For very high-volume user_id queries, consider hash partitioning by user_id instead of range partitioning by time, or use a composite partition key that includes both user_id and a time component.
Q: Your partition maintenance job runs DROP TABLE on old partitions but you notice replication lag spikes during the drop. Why?
Dropping a table acquires an ACCESS EXCLUSIVE lock that blocks all concurrent operations on the table. Even though the partition is separate from the parent table, the metadata operation still generates WAL and can block the replication slot from advancing. If you are using replication slots, the dropped partition’s WAL cannot be removed until the replica acknowledges it. The fix is to use ALTER TABLE DETACH PARTITION instead of DROP TABLE — detaching is less aggressive on the replication slot, and you can drop the detached table during a low-traffic window.
Q: What is the difference between static partition pruning and dynamic partition pruning?
Static partition pruning happens at query planning time — the planner determines which partitions match before execution based on constants in the query. If your query has WHERE created_at = '2024-06-15', the planner knows exactly which partition to scan and excludes all others before executing. Dynamic partition pruning evaluates at execution time — used when the partition key depends on a subquery, parameter, or function result that is not known at planning time. Dynamic pruning has overhead because the executor must determine which partitions to scan during execution. Queries using dynamic pruning are typically slower than those using static pruning.
Q: When would you choose hash partitioning over range partitioning?
Hash partitioning when you cannot predict data distribution and need even row counts across partitions. If your data has natural ordering that aligns with your query patterns (like time-series data), range partitioning is better because it groups related data together. Hash partitioning is also better when you have high-concurrency inserts on the same partition key — hash distribution spreads inserts across partitions and reduces lock contention. The tradeoff is that range queries without the hash key become full-table scans across all hash partitions.
Quick Recap
Key Bullets:
- Partitioning divides large tables into smaller, manageable pieces
- Range partitioning works well for time-based data
- List partitioning suits categorical data
- Hash partitioning ensures even distribution
- Partition pruning gives query benefits only when queries use the partition key
- Dropping partitions is faster than deleting rows
- Partitioning is complexity; only use when benefits outweigh costs
Copy/Paste Checklist:
-- Create range-partitioned table
CREATE TABLE orders (
id SERIAL,
user_id INTEGER NOT NULL,
total DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Create partition for specific list
CREATE TABLE orders_us PARTITION OF orders
FOR VALUES IN ('US', 'CA', 'MX');
-- Detach and drop old partition
ALTER TABLE orders DETACH PARTITION orders_2023_01;
DROP TABLE orders_2023_01;
-- Check which partitions are being used
EXPLAIN SELECT * FROM orders WHERE created_at = '2024-06-15';
Conclusion
Partitioning divides large tables into manageable pieces. Range partitioning by time works well for event data and logs. List partitioning suits categorical data. Hash partitioning ensures even distribution.
Partition pruning gives query benefits when queries filter by the partition key. Maintenance becomes easier: dropping partitions is faster than deleting rows.
Partitioning is complexity. Only introduce it when the benefits outweigh the costs. Most tables do not need partitioning.
For related reading, see Database Scaling for broader scaling strategies, and Relational Databases to understand indexing and query optimization.
Category
Related Posts
Connection Pooling: HikariCP, pgBouncer, and ProxySQL
Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.
Database Monitoring: Metrics, Tools, and Alerting
Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.
Vacuuming and Reindexing in PostgreSQL
PostgreSQL's MVCC requires regular maintenance. This guide explains dead tuples, VACUUM vs VACUUM FULL, autovacuum tuning, REINDEX strategies, and how to monitor bloat with pg_stat_user_tables.