Table Partitioning: Range, Hash, and List Strategies

Learn how table partitioning improves query performance and maintenance, including range partitioning, hash partitioning, and partition pruning.

published: reading time: 24 min read author: GeekWorkBench

Introduction

Partitioning divides a table into smaller pieces called partitions. Each partition contains a subset of rows. The application queries the parent table as if it were whole. The database automatically routes queries to relevant partitions.

-- Create partitioned table
CREATE TABLE orders (
    id SERIAL,
    user_id INTEGER NOT NULL,
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- Create partitions for specific ranges
CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

The parent table orders has no data. It is a shell that routes queries to partitions. The partitions hold the actual rows.

Queries that filter by the partition key benefit from partition pruning. The database only scans relevant partitions.

-- This query only scans orders_2024_q1
SELECT * FROM orders
WHERE created_at BETWEEN '2024-02-01' AND '2024-03-15'
  AND user_id = 123;

When to Use / When Not to Use

When to Use Table Partitioning:

  • Tables exceed tens of gigabytes and queries are slow due to full scans
  • Queries commonly filter by a specific column (partition key)
  • Old data is regularly deleted (archival, cleanup)
  • Maintenance operations on large tables cause downtime
  • You need to manage data lifecycle by time periods
  • Backups of large tables take too long

When Not to Use Table Partitioning:

  • Tables are small enough for simple indexes to handle efficiently
  • Queries do not filter by any specific column (no clear partition key)
  • Data access patterns are random across the table
  • Your team lacks experience with partitioning maintenance
  • The added complexity does not provide measurable benefit

Production Failure Scenarios

FailureImpactMitigation
Partition key causes hot spotsSome partitions receive disproportionate writesChoose partition key carefully, consider hash partitioning for write distribution
Too many partitionsPlanner overhead, metadata bloatRight-size partition count (tens to low hundreds, not thousands)
Queries missing partition key filterFull table scan across all partitionsMonitor slow queries, add partition key filtering hints
Old partitions not droppedStorage grows unboundedAutomate partition lifecycle with scheduled jobs
Partition maintenance causing locksApplication queries timeoutUse CONCURRENTLY for index builds, schedule maintenance during off-peak
Incorrect partition boundsData goes to wrong partition or failsValidate bounds before inserting, test with boundary values
Index propagation too slowPartition indexes lag behindBuild indexes on partitions individually, not on parent
Cascading drops affecting replicasReplica lag during partition dropsCoordinate partition maintenance with replication monitoring

Common Pitfalls / Anti-Patterns

  1. Partitioning without partition key in queries: Partitioning only helps if queries filter on the partition key. Without the partition key in WHERE clauses, you scan all partitions.

  2. Too many partitions: Each partition adds overhead. Hundreds of partitions slow down the query planner and increase memory usage.

  3. Aligning partitions with application code: Embedding partition names in application code creates tight coupling. Keep partition details in the database.

  4. Neglecting partition maintenance: Without automated maintenance, partitions grow unbounded and become unmanageable.

  5. Using inappropriate partition granularity: Monthly partitions work until one month gets viral. Adjust granularity based on data distribution.

  6. Assuming partitions are always beneficial: Partitions add complexity. Measure performance improvement before committing to partitioning.

  7. Not propagating index changes to partitions: Indexes on the parent table do not always propagate correctly. Verify partition indexes after creation.

  8. Partition-boundary bugs: Off-by-one errors in partition bounds cause data to fail or go to wrong partition. Always validate boundary conditions.

Trade-off Analysis

FactorRange PartitioningList PartitioningHash Partitioning
Best forTime-series data, chronological access patternsCategorical data, discrete values (regions, departments)Even distribution, write-heavy workloads
Query pruningExcellent for range filters on partition keyGood for exact match filtersLimited — range scans without hash key become full scans
Hot spot riskHigh if time-based access is skewedHigh if category distribution is skewedLow — distribution is uniform
Maintenance complexityMedium — boundary management for date rangesLow — discrete value listsLow — no boundary management
FlexibilityInflexible — requires contiguously defined boundsModerate — can add new list values easilyInflexible — cannot perform range queries efficiently
Storage alignmentGood for tiered storage (SSD for recent, HDD for archival)Good for location-based data isolationNo natural alignment with storage tiers
Index strategyLocal indexes work well for time-range queriesLocal indexes work well for category filtersLocal indexes required; global indexes lose partition benefits
Skew toleranceLow — skewed data per partition causes imbalanceLow — skewed category distribution causes imbalanceHigh — hash function distributes evenly

When Each Strategy Wins

Choose Range Partitioning when:

  • Data has natural temporal ordering (logs, events, transactions)
  • Queries almost always include a time range filter
  • You need to archive or drop old data by time period
  • Storage tiering aligns with time (recent data on fast storage)

Choose List Partitioning when:

  • Data naturally groups into categories (region, status, account tier)
  • You frequently filter by exact category values
  • Categories are stable and infrequently added
  • You want to isolate specific categories for performance or compliance

Choose Hash Partitioning when:

  • Data distribution is highly skewed and range partitioning would create hot partitions
  • Write throughput is the primary concern and writes must be evenly distributed
  • You do not need range queries on the partition key
  • You need to add partitions for scalability without respecting logical boundaries

Partition Count Trade-offs

Partition CountPlanner OverheadMetadata MemoryParallelism BenefitRecommended Scenario
4–8MinimalLowModerateSmall tables, simple queries
12–24LowModerateHighMedium tables, moderate query complexity
36–64ModerateModerate-HighHighLarge time-series tables
100+HighHighVery HighLarge tables with diverse query patterns, use with caution
1000+Very HighVery HighLimitedAvoid — metadata bloat outweighs benefits

Real-World Case Studies

Real-world case studies coming soon.

Quick Recap Checklist

  • Partition by range for time-series data; list for categorical; hash for even distribution
  • Queries must filter on partition key to benefit from pruning
  • Keep partition count in tens to low hundreds — not thousands
  • Automate partition creation before gaps cause insert failures
  • Detach before dropping old partitions to minimize replication impact
  • Use CONCURRENTLY for index builds on partitioned tables
  • Monitor default partition size — growth indicates stale bounds
  • Multi-column partition keys require leading column in query filters
  • Subpartitioning adds planner overhead — limit to 2 levels
  • Test partition pruning with EXPLAIN before deploying queries

Interview Questions

1. You have a table with 500GB of data partitioned by month. Most queries filter by created_at but some queries filter by user_id only, without a time filter. What happens to those queries and how do you fix it?

Queries that filter only on user_id must scan all 500GB of data across all monthly partitions — partition pruning cannot help because the time filter is absent. The query does a full table scan across all partitions. The fix is either to add a time range constraint to the query when possible, or to create a secondary index on user_id that propagates to all partitions. For very high-volume user_id queries, consider hash partitioning by user_id instead of range partitioning by time, or use a composite partition key that includes both user_id and a time component.

2. Your partition maintenance job runs DROP TABLE on old partitions but you notice replication lag spikes during the drop. Why?

Dropping a table acquires an ACCESS EXCLUSIVE lock that blocks all concurrent operations on the table. Even though the partition is separate from the parent table, the metadata operation still generates WAL and can block the replication slot from advancing. If you are using replication slots, the dropped partition's WAL cannot be removed until the replica acknowledges it. The fix is to use ALTER TABLE DETACH PARTITION instead of DROP TABLE — detaching is less aggressive on the replication slot, and you can drop the detached table during a low-traffic window.

3. What is the difference between static partition pruning and dynamic partition pruning?

Static partition pruning happens at query planning time — the planner determines which partitions match before execution based on constants in the query. If your query has WHERE created_at = '2024-06-15', the planner knows exactly which partition to scan and excludes all others before executing. Dynamic partition pruning evaluates at execution time — used when the partition key depends on a subquery, parameter, or function result that is not known at planning time. Dynamic pruning has overhead because the executor must determine which partitions to scan during execution. Queries using dynamic pruning are typically slower than those using static pruning.

4. When would you choose hash partitioning over range partitioning?

Hash partitioning when you cannot predict data distribution and need even row counts across partitions. If your data has natural ordering that aligns with your query patterns (like time-series data), range partitioning is better because it groups related data together. Hash partitioning is also better when you have high-concurrency inserts on the same partition key — hash distribution spreads inserts across partitions and reduces lock contention. The tradeoff is that range queries without the hash key become full-table scans across all hash partitions.

5. How does PostgreSQL handle a multi-column partition key, and what are the planner limitations?

PostgreSQL supports multi-column partition keys for both range and list partitioning, specified as a tuple in parentheses: PARTITION BY RANGE (created_at, user_id). The partition bounds are compared tuple-wise — meaning (2024-01-01, 100) to (2024-01-01, 200) covers dates of 2024-01-01 with user_ids between 100 and 200. The limitation is that the planner can only prune partitions when the query includes all leading columns of the partition key. If you partition by (created_at, user_id) and query only on user_id, no pruning occurs because the planner needs the leading column (created_at) to determine which partitions to skip. Always put the most frequently queried column first in a composite partition key.

6. What happens when you try to insert a row into a partitioned table and the row does not match any partition bound?

The insert fails with a no partition of relation error. PostgreSQL will not insert the row into a default partition or fall back to the parent table — the row must match an explicit partition bound or an attached default partition. To handle out-of-range data gracefully, attach a DEFAULT PARTITION using FOR VALUES DEFAULT. Without a default partition, applications must handle the insertion error and route data appropriately. For time-based partitions, a default partition acts as a catchall for dates beyond your planned ranges, but you should monitor it closely — a growing default partition indicates your partition bounds are stale.

7. How do you migrate a large non-partitioned table to a partitioned table without downtime?

The safest approach uses a two-phase migration. First, create the partitioned table structure alongside the original table with no data, then use a batched INSERT INTO ... SELECT FROM to copy data in manageable chunks (e.g., 100,000 rows per transaction) to avoid locking and replication lag. Second, use ALTER TABLE ... ATTACH PARTITION to attach partitions as they fill. For very large tables, use the pg_partman extension which automates incremental data migration and partition management. The key is to migrate data while the original table continues accepting writes, then do a final cutover where you rename the old table, rename the partitioned table to the original name, and replay any writes that arrived during migration using a logical replication slot or change-data-capture pipeline.

8. What is the difference between PARTITION BY RANGE and PARTITION BY LIST, and when would you use each?

PARTITION BY RANGE defines partitions as non-overlapping value ranges — FROM ('2024-01-01') TO ('2024-04-01') covers everything between those two boundaries. Ranges must be contiguous and non-overlapping, which makes them ideal for time-series data where partitions represent periods. PARTITION BY LIST defines partitions as specific discrete values — FOR VALUES IN ('warehouse_a', 'warehouse_b') covers only those exact values. List partitioning is better when data naturally groups into categories (regions, departments, account tiers) where you want queries for one category to hit only that partition. List partitions can be non-contiguous unlike range partitions. The choice depends on your access patterns: time-based queries favor range partitioning; categorical or geographic queries favor list partitioning.

9. How does partitioning affect autovacuum and index maintenance, and what special considerations apply?

Each partition runs its own autovacuum workers independently, meaning a partitioned table with 36 monthly partitions can have 36 concurrent vacuum workers. This is generally beneficial — active partitions get frequent vacuum while historical partitions rarely need it. However, index maintenance must be considered: indexes on the parent table propagate to all partitions, but building a large index on the parent blocks writes across all partitions. Use CREATE INDEX CONCURRENTLY on the parent to build indexes without blocking, but be aware this takes longer and cannot be run inside a transaction. For very large partitions, consider building indexes on individual partitions rather than the parent — this lets you stagger index builds and avoid a single massive operation. Partition-specific indexes also allow optimization based on each partition's data characteristics.

10. Can you partition by an expression rather than a column directly, and what are the implications for query rewriting?

Yes, PostgreSQL allows partitioning by an expression: PARTITION BY RANGE (EXTRACT(YEAR FROM created_at)) or PARTITION BY RANGE (LOWER(category)). However, partition pruning only works when the query's filter matches the expression exactly. If you partition by EXTRACT(YEAR FROM created_at), a query with WHERE created_at >= '2024-01-01' will NOT prune — PostgreSQL cannot derive the year expression from a raw timestamp comparison. The query must use WHERE EXTRACT(YEAR FROM created_at) = 2024 to enable pruning. This creates an impedance mismatch: application queries naturally use range comparisons on timestamps, not year extractions. To avoid forcing application changes, partition by the column directly and let the planner handle range pruning naturally. Expressions in partition keys are most useful for list partitioning where you want to partition by a transformed value.

11. How does subpartitioning interact with partition pruning, and what are the planner's limitations with multi-level partitioning?

Subpartitioning creates a hierarchy where the planner must evaluate both levels to determine which partitions to scan. For a query to prune at both levels, it must include filters for both the top-level and second-level partition keys. If you have RANGE by year then RANGE by month within each year, a query filtering only on month without year cannot prune the top-level year partitions — the planner must scan all year partitions and evaluate month subpartitions within each. The planner overhead increases with more levels because it must evaluate more bounds. PostgreSQL supports exactly two levels of partitioning (parent + one level of subpartitions), which is a deliberate design choice to keep planning time reasonable. Best practice is to limit subpartitioning to two levels with clear purposes for each: year for archival boundaries, month for query granularity.

12. What locking behavior occurs during partition maintenance operations, and how do you minimize impact on production queries?

Partition maintenance operations acquire locks of varying severity: ALTER TABLE ATTACH/DETACH takes a SHARE UPDATE EXCLUSIVE lock which allows reads and updates to continue but blocks other maintenance operations; DROP TABLE takes an ACCESS EXCLUSIVE lock which blocks all queries until the drop completes. For production safety, detach partitions before dropping — ALTER TABLE orders DETACH PARTITION orders_2023_q1 is lightweight, then DROP TABLE orders_2023_q1 can run during low-traffic windows. For index creation on partitioned tables, always use CREATE INDEX CONCURRENTLY which takes a SHARE UPDATE EXCLUSIVE lock instead of blocking — but it cannot run inside a transaction, so plan accordingly. Partition splits are the most dangerous operations because they require acquiring locks on both the parent and new partition while redistributing rows.

13. How does partitioning affect the cardinality of indexes, and what strategies work best for partition-specific indexes versus propagated indexes?

Indexes on partitioned tables inherit the cardinality of their underlying data. If one partition contains 80% of rows (hot partition), indexes on that partition are larger and more expensive to scan than indexes on smaller partitions. When you create an index on the parent table (CREATE INDEX ON orders(user_id)), PostgreSQL creates indexes on each partition automatically — but each partition's index reflects that partition's data distribution. Partition-specific indexes let you optimize for each partition's characteristics: the hot partition might need a covering index with include columns, while cold partitions might need no index at all. The tradeoff is management overhead — you must maintain partition-specific indexes manually, while parent-propagated indexes are managed automatically. For time-range partitioned tables, keep indexes on the parent for queries that span partitions, and add partition-specific indexes only for the most active partitions where you can add optimization tricks like partial indexes or different index types.

14. What are the differences between partition-wise joins and partition-wise aggregation, and when do they provide performance benefits?

Partition-wise joins occur when a query joins two partitioned tables on the partition key, allowing PostgreSQL to join only corresponding partitions instead of all partitions. If you have orders partitioned by customer_id and payments also partitioned by customer_id, a join on customer_id can be executed as independent joins between matching partition pairs. This reduces the volume of data moved through the join algorithm. Partition-wise aggregation works similarly — when you aggregate data grouped by the partition key, PostgreSQL can aggregate within each partition in parallel and then combine results. The benefit is proportional to the number of partitions and the data volume within each partition. For small partition counts (under 8), the overhead of coordinating partition-wise operations often exceeds the benefit. For larger partition counts with substantial data in each partition, partition-wise operations can reduce memory usage and execution time significantly.

15. How does PostgreSQL's partition estimation differ from Oracle's, and what implications does this have for query planning?

PostgreSQL's partition estimation uses the number of partitions and partition bounds to estimate row counts, but it does not have sophisticated partition-level statistics like Oracle's partition-wise statistics. In Oracle, each partition can have its own statistics, allowing precise cardinality estimates for queries that target specific partitions. PostgreSQL relies on the parent table's statistics combined with partition bound information to estimate partition cardinalities — this estimation degrades when partitions have significantly different data distributions (skewed partitions). PostgreSQL 14+ improved partition estimation for certain query patterns, but Oracle still has an advantage in multi-partition queries where partition-level statistics allow more accurate join ordering and join method selection. The implication is that PostgreSQL partitioned queries may choose suboptimal join orders more often than Oracle when partitions have unequal sizes, and you should verify EXPLAIN output for large partitioned tables to ensure join decisions are reasonable.

16. You are running PostgreSQL 10 and considering migrating to PostgreSQL 15. What partitioning improvements should you expect, and what caveats apply during the upgrade?

PostgreSQL 11 added partition-wise joins and aggregation which dramatically improve performance for partitioned table queries. PostgreSQL 13 added ALTER TABLE ... ATTACH PARTITION CONCURRENTLY which prevents locking during partition attachment. PostgreSQL 14 improved partition statistics collection which helps the planner make better decisions for skewed partitions. PostgreSQL 15 added asynchronous partition creation. The main upgrade caveat is that some partition-related query plans may change — a query that ran acceptably slow on PG10 may now run faster but with different memory usage patterns. Test partition-heavy workloads before upgrading. Also verify that any partitioning extensions like pg_partman are compatible with the target version.

17. How does pg_partman differ from manual partition management, and what operational risks does it introduce?

pg_partman automates partition creation, retention, and data movement through background workers. It pre-creates partitions ahead of time based on a configured interval (daily, monthly, etc.) and can automatically drop partitions older than a retention period. This eliminates the risk of missing a partition creation window that would cause inserts to fail. The operational risks: pg_partman runs as a background worker that consumes resources — configure its schedule carefully to avoid contention with peak traffic. It also creates a part_config table in the target schema that must be backed up. If pg_partman is misconfigured (wrong interval, wrong retention), it can create partitions too aggressively or drop partitions with live data. Always test the configuration on a non-production system first.

18. Your application queries a partitioned table with WHERE created_at >= $1 using a prepared statement. You add a new partition for next month but queries using the prepared statement do not see improved performance. Why?

PostgreSQL caches the execution plan for prepared statements. When the prepared statement was first created, PostgreSQL compiled a plan based on the partitions that existed at that time. The cached plan does not automatically adapt when partitions are added or removed. The solution is to use EXECUTE with a parameterized query instead of PREPARE — each execution generates a fresh plan that accounts for the current partition configuration. Alternatively, call DEALLOCATE and re-prepare periodically, or use SET plan_cache_mode = force_custom_plan to always use custom plans for prepared statements. For dynamic partition scenarios, custom plans are almost always better because partition changes invalidate cached generic plans.

19. You have a table partitioned by user_id using hash partitioning with 16 partitions. A query filtering on user_id still seems slow even though EXPLAIN shows only one partition is scanned. What else should you investigate?

If pruning shows only one partition is scanned but the query is still slow, the bottleneck is likely the index on that partition. Even a single-partition scan can be slow if the partition's index is large or fragmented. Check pg_relation_size('partition_index') to see index size. Check pg_stat_user_indexes for high index scan counts versus actual row fetches — a low ratio indicates index inefficiency. Also check for missing indexes: a query with WHERE user_id = X AND status = 'active' may be scanning the partition index on user_id but still filtering on status row-by-row. Run ANALYZE partition_name to ensure statistics are up to date. Check pg_statio_user_indexes for index buffer cache hit rates — if the index does not fit in memory, disk I/O becomes the bottleneck.

20. How does using a default partition affect partition pruning, and what monitoring strategies apply to catch problems early?

A default partition acts as a catchall for rows that do not match any explicit partition bound. The risk is that the default partition can accumulate significant data, and queries that should prune to specific partitions still touch the default partition because PostgreSQL must check it for non-matching rows. Queries that join or aggregate across the partitioned table also include the default partition. Monitoring: set an alert on the default partition size — any growth above a small threshold (e.g., 1GB) indicates partition bounds are stale and need updating. Query pg_stat_user_tables for the default partition to track row count growth over time. Ideally, a healthy partitioned table has a zero-row default partition because all data falls within explicit bounds. If the default partition is growing, that is a signal to extend your partition range before the next maintenance window.

Further Reading

Official Documentation:

Tools:

Conclusion

  • Partitioning divides large tables into smaller, manageable pieces
  • Range partitioning works well for time-based data
  • List partitioning suits categorical data
  • Hash partitioning ensures even distribution
  • Partition pruning gives query benefits only when queries use the partition key
  • Dropping partitions is faster than deleting rows
  • Partitioning is complexity; only use when benefits outweigh costs

Copy/Paste Checklist:

-- Create range-partitioned table
CREATE TABLE orders (
    id SERIAL,
    user_id INTEGER NOT NULL,
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

-- Create partition for specific list
CREATE TABLE orders_us PARTITION OF orders
    FOR VALUES IN ('US', 'CA', 'MX');

-- Detach and drop old partition
ALTER TABLE orders DETACH PARTITION orders_2023_01;
DROP TABLE orders_2023_01;

-- Check which partitions are being used
EXPLAIN SELECT * FROM orders WHERE created_at = '2024-06-15';

Best Practices Summary

Design phase:

  • Start with 20-50 partitions for time-based tables; scale up only if needed
  • Put the most frequently queried column first in composite partition keys
  • Reserve 10-20% partition capacity headroom for unexpected growth spikes
  • Use consistent naming conventions that make partition identification easy

Operations phase:

  • Schedule partition maintenance during low-traffic windows
  • Always use CONCURRENTLY for index creation on partitioned tables
  • Monitor partition sizes — alert when any partition exceeds 50GB
  • Detach before dropping to minimize replication impact
  • Run ANALYZE after partition attachment or major data loads

Query phase:

  • Verify partition pruning with EXPLAIN on every new query pattern
  • Add partition key filters to high-frequency queries
  • Consider composite indexes that start with the partition key

Anti-patterns to avoid:

  • Do not partition tables under 100GB unless you have specific pain points
  • Do not create partitions that will require daily or weekly maintenance
  • Do not use expressions in partition keys — use bare columns
  • Do not ignore partition maintenance — set up automation before you need it
-- Verify pruning is working (should show partition names, not parent scan)
EXPLAIN SELECT * FROM orders
WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';

-- Check partition sizes and growth trends
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    pg_stat_get_live_tuples(schemaname||'.'||tablename::regclass) AS rows
FROM pg_tables
WHERE tablename LIKE 'orders_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Category

Related Posts

Connection Pooling: HikariCP, pgBouncer, and ProxySQL

Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.

#database #connection-pooling #performance

Database Monitoring: Metrics, Tools, and Alerting

Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.

#database #monitoring #observability

Vacuuming and Reindexing in PostgreSQL

PostgreSQL's MVCC requires regular maintenance. This guide explains dead tuples, VACUUM vs VACUUM FULL, autovacuum tuning, REINDEX strategies, and how to monitor bloat with pg_stat_user_tables.

#database #postgresql #vacuum