Materialized Views
Precomputed query results stored as tables. PostgreSQL refresh strategies, indexes on materialized views, and when to use them vs views or denormalization.
Materialized Views
A regular view is a stored query. When you query it, the database runs the underlying query and returns the result. No data is stored. Every query to the view re-executes the query.
A materialized view stores the actual result. The query runs once, the result is saved as a table, and you query that table directly. Fast reads, stale data.
The tradeoff—freshness vs performance—determines when materialized views help and when they don’t.
Materialized Views vs Regular Views vs Denormalization
Regular views are logical constructs. They don’t store data, just the query definition. Use them for hiding complex queries, restricting access without separate tables, or encapsulating business logic.
Materialized views store precomputed results. Use them for accelerating expensive aggregations or joins, replicating remote data locally, or building read replicas for specific query patterns.
Denormalization duplicates data manually. Your code decides what to store where and manages keeping copies in sync. Materialized views let the database handle this automatically.
PostgreSQL Materialized Views
PostgreSQL supports materialized views natively.
CREATE MATERIALIZED VIEW sales_by_month AS
SELECT
date_trunc('month', order_date) AS month,
customer_id,
SUM(total) AS total_sales
FROM orders
GROUP BY date_trunc('month', order_date), customer_id
WITH DATA;
WITH DATA populates the view immediately. Without it, you create an empty view that populates on first refresh.
Query the materialized view like a regular table:
SELECT * FROM sales_by_month WHERE month >= '2025-01-01';
Refreshing Materialized Views
Data goes stale as underlying tables change. PostgreSQL offers two refresh strategies:
Full refresh rewrites the entire materialized view:
REFRESH MATERIALIZED VIEW sales_by_month;
This blocks reads until the refresh finishes. For large views, that means noticeable downtime.
Concurrent refresh rebuilds without blocking reads:
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_by_month;
Concurrent refresh requires a unique index. Without one, PostgreSQL refuses the concurrent option.
CREATE UNIQUE INDEX ON sales_by_month (month, customer_id);
Concurrent refresh takes longer than full refresh but keeps the view available.
Automatic Refresh
PostgreSQL doesn’t have built-in automatic refresh. Options:
- Scheduled jobs via pg_cron
- Trigger-based refresh on underlying tables
- Application-level refresh after writes
Oracle’s materialized views support automatic refresh via refresh groups and schedules. PostgreSQL doesn’t have this.
Oracle Materialized Views
Oracle materialized views offer more features:
- On-commit refresh: automatically refresh when underlying tables change
- Refresh groups: refresh multiple materialized views together
- Query rewrite: the optimizer rewrites queries to use materialized views automatically
CREATE MATERIALIZED VIEW monthly_sales
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
SUM(total) AS total_sales
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date);
BUILD IMMEDIATE populates on creation. REFRESH FAST ON COMMIT uses log-based incremental refresh—only changed rows update, not the entire view.
Refresh Methods
Oracle refresh modes:
REFRESH FAST: incremental, requires materialized view logs on underlying tablesREFRESH COMPLETE: full rebuildREFRESH FORCE: fast if possible, otherwise completeNEVER REFRESH: prevents automatic refresh
ALTER MATERIALIZED VIEW monthly_sales REFRESH FAST;
Indexes on Materialized Views
Materialized views are tables. Index them like regular tables to speed up queries.
CREATE MATERIALIZED VIEW order_summary AS
SELECT
customer_id,
status,
COUNT(*) AS order_count,
SUM(total) AS total_value
FROM orders
GROUP BY customer_id, status;
CREATE INDEX idx_order_summary_customer ON order_summary (customer_id);
CREATE INDEX idx_order_summary_status ON order_summary (status);
For frequently filtered queries, proper indexes make a significant difference.
Oracle supports bitmap indexes on materialized views, which compress well for low-cardinality columns like status flags.
When Materialized Views Make Sense
Materialized views work when:
- Underlying data changes infrequently compared to query frequency
- Queries are expensive aggregations or multi-table joins
- You can tolerate stale data for a known period
- Query patterns are predictable and stable
Analytics workloads fit well. Daily sales summaries, monthly rollups, user activity aggregations—these change when new data arrives and are queried constantly.
ETL pipelines use materialized views as intermediate stores. Transform once, store the result, query it repeatedly.
When Materialized Views Don’t Help
Materialized views fail when:
- Data changes frequently and you need current values
- Query patterns are too diverse to precompute efficiently
- Storage overhead exceeds the benefit
- Refresh costs outweigh read savings
If you need to refresh every few minutes, the refresh overhead might exceed the read performance benefit. A regular view with proper indexing or query caching might work better.
Materialized View Refresh Flows
flowchart TD
subgraph FullRefresh["Full Refresh"]
F1[("Materialized<br/>View")] -->|blocks reads| F2[("Drop + Recreate")]
F2 --> F3[("New View<br/>Available")]
end
subgraph ConcurrentRefresh["Concurrent Refresh"]
C1[("Old View<br/>Available")] --> C2[("Build index on<br/>new snapshot")]
C2 --> C3[("Swap atomically<br/>when ready")]
C3 --> C4[("Old view removed<br/>Reads continue")]
end
FullRefresh -.->|concurrent is faster| ConcurrentRefresh
Full refresh blocks reads until complete. Concurrent refresh keeps the old view available while building the new one in the background.
When to Use / When Not to Use Materialized Views
Use materialized views when:
- Underlying data changes infrequently compared to query frequency
- Queries are expensive aggregations or multi-table joins
- You can tolerate stale data for a known period
- Query patterns are predictable and stable
- You need to replicate remote data locally
Do not use materialized views when:
- Data changes frequently and you need current values
- Query patterns are too diverse to precompute efficiently
- Storage overhead exceeds the benefit
- Refresh costs outweigh read savings
- You need real-time data accuracy
Materialized View vs Regular View vs Denormalization Trade-offs
| Dimension | Regular View | Materialized View | Denormalization |
|---|---|---|---|
| Storage | None — query only | Stores result as table | Manual data duplication |
| Read performance | Same as query | Fast — reads precomputed | Fast — reads precomputed |
| Freshness | Always current | Stale until refresh | Manual — application manages |
| Write impact | None | Refresh blocks or takes time | Updates more complex |
| Maintenance | None | Periodic refresh needed | Application-level sync |
| Consistency | Strong | Depends on refresh frequency | Application-dependent |
| Best for | Hiding complexity, access control | Precomputed aggregations, remote data | Read-heavy workloads, specific shapes |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Full refresh blocking reads | Queries fail during refresh window | Use CONCURRENTLY option with unique index |
| Refresh lag causing stale reports | Business decisions on outdated numbers | Monitor last_refresh time, alert on staleness |
| Refresh deadlocks under write load | Refresh and writes block each other | Schedule refresh during low-write windows |
| Unique index missing for concurrent refresh | PostgreSQL refuses CONCURRENTLY | Always create unique index before concurrent refresh |
| Materialized view growing too large | Storage bloat, slow refresh | Partition materialized views, limit aggregation scope |
| Index missing on frequently filtered columns | Slow reads from materialized view | Add indexes like any table — materialized views are tables |
Capacity Estimation: Materialized View Storage and Refresh Time
Materialized views store precomputed results as tables, so sizing is straightforward: the view’s result set plus any indexes you create on it.
Storage sizing formula:
mv_storage_bytes = sum(row_width × number_of_rows)
mv_index_storage = sum(index_width × indexed_rows)
total_mv_storage = mv_storage_bytes + mv_index_storage
For a materialized view aggregating sales by day and product category:
- Source: 10M rows in
fact_sales, 100K rows indim_products - Materialized result: 365 days × 100 categories = 36,500 rows
- Row width: date (4 bytes) + category_id (4 bytes) + product_id (4 bytes) + total_sales (8 bytes) + unit_count (8 bytes) = ~30 bytes per row
- MV storage: 36,500 × 30 = ~1.1MB (plus 30% overhead = ~1.5MB)
- With indexes: btree on (date, category) adds roughly 36,500 × 24 bytes = ~900KB
- Total: ~2.5MB for a materialized view replacing a query scanning 10M rows
Refresh time estimation: Full refresh time depends on the underlying query’s cost:
refresh_time ≈ base_query_time + (index_rebuild_time × number_of_indexes)
If the base query takes 30 seconds on the source data and you have 2 indexes requiring rebuild, full refresh takes roughly 30 + (2 × 5) = 40 seconds. For incremental refresh (REFRESH CONCURRENTLY), the time depends on the delta — typically much faster, proportional to changed rows rather than total rows.
For a 500M-row fact table with daily aggregation materialized view: full refresh might take 5-15 minutes. Incremental refresh of only changed partitions takes seconds. This is why partition-aware materialized views — where the MV is partitioned the same way as the source — enable fast incremental refreshes.
Observability Hooks: Monitoring Materialized View Health
PostgreSQL exposes materialized view metadata through system catalog queries.
-- When was each materialized view last refreshed?
SELECT
matviewname,
schemaname,
last_refresh,
ROUND(EXTRACT(EPOCH FROM (now() - last_refresh)) / 60, 1) AS minutes_since_refresh
FROM pg_matviews
WHERE schemaname = 'public'
ORDER BY last_refresh;
-- Monitor relation sizes (including materialized views)
SELECT
relname,
relkind,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_catalog.pg_statio_user_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(relid) DESC;
Key alerts to configure:
- Staleness alert:
minutes_since_refresh > your_staleness_threshold— if your business requires daily reports at 8am, alert if the MV has not refreshed since yesterday. - Size anomaly: if MV size grows unexpectedly (suddenly 10× larger), check for unbounded aggregation or missing WHERE clause in the view definition.
- Refresh duration: track refresh time over time — if REFRESH CONCURRENTLY starts taking 10× longer, the underlying data distribution has likely changed.
-- Track refresh duration history
CREATE TABLE mv_refresh_history (
mv_name TEXT,
refresh_started TIMESTAMPTZ,
refresh_duration_secs NUMERIC,
rows_affected BIGINT
);
-- After each refresh:
-- INSERT INTO mv_refresh_history VALUES ('sales_by_month', now(), extract_secs, row_count);
Real-World Case Study: Shopify’s Materialized Views for Analytics
Shopify runs analytics over billions of transactions across their merchant base. Their data warehouse team uses materialized views extensively to precompute common aggregation patterns — daily revenue by merchant, order counts by status, refund rates by product category.
The challenge: merchant-facing analytics need to feel real-time, but recomputing aggregations over billions of rows on every query is impractical. Materialized views bridge this gap by giving sub-second query response times on precomputed results.
Their implementation pattern: partition the source fact table by day, create materialized views that aggregate at the day level, and use REFRESH CONCURRENTLY with partition-level granularity. When new data loads for day D, only the partition containing day D changes — the materialized view refresh reads only affected partitions, not the entire fact table.
The lesson: materialized views work best when source data is partitioned and the aggregation granularity aligns with partition boundaries. Without partition alignment, full refreshes become prohibitively expensive and defeat the purpose of precomputation.
Interview Questions
Q: What is the difference between REFRESH MATERIALIZED VIEW and REFRESH CONCURRENTLY?
REFRESH MATERIALIZED VIEW takes an exclusive lock on the MV — reads fail while refresh runs. For large MVs, this lock duration is unacceptable. REFRESH CONCURRENTLY uses a unique index (required) and refreshes without blocking reads — it builds a new version of the MV, swaps when ready, and uses CREATE INDEX CONCURRENTLY semantics. The tradeoff: CONCURRENTLY takes slightly longer than a blocking refresh because it builds indexes one at a time and must ensure no duplicate keys. It also requires the MV to have a unique index.
Q: How do you keep a materialized view from growing unbounded?
The MV query definition determines its size. If you aggregate by day with no upper bound, 10 years of data makes the MV large and slow to refresh. Add a WHERE clause to limit the time window: WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'. Partition the MV the same way you partition source tables so old partitions can be detached and dropped. Alternatively, use a rolling window materialized view — a scheduled job drops old partitions and creates new ones daily.
Q: A materialized view refresh is causing replication lag on your read replica. What do you do?
Refresh on the primary triggers replication of the new data to replicas. A full refresh on a large MV can generate gigabytes of WAL that replicas must apply, causing lag. The fix: run REFRESH CONCURRENTLY on the replica instead of the primary if replicas serve read traffic. Alternatively, reduce refresh frequency — if the MV is refreshing every minute but the source data only changes hourly, you are paying replication cost for no benefit.
Q: When would you choose a materialized view over denormalization managed in application code?
Materialized views let the database handle refresh logic — you define the query once, the database manages when and how to update results. Application-level denormalization requires your code to know about and maintain every denormalized copy. Choose materialized views when the database can manage the refresh efficiently (incremental, partition-aligned, not too frequent). Choose application denormalization when the denormalized data structure does not map cleanly to a SQL query — for example, precomputing a specific social graph adjacency list that requires graph traversal logic rather than a standard aggregation.
Conclusion
Materialized views precompute expensive queries and store results as tables. They trade freshness for read performance. PostgreSQL and Oracle both support them—Oracle has more sophisticated refresh options.
Use materialized views for aggregations and joins queried frequently but changing slowly. Avoid them when data changes often or query patterns are too varied to precompute effectively.
Related Posts
- Denormalization — Manual vs database-managed data duplication
- Query Execution Plans — Understanding how queries run
- Indexes Databases — Indexing strategies for performance
Category
Related Posts
Denormalization
When to intentionally duplicate data for read performance. Tradeoffs with normalization, update anomalies, and application-level denormalization strategies.
Connection Pooling: HikariCP, pgBouncer, and ProxySQL
Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.
Database Monitoring: Metrics, Tools, and Alerting
Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.