Materialized Views

Precomputed query results stored as tables. PostgreSQL refresh strategies, indexes on materialized views, and when to use them vs views or denormalization.

published: reading time: 12 min read updated: January 1, 1970

Materialized Views

A regular view is a stored query. When you query it, the database runs the underlying query and returns the result. No data is stored. Every query to the view re-executes the query.

A materialized view stores the actual result. The query runs once, the result is saved as a table, and you query that table directly. Fast reads, stale data.

The tradeoff—freshness vs performance—determines when materialized views help and when they don’t.

Materialized Views vs Regular Views vs Denormalization

Regular views are logical constructs. They don’t store data, just the query definition. Use them for hiding complex queries, restricting access without separate tables, or encapsulating business logic.

Materialized views store precomputed results. Use them for accelerating expensive aggregations or joins, replicating remote data locally, or building read replicas for specific query patterns.

Denormalization duplicates data manually. Your code decides what to store where and manages keeping copies in sync. Materialized views let the database handle this automatically.

PostgreSQL Materialized Views

PostgreSQL supports materialized views natively.

CREATE MATERIALIZED VIEW sales_by_month AS
SELECT
    date_trunc('month', order_date) AS month,
    customer_id,
    SUM(total) AS total_sales
FROM orders
GROUP BY date_trunc('month', order_date), customer_id
WITH DATA;

WITH DATA populates the view immediately. Without it, you create an empty view that populates on first refresh.

Query the materialized view like a regular table:

SELECT * FROM sales_by_month WHERE month >= '2025-01-01';

Refreshing Materialized Views

Data goes stale as underlying tables change. PostgreSQL offers two refresh strategies:

Full refresh rewrites the entire materialized view:

REFRESH MATERIALIZED VIEW sales_by_month;

This blocks reads until the refresh finishes. For large views, that means noticeable downtime.

Concurrent refresh rebuilds without blocking reads:

REFRESH MATERIALIZED VIEW CONCURRENTLY sales_by_month;

Concurrent refresh requires a unique index. Without one, PostgreSQL refuses the concurrent option.

CREATE UNIQUE INDEX ON sales_by_month (month, customer_id);

Concurrent refresh takes longer than full refresh but keeps the view available.

Automatic Refresh

PostgreSQL doesn’t have built-in automatic refresh. Options:

  • Scheduled jobs via pg_cron
  • Trigger-based refresh on underlying tables
  • Application-level refresh after writes

Oracle’s materialized views support automatic refresh via refresh groups and schedules. PostgreSQL doesn’t have this.

Oracle Materialized Views

Oracle materialized views offer more features:

  • On-commit refresh: automatically refresh when underlying tables change
  • Refresh groups: refresh multiple materialized views together
  • Query rewrite: the optimizer rewrites queries to use materialized views automatically
CREATE MATERIALIZED VIEW monthly_sales
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(MONTH FROM order_date) AS month,
    SUM(total) AS total_sales
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date);

BUILD IMMEDIATE populates on creation. REFRESH FAST ON COMMIT uses log-based incremental refresh—only changed rows update, not the entire view.

Refresh Methods

Oracle refresh modes:

  • REFRESH FAST: incremental, requires materialized view logs on underlying tables
  • REFRESH COMPLETE: full rebuild
  • REFRESH FORCE: fast if possible, otherwise complete
  • NEVER REFRESH: prevents automatic refresh
ALTER MATERIALIZED VIEW monthly_sales REFRESH FAST;

Indexes on Materialized Views

Materialized views are tables. Index them like regular tables to speed up queries.

CREATE MATERIALIZED VIEW order_summary AS
SELECT
    customer_id,
    status,
    COUNT(*) AS order_count,
    SUM(total) AS total_value
FROM orders
GROUP BY customer_id, status;

CREATE INDEX idx_order_summary_customer ON order_summary (customer_id);
CREATE INDEX idx_order_summary_status ON order_summary (status);

For frequently filtered queries, proper indexes make a significant difference.

Oracle supports bitmap indexes on materialized views, which compress well for low-cardinality columns like status flags.

When Materialized Views Make Sense

Materialized views work when:

  • Underlying data changes infrequently compared to query frequency
  • Queries are expensive aggregations or multi-table joins
  • You can tolerate stale data for a known period
  • Query patterns are predictable and stable

Analytics workloads fit well. Daily sales summaries, monthly rollups, user activity aggregations—these change when new data arrives and are queried constantly.

ETL pipelines use materialized views as intermediate stores. Transform once, store the result, query it repeatedly.

When Materialized Views Don’t Help

Materialized views fail when:

  • Data changes frequently and you need current values
  • Query patterns are too diverse to precompute efficiently
  • Storage overhead exceeds the benefit
  • Refresh costs outweigh read savings

If you need to refresh every few minutes, the refresh overhead might exceed the read performance benefit. A regular view with proper indexing or query caching might work better.

Materialized View Refresh Flows

flowchart TD
    subgraph FullRefresh["Full Refresh"]
        F1[("Materialized<br/>View")] -->|blocks reads| F2[("Drop + Recreate")]
        F2 --> F3[("New View<br/>Available")]
    end

    subgraph ConcurrentRefresh["Concurrent Refresh"]
        C1[("Old View<br/>Available")] --> C2[("Build index on<br/>new snapshot")]
        C2 --> C3[("Swap atomically<br/>when ready")]
        C3 --> C4[("Old view removed<br/>Reads continue")]
    end

    FullRefresh -.->|concurrent is faster| ConcurrentRefresh

Full refresh blocks reads until complete. Concurrent refresh keeps the old view available while building the new one in the background.

When to Use / When Not to Use Materialized Views

Use materialized views when:

  • Underlying data changes infrequently compared to query frequency
  • Queries are expensive aggregations or multi-table joins
  • You can tolerate stale data for a known period
  • Query patterns are predictable and stable
  • You need to replicate remote data locally

Do not use materialized views when:

  • Data changes frequently and you need current values
  • Query patterns are too diverse to precompute efficiently
  • Storage overhead exceeds the benefit
  • Refresh costs outweigh read savings
  • You need real-time data accuracy

Materialized View vs Regular View vs Denormalization Trade-offs

DimensionRegular ViewMaterialized ViewDenormalization
StorageNone — query onlyStores result as tableManual data duplication
Read performanceSame as queryFast — reads precomputedFast — reads precomputed
FreshnessAlways currentStale until refreshManual — application manages
Write impactNoneRefresh blocks or takes timeUpdates more complex
MaintenanceNonePeriodic refresh neededApplication-level sync
ConsistencyStrongDepends on refresh frequencyApplication-dependent
Best forHiding complexity, access controlPrecomputed aggregations, remote dataRead-heavy workloads, specific shapes

Production Failure Scenarios

FailureImpactMitigation
Full refresh blocking readsQueries fail during refresh windowUse CONCURRENTLY option with unique index
Refresh lag causing stale reportsBusiness decisions on outdated numbersMonitor last_refresh time, alert on staleness
Refresh deadlocks under write loadRefresh and writes block each otherSchedule refresh during low-write windows
Unique index missing for concurrent refreshPostgreSQL refuses CONCURRENTLYAlways create unique index before concurrent refresh
Materialized view growing too largeStorage bloat, slow refreshPartition materialized views, limit aggregation scope
Index missing on frequently filtered columnsSlow reads from materialized viewAdd indexes like any table — materialized views are tables

Capacity Estimation: Materialized View Storage and Refresh Time

Materialized views store precomputed results as tables, so sizing is straightforward: the view’s result set plus any indexes you create on it.

Storage sizing formula:

mv_storage_bytes = sum(row_width × number_of_rows)
mv_index_storage = sum(index_width × indexed_rows)
total_mv_storage = mv_storage_bytes + mv_index_storage

For a materialized view aggregating sales by day and product category:

  • Source: 10M rows in fact_sales, 100K rows in dim_products
  • Materialized result: 365 days × 100 categories = 36,500 rows
  • Row width: date (4 bytes) + category_id (4 bytes) + product_id (4 bytes) + total_sales (8 bytes) + unit_count (8 bytes) = ~30 bytes per row
  • MV storage: 36,500 × 30 = ~1.1MB (plus 30% overhead = ~1.5MB)
  • With indexes: btree on (date, category) adds roughly 36,500 × 24 bytes = ~900KB
  • Total: ~2.5MB for a materialized view replacing a query scanning 10M rows

Refresh time estimation: Full refresh time depends on the underlying query’s cost:

refresh_time ≈ base_query_time + (index_rebuild_time × number_of_indexes)

If the base query takes 30 seconds on the source data and you have 2 indexes requiring rebuild, full refresh takes roughly 30 + (2 × 5) = 40 seconds. For incremental refresh (REFRESH CONCURRENTLY), the time depends on the delta — typically much faster, proportional to changed rows rather than total rows.

For a 500M-row fact table with daily aggregation materialized view: full refresh might take 5-15 minutes. Incremental refresh of only changed partitions takes seconds. This is why partition-aware materialized views — where the MV is partitioned the same way as the source — enable fast incremental refreshes.

Observability Hooks: Monitoring Materialized View Health

PostgreSQL exposes materialized view metadata through system catalog queries.

-- When was each materialized view last refreshed?
SELECT
    matviewname,
    schemaname,
    last_refresh,
    ROUND(EXTRACT(EPOCH FROM (now() - last_refresh)) / 60, 1) AS minutes_since_refresh
FROM pg_matviews
WHERE schemaname = 'public'
ORDER BY last_refresh;

-- Monitor relation sizes (including materialized views)
SELECT
    relname,
    relkind,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_catalog.pg_statio_user_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(relid) DESC;

Key alerts to configure:

  • Staleness alert: minutes_since_refresh > your_staleness_threshold — if your business requires daily reports at 8am, alert if the MV has not refreshed since yesterday.
  • Size anomaly: if MV size grows unexpectedly (suddenly 10× larger), check for unbounded aggregation or missing WHERE clause in the view definition.
  • Refresh duration: track refresh time over time — if REFRESH CONCURRENTLY starts taking 10× longer, the underlying data distribution has likely changed.
-- Track refresh duration history
CREATE TABLE mv_refresh_history (
    mv_name TEXT,
    refresh_started TIMESTAMPTZ,
    refresh_duration_secs NUMERIC,
    rows_affected BIGINT
);

-- After each refresh:
-- INSERT INTO mv_refresh_history VALUES ('sales_by_month', now(), extract_secs, row_count);

Real-World Case Study: Shopify’s Materialized Views for Analytics

Shopify runs analytics over billions of transactions across their merchant base. Their data warehouse team uses materialized views extensively to precompute common aggregation patterns — daily revenue by merchant, order counts by status, refund rates by product category.

The challenge: merchant-facing analytics need to feel real-time, but recomputing aggregations over billions of rows on every query is impractical. Materialized views bridge this gap by giving sub-second query response times on precomputed results.

Their implementation pattern: partition the source fact table by day, create materialized views that aggregate at the day level, and use REFRESH CONCURRENTLY with partition-level granularity. When new data loads for day D, only the partition containing day D changes — the materialized view refresh reads only affected partitions, not the entire fact table.

The lesson: materialized views work best when source data is partitioned and the aggregation granularity aligns with partition boundaries. Without partition alignment, full refreshes become prohibitively expensive and defeat the purpose of precomputation.

Interview Questions

Q: What is the difference between REFRESH MATERIALIZED VIEW and REFRESH CONCURRENTLY?

REFRESH MATERIALIZED VIEW takes an exclusive lock on the MV — reads fail while refresh runs. For large MVs, this lock duration is unacceptable. REFRESH CONCURRENTLY uses a unique index (required) and refreshes without blocking reads — it builds a new version of the MV, swaps when ready, and uses CREATE INDEX CONCURRENTLY semantics. The tradeoff: CONCURRENTLY takes slightly longer than a blocking refresh because it builds indexes one at a time and must ensure no duplicate keys. It also requires the MV to have a unique index.

Q: How do you keep a materialized view from growing unbounded?

The MV query definition determines its size. If you aggregate by day with no upper bound, 10 years of data makes the MV large and slow to refresh. Add a WHERE clause to limit the time window: WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'. Partition the MV the same way you partition source tables so old partitions can be detached and dropped. Alternatively, use a rolling window materialized view — a scheduled job drops old partitions and creates new ones daily.

Q: A materialized view refresh is causing replication lag on your read replica. What do you do?

Refresh on the primary triggers replication of the new data to replicas. A full refresh on a large MV can generate gigabytes of WAL that replicas must apply, causing lag. The fix: run REFRESH CONCURRENTLY on the replica instead of the primary if replicas serve read traffic. Alternatively, reduce refresh frequency — if the MV is refreshing every minute but the source data only changes hourly, you are paying replication cost for no benefit.

Q: When would you choose a materialized view over denormalization managed in application code?

Materialized views let the database handle refresh logic — you define the query once, the database manages when and how to update results. Application-level denormalization requires your code to know about and maintain every denormalized copy. Choose materialized views when the database can manage the refresh efficiently (incremental, partition-aligned, not too frequent). Choose application denormalization when the denormalized data structure does not map cleanly to a SQL query — for example, precomputing a specific social graph adjacency list that requires graph traversal logic rather than a standard aggregation.


Conclusion

Materialized views precompute expensive queries and store results as tables. They trade freshness for read performance. PostgreSQL and Oracle both support them—Oracle has more sophisticated refresh options.

Use materialized views for aggregations and joins queried frequently but changing slowly. Avoid them when data changes often or query patterns are too varied to precompute effectively.


Category

Related Posts

Denormalization

When to intentionally duplicate data for read performance. Tradeoffs with normalization, update anomalies, and application-level denormalization strategies.

#database #denormalization #performance

Connection Pooling: HikariCP, pgBouncer, and ProxySQL

Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.

#database #connection-pooling #performance

Database Monitoring: Metrics, Tools, and Alerting

Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.

#database #monitoring #observability