One Big Table: The Denormalized Approach to Data Modeling
Learn how One Big Table architecture simplifies data pipelines by combining all attributes into single wide denormalized tables.
One Big Table: The Denormalized Approach to Data Modeling
Every data modeling methodology has its advocates. Kimball dimensional modeling gives you stars. Data Vault gives you hubs and satellites. One Big Table gives you one table to rule them all.
OBT is exactly what it sounds like. Instead of splitting data into normalized fact and dimension tables, you combine everything into single wide tables, typically one per source entity. A customer table contains every attribute about a customer, denormalized from all source systems. An order table contains order data joined with customer attributes, product attributes, and store attributes.
This approach fell out of favor in traditional data warehousing circles, but it has made a comeback in modern data platforms, particularly those built on columnar storage and cloud data warehouses.
The Core Philosophy
The rationale behind OBT is pragmatic. Complexity has a cost. Every join requires understanding relationships, maintaining foreign keys, and debugging null values. Every schema change ripples across multiple tables.
OBT trades storage space and some data redundancy for query simplicity. When you want to analyze orders, you query the order table. All the context you need is already there. No joins required.
-- OBT style: one query, no joins
SELECT
order_date,
customer_name,
customer_region,
product_name,
product_category,
store_city,
quantity,
unit_price,
total_amount
FROM rpt_orders
WHERE order_date >= '2026-01-01';
Compare to the equivalent star schema query joining fact_sales to dim_customer, dim_product, dim_store, and dim_date. The OBT query is simpler to write and understand.
Why OBT Makes Sense in Modern Data Warehouses
Cloud data warehouses like BigQuery, Snowflake, and Redshift are built for columnar storage and massively parallel processing. In these environments, denormalized tables are not the performance problem they once were.
Columnar storage compresses repeated values efficiently. If every row in your order table has the same customer_region values repeated, those compress well. Modern query engines can scan billions of rows quickly when the data is stored in a single location.
The operational simplicity is significant. One table to manage, one schema to understand, one location to query. When a new analyst joins the team, they can understand the data model without a diagram of twenty related tables.
Structure of an OBT
An OBT report table typically combines elements that would be spread across fact and dimension tables in a star schema.
CREATE TABLE rpt_orders (
-- Order identifiers
order_id BIGINT PRIMARY KEY,
order_number VARCHAR(20),
order_date DATE,
order_datetime TIMESTAMP,
-- Customer attributes (denormalized)
customer_id INT,
customer_name VARCHAR(200),
customer_email VARCHAR(200),
customer_region VARCHAR(100),
customer_segment VARCHAR(50),
customer_tier VARCHAR(20),
-- Product attributes (denormalized)
product_id INT,
product_name VARCHAR(200),
product_category VARCHAR(100),
product_subcategory VARCHAR(100),
product_brand VARCHAR(100),
-- Store attributes (denormalized)
store_id INT,
store_name VARCHAR(200),
store_city VARCHAR(100),
store_region VARCHAR(100),
store_format VARCHAR(50),
-- Order metrics
quantity_ordered INT,
unit_price DECIMAL(10,2),
discount_percent DECIMAL(5,2),
discount_amount DECIMAL(10,2),
total_amount DECIMAL(12,2),
cost_of_goods DECIMAL(12,2),
-- Order status
order_status VARCHAR(50),
is_returned BOOLEAN,
-- Processing dates
shipped_date DATE,
delivered_date DATE,
days_to_ship INT,
days_to_deliver INT,
-- Audit columns
load_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
source_system VARCHAR(50)
);
Every attribute you need for order analysis is in one table. The analyst does not need to know which attributes come from which source system or how to join them.
Loading OBT Tables
The load process for OBT tables requires joining source data at load time.
def load_rpt_orders(source_orders, source_customers, source_products, source_stores):
"""Denormalize order data from multiple sources."""
query = """
SELECT
o.order_id,
o.order_number,
o.order_date,
o.quantity,
o.unit_price,
o.discount,
o.total_amount,
c.customer_id,
c.customer_name,
c.customer_region,
p.product_id,
p.product_name,
p.category,
s.store_id,
s.store_city
FROM staging_orders o
LEFT JOIN staging_customers c ON o.customer_id = c.customer_id
LEFT JOIN staging_products p ON o.product_id = p.product_id
LEFT JOIN staging_stores s ON o.store_id = s.store_id
"""
# Execute query and load into OBT
result = execute_query(query)
insert_into_rpt_orders(result)
The denormalization happens in the ETL layer. The reporting layer sees only flat, joined data.
Handling Changes with OBT
One trade-off of OBT is handling slowly changing dimensions. In a star schema, SCD strategies are explicit. In OBT, you need a different approach.
Approach 1: Latest Value Only
For attributes where historical accuracy is not critical, you overwrite with the latest values.
UPDATE rpt_orders
SET customer_tier = 'Gold',
customer_region = 'Northeast'
WHERE customer_id = 12345;
This works when you only care about current state and do not need to analyze historical values.
Approach 2: Effective Date Columns
For attributes that need historical tracking, include effective date columns in the table.
ALTER TABLE rpt_orders ADD COLUMN customer_tier VARCHAR(20);
ALTER TABLE rpt_orders ADD COLUMN customer_tier_effective_date DATE;
ALTER TABLE rpt_orders ADD COLUMN customer_tier_expiry_date DATE;
The reporting query filters to the appropriate effective date range.
Approach 3: Separate History Table
Keep a full history in a separate table and load the latest state into the OBT.
-- History table: complete audit trail
CREATE TABLE hist_customer_changes (
customer_id INT,
attribute_name VARCHAR(50),
old_value VARCHAR(200),
new_value VARCHAR(200),
change_date TIMESTAMP
);
-- OBT: latest values only
CREATE TABLE rpt_customer (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(200),
customer_tier VARCHAR(20),
last_updated TIMESTAMP
);
This separates the concerns of historical analysis and current state reporting.
Advantages of One Big Table
The appeal of OBT is straightforward:
Query simplicity. Analysts write single-table queries without joins. This reduces errors and speeds up ad-hoc analysis.
Faster development. New reports do not require understanding a complex schema of related tables. If the data is in the table, you query it.
Easier onboarding. New team members can be productive faster when there is only one table to understand.
Debugging is simpler. When data looks wrong, you check one table instead of tracing through multiple joins.
Cloud-native performance. Columnar cloud databases handle wide denormalized tables efficiently. Compression makes storage reasonable, and MPP execution makes queries fast.
Disadvantages and Trade-offs
OBT is not without critics:
Data redundancy. The same customer attributes appear on every order that customer places. Storage costs increase compared to normalized designs.
Update anomalies. When a customer moves to a new region, you must update every row for that customer. Miss one and you have inconsistent data.
Unclear ownership. In a star schema, dimension tables have clear owners. In OBT, who owns the customer attributes on the order table?
Loss of relationship information. OBT hides the relationships between entities. You cannot easily see what other orders a customer has placed without querying the orders table.
Difficulty in change management. Schema changes to a wide table are more impactful. Adding a column affects the entire table.
OBT in the Modern Data Stack
OBT has found renewed popularity with the rise of dbt and modern transformation layers.
In a dbt project, you might model your data as OBT:
-- models/reporting/rpt_orders.sql
{{ config(materialized='table') }}
SELECT
o.order_id,
o.order_date,
c.customer_name,
c.customer_region,
p.product_name,
p.product_category,
s.store_city,
o.quantity,
o.total_amount
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
LEFT JOIN {{ ref('stg_products') }} p ON o.product_id = p.product_id
LEFT JOIN {{ ref('stg_stores') }} s ON o.store_id = s.store_id
The staging layer maintains normalized data. The reporting layer denormalizes into OBT tables. You get the best of both worlds: clean source data and simple reporting queries.
This layered approach addresses the update anomaly problem. The staging layer is the system of record. The OBT layer is a derived reporting view.
flowchart TB
subgraph "Star Schema"
F[fact_sales<br/>sale_id, product_id, customer_id]
D1[dim_product<br/>product_id, name, category]
D2[dim_customer<br/>customer_id, name, region]
D3[dim_date<br/>date_id, year, month]
F --> D1
F --> D2
F --> D3
end
subgraph "One Big Table"
OBT[rpt_orders<br/>order_id, customer_name, customer_region, product_name, product_category, store_city, quantity, total_amount]
end
When OBT Works Well
OBT is a good fit when:
Query patterns are predictable. If users mostly query order data with customer and product context, OBT serves those queries well.
Analytics latency is acceptable. Denormalized tables are refreshed on a schedule. Real-time updates are more complex.
Storage costs are not prohibitive. Cloud storage is cheap, but denormalization multiplies storage requirements.
Team has strong data governance. Without the structural constraints of a star schema, data quality depends more on governance processes.
For organizations that prioritize query simplicity and fast development cycles over storage efficiency, OBT is a reasonable choice.
One Big Table Production Failure Scenarios
Update anomaly causing inconsistent customer region
A customer moves from “West” to “Northeast.” The ETL job runs an UPDATE on the rpt_orders table but only processes 99% of the customer’s rows—the job fails mid-way through. The customer now has “West” on older orders and “Northeast” on newer ones. Revenue reports by region are inconsistent until the next full refresh.
Mitigation: Use staging tables as the system of record. OBT is always derived from staging via a complete refresh or a MERGE operation, not incremental updates. If incremental updates are unavoidable, wrap the entire update in a transaction and monitor affected row counts.
SCD handling producing wrong historical totals
The team decides to track customer tier changes in rpt_orders using latest-value-only (Type 1). An analyst queries historical revenue by customer tier for the past year. The result shows all historical orders attributed to the customer’s current tier, not the tier they had at the time of purchase. Marketing sees inflated Gold customer revenue that never actually existed.
Mitigation: Be explicit about SCD limitations in documentation. For historical analysis, use effective date columns (Type 2 equivalent) or a separate history table. Do not let business users assume OBT supports historical SCD when it uses Type 1.
Schema change requiring full table rewrite
A new requirement adds 15 new columns to rict_orders. The ALTER TABLE runs for 4 hours on a 500GB table, during which the table is locked or significantly degraded. BI tools time out. Dashboard users see errors.
Mitigation: Add columns as a separate operation from data population. Add the column, backfill data in batches during low-traffic windows, then make the column visible to users. For very large tables, use a shadow-table approach: create the new table, switch reads to it atomically, then drop the old table.
Load job duplication inflating order counts
The ETL load is not idempotent. A rerun of yesterday’s load inserts duplicate rows into rpt_orders instead of replacing them. Every subsequent aggregation is double-counted. The issue goes undetected for three days until an analyst notices revenue is exactly 2x expected.
Mitigation: Always design OBT loads to be idempotent. Use a truncate-and-reload pattern for full refresh, or use a merge (UPSERT) with proper key handling. Add row count validation checks after every load: compare count and sum(amount) against source systems.
One Big Table Quick Recap
- OBT combines fact and dimension data into single wide tables for query simplicity—no joins needed.
- Works well on columnar cloud warehouses (BigQuery, Snowflake, Redshift) where denormalization is efficient.
- Trade-offs: update anomalies require careful handling, storage redundancy is unavoidable, schema changes are expensive.
- SCD handling is limited: latest-value-only destroys history; add effective date columns for point-in-time correctness.
- Layer OBT on top of normalized staging tables—never use OBT as the system of record for source data.
- Validate row counts and sums after every load; design all loads to be idempotent.
Compare this to Kimball Dimensional Modeling for structured star schemas or Data Vault for enterprise-scale auditability.
Category
Related Posts
Data Vault: Scalable Enterprise Data Modeling
Learn Data Vault modeling methodology for building auditable, scalable enterprise data warehouses with hash keys and satellite tables.
Kimball Dimensional Modeling: Star Schema Techniques
Learn Kimball dimensional modeling techniques for building efficient star schema data warehouses with fact and dimension tables.
Slowly Changing Dimensions: History in Data Warehouses
Master Slowly Changing Dimension techniques including Type 1, Type 2, and Type 3 for maintaining historical accuracy in data warehouses.