Kimball Dimensional Modeling: Star Schema Techniques

Learn Kimball dimensional modeling techniques for building efficient star schema data warehouses with fact and dimension tables.

published: reading time: 10 min read

Kimball Dimensional Modeling: The Foundation of Data Warehousing

Dimensional modeling structures data around business processes so that analysts can actually use it. The problem with normalized schemas is that they model how data is stored, not how people query it. If you have ever written a SQL query joining five tables just to get a simple sales report, you know exactly what I mean.

Ralph Kimball developed this approach in the 1980s, and it is still the dominant methodology for data warehousing. The core idea is simple: model data for how people query it.

Fact and Dimension Tables

Every dimensional model centers on two types of tables: facts and dimensions.

A fact table stores the measurable, quantitative metrics of a business process. Sales amounts, order counts, page views, sensor readings. Facts are typically numeric and additive. A single row represents an event at a specific point in space and time.

A dimension table stores the context around those facts. Who placed the order? What product was purchased? When did the transaction occur? Dimensions are the “who, what, when, where, why” that give facts their meaning.

flowchart LR
    subgraph Fact[Fact Table<br/>fact_sales]
        F1[sale_id]
        F2[product_id]
        F3[customer_id]
        F4[date_id]
        F5[quantity_sold]
        F6[total_amount]
    end
    subgraph DimProduct[dim_product]
        D1[product_id]
        D2[product_name]
        D3[category]
        D4[brand]
    end
    subgraph DimCustomer[dim_customer]
        C1[customer_id]
        C2[customer_name]
        C3[region]
        C4[segment]
    end
    subgraph DimDate[dim_date]
        T1[date_id]
        T2[year]
        T3[month]
        T4[day_of_week]
    end
    Fact -->|product_id| DimProduct
    Fact -->|customer_id| DimCustomer
    Fact -->|date_id| DimDate

This structure is called a star schema because the fact table sits at the center with dimension tables radiating outward like points of a star.

-- Fact table: one row per sales transaction
CREATE TABLE fact_sales (
    sale_id BIGINT PRIMARY KEY,
    product_id INT,
    customer_id INT,
    store_id INT,
    date_id INT,
    quantity_sold INT,
    unit_price DECIMAL(10,2),
    total_amount DECIMAL(12,2),
    discount_amount DECIMAL(10,2)
);

-- Dimension table: product information
CREATE TABLE dim_product (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(200),
    category VARCHAR(100),
    subcategory VARCHAR(100),
    brand VARCHAR(100),
    unit_cost DECIMAL(10,2),
    is_active BOOLEAN
);

The Star Schema

The star schema is the simplest dimensional structure. A single fact table at the center, joined directly to each dimension table. No Snowflake normalization, no multiple layers of joins.

The query simplicity is the main advantage. A business analyst can write a query joining sales to products, customers, and time without understanding complex relational theory.

SELECT
    dp.category,
    dd.year,
    dd.month,
    SUM(fs.total_amount) AS total_revenue,
    COUNT(DISTINCT fs.customer_id) AS unique_customers
FROM fact_sales fs
JOIN dim_product dp ON fs.product_id = dp.product_id
JOIN dim_date dd ON fs.date_id = dd.date_id
WHERE dd.year = 2025
GROUP BY dp.category, dd.year, dd.month
ORDER BY total_revenue DESC;

Single join paths mean predictable performance. The query optimizer can traverse the star efficiently because there is only one path between any two tables.

Snowflake Schema

The Snowflake schema normalizes dimension tables into hierarchical structures. A product dimension might split into product, category, and subcategory tables.

-- Snowflake: category separated into its own table
CREATE TABLE dim_category (
    category_id INT PRIMARY KEY,
    category_name VARCHAR(100),
    department VARCHAR(100)
);

CREATE TABLE dim_product (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(200),
    category_id INT REFERENCES dim_category(category_id),
    brand VARCHAR(100)
);

Snowflake reduces data redundancy and can save storage space. The cost is additional join complexity. Query authors now navigate more tables, and the optimizer works harder.

Most data warehouses stick with star schemas. The minor storage cost of denormalized dimensions buys meaningful query simplicity.

Star Schema Trade-Offs

AspectStar SchemaSnowflake SchemaData VaultOne Big Table
Query simplicityHigh (single join per dimension)Medium (multi-level joins)Low (hub/link structure)High (no joins)
Storage efficiencyLow (redundant dimension data)High (normalized)Medium (hash-based)Low (maximum redundancy)
Load complexityLow (simple inserts)Medium (referential integrity)High (hash keys, satellites)Low (flat inserts)
Change managementHard (SCD handling in denorm)Easier (normalized updates)Easy (satellite versioning)Hardest (update anomalies)
Analytic flexibilityHighMediumMediumLow
Team size fitSmall teamsSmall teamsLarge enterprise teamsAny (simplest)
Governance overheadLowLowHigh (automation needed)Very low

A conformed dimension is a dimension used by multiple fact tables in a consistent way. The classic example is a date dimension. Every fact table in your warehouse uses the same date dimension with the same primary keys and attributes.

This consistency enables drill-across reporting. You can write a query that combines sales data with inventory data because both use the same customer and product dimensions. They speak the same language.

Building conformed dimensions requires upfront coordination. Different teams must agree on what a “customer” means, what the primary key will be, and which attributes to include. This governance work pays dividends across the organization.

Slowly Changing Dimensions

Real-world data changes. Customer addresses change, products get rebranded, employees transfer departments. Dimension tables must track these changes in a way that preserves analytical value.

The Slowly Changing Dimension (SCD) concept categorizes how you handle these changes.

Type 1: Overwrite

The old value disappears and the new value takes its place. Historical facts still point to the dimension row, but the attribute value is now current.

This approach is simple but destroys history. If you overwrite a customer’s region, you can no longer analyze historical sales by the old region.

-- Type 1: Simple update, no history
UPDATE dim_customer
SET region = 'Northeast'
WHERE customer_id = 12345;

Type 2: Add New Row

When an attribute changes, insert a new row with the new values and a new effective date range. The old row remains with its historical dates.

INSERT INTO dim_customer (
    customer_id, customer_name, region, effective_date, expiry_date, is_current
) VALUES (
    12345, 'John Doe', 'Northeast', '2026-03-27', '9999-12-31', TRUE
);

UPDATE dim_customer
SET expiry_date = '2026-03-26', is_current = FALSE
WHERE customer_id = 12345 AND is_current = TRUE;

Type 2 preserves complete history. You can analyze what a customer’s region was at any point in time by finding the row where the fact date falls between effective_date and expiry_date.

Type 3: Add New Attribute

Keep both old and new values as separate columns in the same row.

ALTER TABLE dim_customer ADD COLUMN previous_region VARCHAR(100);
UPDATE dim_customer SET previous_region = 'Southeast' WHERE customer_id = 12345;
UPDATE dim_customer SET region = 'Northeast' WHERE customer_id = 12345;

Type 3 lets you analyze both historical and current values, but only for the most recent change. Older history requires Type 2.

Most analytical use cases need Type 2 behavior. The storage cost of row versioning is usually acceptable given the analytical value.

Degenerate Dimensions

Sometimes the fact table contains attributes that are not foreign keys to dimension tables. An order number, invoice number, or transaction ID might live directly in the fact table.

These degenerate dimensions are typically low-cardinality values that are useful for filtering or grouping but do not join to a dimension.

CREATE TABLE fact_sales (
    sale_id BIGINT PRIMARY KEY,
    order_number VARCHAR(20),  -- Degenerate dimension
    product_id INT,
    date_id INT,
    quantity_sold INT,
    total_amount DECIMAL(12,2)
);

Order numbers are useful for tracking all items in a specific order without requiring a separate dimension table.

Fact Table Types

Different business processes produce different fact table types.

Transaction Fact Tables

One row per discrete transaction event. A sale, a click, a payment. The grain is the individual event.

Transaction fact tables are the most common type. They are straightforward to understand and load.

Periodic Snapshot Fact Tables

One row per dimension combination per time period. Monthly sales by product, daily inventory levels.

Snapshot tables make it easy to see the state at a point in time without scanning all historical transactions.

Accumulating Snapshot Fact Tables

One row per lifecycle instance (like an order) that gets updated as the lifecycle progresses.

The same row tracks an order from creation through fulfillment to delivery. Each milestone updates the same row.

CREATE TABLE fact_order_progress (
    order_id BIGINT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    order_date_id INT,
    ship_date_id INT,
    delivery_date_id INT,
    days_to_ship INT,
    days_to_deliver INT
);

Accumulating snapshots are powerful for pipeline analysis but require careful handling of late-arriving data.

Junk Dimensions

When you have many small, unrelated flags and indicators, you can consolidate them into a junk dimension.

Instead of separate columns for promo_ind, gift_wrap_ind, and loyalty_ind, you create one dimension with all combinations.

CREATE TABLE dim_flags (
    flag_key INT PRIMARY KEY,
    promo_ind CHAR(1),
    gift_wrap_ind CHAR(1),
    loyalty_ind CHAR(1)
);

The fact table stores a single flag_key. The junk dimension maps that key to the combination of flags. This keeps the fact table narrow while preserving the flag information.

Bridge Tables for Hierarchies

When a dimension has a parent-child hierarchy (org chart, product category tree), a bridge table handles the many-to-many relationships.

CREATE TABLE dim_product_category (
    category_id INT PRIMARY KEY,
    category_name VARCHAR(100),
    parent_category_id INT REFERENCES dim_product_category(category_id)
);

-- Bridge table for products in multiple categories
CREATE TABLE bridge_product_category (
    product_id INT,
    category_id INT,
    PRIMARY KEY (product_id, category_id)
);

This allows queries to find products by any level of the category hierarchy.

Kimball Dimensional Modeling Anti-Patterns

Premature dimension normalization (snowflaking without reason). Adding category and subcategory tables because “normalization is good” without considering query patterns. The additional joins slow down queries and complicate the model for analysts who now need to understand multi-level joins. Only snowflake when the storage savings genuinely matter and query patterns justify the complexity.

Conformed dimension drift. Two teams build their own date dimensions with different primary keys. The “drill across” report requires a messy UNION instead of a simple JOIN. This happens when dimension governance is an afterthought. Establish conformed dimensions before fact table proliferation.

Type 2 explosion on high-change attributes. Applying SCD Type 2 to an attribute like “order_status” that changes frequently creates dimension table bloat. A customer order that goes through 12 status transitions produces 12 dimension rows. Use Type 1 for high-frequency attributes and reserve Type 2 for slowly changing attributes like address or segment.

Bridge table overuse. Adding bridge tables for every many-to-many relationship, including ones that rarely get queried. Bridge tables complicate joins for all analysts even when only a few use cases actually need hierarchy traversal. Only add bridges when the hierarchy query is frequent and meaningful.

Kimball Dimensional Modeling Quick Recap

  • Fact tables store measurable events; dimension tables store the context (who, what, when, where, why).
  • Star schema: single join path per dimension gives predictable performance and analyst accessibility.
  • Conformed dimensions enable drill-across reporting across multiple fact tables.
  • SCD Type 1 overwrites history; Type 2 adds rows preserving full history; Type 3 stores both old and new as columns.
  • Only snowflake when storage is a genuine constraint—query simplicity usually wins.
  • Bridge tables handle hierarchies; junk dimensions consolidate low-cardinality flags.
  • Type 2 works for slowly changing attributes; applying it to high-frequency attributes causes dimension explosion.

For related reading on data modeling approaches, see the Data Vault and One Big Table patterns that offer alternative approaches to the same problem space.

Category

Related Posts

Data Vault: Scalable Enterprise Data Modeling

Learn Data Vault modeling methodology for building auditable, scalable enterprise data warehouses with hash keys and satellite tables.

#data-engineering #data-modeling #data-vault

One Big Table: The Denormalized Approach to Data Modeling

Learn how One Big Table architecture simplifies data pipelines by combining all attributes into single wide denormalized tables.

#data-engineering #data-modeling #one-big-table

Slowly Changing Dimensions: History in Data Warehouses

Master Slowly Changing Dimension techniques including Type 1, Type 2, and Type 3 for maintaining historical accuracy in data warehouses.

#data-engineering #data-modeling #slowly-changing-dimensions