Schema Design: Building the Foundation of Your Database

Learn how to design effective database schemas with proper data types, constraints, and relationships that scale with your application.

published: March 26, 2026 reading time: 12 min read

Schema Design: Building the Foundation of Your Database

A well-designed schema makes everything easier. Queries are fast, your application grows without constant reworking, and you don’t spend weekends fixing data that got into a weird state because nobody enforced the rules at the database level. The payoff for thinking carefully upfront is real.

This guide covers the core concepts: tables, columns, data types, constraints, and how relationships between tables work. I’ll assume you know what a database is and have written basic SQL before.

Tables: The Core Building Blocks

Tables are where your data lives. Each one represents a type of thing—users, orders, products—and every row is one actual instance of that thing. Columns hold attributes.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    full_name VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

When you’re deciding what goes in a table, think about real entities and what describes them. Orders don’t belong in a users table. They go in their own table, linked by a relationship.

Naming Conventions

Consistent names matter more than clever ones:

Table names should be singular (user not users) — the table describes a type
Use snake_case (first_name not firstName)
Be specific (subscription_status not status)
Foreign keys should make the reference obvious (orders.customer_id)

Choosing Data Types

The type you pick affects storage, speed, and what errors you can catch. It’s worth thinking about this.

Common Data Types

Type	Use Case	Considerations
INTEGER / BIGINT	Whole numbers	BIGINT for values over 2 billion
DECIMAL / NUMERIC	Exact decimals (money)	Specify precision and scale
VARCHAR(n)	Variable text	Set a max length
TEXT	Long-form text	Use when length varies significantly
BOOLEAN	True/false	More storage-efficient than INT
DATE / TIMESTAMP	Dates and times	TIMESTAMP includes time; DATE is just the day
JSONB	Semi-structured data	PostgreSQL’s binary JSON, supports indexing
UUID	Unique IDs	Works across distributed systems

VARCHAR Without Length: Specify It

Always set a length for VARCHAR unless you have a reason not to. VARCHAR(255) works for most names. VARCHAR(1000) suits longer descriptions. Without a length, some databases allocate maximum space per row, which wastes storage.

-- Good: length is specified
email VARCHAR(255)

-- Avoid: no length constraint unless you specifically want TEXT
description TEXT  -- just use TEXT directly instead

Constraints: Rules That Enforce Themselves

Constraints are how you make the database enforce your rules. Without them, bad data gets in and causes problems later—sometimes silently, which is worse.

Primary Keys

Every table needs one. A primary key uniquely identifies each row. It’s how you reliably reference a specific record from other tables.

CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

NOT NULL

If a column must have a value, say so with NOT NULL. If you forget, NULLs sneak in, and your application has to deal with missing data everywhere.

-- Required
email VARCHAR(255) NOT NULL,

-- Optional
middle_name VARCHAR(100) NULL

UNIQUE

Unique constraints prevent duplicates. A table can have multiple UNIQUE constraints, unlike primary keys.

-- One email per user
email VARCHAR(255) NOT NULL UNIQUE,

-- One badge per employee
badge_number VARCHAR(20) UNIQUE

CHECK

CHECK constraints validate conditions before allowing an insert. This catches business logic errors at the database gate.

price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),

status VARCHAR(20) NOT NULL CHECK (status IN ('active', 'inactive', 'suspended')),

percentage DECIMAL(5, 2) NOT NULL CHECK (percentage >= 0 AND percentage <= 100)

DEFAULT Values

Defaults handle cases where no value is provided. They work well for timestamps, booleans, and status fields.

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

is_active BOOLEAN DEFAULT true,

status VARCHAR(20) DEFAULT 'pending'

Relationships Between Tables

Relationships let you connect data across tables. The three types you’ll encounter most are one-to-one, one-to-many, and many-to-many.

erDiagram
    CUSTOMERS ||--o{ ORDERS : places
    ORDERS ||--|{ ORDER_ITEMS : contains
    PRODUCTS ||--o{ ORDER_ITEMS : "ordered in"
    CUSTOMERS {
        int customer_id PK
        varchar email
        varchar name
    }
    ORDERS {
        int order_id PK
        int customer_id FK
        decimal total
        timestamp created_at
    }
    ORDER_ITEMS {
        int order_id FK
        int product_id FK
        int quantity
        decimal unit_price
    }
    PRODUCTS {
        int product_id PK
        varchar sku
        varchar name
        decimal price
    }

When to Use One-to-One

One-to-one suits optional data that only some rows need. User accounts with optional billing addresses, employees with optional parking assignments. If the optional data is large (long text, many columns) and most rows do not need it, keeping it separate avoids wasting space. If you always query the data together, merging into one table is simpler.

When Not to Use One-to-One

If you find yourself joining the two tables in almost every query, merge them. One-to-one adds query complexity without benefit when the data is always needed together. If the relationship is actually “zero or one” on one side, consider whether that side just needs a nullable foreign key instead.

When to Use One-to-Many

One-to-many fits any parent-child relationship where children belong to exactly one parent. Customers and their orders. Categories and products. Authors and blog posts. The pattern is correct when a child can only have one parent and you frequently query children by parent or parent with children.

When Not to Use One-to-Many

If you need to query children from multiple parents together (like tags on posts where posts can have many tags and tags apply to many posts), you have a many-to-many, not one-to-many. Forgetting this distinction leads to data modeling bugs.

When to Use Many-to-Many

Many-to-many fits when entities associate in arbitrary combinations. Posts and tags. Students and courses. Products and categories. The junction table is not a failure of design — it accurately models the reality that these relationships exist independently of the entities themselves.

When Not to Use Many-to-Many

If the “many” side is actually bounded and small (an order always has exactly 3 line items), consider whether a fixed-array column or separate columns serve you better. Many-to-many queries through junction tables add join overhead. When query performance matters and the combination space is small, denormalizing into arrays or fixed columns eliminates joins.

One-to-One

One row corresponds to exactly one row in another table. A foreign key with a UNIQUE constraint makes this work. Sometimes it makes more sense to just merge the tables—it depends on whether you always use the data together.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL
);

CREATE TABLE user_profiles (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL UNIQUE REFERENCES users(id),
    bio TEXT,
    avatar_url VARCHAR(500),
    preferred_language VARCHAR(10) DEFAULT 'en'
);

The UNIQUE on user_id prevents a user from having more than one profile.

One-to-Many

The most common relationship. One user can have many orders, but each order has exactly one user.

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    name VARCHAR(200) NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(id),
    order_number VARCHAR(50) NOT NULL UNIQUE,
    total DECIMAL(12, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Many-to-Many

Requires a junction table. An order contains many products, and a product can appear in many orders.

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    sku VARCHAR(50) NOT NULL UNIQUE,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

CREATE TABLE order_items (
    order_id INTEGER NOT NULL REFERENCES orders(id),
    product_id INTEGER NOT NULL REFERENCES products(id),
    quantity INTEGER NOT NULL DEFAULT 1,
    unit_price DECIMAL(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

The composite primary key (order_id, product_id) means each product shows up once per order—prevents duplicates too.

Practical Scenarios

Addresses

Embed all fields in one table if you’ll always query them together. Separate them out if different parts of your app need different addresses (shipping vs billing, for example).

-- Simple but inflexible
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    shipping_street VARCHAR(255),
    shipping_city VARCHAR(100),
    shipping_state VARCHAR(100),
    shipping_postal_code VARCHAR(20),
    shipping_country VARCHAR(100)
);

-- Reusable and flexible
CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    address_type VARCHAR(20) CHECK (address_type IN ('shipping', 'billing')),
    street VARCHAR(255) NOT NULL,
    city VARCHAR(100) NOT NULL,
    state VARCHAR(100),
    postal_code VARCHAR(20) NOT NULL,
    country VARCHAR(100) NOT NULL
);

Categories and Tags

Single category? A simple reference works. Product in multiple categories? You need a junction table.

CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL UNIQUE,
    parent_id INTEGER REFERENCES categories(id)
);

CREATE TABLE product_categories (
    product_id INTEGER REFERENCES products(id),
    category_id INTEGER REFERENCES categories(id),
    PRIMARY KEY (product_id, category_id)
);

Audit Fields

Most tables should track who created and modified records.

CREATE TABLE sensitive_data (
    id SERIAL PRIMARY KEY,
    data_value VARCHAR(500),

    -- Audit trail
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by INTEGER REFERENCES users(id),
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_by INTEGER REFERENCES users(id),

    -- Soft delete
    deleted_at TIMESTAMP,
    deleted_by INTEGER REFERENCES users(id)
);

Normalization Trade-offs at a Glance

Aspect	Normalized (3NF+)	Denormalized
Write complexity	Higher (more joins on insert/update)	Lower (single table writes)
Read performance	Slower (multi-table joins)	Faster (fewer joins)
Storage efficiency	Better (no redundant data)	Wasted space from duplication
Update anomaly risk	Lower (data lives in one place)	Higher (same data in multiple rows)
Query flexibility	Higher (flexible filtering)	Lower (schema baked into table)
Scalability	Better horizontal scaling	harder with very large tables

Common Production Failures

Referential integrity breaks: Teams sometimes disable foreign key constraints to speed up bulk loads, then forget to re-enable them. Or they re-enable but skip the validation step. Orphaned rows pile up quietly. After any bulk load operation, run a query that checks for orphaned foreign keys before calling it done.

Constraint violation cascades: CASCADE DELETE on a table with millions of child rows will lock the database long enough to cause an incident. I have seen this happen. Test cascade behavior with production-scale data before it reaches production. Batched deletes or soft deletes are safer for large child tables.

Index bloat on foreign keys: Foreign key columns that get frequent updates but lack indexes accumulate dead tuples fast. This is especially painful on high-throughput tables. Keep an eye on index size and bloat on FK columns.

Wrong data type choices: VARCHAR(255) for a country code is lazy. Every row wastes space. VARCHAR(2) for country_code stores the same two characters correctly in a tiny fraction of the space. Guess the actual maximum length, do not just copy-paste 255.

Quick Recap Checklist

Table names are singular (user not users)
VARCHAR columns have explicit length limits
NOT NULL where the column must have a value
Every table has a primary key
Foreign key columns are indexed
Relationships use correct cardinality (one-to-one, one-to-many, many-to-many)
Optional one-to-one data uses UNIQUE foreign key, not just foreign key
Many-to-many uses a junction table, not comma-separated IDs
Audit fields (created_at, updated_at, deleted_at) on frequently modified tables
Test cascade rules with production-scale data before deploying

Mistakes I See Often

Overusing NULLs: NULL means “unknown” or “not applicable.” If most rows have a value, maybe a default makes more sense.

Wrong-size VARCHAR: VARCHAR(255) for a two-letter country code wastes space. Country codes should be VARCHAR(2) or use a countries table.

Premature denormalization: Start normalized. Denormalize only when you have real performance data showing it’s necessary. You can always denormalize later—you can’t easily go back.

Forgetting indexes on foreign keys: If customer_id isn’t indexed, joining orders to customers gets slow fast as the tables grow.

Security Checklist

Apply NOT NULL constraints on columns that must contain data to prevent silent null propagation in joins and aggregations
Use CHECK constraints to enforce domain-specific data integrity rules at the database level (e.g., CHECK (age >= 0), CHECK (status IN ('active', 'inactive')))
Encrypt columns containing sensitive PII using database-level encryption (e.g., PostgreSQL pgcrypto) rather than relying solely on application-layer encryption
Restrict access to ALTER TABLE and DROP TABLE operations using role-based permissions to prevent schema manipulation attacks
Use column-level grants to limit which roles can SELECT sensitive columns while allowing broader access to the rest of the table

Common Pitfalls and Anti-Patterns

Using VARCHAR without a length specification: VARCHAR without a length limit behaves like TEXT and may bypass application-level length validation. Fix: always specify VARCHAR(n) with an explicit maximum based on the actual data domain.

Naming tables and columns inconsistently: Mixing snake_case, camelCase, and PascalCase across tables makes queries fragile and ORM mapping error-prone. Fix: adopt snake_case for PostgreSQL and enforce it through a schema linting tool.

Ignoring the created_at / updated_at audit fields: Tables without these fields make it impossible to audit when data was created or changed. Fix: add created_at TIMESTAMP DEFAULT NOW() and updated_at TIMESTAMP to every table that tracks persistent data.

Using TEXT for everything to avoid type decisions: TEXT columns waste storage and prevent the database from enforcing meaningful length constraints. Fix: use VARCHAR(n) for short strings, TEXT only for truly unbounded content, and appropriate numeric types (SMALLINT, INTEGER, BIGINT) for numbers.

Dropping tables without verifying foreign key dependencies: A DROP TABLE on a parent table without CASCADE silently fails if child tables have FK references, leaving the schema in an inconsistent state.

Next Steps

Schema design matters most at the start, but it evolves. As you learn more about your domain and your query patterns, you’ll revisit decisions. Understanding these fundamentals makes that refactoring less painful.

The next thing to learn is normalization—organizing data to eliminate redundancy and prevent update anomalies. These two topics work together.

For related reading, see relational databases for ACID properties and transactions, or jump to primary and foreign keys if you want to go deeper on relationships specifically.

Schema Design: Building the Foundation of Your Database

Schema Design: Building the Foundation of Your Database

Tables: The Core Building Blocks

Naming Conventions

Choosing Data Types

Common Data Types

VARCHAR Without Length: Specify It

Constraints: Rules That Enforce Themselves

Primary Keys

NOT NULL

UNIQUE

CHECK

DEFAULT Values

Relationships Between Tables

When to Use One-to-One

When Not to Use One-to-One

When to Use One-to-Many

When Not to Use One-to-Many

When to Use Many-to-Many

When Not to Use Many-to-Many

One-to-One

One-to-Many

Many-to-Many

Practical Scenarios

Addresses

Categories and Tags

Audit Fields

Normalization Trade-offs at a Glance

Common Production Failures

Quick Recap Checklist

Mistakes I See Often

Security Checklist

Common Pitfalls and Anti-Patterns

Next Steps

Category

Tags

Related Posts

Database Normalization: From 1NF to BCNF

Constraint Enforcement: Database vs Application Level

Understanding SQL JOINs and Database Relationships