Schema Design: Building the Foundation of Your Database
Learn how to design effective database schemas with proper data types, constraints, and relationships that scale with your application.
Schema Design: Building the Foundation of Your Database
A well-designed schema makes everything easier. Queries are fast, your application grows without constant reworking, and you don’t spend weekends fixing data that got into a weird state because nobody enforced the rules at the database level. The payoff for thinking carefully upfront is real.
This guide covers the core concepts: tables, columns, data types, constraints, and how relationships between tables work. I’ll assume you know what a database is and have written basic SQL before.
Tables: The Core Building Blocks
Tables are where your data lives. Each one represents a type of thing—users, orders, products—and every row is one actual instance of that thing. Columns hold attributes.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
full_name VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
When you’re deciding what goes in a table, think about real entities and what describes them. Orders don’t belong in a users table. They go in their own table, linked by a relationship.
Naming Conventions
Consistent names matter more than clever ones:
- Table names should be singular (
usernotusers) — the table describes a type - Use snake_case (
first_namenotfirstName) - Be specific (
subscription_statusnotstatus) - Foreign keys should make the reference obvious (
orders.customer_id)
Choosing Data Types
The type you pick affects storage, speed, and what errors you can catch. It’s worth thinking about this.
Common Data Types
| Type | Use Case | Considerations |
|---|---|---|
| INTEGER / BIGINT | Whole numbers | BIGINT for values over 2 billion |
| DECIMAL / NUMERIC | Exact decimals (money) | Specify precision and scale |
| VARCHAR(n) | Variable text | Set a max length |
| TEXT | Long-form text | Use when length varies significantly |
| BOOLEAN | True/false | More storage-efficient than INT |
| DATE / TIMESTAMP | Dates and times | TIMESTAMP includes time; DATE is just the day |
| JSONB | Semi-structured data | PostgreSQL’s binary JSON, supports indexing |
| UUID | Unique IDs | Works across distributed systems |
VARCHAR Without Length: Specify It
Always set a length for VARCHAR unless you have a reason not to. VARCHAR(255) works for most names. VARCHAR(1000) suits longer descriptions. Without a length, some databases allocate maximum space per row, which wastes storage.
-- Good: length is specified
email VARCHAR(255)
-- Avoid: no length constraint unless you specifically want TEXT
description TEXT -- just use TEXT directly instead
Constraints: Rules That Enforce Themselves
Constraints are how you make the database enforce your rules. Without them, bad data gets in and causes problems later—sometimes silently, which is worse.
Primary Keys
Every table needs one. A primary key uniquely identifies each row. It’s how you reliably reference a specific record from other tables.
CREATE TABLE products (
id BIGINT PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(200) NOT NULL,
price DECIMAL(10, 2) NOT NULL
);
NOT NULL
If a column must have a value, say so with NOT NULL. If you forget, NULLs sneak in, and your application has to deal with missing data everywhere.
-- Required
email VARCHAR(255) NOT NULL,
-- Optional
middle_name VARCHAR(100) NULL
UNIQUE
Unique constraints prevent duplicates. A table can have multiple UNIQUE constraints, unlike primary keys.
-- One email per user
email VARCHAR(255) NOT NULL UNIQUE,
-- One badge per employee
badge_number VARCHAR(20) UNIQUE
CHECK
CHECK constraints validate conditions before allowing an insert. This catches business logic errors at the database gate.
price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
status VARCHAR(20) NOT NULL CHECK (status IN ('active', 'inactive', 'suspended')),
percentage DECIMAL(5, 2) NOT NULL CHECK (percentage >= 0 AND percentage <= 100)
DEFAULT Values
Defaults handle cases where no value is provided. They work well for timestamps, booleans, and status fields.
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
is_active BOOLEAN DEFAULT true,
status VARCHAR(20) DEFAULT 'pending'
Relationships Between Tables
Relationships let you connect data across tables. The three types you’ll encounter most are one-to-one, one-to-many, and many-to-many.
erDiagram
CUSTOMERS ||--o{ ORDERS : places
ORDERS ||--|{ ORDER_ITEMS : contains
PRODUCTS ||--o{ ORDER_ITEMS : "ordered in"
CUSTOMERS {
int customer_id PK
varchar email
varchar name
}
ORDERS {
int order_id PK
int customer_id FK
decimal total
timestamp created_at
}
ORDER_ITEMS {
int order_id FK
int product_id FK
int quantity
decimal unit_price
}
PRODUCTS {
int product_id PK
varchar sku
varchar name
decimal price
}
When to Use One-to-One
One-to-one suits optional data that only some rows need. User accounts with optional billing addresses, employees with optional parking assignments. If the optional data is large (long text, many columns) and most rows do not need it, keeping it separate avoids wasting space. If you always query the data together, merging into one table is simpler.
When Not to Use One-to-One
If you find yourself joining the two tables in almost every query, merge them. One-to-one adds query complexity without benefit when the data is always needed together. If the relationship is actually “zero or one” on one side, consider whether that side just needs a nullable foreign key instead.
When to Use One-to-Many
One-to-many fits any parent-child relationship where children belong to exactly one parent. Customers and their orders. Categories and products. Authors and blog posts. The pattern is correct when a child can only have one parent and you frequently query children by parent or parent with children.
When Not to Use One-to-Many
If you need to query children from multiple parents together (like tags on posts where posts can have many tags and tags apply to many posts), you have a many-to-many, not one-to-many. Forgetting this distinction leads to data modeling bugs.
When to Use Many-to-Many
Many-to-many fits when entities associate in arbitrary combinations. Posts and tags. Students and courses. Products and categories. The junction table is not a failure of design — it accurately models the reality that these relationships exist independently of the entities themselves.
When Not to Use Many-to-Many
If the “many” side is actually bounded and small (an order always has exactly 3 line items), consider whether a fixed-array column or separate columns serve you better. Many-to-many queries through junction tables add join overhead. When query performance matters and the combination space is small, denormalizing into arrays or fixed columns eliminates joins.
One-to-One
One row corresponds to exactly one row in another table. A foreign key with a UNIQUE constraint makes this work. Sometimes it makes more sense to just merge the tables—it depends on whether you always use the data together.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
password_hash VARCHAR(255) NOT NULL
);
CREATE TABLE user_profiles (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL UNIQUE REFERENCES users(id),
bio TEXT,
avatar_url VARCHAR(500),
preferred_language VARCHAR(10) DEFAULT 'en'
);
The UNIQUE on user_id prevents a user from having more than one profile.
One-to-Many
The most common relationship. One user can have many orders, but each order has exactly one user.
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
name VARCHAR(200) NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
order_number VARCHAR(50) NOT NULL UNIQUE,
total DECIMAL(12, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Many-to-Many
Requires a junction table. An order contains many products, and a product can appear in many orders.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) NOT NULL UNIQUE,
name VARCHAR(200) NOT NULL,
price DECIMAL(10, 2) NOT NULL
);
CREATE TABLE order_items (
order_id INTEGER NOT NULL REFERENCES orders(id),
product_id INTEGER NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL DEFAULT 1,
unit_price DECIMAL(10, 2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);
The composite primary key (order_id, product_id) means each product shows up once per order—prevents duplicates too.
Practical Scenarios
Addresses
Embed all fields in one table if you’ll always query them together. Separate them out if different parts of your app need different addresses (shipping vs billing, for example).
-- Simple but inflexible
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
shipping_street VARCHAR(255),
shipping_city VARCHAR(100),
shipping_state VARCHAR(100),
shipping_postal_code VARCHAR(20),
shipping_country VARCHAR(100)
);
-- Reusable and flexible
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
address_type VARCHAR(20) CHECK (address_type IN ('shipping', 'billing')),
street VARCHAR(255) NOT NULL,
city VARCHAR(100) NOT NULL,
state VARCHAR(100),
postal_code VARCHAR(20) NOT NULL,
country VARCHAR(100) NOT NULL
);
Categories and Tags
Single category? A simple reference works. Product in multiple categories? You need a junction table.
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
parent_id INTEGER REFERENCES categories(id)
);
CREATE TABLE product_categories (
product_id INTEGER REFERENCES products(id),
category_id INTEGER REFERENCES categories(id),
PRIMARY KEY (product_id, category_id)
);
Audit Fields
Most tables should track who created and modified records.
CREATE TABLE sensitive_data (
id SERIAL PRIMARY KEY,
data_value VARCHAR(500),
-- Audit trail
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by INTEGER REFERENCES users(id),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_by INTEGER REFERENCES users(id),
-- Soft delete
deleted_at TIMESTAMP,
deleted_by INTEGER REFERENCES users(id)
);
Normalization Trade-offs at a Glance
| Aspect | Normalized (3NF+) | Denormalized |
|---|---|---|
| Write complexity | Higher (more joins on insert/update) | Lower (single table writes) |
| Read performance | Slower (multi-table joins) | Faster (fewer joins) |
| Storage efficiency | Better (no redundant data) | Wasted space from duplication |
| Update anomaly risk | Lower (data lives in one place) | Higher (same data in multiple rows) |
| Query flexibility | Higher (flexible filtering) | Lower (schema baked into table) |
| Scalability | Better horizontal scaling | harder with very large tables |
Common Production Failures
Referential integrity breaks: Teams sometimes disable foreign key constraints to speed up bulk loads, then forget to re-enable them. Or they re-enable but skip the validation step. Orphaned rows pile up quietly. After any bulk load operation, run a query that checks for orphaned foreign keys before calling it done.
Constraint violation cascades: CASCADE DELETE on a table with millions of child rows will lock the database long enough to cause an incident. I have seen this happen. Test cascade behavior with production-scale data before it reaches production. Batched deletes or soft deletes are safer for large child tables.
Index bloat on foreign keys: Foreign key columns that get frequent updates but lack indexes accumulate dead tuples fast. This is especially painful on high-throughput tables. Keep an eye on index size and bloat on FK columns.
Wrong data type choices: VARCHAR(255) for a country code is lazy. Every row wastes space. VARCHAR(2) for country_code stores the same two characters correctly in a tiny fraction of the space. Guess the actual maximum length, do not just copy-paste 255.
Quick Recap Checklist
- Table names are singular (
usernotusers) - VARCHAR columns have explicit length limits
- NOT NULL where the column must have a value
- Every table has a primary key
- Foreign key columns are indexed
- Relationships use correct cardinality (one-to-one, one-to-many, many-to-many)
- Optional one-to-one data uses UNIQUE foreign key, not just foreign key
- Many-to-many uses a junction table, not comma-separated IDs
- Audit fields (
created_at,updated_at,deleted_at) on frequently modified tables - Test cascade rules with production-scale data before deploying
Mistakes I See Often
Overusing NULLs: NULL means “unknown” or “not applicable.” If most rows have a value, maybe a default makes more sense.
Wrong-size VARCHAR: VARCHAR(255) for a two-letter country code wastes space. Country codes should be VARCHAR(2) or use a countries table.
Premature denormalization: Start normalized. Denormalize only when you have real performance data showing it’s necessary. You can always denormalize later—you can’t easily go back.
Forgetting indexes on foreign keys: If customer_id isn’t indexed, joining orders to customers gets slow fast as the tables grow.
Security Checklist
- Apply
NOT NULLconstraints on columns that must contain data to prevent silent null propagation in joins and aggregations - Use
CHECKconstraints to enforce domain-specific data integrity rules at the database level (e.g.,CHECK (age >= 0),CHECK (status IN ('active', 'inactive'))) - Encrypt columns containing sensitive PII using database-level encryption (e.g., PostgreSQL
pgcrypto) rather than relying solely on application-layer encryption - Restrict access to
ALTER TABLEandDROP TABLEoperations using role-based permissions to prevent schema manipulation attacks - Use column-level grants to limit which roles can
SELECTsensitive columns while allowing broader access to the rest of the table
Common Pitfalls and Anti-Patterns
Using VARCHAR without a length specification: VARCHAR without a length limit behaves like TEXT and may bypass application-level length validation. Fix: always specify VARCHAR(n) with an explicit maximum based on the actual data domain.
Naming tables and columns inconsistently: Mixing snake_case, camelCase, and PascalCase across tables makes queries fragile and ORM mapping error-prone. Fix: adopt snake_case for PostgreSQL and enforce it through a schema linting tool.
Ignoring the created_at / updated_at audit fields: Tables without these fields make it impossible to audit when data was created or changed. Fix: add created_at TIMESTAMP DEFAULT NOW() and updated_at TIMESTAMP to every table that tracks persistent data.
Using TEXT for everything to avoid type decisions: TEXT columns waste storage and prevent the database from enforcing meaningful length constraints. Fix: use VARCHAR(n) for short strings, TEXT only for truly unbounded content, and appropriate numeric types (SMALLINT, INTEGER, BIGINT) for numbers.
Dropping tables without verifying foreign key dependencies: A DROP TABLE on a parent table without CASCADE silently fails if child tables have FK references, leaving the schema in an inconsistent state.
Next Steps
Schema design matters most at the start, but it evolves. As you learn more about your domain and your query patterns, you’ll revisit decisions. Understanding these fundamentals makes that refactoring less painful.
The next thing to learn is normalization—organizing data to eliminate redundancy and prevent update anomalies. These two topics work together.
For related reading, see relational databases for ACID properties and transactions, or jump to primary and foreign keys if you want to go deeper on relationships specifically.
Category
Related Posts
Database Normalization: From 1NF to BCNF
Learn database normalization from 1NF through BCNF. Understand how normalization eliminates redundancy, prevents update anomalies, and when denormalization makes sense for performance.
Constraint Enforcement: Database vs Application Level
A guide to CHECK, UNIQUE, NOT NULL, and exclusion constraints. Learn database vs application-level enforcement and performance implications.
Understanding SQL JOINs and Database Relationships
Master SQL JOINs with this practical guide covering INNER, LEFT, RIGHT, FULL OUTER, and CROSS joins. Learn how relationship types between tables shape your queries.