How We Model Data

Principles for designing database schemas, TypeScript types, and data structures

How We Model Data

Good data modeling is about representing your business domain in a way that's correct, queryable, and evolvable. This guide covers how to think about data design.

The Core Question

When designing data structures, ask:

"What questions will I need to answer with this data?"

Your schema should make the common queries easy and the rare queries possible.

Principles

1. Model the Domain, Not the UI

Your database should reflect business concepts, not screen layouts.

The mistake: Creating a DashboardData table because that's what the dashboard needs. Six months later, you have a second dashboard and the table name makes no sense.

The principle: Model real things — Users, Orders, Products, Sessions. The UI is a view into the data, not the source of truth for its structure.

How to check: Can you explain each table to a non-technical stakeholder using business terms? If you need to reference UI elements to explain it, reconsider.

2. Capture History When It Matters

The hardest question in data modeling:

"Should I update this record, or create a new one?"

Update in place when:

  • Only the current state matters
  • History has no business value
  • Storage and complexity matter more than audit trails

Create new records (snapshots) when:

  • You need to answer "what was the value on date X?"
  • Compliance or auditing requires history
  • Users need to compare changes over time
  • Undo/rollback is a feature

The session pattern: Group related data into timestamped sessions. Each check/run/import creates a new session. This gives you automatic history without complex versioning.

3. Normalize First, Denormalize Later

Start normalized: Each fact lives in one place. No data duplication.

Why: Normalized data is easier to update correctly. You change one row, not twenty.

When to denormalize: Only when you have measured performance problems. Denormalization trades update complexity for read speed.

Common denormalizations:

  • Storing a count instead of counting every time
  • Copying a user's name into an order (so it survives user deletion)
  • Pre-computing aggregates for dashboards

The trap: Denormalizing too early. You end up with data that can get out of sync and no performance benefit because you didn't have a performance problem.

4. Design for Your Queries

The schema should make common operations simple.

Before creating a table, list:

  1. The 5 most common queries you'll run
  2. The filters you'll need (WHERE clauses)
  3. The sorts you'll need (ORDER BY)
  4. The joins you'll need

Then design a schema that makes those queries straightforward.

Index strategy: Every column in a WHERE, JOIN, or ORDER BY that runs frequently should probably be indexed. But don't index everything — indexes slow down writes.

Decision Framework

"Should this be a new table or a column?"

New table when:

  • The data has its own identity (its own ID)
  • There can be many of them per parent (one-to-many)
  • It might be referenced by other tables
  • It has its own lifecycle (created/deleted independently)

New column when:

  • It's an attribute of the existing thing
  • There's exactly one per row
  • It always exists or is modified with the parent

"Should I store this or compute it?"

Store it when:

  • Computing is expensive and frequent
  • The inputs change rarely
  • Historical accuracy matters (store what the value was, not what it would be now)

Compute it when:

  • The value changes often
  • Storage consistency is hard to maintain
  • The computation is cheap

Hybrid: Compute it, but cache the result. Invalidate on relevant changes.

"How should I handle deletions?"

Hard delete (actually remove the row) when:

  • Data has no long-term value
  • Privacy regulations require it
  • Simplicity matters more than recovery

Soft delete (set a deleted_at timestamp) when:

  • Users might want to recover data
  • Audit trails are required
  • References from other tables would break

Cascade delete (delete children automatically) when:

  • Children have no meaning without the parent
  • You want atomic cleanup

Restrict delete (prevent deletion if children exist) when:

  • Orphaned children would be a data integrity problem
  • Users should explicitly clean up first

"What type should this field be?"

Text vs. Enum:

  • Use an enum (or check constraint) when there's a fixed set of valid values
  • Use text when values are user-defined or frequently changing

Integer vs. UUID for IDs:

  • Integers: Smaller, faster, sequential (good for ranges)
  • UUIDs: No central coordination, safe to generate anywhere, no information leakage

Timestamp storage:

  • Always store in UTC
  • Store as proper timestamp/datetime type, not strings
  • Include timezone if it has business meaning

JSON columns:

  • Use for truly dynamic/schemaless data
  • Avoid for data you need to query frequently
  • Good for storing raw API responses or flexible metadata

Common Mistakes

The "One Big Table" Problem

Putting everything in one table with nullable columns for different "types."

Signs: Half your columns are NULL for any given row. You have a type column that changes which other columns are valid.

Fix: Split into separate tables. Use inheritance/polymorphism if your database supports it, or just separate tables with a shared ID.

The Missing Index

Queries are slow, but you haven't looked at query patterns.

Signs: Simple queries take seconds. The database scans entire tables.

Fix: Add indexes for columns in WHERE, JOIN, and ORDER BY. Use EXPLAIN to verify queries use indexes.

The Premature Optimization

Adding caching, denormalization, and materialized views before you have a performance problem.

Signs: Complex sync logic, but the app has 100 users. Data inconsistencies from failed cache invalidation.

Fix: Start simple. Measure. Optimize only what's actually slow.

The Stringly-Typed Data

Storing structured data as strings.

Signs: Columns like metadata: "key1=value1;key2=value2". Parsing logic scattered everywhere.

Fix: Use proper JSON columns or separate fields. Let the database do type validation.

The Missing Constraint

Relying on application code to enforce data rules.

Signs: Invalid data in the database. Bugs let bad data through.

Fix: Add constraints: NOT NULL, UNIQUE, FOREIGN KEY, CHECK. The database is your last line of defense.

How to Evaluate Your Schema

Your schema is working if:

  • Common queries are simple (1-2 joins max)
  • You can explain each table in business terms
  • Constraints prevent invalid data
  • Updates change one row, not many
  • You can answer historical questions when needed

Your schema needs work if:

  • Queries require 5+ joins
  • The same fact is stored in multiple places
  • "Type" columns determine which other columns are valid
  • Application code works around schema limitations
  • You've lost data you later needed

Evolving Schema Over Time

Additive changes are safe: Adding a new nullable column or a new table rarely breaks things.

Destructive changes are dangerous: Removing or renaming columns breaks existing code. Do these in phases:

  1. Stop writing to the old column
  2. Deploy code that doesn't read the old column
  3. Remove the column

Migrations should be reversible: Write both "up" and "down" migrations. Test the down migration.

Backfills are tricky: Filling new columns with data from old columns can be slow and might need to run in batches.