Back to blog
Dimensional Modeling

Dimensional modeling tutorial for analytics teams

A practical dimensional modeling tutorial covering fact grain, the fact vs. dimension distinction, and how to keep review and documentation in sync.

Define grain before columns

Dimensional modeling gets easier when the team forces itself to define grain first. Grain answers the most important question in the model: what does one row represent?

Without a grain statement, teams blur events and entities together, overpack facts with descriptive attributes, and end up rebuilding marts when reporting questions expand.

Mental Model

Answer this question in one sentence before creating any columns: 'One row in this table represents one ___.' If you need more than one sentence, the model has multiple grains and will cause problems downstream.

Facts measure events, dimensions describe them

A useful dimensional model keeps a clear line between measurable activity and reusable descriptive context. Fact tables hold the measurable event. Dimensions provide consistent business context for slicing and grouping.

That sounds simple, but most modeling mistakes come from crossing those responsibilities: dimensions that carry metrics, facts that absorb too much descriptive text, or marts that hide multiple grains in one table.

Fact tables record events and hold numeric measures. Dimension tables describe entities and hold descriptive attributes used for filtering and grouping.
  • Put additive and semi-additive metrics in facts.
  • Keep dimension attributes descriptive and stable.
  • Model many-to-many edge cases intentionally, not implicitly.

The four dimensions almost every model needs

Regardless of domain, most analytical fact tables need the same four dimensional anchors: date, customer (or user), product (or item), and location. These four dimensions cover the who, what, when, and where of nearly every business event.

Designing them as conformed dimensions — shared across marts — means analysts can combine facts from different domains without writing joins that break on mismatched keys.

The four core dimensions in SchemaStruct DSL
table dim_date {
  date_id      int     [pk]
  full_date    date
  year         int
  quarter      int
  month        int
  month_name   varchar
  day_of_week  varchar
  is_weekend   boolean
  is_holiday   boolean
}

table dim_customer {
  customer_id  int     [pk]
  full_name    varchar
  email        varchar [unique]
  segment      varchar    // SMB, Mid-Market, Enterprise
  region       varchar
  signup_date  date
}

table dim_product {
  product_id   int     [pk]
  product_name varchar
  category     varchar
  subcategory  varchar
  unit_cost    decimal
  is_active    boolean
}

table dim_location {
  location_id  int     [pk]
  city         varchar
  state        varchar
  country      varchar
  region       varchar
}
  • Conformed date dimensions unlock time-series analysis across every fact.
  • Never embed date attributes directly in a fact — always join to dim_date.
  • Reuse the same surrogate keys across fact tables for clean cross-domain joins.

Review joins as a product decision

Join paths are not just implementation details. They define what downstream teams can safely ask from the warehouse. If joins are ambiguous, analytics work becomes fragile no matter how polished the SQL looks.

A schema review should therefore validate join structure explicitly: which dimensions connect to which facts, whether role-playing dimensions are clearly named, and whether path choices match the business story the mart is meant to tell.

Keep documentation generated from the same model

Dimensional models age poorly when docs are maintained separately. The quickest route to drift is to describe the warehouse in one place and change it in another.

The better pattern is to keep review, documentation, and handoff outputs tied to the same schema source. When a fact grain changes or a dimension expands, the change propagates through the workflow instead of creating parallel truths.

Pro Tip

Docs that live in a separate system from the schema are docs that lie. Generate your data dictionary from the same DSL or model definition that produces your DDL. When the schema changes, the docs change with it.