Dimensional Modeling

Dimensional modeling is a design approach for data warehouses that organizes data into facts (measurable events) and dimensions (descriptive attributes).

Fact tables store numerical data (sales amount, number of clicks, order count)
Dimension tables store context (customer info, product details, time periods) Dimensional modeling strives to simplify queries by denormalizing data into a structure that's easy for analytics and machine learning to consume.

Why?

Feature engineering simplification: ML models need structured, contextual features. Dimensions (like product categories, user demographics, or time ranges) help generate meaningful features quickly.
Performance at scale: Aggregations (e.g., average order value, churn rate) run faster on fact-dimension schemas optimized for analytical queries.
Business alignment: analysts, data engineers, and ML engineers can speak the same language when metrics are tied to business processes (orders, views, clicks)
Consistency across models: dimensions act as a single source of truth for attributes like "customer region" or "product category", ensuring ML models train. on consistent inputs

When?

When your ML models rely on historical business events
When you need to generate aggregated features
When you data warehouses is the main hub for feature engineering instead of operational databases

How?

Fact table example: fact_interactons
- Contains records of user-product interactions
- Each row = 1 interaction, with measures like interaction_type, amount, timestamp
Dimension table examples:
- dim_user: demographics, subscription tier.
- dim_product: category., price range
- dim_time: day, week, month hierarchies
Result would be an ML pipeline that can query: total clics per product category in the last 7 days per user which can be used for a direct feature for recommendation model

Dimensional_Data_Modeling

Dimensional Modeling

Why?

When?

How?

Linked to this note