Logo
Brain/Dimensional_Data_Modeling

Dimensional_Data_Modeling

#data-modeling#warehousing#star-schema

Dimensional Modeling

Dimensional modeling is a design approach for data warehouses that organizes data into facts (measurable events) and dimensions (descriptive attributes).

  • Fact tables store numerical data (sales amount, number of clicks, order count)
  • Dimension tables store context (customer info, product details, time periods) Dimensional modeling strives to simplify queries by denormalizing data into a structure that's easy for analytics and machine learning to consume.

Why?

  • Feature engineering simplification: ML models need structured, contextual features. Dimensions (like product categories, user demographics, or time ranges) help generate meaningful features quickly.
  • Performance at scale: Aggregations (e.g., average order value, churn rate) run faster on fact-dimension schemas optimized for analytical queries.
  • Business alignment: analysts, data engineers, and ML engineers can speak the same language when metrics are tied to business processes (orders, views, clicks)
  • Consistency across models: dimensions act as a single source of truth for attributes like "customer region" or "product category", ensuring ML models train. on consistent inputs

When?

  • When your ML models rely on historical business events
  • When you need to generate aggregated features
  • When you data warehouses is the main hub for feature engineering instead of operational databases

How?

  • Fact table example: fact_interactons
    • Contains records of user-product interactions
    • Each row = 1 interaction, with measures like interaction_type, amount, timestamp
  • Dimension table examples:
    • dim_user: demographics, subscription tier.
    • dim_product: category., price range
    • dim_time: day, week, month hierarchies
  • Result would be an ML pipeline that can query: total clics per product category in the last 7 days per user which can be used for a direct feature for recommendation model

Linked to this note