Dimensional modeling remains the most effective way to make analytics fast, understandable, and resilient. The star schema sits at the center of that approach: a simple, denormalized structure where fact tables record measurable events and dimension tables provide descriptive context. In this post, we’ll ground the core ideas, clarify the often‑confused concept of snowflaking (and when it’s worth it), and show how to scale from a single star to a galaxy schema (a.k.a. fact constellation) without losing your footing.
Bottom Line Up Front
- Star schema = 1 fact table + several denormalized dimensions, joined by surrogate keys.
- Snowflaking = normalizing a dimension into multiple tables. Do it sparingly—only when it significantly reduces maintenance, enforces shared hierarchies across stars, or meets specific governance/performance constraints.
- Galaxy schema = multiple fact tables sharing conformed dimensions. Use a bus matrix to plan it, lock in grain and conformance rules, and use “drill‑across” patterns to combine metrics correctly.
Star Schema 101
Facts are your measures and foreign keys: e.g., sales_amount, units_sold, plus keys to date, product, customer, store, etc. Facts come in three common types:
- Transactional (each row = an event; e.g., an order line)
- Periodic snapshot (each row = periodic state; e.g., daily inventory level)
- Accumulating snapshot (each row = lifecycle with dated milestones; e.g., order → pick → ship → deliver)
Dimensions are your descriptive lookup tables used for slicing and dicing (e.g., Product, Customer, Date, Store). Good dimensions are:
- Denormalized (one table per real‑world thing)
- Wide enough to be useful (rich attributes that analysts can filter and group by)
- Managed with surrogate keys (stable, warehouse‑assigned identifiers that decouple analytics from source system churn)
Non-negotiables
- Declare the grain first for every fact table. “One row per order line” is unambiguous. So is “one row per store, product, calendar day.” Grain drives the rest.
- Use surrogate keys in facts that point to dimensions. Natural keys can be attributes in the dimension but should not be foreign keys in facts.
- Handle history deliberately with Slowly Changing Dimensions (SCDs):
- Type 1: overwrite in place (corrects data, no history)
- Type 2: add a new row per change (records history)
- Type 3: keep a limited set of prior values (rare)
- Keep measures atomic (lowest level at which they are generated). Aggregate later.
Snowflaking: What It Is and When to Do It
Definition: Snowflaking is normalizing a dimension so that some of its attributes live in separate, related tables. Example: splitting
DimProductintoDimProduct+DimBrand+DimCategory.
This is not about the Snowflake cloud platform; it’s a modeling choice that trades simplicity for normalization.
Why people snowflake (the good reasons)
- Authoritative shared hierarchies.
If multiple dimensions need to conform to the same officialCategory → Subcategory → Segmenthierarchy (governed centrally), housing that hierarchy as its own small dimension ensures one version of the truth that multiple stars can reference. - Very large, very volatile subdomains.
Suppose you have a 300M‑rowDimCustomer. A small, frequently changingDimHouseholdshared by several customer‑like dimensions might deserve its own table to reduce Type 2 explosion and update friction. - Security and ownership boundaries.
Teams with separate stewardship responsibilities may need to maintain certain attributes in an independently controlled table, audited and governed separately. - Extreme deduplication needs in a specific warehouse where storage costs, write amplification, or CDC mechanics make denormalization painful.
Why snowflaking is usually avoided
- Performance and complexity. More joins mean more room for mistakes and slower queries, especially in BI tools optimized for stars.
- Usability. Analysts prefer “one place to find product info.” Multiple joins obscure the mental model.
- Testing surface area. Each extra join means more integration points to validate.
Practical guidance
- Default to a denormalized dimension.
- Snowflake only when:
1. The shared hierarchy or subdomain is reused by multiple dimensions or stars and is governed separately.
2. The dimension’s physical characteristics (size, change velocity) make denormalization impractical. - Prefer “outriggers” sparingly. An outrigger is a small snowflaked table referenced by a single dimension (e.g.,
DimCustomer → DimGeography). Use only for tiny and highly stable lookups, or where the attribute set is reused across many dimensions. - For many‑to‑many attributes, don’t snowflake; use a bridge (e.g., a
CustomerGroupBridgewith weights for allocation).
Making Galaxy Schemas Work (a.k.a. Fact Constellations)
A galaxy schema is a set of related star schemas that share conformed dimensions—the same DimDate, DimProduct, DimCustomer, etc.—so you can analyze across business processes.
Step 1: Build a Bus Matrix
A bus matrix maps business processes (facts) to conformed dimensions. Start with this grid and keep it under version control.
| Process (Fact) | Grain | Date | Product | Customer | Store | Supplier | Promo | Employee |
|---|---|---|---|---|---|---|---|---|
| Sales (Txn) | Order line | ✅ | ✅ | ✅ | ✅ | ✅ | ||
| Inventory (Snapshot) | Store, product, calendar day | ✅ | ✅ | ✅ | ✅ | |||
| Shipments (Txn) | Shipment line | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Returns (Txn) | Return line | ✅ | ✅ | ✅ | ✅ |
- The columns are conformed dimensions (shared keys, shared attributes).
- The rows are facts with a declared grain and the dimensions they use.
Step 2: Conform Your Dimensions
“Conformed” means:
- Same business meaning (e.g., “Product” means the same thing across Sales and Inventory).
- Same surrogate key domain (a
product_key=123refers to the same product across all stars). - Consistent SCD policy for the shared attributes:
- If
DimProductis Type 2 for category changes, then all facts joining toDimProductmust either:- Record transactions with the effective dimension key at the time of the event, or
- Join via a transaction date to a view that resolves the correct SCD row.
- If
If two teams genuinely need different definitions (e.g., marketing product taxonomy vs. finance product taxonomy), create two named role dimensions (DimProduct_Marketing, DimProduct_Finance) and do not pretend they’re conformed.
Step 3: Handle Role-Playing Dimensions Cleanly
The same dimension can appear multiple times in a single fact with different roles (e.g., Order Date, Ship Date, Delivery Date). Implement as separate foreign keys to the same DimDate. In BI tools/semantic models, expose distinct role names.
Step 4: Plan for “Drill‑Across”
When analysts combine measures from different facts (e.g., Sales vs. Inventory), the safe pattern is drill‑across:
- Filter using attributes from conformed dimensions (e.g.,
Product Category = 'Bikes',Calendar Month = '2025‑07'). - Aggregate each fact separately at the same dimensional level (e.g., by
product_key,month_key). - Join the aggregates on the conformed keys (not on raw facts).
This avoids mismatched grains (e.g., Sales by order line vs. Inventory by daily store/product) and produces correct comparisons.
SQL sketch:
WITH sales AS (
SELECT product_key, month_key, SUM(sales_amount) AS sales_amt
FROM FactSales
JOIN DimDate USING (date_key)
WHERE DimDate.calendar_month = '2025-07'
GROUP BY product_key, month_key
),
inv AS (
SELECT product_key, month_key, AVG(on_hand_qty) AS avg_on_hand
FROM FactInventorySnapshot
WHERE month_key = 202507
GROUP BY product_key, month_key
)
SELECT p.product_name, s.sales_amt, i.avg_on_hand
FROM sales s
JOIN inv i USING (product_key, month_key)
JOIN DimProduct p USING (product_key);
Step 5: Bridge the Tricky Relationships
- Many‑to‑many between facts and dimensions (e.g., an order line tied to multiple promotions): use a bridge table(
OrderLinePromotionBridge) with optional allocation weights so measures can be split proportionally. - Ragged hierarchies (e.g., store → region → area with missing levels): use a hierarchy bridge, not a snowflake, to support flexible rollups.
- Degenerate dimensions (order number, invoice number) live in the fact table itself and are fine.
Step 6: Enforce Semantic Consistency
- Standardize units (currencies, time zones, UOM). If you must store multiple currencies, add a Currencydimension and plan as‑of conversion logic.
- Name measures precisely (
net_sales_amount_local,gross_sales_amount_usd) and document computation.
Common Anti‑Patterns (and Remedies)
- Mixed grains in one fact.
Remedy: Split into separate facts with clear grains (e.g., orders vs. inventory snapshots). - Over‑snowflaking dimensions.
Remedy: Collapse unless there’s a concrete governance/scale reason. - Using natural keys as foreign keys.
Remedy: Assign surrogate keys; keep naturals as attributes for traceability. - Combining incompatible facts directly.
Remedy: Use drill‑across with aggregated result sets on conformed keys. - Unmanaged SCD2 impact on facts.
Remedy: Store the correct SCD key at transaction time; or join by effective dates. - Hidden role‑playing.
Remedy: Explicitly model role‑playing dimensions with separate FKs and names.
Performance in Modern Warehouses
- Columnar MPP engines (BigQuery, Snowflake, Redshift, Synapse, DuckDB, etc.) love wide scans with selective filters—stars still shine.
- Partition/cluster facts by event date or the primary slicing key.
- Keep dimensions compact and cache‑friendly; deduplicate only when measurable.
- Pre‑aggregate where it’s repeatedly needed at the same grain (materialized views or summary facts) but don’t bake every dashboard into a table.
A Note on Data Vault & Stars (if you’re staging in DV 2.0)
If your core EDW uses Data Vault 2.0, stars (or galaxies) are excellent marts downstream:
- Build dimensions from Hubs + Satellites (business keys, descriptive history).
- Build facts from Links and Satellites (events and relationships).
- Assign surrogate keys in marts, keeping the DV lineage (hash keys, load dates) for traceability.
- Decide SCD policy per business use; DV gives you the raw history—your star formalizes the analytic view.
When You Should Snowflake (Concrete Scenarios)
- Corporate Geography used everywhere. A single
DimGeography(Country/Region/State/City) governed by HR/Finance, referenced as an outrigger byDimCustomer,DimStore, andDimSupplier. Small, stable, and updated centrally → ✅ Snowflake. - Product Taxonomy governed by Merchandising. Multiple stars need the same changing hierarchy; the taxonomy team publishes updates. Keep
DimProductdenormalized for most attributes, but reference a tinyDimProductCategoryby key. → ✅ Snowflake (targeted). - Identity/Privacy domains. PII attributes held in a restricted
DimIdentitytable, with non‑PIIDimCustomerreferencing it via tokenized key. → ✅ Snowflake for governance. - Avoid snowflaking when the only benefit is shaving a few megabytes or “it looks tidy” in an ERD tool. Query simplicity beats micro‑normalization.
Implementation Checklist
- Define facts and their grains. (Write it down.)
- Design conformed dimensions with surrogate keys and SCD policies.
- Populate a versioned bus matrix. Share it with stakeholders.
- Keep dimensions denormalized by default; snowflake only with a written justification.
- Model role‑playing dimensions explicitly.
- Provide drill‑across guidance (semantic model or SQL patterns).
- Add bridges for many‑to‑many and ragged hierarchies.
- Bake in unit/currency standards and document measures.
- Test with realistic workloads (filter, group, join) and profile performance.
- Govern changes (taxonomy updates, SCD rules) through data contracts or change tickets.
Final Thoughts
Star schemas keep analytics strong because they align with how people ask questions: “What happened?” (facts) “By what categories?” (dimensions). Snowflaking can be the right tool for certain shared hierarchies, governance boundaries, or extreme volumes—but it’s the exception, not the rule. When your organization grows beyond a single star, a well‑designed galaxy schema with conformed dimensions and a living bus matrix lets teams scale, compare, and combine metrics confidently.
One of the best summaries you can find on this is in the DAMA DMBOK. In particular, the data architecture, data modeling, and data warehousing sections are amazing and should be considered required reading for anyone interested in this type of data architecture.