Periodic snapshots are a gift and a curse.
They’re a gift because they’re easy to land: each load is a complete “as-of” picture, and ingestion rarely needs fancy orchestration. They’re a curse because the moment you want history with meaning—a clean versioned change feed, a Type 2 dimension, a Data Vault satellite—you’re suddenly writing heavy window logic, MERGEs, and stateful pipelines that are harder to reason about than the business problem you were trying to solve.
This post describes a Fabric Materialized Lake View (MLV) pattern that “squashes” a rolling set of snapshot tables down into a bounded, versioned change feed by pairing a chain of MLVs with a periodically refreshed frozen table. We’ll walk the pattern end-to-end, call out where it shines (and where it doesn’t), and then show how the resulting change feed can be used to support both #SlowlyChangingDimensions and #DataVault processes in an MLV-forward #MicrosoftFabric lakehouse architecture.
Before we go too far: the gold standard is still getting a change feed directly from the source system (CDC logs, transactional events, source-managed “effective dating,” or an authoritative change table). When you can get that, take it. Everything else—including this pattern—is a disciplined way of making the best of snapshots.
Why this pattern exists in a world with “traditional” approaches
In a traditional warehouse-first build, snapshot-to-history usually lands in one of these buckets:
- A procedural pipeline that compares the latest snapshot to the current dimension and issues a
MERGE/upsert, closing out old versions and inserting new ones. - A batch process that replays all snapshots each run, then uses window functions to compute
validfrom/validto(accurate, but expensive as snapshots pile up). - A CDC-native approach (best case) where the source emits changes and the target simply applies them.
Materialized Lake Views shift the trade-off space. MLVs are precomputed, stored results of SQL queries that Fabric can refresh on demand or on a schedule, with refresh behavior that can be incremental, full, or skipped depending on changes in dependent sources. They also come with lineage/dependency management that executes MLVs in the correct order.
But MLVs are not a magic wand. They have real constraints: DML isn’t supported inside MLV definitions, and certain constructs (like temporary views and Delta time travel) are not supported. Updating an MLV’s definition is effectively “drop and recreate.” And while Fabric supports optimal refresh modes, window functions can trigger a full refresh strategy.
So the problem becomes: How do we keep the pipeline declarative and MLV-forward, while still producing a correct, versioned history from snapshots—without reprocessing the entire world every run?
That’s where freezing and squashing comes in.
Materialized Lake Views as the engine (and the guardrails)
This pattern assumes you’re leaning into an MLV-forward architecture: transformations live as a managed chain of MLVs, Fabric refreshes them as a dependency graph, and downstream consumers read the materialized results as if they were tables.
A few Fabric realities matter for the design:
- Refresh modes matter. Fabric supports an “optimal refresh” mode that can do incremental refresh, full refresh, or no refresh depending on what changed in the dependent sources.
- Incremental refresh has prerequisites and boundaries. Incremental refresh requires Delta Change Data Feed enabled on dependent sources, and it is best aligned with append-only behavior; updates/deletes in the sources can force a full refresh.
- Window functions are powerful, but expensive in refresh terms. Window functions can push an MLV into full refresh behavior.
This pattern uses those facts rather than fighting them: it isolates the “window-heavy” squashing step onto a deliberately bounded dataset so full refresh stays cheap.
The pattern in plain English: freeze the past, squash the present
You have periodic snapshots arriving in your ingestion layer (call it Bronze, if you’re medallion-minded). Each snapshot represents the full state of an entity at a point in time: customers as of yesterday, assets as of end-of-month, subscriptions as of the nightly run.
You want a versioned change feed: only the rows that represent a new version of an entity when something actually changed, with validfrom, validto, and isactive.
The pattern creates that by maintaining two things:
- A Source Table of periodic snapshots.
- A Frozen Table that stores a periodically “compacted” version of history so your squashing logic never has to look back forever.
Then it runs a chain:
- MLV 1: Pull only snapshots after the freeze point and compute a
hashdiff. - MLV 2: Union those new snapshots with the currently active baseline from the frozen history.
- MLV 3: “Squash” the union down to only versions where the
hashdiffchanged. - MLV 4: Close versions (
validto,isactive) and union back the already-inactive frozen history to produce the full feed.
Periodically, you overwrite the frozen table with the output of MLV 4, resetting the baseline.
That’s it. The rest is details—and the details are where this becomes useful.
The two required objects
Source table: periodic snapshots in ingestion
This is the table being processed. The assumption is:
- It is append-only.
- It includes a business key (natural key) for the entity.
- It includes a snapshot “as-of” date/time column (the snapshot version key).
If your source doesn’t have a clean as-of timestamp, use the ingestion timestamp. Just be consistent: the entire pattern is built on the idea that each snapshot row can be ordered into a timeline.
Frozen table: a bounded “history anchor”
The frozen table is a copy of the existing versioned change feed, taken periodically. It exists for one reason: to keep processing from becoming heavier as snapshots accumulate.
It includes:
- All source columns except the snapshot date (you’ll carry it as
validfrominstead). validfrom,validto,isactivehashdifffreeze_date(the date the table was frozen)
This frozen table is the practical compromise between “recompute everything always” and “build a complex stateful pipeline.”
MLV 1: Pull new snapshots and compute hashdiff
MLV 1’s job is to take only the snapshots that happened after your last freeze, compute a hashdiff, and hand a smaller, enriched dataset forward.
The key idea is that MLV 1 finds the cutoff dynamically:
cutoff = max(freeze_date) from FrozenTable- filter Source snapshots where
snapshot_date > cutoff
Then compute hashdiff from the attributes that define a “meaningful” change.
A typical approach is a deterministic hash over a canonicalized concatenation of columns (trim strings, standardize nulls, stable ordering). Determinism matters because Fabric’s incremental refresh behavior depends on deterministic functions for the best outcomes.
If you’ve built Data Vault satellites before, this should feel familiar. If you haven’t, here’s the mental model: hashdiff is your “did the descriptive content change?” detector.
MLV 2: Union in the active frozen baseline
MLV 2 does something that looks odd until you’ve lived through a freeze boundary bug.
It unions:
- All rows from MLV 1 (new snapshots after the freeze)
- The active records from the frozen table
…but with a twist: for the active frozen records, you populate only what you need to bridge the timeline. “New” columns coming from MLV 1 can be left NULL, except for validfrom, which becomes the “version key” aligning with MLV 1’s snapshot date.
Why?
Because when you freeze history, you cut off the raw snapshot stream. Without bringing the last known active state forward, you can’t detect whether the first post-freeze snapshot represents a change or continuity. You need the “previous version” present in the dataset you’re squashing.
This step is also where the pattern stays sane over time. MLV 2 should never grow without bound because freezing constrains how many post-freeze snapshots are in scope.
MLV 3: Squash MLV 2 into “only the changes”
MLV 3 is the heart of the pattern: it takes the bounded dataset and produces a versioned feed containing:
- The active frozen baseline rows (as the “prior state”)
- Any new versions where
hashdiffchanged compared to the previous row for that business key
Conceptually:
- Partition by business key
- Order by version key (
validfrom) - Keep the first row
- Keep any row where
hashdiff != lag(hashdiff)
This is where window functions typically show up (e.g., LAG). Window functions are also one of the reasons you freeze: they can push an MLV toward full refresh strategy. The win is that MLV 3 is operating on a bounded subset, so full refresh is acceptable and predictable.
At the end of MLV 3, you have a version list but not a closed history yet.
MLV 4: Close versions and rebuild the complete feed
MLV 4 takes MLV 3 and computes the mechanics of history:
validtois the next version’svalidfrom(often modeled as a half-open interval:[validfrom, validto)).isactiveis true when there is no next version.
Again, this commonly uses a window function like LEAD(validfrom). Same note: this is fine because the dataset is bounded.
Then MLV 4 unions:
- The newly closed versions from MLV 3
- The already inactive rows from the frozen table
That union gives you a full, consistent change feed: old closed history plus newly computed versions.
At this point, you have what most teams want but rarely name clearly: a reusable, canonical versioned change feed.
The freeze step: the one intentionally “imperative” move
Periodically, copy the output of MLV 4 into the frozen table, overwriting it.
This is the “escape hatch” that keeps the rest of the pipeline declarative. MLVs don’t support DML in their definitions, and you can’t “update history in place” inside an MLV. So freezing is done as a separate operation (a notebook write, a pipeline step, or an automated job).
If you’re leaning into automation, Fabric also exposes REST APIs that support scheduling and running MLV lineage refresh jobs. Even if you don’t use the APIs directly, the broader point is that MLV pipelines are meant to be managed and repeatable.
Why this works so well in an MLV-forward Fabric architecture
This pattern isn’t just “how to do SCD2 with snapshots.” It’s a way of making snapshot-based history feel native in a Fabric lakehouse where MLVs are the primary transformation mechanism.
The advantages are architectural, not just tactical:
- You keep the pipeline declarative. Most of the logic lives in
SELECTstatements, materialized and refreshed by Fabric. MLVs are literally designed for this: define once, refresh on schedule, and query like a table. - You get managed dependency execution. The lineage view exists to run dependent MLVs in sequence when new data arrives.
- You isolate “full refresh” pressure to a bounded slice. Optimal refresh can do incremental/no refresh/full refresh, but window functions can force full refresh. Freezing gives you a practical safety valve: even if MLV 3/4 must refresh fully, they refresh fully over a small dataset.
- You produce a reusable artifact. Downstream models—dimensional, Data Vault, even semantic-layer friendly tables—can all start from the same change feed.
If you’re trying to build a lakehouse that doesn’t devolve into “a pile of notebooks and hope,” this pattern is one of the cleaner bridges between snapshot reality and modeled history.
From versioned change feed to Slowly Changing Dimensions
Once MLV 4 produces:
- business key
- descriptive attributes
validfromvalidtoisactivehashdiff
…you’re one small step away from an SCD Type 2 dimension.
In many cases, the change feed already is the dimension table, minus a surrogate key. If you need a surrogate key, you can derive one deterministically (for example, hash of business key + validfrom), and you can add the usual dimension hygiene (unknown member row, default dates, conformed attribute naming).
The bigger win is conceptual: your SCD logic is no longer the “hard part.” The hard part—detecting changes across time—has already been done once, centrally, and materialized.
That’s what makes this pattern feel #MaterializedLakeViews-native: the dimension becomes a thin layer over a canonical feed rather than a bespoke pipeline.
From versioned change feed to Data Vault hubs, links, and satellites
Data Vault practitioners will recognize the frozen-table columns immediately: hashdiff, effective dating, active flags.
This change feed supports Data Vault builds in a straightforward way:
- Hub: Take distinct business keys from the feed. Compute a hub hash key (HK) from the business key(s). Load once per key.
- Satellite: The feed rows are essentially satellite rows already: a key, a
hashdiff, a set of descriptive attributes, and a timeline. If you want to track both load date and effective date separately, keepvalidfromas effective date and add an ingestion/load timestamp from the snapshot landing. - Link: If your snapshots represent relationships (many-to-many associations, hierarchies, memberships), the same squashing logic can generate a relationship change feed keyed by the participating business keys, which then becomes a link + satellite pattern.
The larger architectural benefit is that you can support both worlds—SCD and Data Vault—without duplicating “change detection logic” across two modeling teams. One canonical feed, many downstream shapes.
Advantages and trade-offs
This pattern is useful precisely because it’s not pretending to be perfect.
Where it shines
It shines when you’re living in snapshot land and want history without building a procedural monster:
- Bounded cost over time. Freezing prevents “scan all snapshots forever” behavior.
- MLV-first pipeline design. You get managed refresh, lineage, and consistent transformed datasets.
- A reusable output. The versioned feed can drive #SlowlyChangingDimensions, #DataVault satellites, or any downstream conformed layer.
Where it doesn’t
There are real trade-offs:
- It’s only as granular as your snapshots. If an attribute changed three times between snapshots, you’ll see at most one change.
- It requires discipline in hashdiff design. Poor canonicalization leads to false positives (or worse, missed changes).
- You are managing a freeze cycle. That’s operational overhead and an extra copy of history (by design).
- Window-heavy steps may not be incrementally refreshed. Fabric’s optimal refresh strategy can fall back to full refresh when using window functions; the pattern absorbs that by bounding the dataset.
And again: if your source can provide a trustworthy change feed directly, you should prefer that. Delta Change Data Feed, for example, exists specifically to record row-level change events between versions of a Delta table (including inserts, deletes, and updates). That’s closer to truth than reconstructing change from snapshots.
Closing thoughts: make snapshots behave like a change feed
Let’s recap what we set out to do, what we did, and what you can do next.
We started with a common constraint: periodic snapshots that are easy to ingest but expensive to use for history. We then used a Fabric Materialized Lake View chain to transform those snapshots into a compact, versioned change feed—without letting the cost grow unbounded—by introducing a frozen table as a deliberate boundary. Along the way, we stayed aligned with Fabric’s strengths (declarative MLVs, lineage-driven refresh) and designed around its constraints (no DML in MLVs, refresh behavior that can fall back to full refresh for window functions).
The result is a practical asset: a versioned feed that can be shaped into either SCD2 dimensions or Data Vault structures with minimal additional work.
If you’re building in Microsoft Fabric and you’re still getting snapshots instead of CDC, the question isn’t “can we do history?” It’s “can we do it in a way that stays maintainable a year from now?” This pattern is one of the cleaner answers I’ve found.