The Microsoft Fabric Delta Change Data Feed (CDF)

In Microsoft Fabric you’re sitting on top of Delta Lake tables in OneLake. If you flip on Delta Change Data Feed (CDF) for those tables, Delta will record row‑level inserts, deletes, and updates (including pre‑/post‑images for updates) and let you read just the changes between versions. That makes incremental processing for SCDs (Type 1/2) and Data Vault satellites dramatically simpler and cheaper because you aren’t rescanning entire tables—just consuming the “diff.” Fabric’s Lakehouse fully supports this because it’s natively Delta; Mirrored databases land in OneLake as Delta too, but (as of September 2025) Microsoft hasn’t documented a supported way to enable Delta CDF on the mirrored tables themselves; you can still analyze mirrored data with Spark via Lakehouse shortcuts, or source CDC upstream (Real‑Time hub) and write to your own Delta tables with CDF enabled.

This feature is already underutilized, but once Mirrored Databases support the CDF, it’s going to be a must have in every data engineer’s toolkit.

What Delta Change Data Feed is (in Fabric terms)

Delta CDF is a table capability that tracks row‑level change events and exposes them with metadata columns such as _change_type_commit_version, and _commit_timestamp. For updates, you get both update_preimage and update_postimage. Reads can be batch (e.g., “give me changes from version 120→latest”) or streaming (continuous consumption). You enable it by setting the table property delta.enableChangeDataFeed=true and you read it by specifying option("readChangeFeed","true"). Retention is finite: change files obey the table’s time‑travel/VACUUM policies, so you should process (or archive) the feed within that window.

Fabric Lakehouse uses Delta as the unified table format across compute engines, so these behaviors apply directly in Fabric notebooks/Spark jobs.

Sketch:

-- enable CDF when creating (SQL in Spark) 
CREATE TABLE bronze.orders
USING delta
TBLPROPERTIES (delta.enableChangeDataFeed = true)
AS SELECT * FROM source_snapshot;

-- or on an existing table
ALTER TABLE bronze.orders SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
# batch consume the change feed in Fabric notebooks (PySpark)
changes = (spark.read
  .format("delta")
  .option("readChangeFeed","true")
  .option("startingVersion", 120)      # or startingTimestamp
  .table("bronze.orders"))

How this interacts with mirrored data and other sources

Mirrored databases → OneLake Delta
Mirroring continuously replicates Azure SQL, SQL Server, Cosmos DB, Snowflake, etc., into Delta tables in OneLake. You can reach those mirrored tables from Spark by creating Lakehouse shortcuts to the mirrored database and even open the underlying Delta files. This is deliberate: mirroring aims to give you analytics‑ready Delta without burdening the source.

But can you turn on Delta CDF on the mirrored tables?
Microsoft’s docs don’t advertise a supported way to enable CDF on mirrored tables today. Community threads repeatedly note that CDF isn’t available on mirrored databases, and feature requests are tracked externally (status was “Planned” late 2024). Practically, teams treat mirrored tables as read‑optimized replicas and keep CDF on their own Lakehouse tables downstream.

A separate—but commonly confused—point: source‑system change feeds (for example, Cosmos DB’s native change feed) are not exposed “as a feed” by Fabric Mirroring; mirroring writes changes into Delta snapshots in OneLake.

Alternatives when you start from a mirror:

  1. Consume mirrored tables via Spark and manage your own CDF downstream.
    Create a Lakehouse shortcut to the mirrored tables, then write them into your Bronze Delta tables with MERGEsemantics and delta.enableChangeDataFeed=true. From there, CDF is yours to use.
  2. Ingest upstream CDC directly (Real‑Time hub) instead of—or alongside—mirroring.
    Fabric’s Real‑Time hub has first‑party sources for Azure SQL DB/MI and SQL Server CDC. You can land the CDC stream into a Lakehouse/Delta table that you own with CDF enabled, giving you a clean, durable change log ideal for SCD/DV processing.
  3. Replicate out of a Lakehouse using CDF (preview).
    If you need to push changes to other systems, Fabric now exposes “Data replication from Lakehouse with Delta Change Feed” (preview), which uses CDF to drive downstream connectors.

(Side note on Warehouses:) Fabric Warehouses also store data in Delta and even publish Delta logs for interoperability, but Warehouse docs don’t present a productized knob for Delta CDF on Warehouse tables themselves. The standard, low‑friction place to leverage CDF in Fabric remains the Lakehouse.


Why CDF is useful for SCDs and Data Vault

SCD Type 1/2 (dimensions).
With CDF, your source of truth table emits exactly the rows that changed since version v. The _change_type tells you whether to treat a row as an insert, delete, or an update’s pre/post image. You can route those events into SCD logic without rescanning the whole dimension every run.

Sketch: consume bronze.customers changes and maintain a Type‑2 dimension:

from pyspark.sql.functions import col, current_timestamp, lit

cdf = (spark.read
  .format("delta")
  .option("readChangeFeed","true")
  .option("startingVersion", last_processed_version)
  .table("bronze.customers")
  .filter(col("_change_type").isin("insert","update_postimage","delete")))

# Upsert logic (MERGE) sets end_date/current_flag on matches and inserts new versions on change.
# Only keys present in 'cdf' are touched, so it scales with the delta, not the full table.
cdf.createOrReplaceTempView("src_changes")

spark.sql("""
MERGE INTO dim.customer_scd2 AS d
USING (
  SELECT customer_id, name, city, state, postal_code, _change_type
  FROM src_changes
) AS s
ON d.customer_id = s.customer_id AND d.current_flag = 1
WHEN MATCHED AND s._change_type IN ('update_postimage','delete') THEN
  UPDATE SET d.current_flag = 0, d.end_date = current_timestamp()
WHEN NOT MATCHED AND s._change_type <> 'delete' THEN
  INSERT (customer_id, name, city, state, postal_code, start_date, end_date, current_flag)
  VALUES (s.customer_id, s.name, s.city, s.state, s.postal_code, current_timestamp(), NULL, 1)
""")

Because CDF gives you update_preimage/update_postimage, you can compute deltas precisely (e.g., only end‑date when attributes actually changed) and keep the dimension slim. The semantics come straight from Delta CDF; retention/processing considerations apply.

Data Vault 2.0 (raw vault & PIT/bridge).
Traditional DV raw vault is insert‑only: hubs/links are driven by business keys and relationships; satellites capture attribute history with hashdiffs. CDF slots in as the change driver:

  • Land source into a Bronze Delta table with a stable business key; enable CDF.
  • For hubs/links, consume only insert and the “new relationship” events; compute hash keys and append.
  • For satellites, read insert/update_postimage rows from CDF, recompute the hashdiff for the satellite’s attribute group, and append a new satellite row only when the hashdiff changes. Deletes from source (hard or logical) can be carried into a delete satellite or a status attribute—again driven by CDF events rather than full scans.

This approach eliminates wide, expensive “compare everything to yesterday” diffs and makes PIT refreshes faster because the sat churn is minimal and version‑stamped by _commit_version/_commit_timestamp. The DV mechanics are model‑specific; the point is that CDF gives you the minimal change set to operate on, which is exactly what DV wants. The underlying CDF mechanics and metadata are defined by Delta Lake; Fabric’s Lakehouse is a first‑class Delta environment.


Practical guidance in Fabric

If you already mirror an operational store into Fabric, treat the mirrored tables as your landing/snapshot and immediately persist them into your own Lakehouse Bronze tables with deterministic keys and MERGE semantics; enable CDF there. That gives you a durable, queryable feed for SCD/DV work independent of mirroring internals. If you need true event‑time fidelity or per‑row lineage from the source, wire up Real‑Time hub CDC from Azure SQL/MI/SQL Server and write directly into CDF‑enabled Delta tables—you’ll have an end‑to‑end change pipeline that remains reusable for both SCD and Vault patterns.

Finally, remember that CDF is not a permanent ledger; it’s a change convenience. If you need long‑term replay/audit, land the feed into your own “changelog” Delta table before VACUUM evicts older versions. These retention behaviors are part of the spec and worth designing for up front.

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.

Discover more from EduDataSci - Educating the world about data and leadership

Subscribe now to keep reading and get access to the full archive.

Continue reading