October 2025 – Page 2 – EduDataSci – Educating the world about data and leadership

The Microsoft Fabric Delta Change Data Feed (CDF)

In Microsoft Fabric you’re sitting on top of Delta Lake tables in OneLake. If you flip on Delta Change Data Feed (CDF) for those tables, Delta will record row‑level inserts, deletes, and updates (including pre‑/post‑images for updates) and let you read just the changes between versions. That makes incremental processing for SCDs (Type 1/2) and Data Vault satellites dramatically simpler and cheaper because you aren’t rescanning entire tables—just consuming the “diff.” Fabric’s Lakehouse fully supports this because it’s natively Delta; Mirrored databases land in OneLake as Delta too, but (as of September 2025) Microsoft hasn’t documented a supported way to enable Delta CDF on the mirrored tables themselves; you can still analyze mirrored data with Spark via Lakehouse shortcuts, or source CDC upstream (Real‑Time hub) and write to your own Delta tables with CDF enabled.

This feature is already underutilized, but once Mirrored Databases support the CDF, it’s going to be a must have in every data engineer’s toolkit.

Analytics Governance: the Missing Middle of the Information Governance Stack

Most organizations have matured data governance (quality, ownership, catalogs) and are racing to formalize AI governance (risk, bias, safety, model monitoring). Application governance (SDLC, access, change control) keeps production systems stable.

But the layer where business decisions actually touch numbers—analytics—often sits in a gray zone. KPI definitions live in wikis, dashboards implement subtle variations of the “same” metric, and spreadsheets quietly fork the math. Analytics governance fills that gap: it is the set of controls, roles, artifacts, and workflows that make calculations consistent, auditable, and reusable across the enterprise.

Why Star Schemas Make Analysts Faster (and Happier)

If you live in spreadsheets or SQL all day, the “one big table” (OBT) feels like home. Everything you need is right there: one row per thing, a column for every attribute, and no joins to worry about. It’s a great way to explore data fast—until it isn’t. This post explains, in plain language, why the star schema pays you back every day you analyze data, and how it keeps the speed you love without the headaches you’ve learned to live with.

Power Platform, Citizen Developers, and Citizen Data: More Than a One‑Trick Platform

I’m often asked whether Power Platform is “just” a sandbox for non-developers. It isn’t. Power Platform is a connective tissue across data, process, and people—equally at home enabling a teacher to automate feedback on assignments, a business analyst to ship a line‑of‑business app, and an engineering team to surface enterprise APIs safely to the front lines. It integrates naturally with citizen developer initiatives and citizen data initiatives, but it also gives professional developers a fast, governed way to deliver solutions without reinventing the plumbing.

A New Paradigm For Data Teams: The Changing Role of the Data Visualization Engineer

When teams build warehouses the old way—source → bronze → silver → gold → semantic—visualization and semantic specialists are invited in at the end. Their job looks reactive: wire up a few visuals, name some measures, make it load fast enough. They inherit whatever the pipeline produced, then try to make meaning out of it. The failure mode is predictable: pixel‑perfect charts sitting on semantic quicksand, with definitions that shift underfoot and performance that depends on structures no one designed for the questions at hand.

Flip the sequence to Gold → Silver → Bronze → Ingestion, and the center of gravity moves. The product—expressed as a semantic contract—is defined first. In Fabric, that contract is not a veneer; it’s the spine. Direct Lake brings OneLake Delta tables straight into the model; Materialized Lake Views make silver transformations declarative in the lake; Eventhouse (as part of Real‑Time Intelligence) lands and analyzes streams while also publishing them to OneLake for the same model to consume. In that world, the people who shape the semantic layer stop being “report writers” or “data visualization engineers.” They become data product engineers who lead the build toward a specific, testable outcome.

FabCon Feature: Fabric Real‑Time Intelligence

Real‑Time Intelligence (RTI) is the part of Fabric that treats events and logs as first‑class citizens: you connect live streams, shape them, persist them, query them with KQL or SQL, visualize them, and trigger actions—all without leaving the SaaS surface. Concretely, RTI centers on Eventstream (ingest/transform/route), Eventhouse (KQL databases), Real‑Time Dashboards / Map, and Activator (detect patterns and act). That tight loop—capture → analyze → visualize/act—now covers everything from IoT telemetry to operational logs and clickstream analytics.

Information Governance: The Backbone That Unifies Data, AI, Applications, and Analytics

Information governance (IG) is the strategy, accountability, and control system for how an organization collects, classifies, uses, protects, shares, retains, and disposes of information across its entire lifecycle. It is:

Scope‑wide: Covers structured data, unstructured content, model artifacts, code, dashboards, and records (including legal/records management and privacy).
Lifecycle‑aware: From intake and creation → active use → archival → retention/disposition and legal holds.
Outcome‑driven: Balances value (insights, automation, personalization) with risk (security, privacy, ethics, legal/regulatory).

Where data governance focuses on data as an asset, information governance focuses on information as a liability and an asset—linking value creation with lawful, ethical, and secure handling.

Baselines Over Buzzwords: From Warehouse to Lakehouse

If you’ve built data systems long enough, you’ve lived through at least three architectural moods: the tidy certainty of Kimball and Inmon, the anarchic freedom of “throw everything in the data lake to ingest quickly,” and today’s lakehouse, which tries to keep our speed without losing our sanity. I’ve always cared less about labels and more about baselines—clear, durable expectations that make change safe. This piece traces how those baselines shifted, what we gained and lost, and how to rebuild them for modern work, including real‑time, very large, and unstructured data.