EduDataSci – Educating the world about data and leadership – Page 3 – Data Strategy from a one time K-12 data architect

Data Quality as Code in Fabric: Declarative Checks on Materialized Lake Views

If you’ve ever shipped a “clean” silver or gold table only to discover (later) that it quietly included null keys, impossible dates, or negative quantities… you already know the real pain of data quality.

The frustration isn’t that bad data exists. The frustration is that quality rules often live somewhere else: in a notebook cell, in a pipeline activity, in a dashboard someone checks (sometimes), or in tribal knowledge that never quite becomes a contract.

Microsoft Fabric’s Materialized Lake Views (MLVs) give you a more disciplined option: you can define declarative data quality checks inside the MLV definition using constraints, and then use Fabric’s built-in monitoring, lineage, and embedded Power BI Data Quality reports to understand how quality is trending across your lakehouse and your data products.

In this post, I’ll cover what these checks look like, how to add them, and—most importantly—how to turn them into quality signals you can operationalize for a Microsoft Fabric lakehouse and the Data Engineering teams who depend on it.

It’s important to note, here, that we’re looking at structural data quality here. Data Integrity – making sure that your data is following your business logic, makes sense, and isn’t drifting, is another discipline, and while these techniques can be adapted for it, there’s other ways to implement that that are more efficient.

The Advanced Lakehouse Data Product: Shortcuts In, Materialized Views Through, Versioned Schemas Out

There’s a familiar tension in modern analytics: teams want data products that are easy to discover and safe to consume, but they also want to move fast—often faster than the governance model can tolerate.

In Microsoft Fabric, that tension frequently shows up as a perception of workspace sprawl. A “single product per workspace” model is clean on paper—strong boundaries, tidy ownership, straightforward promotion—but it can quickly turn into dozens (or hundreds) of workspaces to curate, secure, and operate.

This post proposes a different pattern—an advanced lakehouse approach that treats the lakehouse itself like a product factory:

Shortcuts or schema shortcuts become the input layer (a clean, contract-aware “ingest without copying” boundary).
A small-step transformation layer is implemented as a multi-step DAG using Materialized Lake Views (MLVs).
A versioned, schema-based surface area becomes the data product contract you expose to consumers.

Then we connect that to OneLake security and Fabric domains, showing how you can expose left-shifted data products (usable earlier in the lifecycle) without letting workspaces multiply endlessly.

Freeze-and-Squash: Turning Snapshot Tables into a Versioned Change Feed with Fabric Materialized Lake Views

Periodic snapshots are a gift and a curse.

They’re a gift because they’re easy to land: each load is a complete “as-of” picture, and ingestion rarely needs fancy orchestration. They’re a curse because the moment you want history with meaning—a clean versioned change feed, a Type 2 dimension, a Data Vault satellite—you’re suddenly writing heavy window logic, MERGEs, and stateful pipelines that are harder to reason about than the business problem you were trying to solve.

This post describes a Fabric Materialized Lake View (MLV) pattern that “squashes” a rolling set of snapshot tables down into a bounded, versioned change feed by pairing a chain of MLVs with a periodically refreshed frozen table. We’ll walk the pattern end-to-end, call out where it shines (and where it doesn’t), and then show how the resulting change feed can be used to support both #SlowlyChangingDimensions and #DataVault processes in an MLV-forward #MicrosoftFabric lakehouse architecture.

Before we go too far: the gold standard is still getting a change feed directly from the source system (CDC logs, transactional events, source-managed “effective dating,” or an authoritative change table). When you can get that, take it. Everything else—including this pattern—is a disciplined way of making the best of snapshots.

Ship Your Lakehouse Like Code: Deploying MLVs with a SQL-Only Configuration Notebook

If you’re building with Materialized Lake Views (MLVs), you’ve probably felt the tension: the definitions live in code, but the Lakehouse itself is an environment-specific artifact. That gap is where deployments get messy—schemas drift, tables don’t exist yet, and MLV refresh behavior looks “random” when it’s really just reacting to configuration.

This post lays out a pattern that closes that gap cleanly: a lakehouse configuration notebook that you promote through your deployment pipeline and run in every environment to create schemas, tables, and MLVs idempotently—using SQL cells only. The key is that MLVs are treated as “definition-driven assets” that can be iterated in dev and re-stamped into test/prod with the same notebook.

And we’ll end with the detail you want to institutionalize: the final cell sets Delta Change Data Feed (CDF) the way you want it—because it directly affects whether Fabric uses incremental refresh and whether some “static-source” MLVs appear to not run.

Delta First: Building Efficient Bitemporal Tables in Microsoft Fabric

In financial services, the questions that matter most are rarely answered by “the latest record.”

Regulators, auditors, model validators, and operations teams want something more specific: what was true for the business at the time, and what did we know at the time? That’s bitemporal thinking—and it’s exactly the kind of problem where Microsoft Fabric’s Lakehouse on Delta becomes more than storage. It becomes a practical design advantage.

In this post, I’m going to walk through what bitemporal tables actually require, why intervals matter (ValidFrom/ValidTo), and how to implement bitemporal efficiently in Fabric by leaning into #DeltaLake in the Lakehouse. We’ll ground it with two #FSI examples (low velocity KYC and high velocity trades/payments), and we’ll add a derived-layer option using materialized lake views to calculate closure dates. Finally, we’ll cover when Azure SQL Database (including Hyperscale) is the right operational complement to Fabric.

DirectLake on OneLake CI/CD: A Practical Two-Step Deployment Pattern with Sempy Labs + Variable Libraries

DirectLake on OneLake is one of those “this is what we’ve been waiting for” features in Microsoft Fabric—until you try to deploy it cleanly across Dev → Test → Prod and realize you’ve re-entered the world of post-deployment manual fixes.

In this how-to, I’m going to do three things:

Contrast DirectLake on SQL endpoints (the “classic” flavor) with DirectLake on OneLake (the newer flavor), and explain why OneLake is worth the trouble.
Walk through the normal deployment pipeline approach that works well for DirectLake on SQL.
Show a two-step, semi-automated approach for DirectLake on OneLake using:
- sempy_labs.directlake.update_direct_lake_model_connection(), and
- a Fabric Variable Library + a “run-after-deployment” notebook.

Along the way, I’ll call out the current challenges (because yes, they’re real right now), and why this pattern matters for serious Microsoft Fabric Power BI CI/CD work.

Two Flavors of DirectLake: Over SQL vs. Over OneLake (and How to Switch Without Surprises)

DirectLake has a way of sounding wonderfully simple: “Power BI, but it reads the lake directly.” Then you build two semantic models that both say DirectLake, and they behave… differently. One falls back to DirectQuery when you least expect it. Another refuses to touch your SQL views. Security works for you, but not for your report consumers. Suddenly, “DirectLake” feels less like a feature and more like a riddle.

The good news: this is explainable. And once you understand the two flavors—DirectLake over SQL and DirectLake over OneLake—you can choose deliberately, design around the trade-offs, and even switch between them when you have to.

In this post, I’ll demystify what each option really means, lay out the positives and negatives, explain when you’d use each (and why), and show how to switch using Semantic Link Labs—including what can break when you flip the switch.

From Tables to Networks: A Deep Dive into Graph in Microsoft Fabric for Financial Services Insights

Most financial services data is already “connected.” It just isn’t modeled that way.

Fraud rings don’t show up as a single row. Money laundering doesn’t announce itself in one transaction. Counterparty exposure isn’t obvious from one booking. The meaningful signal lives in relationships: who shares an address, which accounts route funds through the same nodes, where devices and identities overlap, and how risk propagates through a network.

Graph in Microsoft Fabric is designed for exactly that: turning your OneLake data into a connected model you can explore visually, query with GQL, and enrich with built-in graph algorithms—without standing up a separate graph stack and duplicating data.

In financial services, this is the difference between “we have the data” and “we can reason over the connections.”

Gold as the Contract: Schema Evolution, Data Products, and Governance in Microsoft Fabric

A schema change is rarely “just a schema change.”

It’s the moment an upstream team’s perfectly reasonable adjustment becomes a downstream team’s broken report, confusing metric, or silent misinterpretation. And that’s why schema evolution has always been a source of anxiety: a schema isn’t simply structure—it’s an interface.

In this post, I’ll do three things. First, I’ll ground why schema evolution has historically been such a persistent concern. Next, I’ll reframe Medallion with Gold as the published surface area of the data product, and Silver as an optional workshop layer where data is supplemented and transformed. Finally, I’ll connect that design to how Microsoft Fabric supports it today—especially the post‑Ignite direction around governance, security, and semantics.

Power BI Copilot Has Multiple Modes. Here’s What Each One Does—and How Fabric Data Agents Change the Game.

Copilot in Power BI isn’t “one feature.” It’s a growing set of experiences that show up in different places, behave differently, and—most importantly—solve different problems.

That’s why two people can both say “Copilot didn’t work for me,” and both be right. One might be trying to generate a report page in Desktop. Another might be trying to chat across any model in their tenant. A third might be expecting an agent-like experience that stays grounded in a curated subject area.

In this post, we’ll map the major modes of Power BI Copilot (where it shows up, what it’s best at, and what it’s not), then contrast that with Fabric Data Agents—because Data Agents aren’t a “better Copilot.” They’re a different building block, meant for a different kind of outcome.