EduDataSci – Educating the world about data and leadership – Data Strategy from a one time K-12 data architect

From Telemetry to Trust: Using FUAM + Purview Lineage to Make Fabric Governance Pay Off

If you’re running Microsoft Fabric at any real scale, you’ve probably felt the tension: the platform makes it easy to build, share, and iterate—but it also makes it easy to spend, sprawl, and accidentally ship the wrong answer.

The good news is you already have most of the raw ingredients to fix that. What’s missing is an operating model that converts “platform signals” into business outcomes: predictable costs, cleaner estates, and faster response when data is wrong.

In this post I’ll walk through three practical patterns:

using FUAM as a telemetry backbone for FinOps that people will actually use
using the same signals for stale workspace detection (without manual audits)
combining Microsoft Purview lineage with usage signals to identify incorrect datasets that are actively being consumed—and contain the blast radius

Along the way, I’ll stay grounded in business value: what these ideas buy you in dollars, time, and trust.

The Chief Risk Officer’s Quiet Obsession: Data Platforms and Data Products

A Chief Risk Officer (CRO) at an FSS Corporation rarely wakes up thinking, “I can’t wait to talk about data architecture today.”

But they do wake up thinking about something that inevitably leads back to it:

Can I trust what we’re about to tell the Board, the regulator, and the market—especially when conditions get ugly?

That question is why the CRO cares deeply about your Data Platform and your Data Products. Not as “tech initiatives,” but as the machinery that turns risk from opinions and spreadsheets into repeatable, auditable decisions the business can stand behind.

In this post, I’ll connect the CRO’s mandate to the practical realities of platforms and products—and why getting this right is a risk control, not a nice-to-have. Along the way, you’ll see why risk management and operational resilience don’t live in policy binders—they live in data.

Fabric-CICD Is Official Now. That Changes the Conversation.

If you’ve been building in Microsoft Fabric long enough to feel the friction, you already know the moment: the work is “done,” the PR is merged, and then deployment becomes a mix of careful clicks, environment tweaks, and crossed fingers.

That’s exactly why fabric-cicd (often written as Fabric-CICD) getting official support matters. It’s not just another community accelerator to admire—it’s a signal that code-first deployment is now a first-class part of the Fabric lifecycle story.

In this post I’ll lay out what Fabric-CICD is, why “official” changes its value, and where it fits alongside Git integration and deployment pipelines—so you can decide if it belongs in your Microsoft Fabric delivery path.

Syntax Was Never the Hard Part: What AI Coding Misses in Legacy Modernization

There’s a familiar storyline making the rounds right now: point an AI coding assistant at a legacy application, translate the COBOL (or FORTRAN, or PL/I, or SAS, or VB 6.0), and watch a modern system emerge on the other side.

It’s a comforting idea because it frames modernization as a language problem. And language problems are the kind of problems we’re used to solving with tools.

But most modernization programs don’t fail because the engineers can’t learn the syntax. They fail because the organization can’t recover the intent.

In this post, I want to make a simple case: AI-assisted coding can absolutely accelerate modernization, but it doesn’t remove the hard parts of modernization. Those hard parts live upstream and downstream from “write code”: the “why,” the evidence, the governance, and the operational reality of running real systems under real constraints.

Stop Picking a “Winner”: Data Product Interoperability Between Databricks and Fabric in Financial Services

Financial services teams have a familiar argument: “Are we a Databricks shop or a Fabric shop?” It sounds like a strategic question, but it usually hides the real problem—different parts of the business need different ways to use the same data, under tight controls, with clear auditability.

When Databricks and Microsoft Fabric interoperate at the data product level, the conversation shifts from which platform to how the data must be used: BI and semantic models, heavy Spark engineering, real-time analytics, governed sharing across boundaries, or advanced ML. The platform becomes a means, not the decision.

In this post I’ll lay out what “data product level interoperability” looks like in practice, why it enables responsible best-of-breed choices in regulated environments, and how it plays out in both directions: Databricks → Fabric and Fabric → Databricks.

Beyond the Medallion: Building Fabric Data Products with Schemas, Materialized Lake Views, and a “Surface Area” Contract

If you’ve been around modern analytics platforms for more than five minutes, you’ve probably built (or inherited) a medallion architecture: bronze → silver → gold. It’s familiar, it’s easy to draw on a whiteboard, and it’s often the first stable pattern teams reach for.

But there’s a quiet problem hiding in that simplicity: the number of sublayers tends to grow, and the complexity of each layer tends to balloon. Before long, you’re not designing a data product—you’re running an assembly line of multi-step transforms, hand-managed orchestration, and fragile dependencies.

Microsoft Fabric is starting to give us a different move: instead of treating transformation as a few “big” layers, you can treat it as a series of small, composable steps—and let the platform manage the dependency graph.

In this article, I’m going to connect three ideas:

Lakehouse schemas as your unit of organization (and the boundary between “internal plumbing” and “published contract”)
Materialized Lake Views as the declarative engine that builds (and refreshes) a dependency graph for you
A surface area schema designed to be shortcutted into other workspaces—so each workspace becomes an “analytical microservice” with its own interface, security boundary, and versioning story

Along the way, we’ll introduce a pragmatic versioning approach: create a new schema for major versions so breaking changes get semantic versioning “for free.”

The Ideal Microsoft Fabric CI/CD Approach: Git for Change, Deployment Pipelines for Promotion, and a Code-First Escape Hatch

Microsoft Fabric CI/CD has a reputation for being confusing—usually because people look at Git integration and Deployment Pipelines as competing ideas rather than two halves of a single delivery story.

The good news is that the “ideal” approach is not exotic. It’s a handoff:

Use Git integration to support real developer workflows (including branching that maps cleanly to isolated workspaces).
Use Deployment Pipelines to promote approved changes across environments.
When you need richer approvals, tests, and release controls, let traditional tooling—especially GitHub Actions or Azure DevOps Pipeline—orchestrate promotions via Fabric APIs.

In this post, I’ll lay out that end-to-end pattern step-by-step, show where the seams belong, and call out the cost you can’t ignore: workspace sprawl—and the operational discipline required to manage aged workspaces intentionally.

The NotebookUtils Gems I Wish More Fabric Notebooks Used

Most Fabric notebook code I review has the same telltale shape: a little Spark, a hardcoded path (or three), and just enough glue logic to “get it to run.” And then, a month later, someone copies it into another workspace and everything breaks.

NotebookUtils is one of the easiest ways to avoid that fate. It’s built into Fabric notebooks, it’s designed for the common “day two” problems (orchestration, configuration, identities, file movement), and it’s still surprisingly underused. NotebookUtils is also the successor to mssparkutils—backward compatible today, but clearly where Microsoft is investing going forward.

In this post, I’m going to do two things:

Give you a quick, practical orientation to NotebookUtils in Fabric.
Walk through the functions I reach for most often—especially the ones I don’t see enough in real projects: runtime.context, runMultiple()/validateDAG(), variableLibrary.getLibrary(), fs.fastcp(), fs.getMountPath(), credentials.getToken(), and lakehouse.loadTable().

Along the way, I’ll call out a few patterns that make notebooks feel less like “scripts you run” and more like reusable components in Microsoft Fabric data engineering work.

DirectLake Without OneLake Access: A Fixed-Identity Pattern That Keeps the Lakehouse Off-Limits

There’s a moment that catches a lot of Fabric teams off guard.

You publish a beautiful report on a DirectLake semantic model. Users can slice, filter, and explore exactly the way you intended. Then someone asks, “Why can I open the lakehouse and browse the tables?” Or worse: “Why can I query the SQL analytics endpoint directly?”

If your objective is semantic model consumption without lake access, the default DirectLake behavior can feel like it’s working against you. By default, DirectLake uses Microsoft Entra ID single sign-on (SSO)—meaning the viewer’s identity must be authorized on the underlying Fabric data source.

This post walks through a clean, operationally heavier—but very effective—pattern:

Bind the DirectLake semantic model to a shareable cloud connection with a fixed identity, and keep SSO disabled. Then do not grant end users any permissions on the lakehouse/warehouse item. Users can query the semantic model, but they can’t browse OneLake or query the data item directly.

Along the way, we’ll also cover the “gotchas” that trip teams up (especially around permissions and “SSO is still on somewhere”), plus a few guardrails that matter for real-world data governance in Microsoft Fabric.

Workspace Sprawl Isn’t Your Fabric Problem—Stale Workspaces Are

“Do we really need another workspace?”

If you’ve built anything meaningful in Microsoft Fabric, you’ve heard some version of that question. It usually comes wrapped in a familiar anxiety: workspace sprawl. Too many containers. Too much to govern. Too hard to manage.

Here’s the reframing that matters: workspace count is rarely the risk. The real risk is stale workspaces and stale data—the forgotten corners of your tenant where ownership is unclear, permissions linger, and the platform quietly accumulates operational and compliance debt.

In this post I’ll walk through why “workspace sprawl” is a false fear, why workspaces naturally form clusters (and why good development multiplies them), and how intentional permissioning in Microsoft Entra and Fabric keeps management from becoming a linear slog—especially once you introduce automation and tooling. Along the way, I’ll ground the point in the real mechanics of Microsoft Fabric rather than vibes.