September 2025 – EduDataSci – Educating the world about data and leadership

Two Lenses on Purview in MS Fabric

Microsoft uses one brand—Purview—for two scopes. After FabCon Europe (Vienna, Sep 15–18, 2025), the split is even clearer:

Enterprise Purview (in the Purview portal) is your estate‑wide governance, security, and compliance plane: unified catalog, lineage, data quality, sensitivity labels/DLP/audit, DSPM for AI, and (now) Fabric‑aware risk insights.
Purview inside Fabric (OneLake catalog + Purview hub) is the in‑product lens for people building and consuming inside Fabric. The Govern tab is now GA, with domain‑scoped insights; domains have public APIs.

Think of it this way: Fabric’s lens governs work where it’s made; Enterprise Purview governs the whole estate.

A New Paradigm For Data Teams: The real bottleneck isn’t data, it’s definition

Most data teams still run a tidy assembly line: ingest sources into bronze, standardize into silver, curate into gold, and only then wire up a semantic model for BI. That sounds rigorous—but it puts the business contract (grain, conformed dimensions, measure logic, security scope, and SLOs) at the very end. By the time the organization finally argues about what “AUM” or a compliant “time‑weighted return” really means, we’ve already paid for pipelines, copies, and storage layouts that might not fit the answer we need.

Symptoms you’ll recognize: months of “inventory building” without shipping a trustworthy product; duplicate stacks for streaming vs. batch; sprawling “bronze” zones that age into operational risk; and endless rework because definitions arrived too late.

Modern Microsoft Fabric tools let you flip the incentives. With Direct Lake placing the semantic model directly over Delta in OneLake—and with shortcuts, mirroring, materialized lake views, and eventhouses spanning real‑time and lake—there’s finally a platform that rewards designing from the output backward. In other words: Gold → Silver → Bronze → Ingestion.

FabCon Feature: Purview

On edudatasci.net, I keep data mesh grounded in four behaviors: domains own data; data as a product; a small self‑serve platform; and federated governance (policies expressed as code and applied consistently). I also use foundational vs derived data products as a practical way to think about scope and ownership, and I recommend publishing products in Purview’s Unified Catalog so ownership, access and SLOs are discoverable to the org, not just the team that built them.

FabCon Europe 2025 dropped a number of big announcements related to Purview in Fabric, and I think they – and their connection to Data Mesh are worth talking about.

Certifications in IT

I hold a lot of certifications. That’s a personal choice, not a creed. I like challenging myself against a test to prove I’ve learned something and lets me prove to myself that I’ve actually successfully learned something. The “certs or no certs?” debate is as eternal—and as spicy—as vi vs. Emacs (or “eMacs,” if you’re trolling your coworkers). Different corners of computing answer that question differently, for good reasons. A little history sets the stage:

CCIE’s origin (1993): Cisco launched the CCIE with a famously grueling, hands‑on lab. It quickly became the example of a vendor cert that feels like a license to practice, not just a multiple‑choice quiz.
MCSA’s era (retired 2020–2021): Microsoft’s MCSA/MCSE/MCSD defined the classic exam‑centric, product‑specific credentialing model before Microsoft pivoted to role‑based, cloud‑first certifications.
Software Engineering PE (ended 2019): In the U.S., the dedicated PE exam for software engineers was discontinued after the April 2019 sitting—closing a formal licensure lane some hoped would tie software to traditional engineering standards.

Zero Unmanaged Copy + SCDs: Keep All the History, Lose the Bloat

Data teams often face a false choice: either keep rich Slowly Changing Dimensions (SCDs) and accept a sprawl of duplicate tables, or keep the warehouse lean and give up on audit‑ready history. You don’t have to choose. With a zero unmanaged copy approach, you can keep full history and maintain predictable performance—without littering the lakehouse with ad‑hoc exports and orphaned datasets.

This post explains the idea in plain English, shows how mirrored, CDC, historical (SCD2), and snapshot tables fit together, and lays out four deployment options. We’ll also cover the performance guardrails—partitioning, clustering, idempotent merges, micro‑batches, workload isolation, and observability—so your SLAs stay green as data scales.

FabCon Feature: OneLake Security

Fabric’s second European conference didn’t just showcase new toys; it tightened the platform’s governance spine. Microsoft moved OneLake Security into full preview and added a Secure tab to the OneLake catalog—a single place to see and manage data‑level permissions across items. That elevates lake‑native RBAC from a feature to a first‑class control surface, so product teams can set access once, at the path where the bytes live, and have it enforced consistently.

You Can Own a Data Product Without Writing a Line of Code

If you’re on an operations or business team and someone just asked you to “own the data product,” you might be thinking: I don’t code—how could I own it?
Good news: owning a data product is a leadership role, not a coding job.

Think of it like being a product owner in other fields:

Software product manager: sets the roadmap and defines success while engineers write the code.
Consumer hardware lead: chooses features and quality thresholds while factories assemble devices.
Marketing campaign owner: decides audience and outcomes while creative and media teams execute.
Publishing editor: shapes the issue and deadlines while writers and designers produce.
Construction owner’s rep: defines scope, budget, and acceptance criteria while contractors build.

In every case, the owner doesn’t run the machinery. They own the outcomes: what gets built, why it matters, who it serves, how “good” it needs to be, and how it changes over time. Data product ownership is the same. You set direction, make trade‑offs, and keep everyone informed. Others handle the pipes, queries, and platforms.

What’s different with data is simply the medium: it behaves like a service. Your “product” is a reliable, permission‑aware way to answer recurring questions. Your promise is about freshness, accuracy, and clarity, not lines of code.

Fabric data ingestion: what to use when

Data platforms usually fail in two predictable ways: they drown in shadow copies nobody owns, or they calcify around a single ingestion pattern that does not fit every source. Microsoft Fabric offers a broader palette. You can read data where it already lives, replicate operational systems into a governed lake, and run high‑throughput batch and low‑latency streams without wiring a dozen services together. The work isn’t picking a tool; it’s choosing deliberately so your estate stays fast, testable, and governable as it grows.

This guide treats Zero Unmanaged Copies (ZUC) as a strong—but not exclusive—operating model. ZUC constrains where bytes land and keeps lineage simple: if data persists, it is inside OneLake under policy and catalog; if it does not need to persist, you read it in place. Many teams will also continue to run a traditional lakehouse with raw/bronze landings, curated silver, and published gold. Fabric supports both because everything converges on OneLake (the boundary) and Delta (the table format). We evaluate each option with consistent criteria: performance (bulk throughput and end‑to‑end latency), operational surface (how much you must run and monitor), governance posture (where data persists and how it is secured), team ergonomics (SQL, Spark, or low‑code), and table health (file sizes, partitioning, Delta logs).

For clarity: zero‑copy means reading in place. Managed copy means materializing inside OneLake with lineage. Unmanaged copy is anything persisted outside governance—temporary blobs, stray CSV drops, buckets with unclear ownership. ZUC eliminates that last category; a traditional lakehouse allows governed staging and raw landings as part of the pipeline.

Digital Workers, the White Space, and How to “Hire” One (with the Right Partner)

Every organization has white space: important work that lives between teams and across systems, is almost always evidence‑bearing, and—despite its value—rarely reaches the top of the backlog. In software engineering, that’s the unglamorous backbone of quality: keeping documentation and runbooks current, sustaining full test coverage (beyond unit tests), and validating against standards (security, accessibility, SBOM/licensing). In manufacturing, it shows up as traceability and shipment evidence (SPC, PPAP/FAI, calibration certificates) and keeping control plans/PFMEA in sync with engineering changes. In education, it appears as standards alignment of curricula, accessibility/privacy checks across LMS content, and intervention follow‑through after assessments. These jobs cross many systems, require judgment, must leave an audit trail, and are perpetually “important but not urgent”—perfect territory for delegating to digital workers: software teammates that live in the seams, move work to done, and attach the receipts as they go.

“To effectively delegate these tasks they need knowledge, access, and some intangibles.” (Nathan Lasnoski)

A digital worker earns real delegation only when three things are in place: knowledge (trusted sources, rubrics, examples), access (the right tools and permissions under guardrails), and the intangibles of a good teammate (when to act vs. ask, tone, and norms). With that foundation, a coaching worker can also serve as worker‑as‑judge—applying explicit rubrics, pulling evidence across systems, returning a pass/fail or “needs work” with a brief rationale, and providing an easy appeal to a human. The payoff is fast, fair, actionable feedback that feels like a senior reviewer on call 24/7—something frontline teams welcome.

Data Mesh Isn’t Just for Tech Companies

If you’ve skimmed headlines, it’s easy to conclude that data mesh is a Silicon Valley thing—something streaming apps and fintechs use to wrangle petabytes. That mental model sells a lot of tools, but it misses the point. Data mesh is first an operating model—a way to organize people, responsibilities, and guardrails so data can be produced and used where the knowledge lives. That matters just as much (and often more) in organizations whose mission is not building software: manufacturers, hospitals, universities, public-sector agencies, retailers, utilities, and nonprofits.