Releases Imply Requirements

In a recent post, I argued that a real release is a declaration—a line in the sand that says, this is the version we stand behind. A declaration begs a follow‑up: what exactly are we declaring? The honest answer is: requirements. A release without requirements is just a pile of diffs; a release grounded in requirements is a promise we can audit, test, and keep.

This is where classic software requirements work—yes, the unglamorous kind—earns its keep in data and analytics. If releases create accountability, requirements make that accountability usable.

Continue reading “Releases Imply Requirements”

Releases and CI/CD in Microsoft Fabric — with Variable Libraries That Keep Meaning Stable

I keep saying the quiet part out loud: a modern warehouse ships meaning and trust, not just tables. If meaning changes invisibly, trust evaporates. Releases, Release Flow, and CI/CD in Microsoft Fabric are how you move quickly and keep confidence—by making change observable, reversible, and governed. Fabric’s Variable Library and a deliberate, database‑level metadata library are the glue that make this work day to day.


A release in data: shipping meaning deliberately

A release in data engineering is a versioned bundle—models, DDL, pipelines, notebooks, semantic definitions, and the permissions posture—promoted through environments with intent and traceability. In Fabric, Deployment Pipelines formalize that path (Dev → Test → Prod), including stage‑specific rules that swap connections and parameters so the same artifact behaves correctly in each stage. This keeps tests real but safe and turns promotion into a controlled, reversible act.

Staging should mirror production closely enough that behavior is predictable. Use OneLake Shortcuts to expose prod‑shaped data without copying petabytes, so performance and edge cases surface before users do.


CI in Fabric: prevent “looks fine locally” from reaching people

CI earns its keep the moment it blocks a bad deploy. In Fabric, keep the spine simple:

  • Git integration ties workspaces to branches, making every change reviewable and reproducible. (Mind the “supported items” list as it evolves.)
  • Validate invariants early: compile, lint, and assert keys, referential links, distribution bounds, and metric semantics in your pipelines/notebooks. When CI fails, the business doesn’t.
  • Keep shape realistic: Test with shortcuts and stage‑correct connections so volume, permissions, and latency aren’t surprises later.

CD in Fabric: promote with intent, cut over without drama

Continuous Delivery is less about auto‑pushing and more about predictable promotion:

  • Promote via Deployment Pipelines and stage rules; treat backfills as first‑class release artifacts you observe in the Monitoring hub.
  • Use Power BI App audiences to canary new semantic models and reports to a small internal group; widen only when drift and performance are acceptable.
  • When you outgrow clicking, automate promotion with the fabric‑cicd library in GitHub Actions or Azure DevOps, using service principals for least privilege.

Where Release Flow fits (and why it works for data)

When we say “reflow,” we mean Release Flow—Microsoft’s trunk‑based model with sprint‑scoped release branches and cherry‑picked hotfixes. Keep main moving; cut a release branch to stabilize; merge fixes to main first, then cherry‑pick to the release. Map Dev to main, Test/Prod to the release branch, and promote through your pipeline. It’s fast, auditable, and avoids “fixed in prod, broken next release.”


Variable Library: stage‑aware configuration without hard‑coding

Fabric’s Variable Library is a workspace item that holds named variables and their values per pipeline stage. Items like Data Pipelines and Dataflow Gen2 can consume these variables directly, so the same artifact resolves the right connection, path, or toggle in Dev/Test/Prod—no string‑surgery, no accidental “Test reading Prod.” This is application lifecycle management (ALM) for configuration, not a bag of ad‑hoc parameters.

In practice, Variable Library becomes your single source for things like:

  • connection aliases (e.g., sales_wh_connbronze_lake_path),
  • time windows and data slices for CI runs (e.g., “last 3 days”),
  • feature toggles (e.g., enable a new scoring routine only in Test),
  • stage‑specific destinations (schemas, lake folders) used by pipelines and dataflows.

Because values are bound by stage, a promotion flips behavior without editing code—exactly what you want when reliability and auditability matter.


Safe development and effective testing, Fabric‑style

Develop in isolated workspaces tied to branches. Use Variable Library values to bind stage‑correct connections and “slice” windows; validate contracts from your metadata schema before any model rebuild or backfill runs. Promote with Deployment Pipelines; canary via App audiences; observe in Monitoring; and roll back quickly because promotion was a metadata change, not a long‑running fix‑by‑hand.


Reliability and governance as properties of the system

Define freshness, completeness, and correctness SLOs; then let your CD gates enforce them. Sensitivity labels and Purview’s Unified Catalog close the loop on governance and lineage so your release record isn’t just technical—it’s compliant. When auditors ask, you don’t reconstruct history; you point to it.


The payoff

With Release Flow, CI/CD, Variable Libraries, and a database‑level metadata library, your warehouse stops being fragile plumbing and becomes a platform. Teams ship more often with less drama. Stakeholders trust numbers because the path to those numbers is visible, repeatable, and reversible.

That’s the bar we set: move fast, keep meaning stable, and let your pipeline tell the story of how you did it.

Why We Still Need Real Releases in Data and Analytics

In an era where everything markets itself as “continuous”—continuous integration, continuous delivery, continuous retraining—it can feel quaint to talk about releases. But if we care about reliability and governance, we should talk about them more, not less. A true software‑style release is not nostalgia; it’s a commitment device. It’s the point where we say: this is the version we stand behind, with a clear boundary of what changed, what didn’t, and how long we intend to support it.

At edudatasci.net we work at the seam where data, software, and institutional decision‑making meet. At that seam, releases are how we translate rapid iteration into dependable outcomes—for educators, researchers, and the operational teams who carry real responsibility for real people. Without the concept of a release, our systems may move quickly, but the trust we need from stakeholders never catches up.

Continue reading “Why We Still Need Real Releases in Data and Analytics”

Data Vault, Practically: Why It Exists, How It’s Built, and What 2.1 Changes

Modern data platforms live in tension:

  • Source systems evolve faster than dimensional models can absorb.
  • Audit and lineage are mandatory, but teams still need velocity.
  • Cloud lakehouses, streaming, and domain ownership do not slot neatly into yesterday’s warehouse playbooks.

Data Vault is a response to those pressures. It is both a modeling approach and a delivery method designed to (1) absorb change, (2) preserve complete, immutable history, and (3) decouple integration from consumption. The core building blocks—Hubs, Links, and Satellites—organize into a Raw Vault (source truth, append‑only) and a Business Vault(governed derivations and query assistance). Think of it as a fault‑tolerant integration substrate with a clean seam to marts, semantic models, and data products.

Continue reading “Data Vault, Practically: Why It Exists, How It’s Built, and What 2.1 Changes”

Testing Like We Mean It: Bringing Software‑Grade Discipline to Data Engineering

I like to say that the first product of a data team isn’t a table or a dashboard—it’s trust. Trust is built the same way in data as it is in software: through tests that catch regressions, encode intent, and make change safe. If pipelines are code, then they deserve the same rigor as code. That means unit tests you can run in seconds, integration tests that respect the messy edges of reality, comprehensive tests that exercise the platform end‑to‑end, and user acceptance testing that proves the system answers the questions people actually have. Done well, this isn’t busywork; it’s the backbone of reliability and a pillar of governance.

Continue reading “Testing Like We Mean It: Bringing Software‑Grade Discipline to Data Engineering”

Bronze Is Live Now: what Mirroring + Shortcuts really change about cost, archives, and getting to Silver

For years, “Bronze” quietly became a parking lot for periodic snapshots: copy a slice from the source every hour/day, write new files, repeat. It worked, but it was noisy and expensive—lots of hot storage, lots of ingest compute, and a tendency to let “temporary” landing data turn into de‑facto history.

Fabric upends that with two primitives that encourage Zero Unmanaged Copies:

  • Mirroring: a service‑managed, near–real‑time replica of your database/tables into OneLake, with replication compute included and a capacity‑based allowance of free mirrored storage (1 TB per CU; e.g., an F64 includes 64 TB just for mirrored replicas). You still pay for downstream query/transform compute, but not for the continuous ingest job itself. Retention for mirrored data is explicitly managed and—by default for new mirrors since mid‑June 2025—kept lean (1 day) unless you raise it.
  • Shortcuts: pointers that let Fabric read in place from ADLS/S3/other OneLake locations (and even across tenants via External Data Sharing, which creates a shortcut in the consumer’s tenant rather than duplicating data). That means zero OneLake bytes for the data itself; you pay storage where the data already lives, and Fabric charges only for the compute you use to read/transform it.

Add Real‑Time Intelligence/Eventhouse or Eventstreams, and “Bronze” becomes the live edge: the freshest, governed view of your sources—either replicated (Mirroring) or virtualized (Shortcuts)—instead of a pile of periodic copies.

Continue reading “Bronze Is Live Now: what Mirroring + Shortcuts really change about cost, archives, and getting to Silver”

Governed Innovation: Turning Learning Loops into Enterprise Strategy

Governance, done well, accelerates innovation. That sounds counterintuitive because “governance” often conjures gatekeeping and delay. But in complex systems, enabling constraints—clear aims, decision rights, evidence standards, and risk guardrails—reduce thrash. They let teams move faster with less politics, less ambiguity, and fewer expensive reworks.

Put simply:

Governed innovation = purposeful exploration + disciplined decisions + explicit guardrails.

  • Purposeful exploration means we start from outcomes the organization actually cares about (growth, safety, quality, equity, cost-to-serve) and frame hypotheses against those aims.
  • Disciplined decisions means we pre‑commit to how we’ll read the evidence and when we’ll stop, scale, or adapt.
  • Explicit guardrails means privacy, security, ethics, accessibility, and brand risk are design inputs, not last‑minute vetoes.

Improvement science provides the learning loop (PDSA, practical measurement, driver diagrams). Governed innovation provides the direction (what we test and why), the portfolio (how many bets across time horizons), and the legitimacy (we are learning fast and being good stewards).

Continue reading “Governed Innovation: Turning Learning Loops into Enterprise Strategy”

No Governance, No Mesh: Why Compatibility Is the Currency of Data Products

I love the promise of data mesh: push data ownership to the edges, let domain teams ship data as products, and watch the organization move faster. But here’s the unglamorous truth we keep repeating in classrooms and boardrooms: a mesh without strong, distributed data and analytics governance is just a tangle. Autonomy without agreed‑upon rules yields incompatible data products, brittle integrations, and an ever‑growing integration tax. Governance is not a bolt‑on—it’s the substrate that makes a mesh possible.

Continue reading “No Governance, No Mesh: Why Compatibility Is the Currency of Data Products”

Materialized Lake Views (MLVs) in Microsoft Fabric

A Materialized Lake View (MLV) is a table in your Fabric lakehouse that’s defined by a SQL query and kept up‑to‑date by the service. You write one CREATE MATERIALIZED LAKE VIEW … AS SELECT … statement; Fabric figures out dependencies, materializes the result into your lakehouse, and refreshes it on a schedule. Today, MLVs are in preview, SQL‑first (Spark SQL), and designed to make Medallion layers (Bronze → Silver → Gold) declarative instead of hand‑assembled pipelines.

Continue reading “Materialized Lake Views (MLVs) in Microsoft Fabric”

A New Paradigm for Data Teams: Data Mesh, Data Warehousing and the Upside‑Down Data Product

If you change the sequence, you change the system. Designing Gold → Silver → Bronze → Ingestion—with the semantic model as the contract and the lake/warehouse as implementation—doesn’t just alter build tasks. It reshapes how data mesh and traditional architectures behave. The same platform primitives are available to both, but the incentives shift: domains and central teams stop arguing about “which pipeline stage we’re in” and align on “which product contract we’re honoring.”

Below is how the flip lands in each world—what truly changes, what stubbornly stays the same, and what actually gets better.

Continue reading “A New Paradigm for Data Teams: Data Mesh, Data Warehousing and the Upside‑Down Data Product”