Data Quality as Code in Fabric: Declarative Checks on Materialized Lake Views

If you’ve ever shipped a “clean” silver or gold table only to discover (later) that it quietly included null keys, impossible dates, or negative quantities… you already know the real pain of data quality.

The frustration isn’t that bad data exists. The frustration is that quality rules often live somewhere else: in a notebook cell, in a pipeline activity, in a dashboard someone checks (sometimes), or in tribal knowledge that never quite becomes a contract.

Microsoft Fabric’s Materialized Lake Views (MLVs) give you a more disciplined option: you can define declarative data quality checks inside the MLV definition using constraints, and then use Fabric’s built-in monitoring, lineage, and embedded Power BI Data Quality reports to understand how quality is trending across your lakehouse and your data products.

In this post, I’ll cover what these checks look like, how to add them, and—most importantly—how to turn them into quality signals you can operationalize for a Microsoft Fabric lakehouse and the Data Engineering teams who depend on it.

It’s important to note, here, that we’re looking at structural data quality here. Data Integrity – making sure that your data is following your business logic, makes sense, and isn’t drifting, is another discipline, and while these techniques can be adapted for it, there’s other ways to implement that that are more efficient.

Why declarative checks change the game

In a medallion-style lakehouse, we often talk about bronze → silver → gold as layers of refinement. But refinement is really two things:

Transformations (joins, projections, aggregations, standardization)
Assertions (what must be true for the dataset to be trusted)

Materialized lake views are already designed to simplify transformation pipelines with a SQL-first, declarative approach, and the feature is explicitly positioned for medallion-style workflows.

The “quality as code” shift happens when you treat those assertions as first-class parts of the definition, too, not scattered logic in five places. In MLVs, that’s exactly what constraints enable. This brings data engineering far closer to the ideals of test driven development than ever before, but does it in a way that recognizes that the data environment changes every ingestion, not every deployment.

What Fabric supports today: constraints on Materialized Lake Views

In Fabric, you can define data quality constraints directly in the CREATE MATERIALIZED LAKE VIEW statement using:

CONSTRAINT <name> CHECK (<condition>)
plus an action for violations: ON MISMATCH DROP or ON MISMATCH FAIL

The action matters:

FAIL stops the refresh when a constraint is violated (and it’s the default behavior even if you don’t specify it).
DROP keeps the refresh going but removes rows that violate the constraint, and Fabric surfaces the count of removed records in the lineage view.

If you define both behaviors across constraints, FAIL takes precedence.

Step-by-step: add declarative data quality checks to an MLV

This is a practical, minimal path—enough to be useful without turning into a full framework.

1) Confirm the prerequisites

MLVs require:

A Fabric capacity-enabled workspace
A lakehouse with schemas enabled
Fabric Runtime 1.3

2) Decide where the check belongs

A good rule of thumb:

Silver or transformation layer: validate structure and “obvious wrong” values (null keys, negative quantities, impossible ranges).
Gold or surface layer: validate metric integrity and business rules that downstream reporting assumes.

This is where you start treating the dataset like a data quality contract for data product.

3) Declare constraints in the MLV definition

Here’s the core syntax pattern (straight from the Spark SQL reference, with room for multiple constraints):

CREATE [OR REPLACE] MATERIALIZED LAKE VIEW [IF NOT EXISTS] silver.some_view
(
  CONSTRAINT constraint_name1 CHECK (condition_expression1) ON MISMATCH DROP,
  CONSTRAINT constraint_name2 CHECK (condition_expression2) ON MISMATCH FAIL
)
AS
SELECT ...
;

And here’s a more “real” example you can adapt for a silver view. It uses checks that are easy to reason about and avoids pattern searches like LIKE/regex, which are currently restricted in constraint conditions.

CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.cleaned_order_data
(
  CONSTRAINT order_id_not_null CHECK (order_id IS NOT NULL) ON MISMATCH FAIL,
  CONSTRAINT quantity_positive CHECK (quantity > 0) ON MISMATCH DROP,
  CONSTRAINT revenue_non_negative CHECK (revenue >= 0) ON MISMATCH DROP
)
AS
SELECT
  o.order_id,
  o.order_date,
  o.product_id,
  p.product_name,
  o.quantity,
  p.price,
  o.quantity * p.price AS revenue
FROM bronze.orders o
JOIN bronze.products p
  ON o.product_id = p.product_id
;

4) Refresh and monitor via lineage and Monitor Hub

Fabric gives you multiple ways to observe run health:

The lineage view is autogenerated based on dependencies, and you can drill into node execution details and logs during runs.
The Monitor hub provides a centralized place to browse MLV runs, filter, and open execution details (and even cancel in-progress runs).

If you use ON MISMATCH DROP, Fabric can surface dropped-row counts in the lineage experience—turning “dirty data happened” into something you can actually measure.

5) Generate the built-in Data Quality report (and use it like a scorecard)

From the Managed materialized lake views page, you can select Data quality report to generate an embedded Power BI report. If it’s your first time, Fabric creates a semantic model and report in the background and stores them in your workspace.

The report is designed to answer questions you should be asking about every data product:

Where are violations happening?
Are violations trending up or down?
Which constraints are causing the most drops?
Which MLVs are the most fragile?

The report has two pages:

Overview: trends for the last week, plus “top violators” across MLVs and constraints.
MLV Detail: deeper drill-down with filters (schema, view name, relative date) and historical trends across several weeks.

One subtle but important definition Fabric makes explicit:

A violation means a row failed one or more constraint conditions.
A row is dropped only once, even if it violated multiple constraints, so drops ≤ violations.

That’s exactly the kind of nuance you want captured in platform telemetry—not re-discovered in incident reviews.

6) Add alerts so quality regressions aren’t a surprise

Fabric’s guidance for the default data quality report includes setting alerts via Data Activator directly from the report UI (“Set Alert”). You can alert on measures like total violations, rows dropped, and more, then route notifications to Teams or Outlook.

This is where data quality starts behaving like observability: you’re not “checking data quality,” you’re monitoring it.

Using these checks to understand quality across your lakehouse (and your data products)

The value isn’t the CHECK clause. The value is the loop it creates:

Constraints define what “good” means where the transformation lives.
Refresh runs produce violations/drops as measurable events.
Lineage, Monitor Hub, and the Data Quality report turn those events into trends.
Alerts turn trends into action.

That loop is the foundation for treating curated tables as products: the consumer doesn’t just get data—they get a signal about how trustworthy that data is today.

If you’re using optimal refresh, Fabric can choose incremental/full/no refresh modes automatically, and it enforces data quality constraints during updates (including incremental refresh when applicable).

Practical gotchas and current limitations to plan for

MLVs and their quality features are powerful, but they’re not “anything goes.”

A few constraints to design around:

You can’t update constraints in-place. To change them, you must recreate the MLV.
Constraint conditions are limited. Use of functions and pattern searches (LIKE/regex) is restricted in constraint conditions, so keep checks simple and deterministic.
Data quality report naming constraints are real. Special characters or spaces in workspace or lakehouse names can cause report failures.
The embedded report uses DirectQuery and has a default 1M row per-query limit (Premium capacities can exceed it).

None of these are blockers. They’re just the shape of the current feature set—useful to know before you build a dependency on it.

Closing: turn “data quality” into a measurable contract

Here’s the punchline: declarative checks in Materialized Lake Views move data quality from best-effort to intentional.

You define expectations where your transformation logic lives. Fabric enforces them at refresh time. And then Fabric gives you multiple lenses—lineage, Monitor Hub, and the embedded report—to understand whether quality is stable, improving, or quietly slipping.

If you’re building lakehouse layers that others rely on, pick one high-value silver or gold MLV and add two or three constraints this week. Generate the Data Quality report. Set an alert on violations or dropped rows. Start treating your tables like a product with an observable contract.

That’s how data quality becomes real.

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials. View all posts by Jason Miles

Data Quality as Code in Fabric: Declarative Checks on Materialized Lake Views

Why declarative checks change the game

What Fabric supports today: constraints on Materialized Lake Views

Step-by-step: add declarative data quality checks to an MLV

1) Confirm the prerequisites

2) Decide where the check belongs

3) Declare constraints in the MLV definition

4) Refresh and monitor via lineage and Monitor Hub

5) Generate the built-in Data Quality report (and use it like a scorecard)

6) Add alerts so quality regressions aren’t a surprise

Using these checks to understand quality across your lakehouse (and your data products)

Practical gotchas and current limitations to plan for

Closing: turn “data quality” into a measurable contract

Like this:

Related

Author: Jason Miles

Why declarative checks change the game

What Fabric supports today: constraints on Materialized Lake Views

Step-by-step: add declarative data quality checks to an MLV

1) Confirm the prerequisites

2) Decide where the check belongs

3) Declare constraints in the MLV definition

4) Refresh and monitor via lineage and Monitor Hub

5) Generate the built-in Data Quality report (and use it like a scorecard)

6) Add alerts so quality regressions aren’t a surprise

Using these checks to understand quality across your lakehouse (and your data products)

Practical gotchas and current limitations to plan for

Closing: turn “data quality” into a measurable contract

Share this:

Like this:

Related

Author: Jason Miles

Discover more from EduDataSci - Educating the world about data and leadership