Materialized Lake Views (MLVs) in Microsoft Fabric

A Materialized Lake View (MLV) is a table in your Fabric lakehouse that’s defined by a SQL query and kept up‑to‑date by the service. You write one CREATE MATERIALIZED LAKE VIEW … AS SELECT … statement; Fabric figures out dependencies, materializes the result into your lakehouse, and refreshes it on a schedule. Today, MLVs are in preview, SQL‑first (Spark SQL), and designed to make Medallion layers (Bronze → Silver → Gold) declarative instead of hand‑assembled pipelines.

What happens when an MLV “runs”?

  1. Scheduling and DAG. Fabric reads dependencies between your MLVs (e.g., a Gold MLV that selects from a Silver MLV) and builds a lineage graph. When the schedule fires, Fabric executes that graph in dependency order. You can open Manage materialized lake views to see that lineage and its execution status.
  2. Refresh behavior. When a refresh is triggered, Fabric decides whether to run or skip based on whether sources changed. If it runs, the refresh is currently a full refresh, though incremental refresh is coming soon. If your MLV reads non‑Delta sources, though, refresh is always full.
  3. Run timing and limits. You schedule in the UI (minute/hour/day/week/month cadence). An MLV run that exceeds 24 hours fails. Use the Monitor hub to see run history, duration, and to drill into failures.
  4. Where is the data? The result is stored inside your lakehouse (as Delta data in OneLake), so any Fabric workload that reads Delta (including Direct Lake semantic models) can consume it without another copy.

What an MLV is not (and why that matters)

  • Not a classic SQL view. A normal view re‑runs its SELECT at query time. An MLV materializes data ahead of time on a schedule, so downstream reads are fast and predictable.
  • Not just a CTAS table you rebuild manually. CTAS gives you a table—but not lineage, scheduling, data‑quality rules, or run monitoring. MLVs add a managed refresh pipeline, a dependency graph, and integrated monitoring.
  • Not a cross‑lakehouse orchestrator. Today’s lineage and execution stay inside one lakehouse. Keep each MLV flow within a single lakehouse boundary (you can still read external data, but the orchestrated lineage doesn’t span lakehouses).

Environment prerequisite: Use a schema‑enabled lakehouse (and Fabric Runtime 1.3+).


The Spark SQL you write—explained line‑by‑line

Here’s a Silver‑layer MLV that cleans orders and enforces two quality rules:

CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.ordersummary
(
  CONSTRAINT nonneg_amount CHECK (TotalAmount >= 0) ON MISMATCH DROP,
  CONSTRAINT order_id_not_null CHECK (OrderNo IS NOT NULL) ON MISMATCH FAIL
)
PARTITIONED BY (OrderDate)
COMMENT "Clean order summaries joined to customers; basic quality rules"
TBLPROPERTIES ("domain"="sales","quality"="silver")
AS
SELECT
  o.OrderNo,
  o.CustomerNo,
  CAST(o.order_ts AS DATE) AS OrderDate,
  o.TotalAmount,
  c.CustomerSurrogateKey
FROM bronze.orders os
LEFT JOIN bronze.customers c
  ON CustomerNo = c.CustomerNo AND o.order_ts >= c.MetaEffectiveFrom and o.order_ts < c.MetaEffectiveTo;

Why these clauses exist:

  • CONSTRAINT … CHECK (…) ON MISMATCH lets you express data‑quality rules inside the view definitionDROPexcludes bad rows but keeps the run moving (and you’ll see counts in lineage). FAIL stops the run on the first violation—good for hard contracts. If you define both, FAIL wins.
  • PARTITIONED BY (OrderDate) organizes the MLV’s storage layout for downstream performance.
  • COMMENT and TBLPROPERTIES help with discoverability and tagging (for you—Fabric doesn’t “use” your properties beyond storing them).
  • IF NOT EXISTS makes repeated deployments idempotent.
  • Syntax surface. All of this is Spark SQL; PySpark has no declarative MLV builder yet.

Now compose a Gold MLV on top of that Silver:

CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS gold.RevenueByDay
AS
SELECT
  OrderDate,
  COUNT(*)          AS orders,
  SUM(TotalAmount) AS revenue
FROM silver.ordersummary
GROUP BY OrderDate;

Operating with SQL (typical day‑to‑day):

-- Manually force a run when needed (runs are full in preview)
REFRESH MATERIALIZED LAKE VIEW gold.RevenueByDay FULL;

-- Inventory and introspection
SHOW MATERIALIZED LAKE VIEWS IN silver;
SHOW CREATE MATERIALIZED LAKE VIEW gold.RevenueByDay;

-- ALTER can only rename; other changes require DROP + CREATE
ALTER MATERIALIZED LAKE VIEW gold.RevenueByDay RENAME TO gold.RevenueDaily;
DROP  MATERIALIZED LAKE VIEW gold.RevenueByDay;

See the Spark SQL reference for the full grammar, supported commands, and preview‑era limits (e.g., no time‑travel in CREATE, no temp views in definitions, no UDFs in CTAS; CTEs can create refresh issues).


Data quality rules—what actually happens

  • FAIL stops the refresh at the first violation. Use this to protect downstream consumers when your contract must be honored.
  • DROP removes violating rows and records the dropped‑row count in lineage, so you can track trends without breaking the pipeline.
  • If you need to change constraints after the fact, you recreate the MLV (updating constraints in place isn’t supported yet). There are also restrictions on constraint expressions and a known issue with FAIL in some scenarios.

Lineage, scheduling, and monitoring (the “operational” side)

  • Lineage view. Shows each node (sources, MLVs) and the edges between them. It isn’t just a picture: it mirrors the dependency order Fabric uses to execute your flow. Table Shortcuts aren’t shown in MLV lineage today.
  • Scheduling. You turn on a schedule from the lineage/management page, choose frequency and timezone, and Fabric triggers the DAG accordingly. Long‑running MLVs time‑out at 24 hours.
  • Monitor hub. A single place to see run history, durations, status, and to cancel in‑flight runs and drill into details.

Why MLVs pair naturally with Direct Lake models

Because the materialized output is just Delta data in OneLake, you can point a Direct Lake semantic model at your Gold MLV tables. That keeps transformation logic in the lake (not duplicated in Power Query/DAX), while Direct Lake loads only the data it needs into memory for interactive BI. It’s a clean separation: MLVs curate → Direct Lake serves.


Preview‑era “gotchas” to be aware of

  • SQL only. MLV CREATE/REFRESH is Spark SQL; there’s no declarative PySpark DSL. (More Info)
  • Full refresh when it runs. The service may skip a run if nothing changed; otherwise it’s a full recomputation (incremental not available yet). Non‑Delta sources always full. (More Info)
  • Edgy syntax and features. All‑caps schema names aren’t supported; no time‑travel in CREATE; no temp views, UDFs in CTAS; CTEs in the CREATE body can cause later refresh failures. (More Info)

Why teams should adopt MLVs

If your team already thinks in SQL, MLVs let you move transformation logic into the lake, express quality rules right where the data is defined, and let Fabric handle execution, dependency order, and monitoring. Your BI layer (especially with Direct Lake) can then read those curated Delta tables directly—cleanly separating curation from consumption.

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.