Ship Your Lakehouse Like Code: Deploying MLVs with a SQL-Only Configuration Notebook

If you’re building with Materialized Lake Views (MLVs), you’ve probably felt the tension: the definitions live in code, but the Lakehouse itself is an environment-specific artifact. That gap is where deployments get messy—schemas drift, tables don’t exist yet, and MLV refresh behavior looks “random” when it’s really just reacting to configuration.

This post lays out a pattern that closes that gap cleanly: a lakehouse configuration notebook that you promote through your deployment pipeline and run in every environment to create schemas, tables, and MLVs idempotently—using SQL cells only. The key is that MLVs are treated as “definition-driven assets” that can be iterated in dev and re-stamped into test/prod with the same notebook.

And we’ll end with the detail you want to institutionalize: the final cell sets Delta Change Data Feed (CDF) the way you want it—because it directly affects whether Fabric uses incremental refresh and whether some “static-source” MLVs appear to not run.

What you’ll build

You’re going to create one notebook whose job is not to transform data day-to-day, but to configure the Lakehouse consistently in each environment:

  • Schemas and tables are created with IF NOT EXISTS.
  • MLVs are deployed with a drop-if-exists / create-if-not-exists pattern (because in practice, MLV definition changes are handled by dropping and re-creating).
  • The notebook is promoted through a Fabric deployment pipeline, and the pipeline uses a Default Lakehouse deployment rule so the notebook always runs against the correct Lakehouse—no parameters required.

This is the “Lakehouse-as-Code” move that makes Microsoft Fabric feel like a modern data engineering platform instead of a collection of manually configured workspaces.

Why SQL-only is the right default for MLV deployment

Fabric positions MLVs as a SQL-first capability. In fact, one of the documented limitations is that declarative syntax in PySpark isn’t available, and the recommended approach is to use Spark SQL syntax to create and refresh MLVs.

That maps perfectly to a deployment notebook made entirely of SQL cells:

  • It’s readable.
  • It’s reviewable.
  • It’s portable across environments.
  • And it avoids hidden runtime state that can creep in when notebooks mix languages.

Eliminate parameters by using a Default Lakehouse deployment rule

This is one of those “simple but powerful” operational decisions.

In Fabric deployment pipelines, you can set a Default Lakehouse rule for notebooks so that when the notebook is deployed into the target stage, it’s connected to (and runs against) the correct Lakehouse there.

When you do this well, your configuration notebook doesn’t need environment parameters at all. It can simply assume: “the default Lakehouse is correct here,” and proceed to build schemas, tables, and MLVs inside that Lakehouse.

That’s the most maintainable version of this pattern.

The idempotent notebook pattern that works in real pipelines

A configuration notebook should be safe to run:

  • on an empty Lakehouse
  • after partial failures
  • after prior deployments
  • after definition changes in dev

The trick is that MLVs are special: to update an MLV definition, you drop it and re-create it (ALTER is only supported for renaming).

So your notebook needs to embrace that reality and make the drop/recreate workflow predictable.

A practical SQL notebook skeleton

Below is a structure you can paste into a Fabric notebook and keep as SQL-only. Treat each section as its own SQL cell.

-- =========================
-- CELL: Schemas (idempotent)
-- =========================
CREATE SCHEMA IF NOT EXISTS bronze;
CREATE SCHEMA IF NOT EXISTS silver;
CREATE SCHEMA IF NOT EXISTS gold;
-- =========================
-- CELL: Base tables (idempotent)
-- =========================
CREATE TABLE IF NOT EXISTS bronze.orders (
  order_id        STRING,
  customer_id     STRING,
  order_ts        TIMESTAMP,
  order_amount    DECIMAL(18,2)
) USING DELTA;

CREATE TABLE IF NOT EXISTS bronze.customers (
  customer_id     STRING,
  customer_name   STRING,
  region          STRING
) USING DELTA;

We don’t want to use a “CREATE OR REPLACE” pattern here because that might drop a table with data. You should be either using ALTER statements below this, or being extremely explicit if you want a drop and replace.

-- =====================================
-- CELL: MLVs (drop/recreate for updates)
-- =====================================

-- If your runtime doesn't accept IF EXISTS here, guard with SHOW MATERIALIZED LAKE VIEWS and conditionally drop.

DROP MATERIALIZED LAKE VIEW IF EXISTS silver.orders_clean;
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.orders_clean AS
SELECT
  o.order_id,
  o.customer_id,
  CAST(o.order_ts AS DATE) AS order_date,
  o.order_amount,
  c.region
FROM bronze.orders o
LEFT JOIN bronze.customers c
  ON o.customer_id = c.customer_id
WHERE o.order_amount IS NOT NULL;

DROP MATERIALIZED LAKE VIEW IF EXISTS gold.daily_sales;
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS gold.daily_sales AS
SELECT
  order_date,
  region,
  SUM(order_amount) AS total_sales
FROM silver.orders_clean
GROUP BY order_date, region;

Make refresh behavior intentional (and don’t confuse “skip refresh” with failure)

Fabric’s optimal refresh policy for MLVs includes an explicit “No refresh” behavior:

If the source remains unchanged—meaning no delta commit change is detected—Fabric can skip the refresh.

That’s a feature, not a bug. But in deployments, it becomes a source of confusion because it can look like “the MLV didn’t run.”

If your deployment depends on the MLV being physically materialized right now (for validation, smoke testing, or downstream objects), consider adding an explicit refresh step after creation:

-- =========================
-- CELL: Deployment refresh
-- =========================
REFRESH MATERIALIZED LAKE VIEW gold.daily_sales FULL;

That keeps your release deterministic, which is exactly what you want in a pipeline.

The final cell rule: set Delta Change Data Feed exactly how you want it

Now to the thing you called out—and it’s worth repeating because it prevents hard-to-diagnose behavior later.

Incremental refresh requires Delta change data feed enabled (delta.enableChangeDataFeed=true) on all dependent sources referenced in the MLV definition.

So your notebook should end with a single “policy cell” that makes CDF posture explicit.

-- ==========================================
-- CELL (FINAL): Delta Change Data Feed policy
-- ==========================================

-- Enable on core sources feeding MLVs where you want incremental refresh capability
ALTER TABLE bronze.orders SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
ALTER TABLE bronze.customers SET TBLPROPERTIES (delta.enableChangeDataFeed = true);

-- Example: disable on truly static/seed tables (if applicable)
-- ALTER TABLE bronze.date_dim SET TBLPROPERTIES (delta.enableChangeDataFeed = false);

This is also the safest way to configure CDF because it’s a table property, not a session-level setting—and Fabric calls out that Spark properties set at the session level aren’t applied during scheduled lineage refresh.

The static-table gotcha: why your date-table MLV “may not run” with CDF-driven behavior

This is the one that catches teams.

If you build an MLV off a static table (classic example: a date dimension that only changes once a year), Fabric’s refresh engine can legitimately decide there’s nothing to do.

In optimal refresh terms: if no delta commit change is detected, Fabric can skip refresh.

When you enable CDF broadly (which you often do to support incremental refresh across an MLV chain), static-source MLVs can end up in a “nothing changed” posture most of the time. Operationally, that can look like:

  • schedules firing but doing nothing
  • lineage showing skipped refreshes
  • engineers wondering whether the view is broken

The fix is not “turn CDF off everywhere.” The fix is to be deliberate:

  • Materialize static-source MLVs once during deployment (full refresh), then don’t expect frequent refresh.
  • Or separate truly static dimensions from CDF-driven chains.
  • Or disable CDF on those static seed tables if your refresh expectations don’t align with change-driven behavior.

This is where #DeltaLake mechanics meet real-world operations.

Where this fits in a deployment pipeline

Here’s the flow that stays simple and repeatable:

  1. Develop and adjust the notebook in dev until it produces the exact Lakehouse shape you want.
  2. Promote the notebook through the deployment pipeline.
  3. Use Default Lakehouse deployment rules so the notebook binds to the correct Lakehouse in test and prod—no parameters required.
  4. Run the notebook in each environment to stamp the schemas, tables, MLVs, and CDF settings into place.

One more operational reminder: dropping or renaming MLVs affects lineage and scheduled refresh, so treat the notebook as the source of truth for definitions and be prepared to re-establish schedules as needed.

Conclusion: make your Lakehouse predictable on purpose

We covered the core moves:

  • Use a SQL-only lakehouse configuration notebook because MLVs are designed to be driven by Spark SQL.
  • Eliminate environment parameters by using a Default Lakehouse deployment rule in your pipeline.
  • Deploy MLVs with a drop-if-exists / create-if-not-exists approach because definition updates are handled by drop and re-create.
  • End with a clear final cell that sets Delta Change Data Feed intentionally—because it governs incremental refresh and can make static-source MLVs look like they “don’t run.”

If you take one action after reading this, make it this: create the configuration notebook, promote it like code, and run it in every environment. That’s how you keep your Lakehouse stable while still moving fast.

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.

Discover more from EduDataSci - Educating the world about data and leadership

Subscribe now to keep reading and get access to the full archive.

Continue reading