Materialized Lake Views—A Real‑World Demo with F1 Data

This post demonstrates Materialized Lake Views (MLVs) in a realistic Microsoft Fabric setup. We pair MLVs with Real‑Time Intelligence (RTI)—specifically an Eventhouse (KQL database)—to show how event‑scale data (lap times and race results) can feed a medallion‑style model with no bespoke pipelines and clean governance.

We’ll use the Kaggle Formula 1 World Championship dataset (link shows 1950–2024) and build this end‑to‑end:

  • Eventhouse (RTI) to own lap_times and results, surfaced to OneLake as Delta via OneLake Availability.
  • Lakehouse to declare MLVs (Silver/Gold) over those Delta tables.
  • Warehouse to query Lakehouse tables with three‑part names (no replication).
  • Power BI Semantic Model in Direct Lake over the materialized Delta outputs.

1) Concept & dataset

Concept: Make MLVs the “engine” of your medallion layers. Keep high‑volume facts in Eventhouse, expose them to OneLake as Delta, and declare your Silver/Gold logic as MLVs in the Lakehouse. Query the results from a Warehouse and publish a Direct Lake model—without wiring separate pipelines or scattering logic across tools.

Dataset: Kaggle — Formula 1 World Championship (1950–2024). You’ll use:
drivers.csvconstructors.csvcircuits.csvraces.csvresults.csvlap_times.csv.


2) Reference architecture (MLV‑first)

Kaggle CSVs (drivers, constructors, circuits, races, results, lap_times)
        └──► Eventhouse (KQL DB): ingest CSVs as KQL tables
               └──► OneLake Availability: exposes Eventhouse tables as Delta

Lakehouse
  └──► Table Shortcuts to the Eventhouse-backed Delta tables
  └──► Materialized Lake Views (Silver/Gold) declared in Spark SQL

Warehouse (SQL)
  └──► Cross-database queries (database.schema.table) against Lakehouse Delta

Power BI
  └──► Direct Lake model bound to MLV Delta outputs (avoid SQL views to prevent fallback)

Why this pattern? Eventhouse is purpose‑built for event‑scale ingestion and ad‑hoc KQL. OneLake Availability turns those tables into Delta so every other engine can read them without copy steps. MLVskeep transformation logic declarative, governable, and monitorable.


3) Create Fabric items

  • Eventhouse (KQL DB): f1_event
  • Lakehouse: f1_lake (enable schemas)
  • Warehouse (SQL): f1_wh (optional but useful for SQL demos)

Note that when you create your lakehouse, you won’t see a default semantic model. This model has been sunset, and now you will need to create a semantic model manually, following best practice. This avoids a number of errors.


4) Load the dataset into Eventhouse (KQL)

Use the Eventhouse ingestion wizard or KQL commands. Below is a concise pattern you can adapt.

4.1 Create tables

// Event tables
.create table lap_times (
  raceId:int, driverId:int, lap:int, position:int, [time]:string, milliseconds:long
);

.create table results (
  resultId:int, raceId:int, driverId:int, constructorId:int,
  number:int, grid:int, position:string, positionText:string, positionOrder:int,
  points:real, laps:int, [time]:string, milliseconds:long,
  fastestLap:int, rank:int, fastestLapTime:string, fastestLapSpeed:real,
  statusId:int
);

// Reference tables
.create table drivers (
  driverId:int, driverRef:string, number:int, code:string,
  forename:string, surname:string, dob:string, nationality:string, url:string
);

.create table constructors (
  constructorId:int, constructorRef:string, name:string, nationality:string, url:string
);

.create table circuits (
  circuitId:int, circuitRef:string, name:string, location:string,
  country:string, lat:real, lng:real, alt:int, url:string
);

.create table races (
  raceId:int, year:int, round:int, circuitId:int,
  name:string, [date]:string, [time]:string, url:string
);

4.2 Define CSV mappings (example for one table; repeat as needed or use the wizard)

.create table results ingestion csv mapping "ResultsCsv"
'['
 '{"column":"resultId","datatype":"int","Ordinal":"0"},'
 '{"column":"raceId","datatype":"int","Ordinal":"1"},'
 '{"column":"driverId","datatype":"int","Ordinal":"2"},'
 '{"column":"constructorId","datatype":"int","Ordinal":"3"},'
 '{"column":"number","datatype":"int","Ordinal":"4"},'
 '{"column":"grid","datatype":"int","Ordinal":"5"},'
 '{"column":"position","datatype":"string","Ordinal":"6"},'
 '{"column":"positionText","datatype":"string","Ordinal":"7"},'
 '{"column":"positionOrder","datatype":"int","Ordinal":"8"},'
 '{"column":"points","datatype":"real","Ordinal":"9"},'
 '{"column":"laps","datatype":"int","Ordinal":"10"},'
 '{"column":"time","datatype":"string","Ordinal":"11"},'
 '{"column":"milliseconds","datatype":"long","Ordinal":"12"},'
 '{"column":"fastestLap","datatype":"int","Ordinal":"13"},'
 '{"column":"rank","datatype":"int","Ordinal":"14"},'
 '{"column":"fastestLapTime","datatype":"string","Ordinal":"15"},'
 '{"column":"fastestLapSpeed","datatype":"real","Ordinal":"16"},'
 '{"column":"statusId","datatype":"int","Ordinal":"17"}'
']';

4.3 Ingest from storage

My recommendation here is to use the wizard to ingest the data you need via a shortcut. (I pulled the data for the “changeable” items, like the results and standings from an Azure Blob Storage data source after I had truncated them to remove the 2024 data so that we can demonstrate ingesting new data through the eventhouse.

  1. Click on the ellipsis next to the desired table (…)
  2. Click “Get Data”
  3. Choose your storage method
  4. Fill out your connection details (making sure to properly filter your data to only a single file format)
  5. Validate your schema mappings
  6. Complete!

For completeness, I also created tables for the remaining reference tables: seasons and status, which is supported from this same interface.


5) Enable OneLake Availability on Eventhouse

Turn it on at the database or table level. Eventhouse will publish the KQL tables to OneLake as Delta with a governed latency window. While enabled, some operations (e.g., rename) are restricted. This makes your data instantly consumable by Lakehouse, Warehouse, and Power BI without extra copy steps.


6) In the Lakehouse, create Table Shortcuts to Eventhouse tables

Create new schemas, bronze, silver, and gold in the lakehouse.

From f1_lake → Tables → bronze ▸ New Shortcut → Source: KQL database → select driversconstructorscircuitsracesresultslap_times.
They appear as Delta tables in the Lakehouse—queryable by Spark, the SQL analytics endpoint, and Direct Lake.


7) Declare Materialized Lake Views (MLVs)

Prereqs: In the Lakehouse, schemas must be enabled. Create MLVs from a notebook using Spark SQL. You’ll monitor, schedule, and see lineage from the Managed materialized lake views pane.

7.1 Silver layer (standardize/enrich)

-- silver.dim_driver
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.dim_driver AS
SELECT CAST(driverId AS INT)          AS driver_id,
       CONCAT(forename, ' ', surname) AS driver_name,
       code,
       nationality,
       CAST(dob AS DATE)              AS dob
FROM drivers;

-- silver.dim_constructor
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.dim_constructor AS
SELECT CAST(constructorId AS INT) AS constructor_id,
       name AS constructor_name,
       nationality
FROM constructors;

-- silver.dim_circuit
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.dim_circuit AS
SELECT CAST(circuitId AS INT) AS circuit_id,
       name AS circuit_name,
       location,
       country
FROM circuits;

-- silver.dim_race (join circuits)
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.dim_race AS
SELECT r.raceId             AS race_id,
       r.year               AS season,
       r.round,
       r.name               AS race_name,
       CAST(r.date AS DATE) AS race_date,
       r.circuitId          AS circuit_id,
       c.circuit_name,
       c.location,
       c.country
FROM races r
LEFT JOIN silver.dim_circuit c
  ON r.circuitId = c.circuit_id;

-- silver.results_clean with data quality
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.results_clean
(
  CONSTRAINT pos_order_valid CHECK (positionOrder >= 1) ON MISMATCH DROP,
  CONSTRAINT points_nonneg   CHECK (points >= 0)        ON MISMATCH FAIL
)
AS
SELECT resultId, raceId, driverId, constructorId,
       grid, positionOrder, points, laps,
       fastestLap, fastestLapTime, fastestLapSpeed
FROM results;

-- (Optional) silver.laps_clean
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.laps_clean
(
  CONSTRAINT lap_ms_nonneg CHECK (milliseconds >= 0) ON MISMATCH DROP
)
AS
SELECT raceId, driverId, lap, position, milliseconds
FROM lap_times;

7.2 Gold layer (analytics‑ready facts)

-- gold.fact_results
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS gold.fact_results AS
SELECT d.driver_id,
       co.constructor_id,
       r.season,
       r.race_id,
       rc.grid,
       rc.positionOrder AS finish_pos,
       rc.points,
       rc.laps,
       rc.fastestLap,
       rc.fastestLapTime,
       rc.fastestLapSpeed
FROM silver.results_clean rc
JOIN silver.dim_race       r  ON rc.raceId       = r.race_id
JOIN silver.dim_driver     d  ON rc.driverId     = d.driver_id
JOIN silver.dim_constructor co ON rc.constructorId = co.constructor_id;

-- gold.fact_laps
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS gold.fact_laps AS
SELECT r.season,
       l.raceId       AS race_id,
       l.driverId     AS driver_id,
       l.lap,
       l.position     AS lap_position,
       l.milliseconds AS lap_time_ms
FROM silver.laps_clean l
JOIN silver.dim_race  r ON l.raceId = r.race_id;

Operate the MLVs: From Manage materialized lake views, set a schedule, trigger a manual refresh, review lineage, and inspect data‑quality results (dropped rows vs. failures). Current behavior is full refresh on change; runs are skipped when no inputs changed.


8) Query from the Warehouse (no copies)

From f1_wh, use three‑part names (database.schema.table) to query Lakehouse tables directly:

-- Inspect gold fact
SELECT TOP 10 *
FROM f1_lake.gold.fact_results
ORDER BY season DESC, race_id DESC;

-- Top winners
SELECT d.driver_name, COUNT(*) AS wins
FROM f1_lake.gold.fact_results fr
JOIN f1_lake.silver.dim_driver d
  ON fr.driver_id = d.driver_id
WHERE fr.finish_pos = 1
GROUP BY d.driver_name
ORDER BY wins DESC;


9) Build the Semantic Model (Power BI) in Direct Lake

Bind model tables directly to the Delta outputs (gold.*, selected silver.*). Avoid layering SQL views between Power BI and your Delta tables; views can cause Direct Lake → DirectQuery fallback. Use physical Delta tables (your MLV outputs) for the best performance.

Starter DAX

Wins := COUNTROWS( FILTER( 'fact_results', 'fact_results'[finish_pos] = 1 ) )
Podiums := COUNTROWS( FILTER( 'fact_results', 'fact_results'[finish_pos] <= 3 ) )
Avg Lap (ms) := AVERAGE('fact_laps'[lap_time_ms])
Points := SUM('fact_results'[points])


10) Notes & gotchas

  • MLVs today: created via Spark SQL; refresh is full on change with skipped runs when unchanged.
  • Eventhouse availability: while enabled, certain DDL operations (e.g., rename) are restricted; configure latency to balance freshness vs. cost.
  • Direct Lake: binding to tables (not SQL views) avoids fallback to DirectQuery.
  • Alternate path for reference data: If you prefer not to store dims in Eventhouse, you can import the CSVs into the Lakehouse as Delta (e.g., “Create table from files” or a short Spark notebook). This does create a managed copy, but keeps your MLV story intact—still no separate pipeline tool required.

11) Quick validation

KQL (Eventhouse)

results
| summarize total_points = sum(points) by driverId
| top 10 by total_points desc

Spark SQL (Lakehouse)

SELECT season, COUNT(DISTINCT race_id) AS races
FROM gold.fact_results
GROUP BY season
ORDER BY season DESC;

T‑SQL (Warehouse)

SELECT TOP 20 r.season, r.race_name, d.driver_name, fr.finish_pos, fr.points
FROM f1_lake.gold.fact_results AS fr
JOIN f1_lake.silver.dim_race   AS r ON fr.race_id   = r.race_id
JOIN f1_lake.silver.dim_driver AS d ON fr.driver_id = d.driver_id
ORDER BY r.season DESC, r.race_id DESC, fr.finish_pos ASC;


Why Materialized Lake Views (MLVs) matter

  • Declarative pipelines: one Spark SQL statement defines each transform. Fabric handles orchestration, lineage, and monitoring—no separate ETL/pipeline asset to wire up.
  • Governed & observable: MLVs show source dependencies, run history, and data‑quality outcomes in a single, first‑class UI.
  • Built‑in data quality: add CHECK constraints with ON MISMATCH DROP | FAIL to reject or fail on bad data—without custom code.
  • Predictable refresh: full refresh when sources change, skipped runs when they don’t—simple behavior that’s easy to explain to stakeholders.
  • Open downstream: MLV outputs are Delta tables in OneLake—query them from Warehouse (SQL), notebooks, and Power BI in Direct Lake for fast BI.

Closing

MLVs turn your medallion layers into governed assets: one statement per transform, first‑class lineage, schedulable refresh, and built‑in data quality. Pair them with Eventhouse and OneLake Availability to keep ingestion fast and analytics open—then light up Warehouse and Direct Lake without extra copy steps or pipeline sprawl.

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.