A Lightweight Ingestion Framework in Microsoft Fabric

Modern Fabric estates don’t need a forest of bespoke pipelines, but they do need metadata-driven tools to reduce time to insight. You can land data quickly in Bronze, promote it reliably to Silver and Gold with a metadata‑driven Spark Structured Streaming engine, and treat Gold as the foundation for your data products—semantic models, AI endpoints, and any other served formats.

Bronze: three simple ways to land data

Shortcuts (by‑reference): Create a pointer in OneLake to data that already lives in ADLS Gen2, Amazon S3 (and S3‑compatible), Google Cloud Storage, Dataverse, or other OneLake items—so all Fabric engines can read it through the unified OneLake namespace. Credentials/permissions are centrally managed, and you can even reach on‑prem or network‑restricted sources via the Fabric gateway. (S3/GCS shortcuts are read‑only.)

Mirroring (replicated): Continuously replicate operational databases into OneLake as Delta tables—low latency, no ETL to build. Today you can mirror Azure SQL Database, Snowflake, Azure Cosmos DB (preview), Azure SQL MI (preview), SQL Server (preview), Azure Database for PostgreSQL flexible server (preview), plus metadata and open mirroring options. Storage and background replication compute include capacity‑based allowances.

Dataflows Gen2 (guided ingestion): Use Power Query to connect to many SaaS, file, and database sources and write directly to a Lakehouse, Warehouse, KQL DB, and more. Lakehouse writes include automatic metadata sync; on‑premises sources are supported via the gateway.

Keep Bronze light. Land raw files/tables as‑is. Save standardization, joins, and quality checks for the promotion steps.

Moving data with a metadata‑driven Spark Structured Streaming engine

Rather than hand‑crafting pipelines, define what to move and how in metadata tables (what source to read, keys, schema mapping, quality rules, write mode, schedule). A single Spark job reads that configuration, processes only new/changed data, and writes to Silver (standardized/validated) and then to Gold (curated/serving). In Fabric, run this as a Spark Job Definition with scheduling and retry policy for resilience.

Why Structured Streaming?: It natively handles incremental processing and exactly‑once semantics with checkpoints. The foreachBatch pattern lets you apply batch‑style logic (e.g., merge/upsert, SCD handling) to each micro‑batch.

Change capture options: When Delta Change Data Feed (CDF) is enabled on Bronze tables, stream row‑level inserts/updates/deletes rather than rescanning; otherwise, use a timestamp/sequence watermark.

Silver → Gold curation: Use the engine to conform datatypes and names, deduplicate, enforce basic DQ rules in Silver, then build Gold tables that reflect business-ready facts/dimensions or domain slices. Optimize Gold for query performance (e.g., V‑Order where appropriate) so Direct Lake models are fast and predictable.

Operate & observe: Schedule and restart the streaming jobs with retry policies; wire Fabric Job events into Real‑Time Hub if you want alerting on failures/completions.

Gold is the foundation for your data products

When Gold is clean and curated, you can safely build:

Semantic models (Power BI, Direct Lake) for high‑performance BI directly on Delta in OneLake—no import overhead, shared definitions.
AI endpoints to serve real‑time predictions from managed model endpoints—fed by Gold features for consistency. (Preview.)
Other formats & interfaces (governed views, APIs, extracts) that all source from Gold to keep one version of the truth.

Publish what you expose as Data Products in Microsoft Purview Unified Catalog so ownership, access policies, SLOs, and health are visible to consumers. Use a common vocabulary:

Foundational products are narrow, stable, domain‑anchored.
Derived products compose, restrict, or enrich those foundations for specific outcomes.

Roles & responsibilities (keeps it lightweight)

Platform team: Stand up workspaces; create Shortcuts/Mirroring/Dataflows to land Bronze; operate the metadata‑driven streaming engine; integrate with Purview.
Domain teams: Define Foundational products (contracts, keys, freshness, access); create Derived products (composition, metrics, ML) on Gold; own product SLOs.

A Vision of the Future: API‑Driven Materialized Lake Views, Managed by Metadata

It’s easy to imagine the metadata‑driven approach managing Materialized Lake Views (MLVs) directly: your framework would generate MLV SQL from metadata, register dependencies, set refresh policies, run on‑demand refreshes, and reconcile drift—all as code. Today, MLVs are preview features with SQL‑based creation/management and Monitor‑hub visibility; there’s an on‑demand refresh API (preview) but not a fully documented, end‑to‑end REST surface for lifecycle management. In other words, MLVs can’t do this yet—but the building blocks are emerging.

What this future could look like:

Declarative plans: Metadata defines medallion promotions; the framework compiles to CREATE MATERIALIZED LAKE VIEW … with constraints/partitions and auto‑registers dependencies.
API‑first orchestration: The framework uses official endpoints to create/update/drop/refresh MLVs, attach schedules, and roll out changes safely across environments (dev→test→prod). (Partial refresh API exists today in preview.)
Unified operations: MLV runs emit standard job events to Real‑Time Hub; policies enforce SLOs and alert on late/missing refresh.

Until that arrives, the metadata‑driven Spark engine gives you consistent, code‑as‑contract promotion from Bronze→Silver→Gold—and positions you to adopt API‑driven MLVs when they mature.

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials. View all posts by Jason Miles

A Lightweight Ingestion Framework in Microsoft Fabric

Bronze: three simple ways to land data

Moving data with a metadata‑driven Spark Structured Streaming engine

Gold is the foundation for your data products

Roles & responsibilities (keeps it lightweight)

A Vision of the Future: API‑Driven Materialized Lake Views, Managed by Metadata

Like this:

Related

Author: Jason Miles

Bronze: three simple ways to land data

Moving data with a metadata‑driven Spark Structured Streaming engine

Gold is the foundation for your data products

Roles & responsibilities (keeps it lightweight)

A Vision of the Future: API‑Driven Materialized Lake Views, Managed by Metadata

Share this:

Like this:

Related

Author: Jason Miles

Discover more from EduDataSci - Educating the world about data and leadership