When Facts Don’t Live in a Domain: Why Data Products Beat Pure Domain-Driven Data Engineering

There’s a pattern I see in mature analytics organizations: as soon as the data platform gets big enough to feel “enterprise,” someone reaches for domain-driven design (DDD) as the organizing principle for data engineering and governance.

It’s an understandable move. DDD gives us language for ownership, boundaries, and “this team is responsible for that thing.” And when you’re trying to untangle a spaghetti warehouse, that sounds like oxygen.

But here’s the catch: the most valuable data in a warehouse—the fact tables in traditional star schemas—often doesn’t live neatly inside a single domain. It lives at the intersections.

In this post, I’m going to make three points:

  • why domain boundaries tend to break down precisely where the business value shows up (facts)
  • how a data product model handles “intersection data” without inventing reference domains and mega-joins
  • why Microsoft Fabric’s workspace + lakehouse + OneLake tooling makes the product approach easier to execute in practice (especially with shortcuts, security, and materialized lake views)

And yes—domains still matter. Just not as the unit of delivery.

What domain-driven data engineering gets right

DDD is compelling because it’s fundamentally about cohesion and ownership. In software, bounded contexts keep teams from sharing a single god-object model that nobody can safely change. In data engineering, we borrow that idea because we want the same outcome:

  • fewer cross-team dependencies
  • clearer accountability
  • models that reflect real business concepts, not just system extracts

This is where data engineering and data governance often converge: ownership isn’t just a technical decision. It’s how you keep meaning stable while everything else changes.

So far, so good.

The problem isn’t that domain thinking is wrong. The problem is that domains are often too broad, and the data that matters most isn’t domain-pure.

Why facts break the domain map

In a classic star schema, your facts are the measures of reality: transactions, events, interactions, observations.

And those are almost never “owned” by a single domain in the way we want them to be.

A few examples:

  • A purchase is “Sales”… but it’s also Product, Pricing, Customer, Fulfillment, and Finance.
  • A shipment event is “Supply Chain”… but it’s also Customer Experience and Revenue recognition timing.
  • In education analytics, an enrollment record is “Registrar”… but it intersects Student, Course, Term, Financial Aid, and Instructional delivery.

The grain of the fact table—what it means, what keys define it, what attributes are required for interpretation—is often an intersection of multiple bounded contexts.

When you force that reality into a strict domain structure, two things tend to happen:

You invent “reference domains.”
You create a shared “Customer domain” or “Product domain” that every other domain has to reference, because you don’t want duplication… but now you’ve created a central dependency that every team must negotiate.

Or you accept mega-joins.
You keep each domain “pure,” but every meaningful query requires crossing multiple domains with complex joins, brittle semantics, and subtle grain mismatches. The domain model stays clean on paper, while the consumption layer becomes a tangle of logic that nobody feels responsible for.

That’s the moment where DDD-as-the-primary-structure starts to feel like it’s working… right up until it doesn’t.

Data products handle intersection data without pretending it isn’t intersection data

A data product model makes a different trade:

Instead of trying to place every important dataset within a domain boundary, you define smaller, purpose-built products with explicit contracts, and you allow products to depend on other products.

Think of “foundational” products like:

  • a security master (in finance)
  • an HR staff roster
  • a student identity registry
  • a product catalog

Then you build “derived” products that consume those foundations, but remain responsible for their own completeness.

This is the key shift:

A product doesn’t outsource completeness to the consumer.
If your “Course Enrollment” data product depends on Student and Course products, your product is still the one that ensures an enrollment record is interpretable and usable. Consumers shouldn’t have to reconstruct meaning by stitching together five domains and hoping they got the grain right.

That’s what I mean when I say “inheritance” works better in products: not inheritance in the object-oriented sense, but inheritance as composition and dependency. A derived product can reference upstream products without turning the downstream consumer into an integration engineer.

This is also where data products becomes more than a buzzword. It’s a governance mechanism disguised as an engineering pattern.

Contracts are what keep products from becoming mini-warehouses

Without contracts, “data product” becomes a rebranding of a mart. With contracts, it becomes a disciplined interface.

A useful product contract doesn’t need to be pages long. It needs to answer a few non-negotiables:

  • What is the grain?
  • What keys are stable, and what keys are derived?
  • What does “complete” mean for this product?
  • What is the update and availability expectation?
  • What are the semantic guarantees (and what is explicitly out of scope)?

Notice what isn’t in that list: how the consumer should join it to ten other things.

When contracts are clear, the product is allowed to be smaller and sharper. That’s exactly what domains struggle to enforce, because domains tend to expand to include “everything related to X.”

Why Microsoft Fabric makes the product approach easier to operationalize

Next, there’s a practical question: how do you implement it without copying data everywhere, breaking security, or creating a maze of cross-workspace wiring?

This is where Microsoft Fabric (and specifically OneLake-centric patterns) gives you leverage.

Workspaces and lakehouses map well to product boundaries

Fabric workspaces naturally support separation of responsibilities. Teams can own and operate the assets inside their workspace with the appropriate role assignments (Admin/Member/Contributor/Viewer), which gives you a straightforward place to draw operational boundaries.

A product can have:

  • its own lakehouse (storage + SQL analytics endpoint)
  • its own pipelines/notebooks
  • its own release rhythm
  • its own access model

That’s a much cleaner story than “this domain owns a universe.”

OneLake shortcuts reduce the pressure to copy (and the pressure to “merge everything”)

OneLake shortcuts are designed to connect to existing data without copying it into a new physical location. In other words: you can reference upstream product data where it already lives, and still treat OneLake as the unified namespace.

That matters because the “domain-driven warehouse” often fails operationally when teams start duplicating shared dimensions, then arguing about which one is authoritative. Shortcuts give you a path to reuse without duplication-by-default.

OneLake security and lakehouse sharing make product-level access more realistic

Fabric’s permission model includes workspace roles and OneLake security options for data access inside OneLake. OneLake security (currently described as preview in the docs) supports creating roles within a lakehouse and granting granular read access to specific tables or folders; it’s explicitly positioned as a way to control read access, while write permissions still flow through workspace roles.

That’s important for data products, because “ownership” without a practical way to enforce access boundaries is just org-chart theater.

Materialized lake views reduce the need for consumer-side mega-joins

Materialized lake views in Fabric are presented as precomputed, stored results of SQL queries that can be refreshed on demand or on a schedule (also documented as preview). They’re essentially a mechanism to publish “smart tables” that encapsulate complex transformations and joins behind a managed refresh.

From a product perspective, this is powerful: the derived product can do the cross-product join once, publish the result as its interface, and stop. Consumers don’t need to rebuild the join logic (or rediscover it incorrectly).

This is exactly how you avoid the “domain purity” trap that leads to massive downstream joins.

Domains still matter—just not as the unit of delivery

Here’s the nuance I don’t want to lose:

Domains are still useful as a governance and organizational lens. They are where stewardship lives. They’re where shared language is curated. They’re where you anchor OKRs, health controls, and policy decisions.

And this is where Microsoft Purview’s direction is telling.

In Microsoft Purview Unified Catalog, governance domains provide context for your data assets, and data products are organized so they can be browsed and discovered by governance domain. A governance domain can contain many data products, while each data product is managed by a single governance domain (even if it can be discovered across domains).

That’s the model I’ve found to be most workable in practice:

  • Domains are the organizing and stewardship layer.
  • Data products are the delivery and accountability layer.

When you try to use domains as both, you almost inevitably create products that are too large, too coupled, and too tempting to “just join everything.”

Closing: shrink the unit of ownership, not the ambition of governance

If you’re trying to apply DDD principles to data engineering and you keep running into complexity, it’s probably not because your team “did DDD wrong.”

It’s because facts don’t respect your org chart.

The path forward is to keep domains where they help—language, stewardship, governance context—and shift delivery to data products with clear contracts and explicit dependencies. That’s how you avoid reference domains, reduce mega-joins, and keep accountability crisp.

If you’re building on Fabric, the combination of workspaces, lakehouses, OneLake shortcuts, OneLake security, and materialized lake views gives you a practical toolkit to make this real—not just aspirational.

The callback to action is simple: pick one cross-domain fact that everyone argues about, and redesign it as a product with a contract. Let the product own completeness. Let consumers stop doing archaeology.

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.

Discover more from EduDataSci - Educating the world about data and leadership

Subscribe now to keep reading and get access to the full archive.

Continue reading