Foundational and Derived Data Products: Practical Guidance for Architects and Data Leaders

As we discussed previously, a data product is a reusable, self‑contained package that bundles data, metadata, access methods, and governance to deliver a clear outcome to users or other systems. Treating data as a product implies product management disciplines (contracts, SLOs, versioning, observability) and an emphasis on discoverability, interoperability, and security. 

Within modern mesh-aligned architectures, data products must be interoperable and composable so they join predictably and can be assembled into higher‑order solutions. This is a first‑principles characteristic, not a nice‑to‑have. 

Foundational vs. Derived Data Products

  • Foundational (source/master‑based) products: Narrow, stable, domain‑anchored surfaces (e.g., Student, Well, Portfolio Holdings). They serve as authoritative building blocks and change slowly.  
  • Derived products: Built from other products/sources via transformation, aggregation, inference, or enrichment to address specific use cases (e.g., Attendance Insights, Well Downtime Risk, Exposure Attribution). They evolve faster and live closer to decision workflows.

Think of foundations as contracts for what the thing is, and derivatives as contracts for what the business or organization needs.

Four common patterns of derived data products

  1. Restricted (subsetted) products
    Curate safe, purpose‑specific slices of a foundation using row/column policies and authorized interfaces—not copies. In cloud platforms this often means governed views/APIs with row‑level and column‑level controls.  
  2. Composite products
    Join two or more foundational products on conformed keys to create a denormalized, consumption‑ready surface (e.g., Customer × Orders × Entitlements). Composability and shared semantics are essential to keep joins predictable.  
  3. Semantic/metric products
    Encapsulate business definitions and metrics in a semantic layer so downstream tools and services consume the same logic everywhere (for example, “Active Student,” “Gross Revenue,” “Reportable Production”).  
  4. Inferred/feature products
    Package model outputs or reusable ML features for training and serving (scores, embeddings, rolling features) via a feature store with lineage and freshness guarantees.  

Keep the core small: restricting foundational products

Foundational products should be minimal, stable, and safe. Concretely:

  • Publish interfaces, not raw tables: expose versioned views/APIs; enforce least‑privilege access at the interface (row filters, column masking). De‑emphasize copy‑by‑policy.  
  • Use data contracts for schema, semantics, quality and availability. Treat contract changes as product changes with release notes and deprecation windows.  
  • Version deliberately: evolve interfaces (v1 → v2) without breaking consumers; keep canonical IDs and time bases stable to preserve joinability across products. (Mesh literature frames this as balancing autonomous product evolution with organization‑wide interoperability.)  

Combining products without creating chaos

Composition fails without standard join keys and shared semantics. Practical guardrails:

  • Conformed identifiers & dimensions across domains (e.g., canonical student_id, asset_id, security_id).
  • A semantic layer as the single locus for metric definitions and filters; push reporting logic there, not into every derivative.  
  • Federated governance: domain teams own their products, but platform guardrails (catalog, policy as code, lineage) enforce cross‑team interoperability.
  • Remember that derived data products need to be composable, too. They should have the same guardrails as any other data product.

Extending products with business context

Derived products are where context lives:

  • Business semantics: encode definitions, time grains, and eligibility rules in the semantic layer for consistent BI, apps, and AI assistants.  
  • External & reference data: enrich with benchmarks, regulatory calendars, or market classifications—joined via governed keys and documented lineage.
  • ML features & predictions: register reusable features (e.g., 30‑day utilization, absenteeism velocity) and publish in online/offline stores to ensure training/serving parity and reuse.  

Sector examples

K‑12 / SLED

Foundational: Student Master (enrollment, demographics), Attendance Events, Assessment Results. (Narrow surfaces with clear ownership and access controls.)  

  • Derived – Restricted: Student Directory (Masked) for principals, counselors, and district office personnel —authorized views mask PII while enabling cohort‑level analysis.  
  • Derived – Composite: Attendance Insights joining Student Master × Attendance to surface chronic absenteeism metrics with district/state calendar conformance in the semantic layer.  
  • Derived – Inferred: Early‑Support Signals scoring risk of disengagement using rolling features (e.g., 10‑day absence rate, late submissions) managed through a feature store for consistent training/inference.  

Oil & Gas

  • Foundational: Well/Asset Master, Production Telemetry, Work Orders.  
  • Derived – Restricted: Regulatory Share Set—subset of well attributes with RLS by basin/operator for partner access.  
  • Derived – Composite: Operations 360 combining production, downtime, and maintenance backlogs using conformed asset keys; metrics like MTBF/MTTR formalized in the semantic layer.  
  • Derived – Inferred: Downtime Risk predicting failure windows from vibration and pressure signals; features (variance, kurtosis, lag deltas) cataloged and served via the feature store.  

Investment Management

  • Foundational: Security Master, Positions & Trades, Reference Calendars/FX.
  • Derived – Restricted: Client Reporting Entitlements—authorized views expose only the accounts, attributes, and look‑through levels a team is permitted to see.  
  • Derived – Composite: Exposure & Performance Attribution joining positions to benchmark classifications and corporate actions; calculations centralized in the semantic layer for consistent attribution across desks and tools.  
  • Derived – Inferred: Liquidity & Slippage Risk with model outputs delivered as a product (scores + explanations) and governed SLOs for intraday freshness. Feature reuse accelerates model iteration.  

Governance and operating model

  • Contracts & SLOs: Every product—foundational or derived—publishes schema, meaning, DQ rules, and availability targets; consumers onboard via the contract.  
  • Access by interface: Prefer views/APIs with RLS/CLS over proliferating physical copies. This reduces drift and audit scope while enabling targeted entitlements.  
  • Lifecycle management: Version interfaces, deprecate safely, and monitor usage/lineage to know what can retire. Mesh guidance emphasizes balancing autonomy with platform guardrails.  

Anti‑patterns to avoid

  • Foundations that do “too much”: when a foundational product looks like a report or contains volatile logic, promote that logic into a derived layer instead.
  • Copy‑by‑policy: duplicating datasets for each audience. Use authorized interfaces and policies at query time.  
  • Metric logic in every tool: disperse definitions create “multiple truths.” Centralize in a semantic layer.  

Closing

Foundational products are your stable, narrow cores. Derived products are where you compose, restrict, and enrich for impact. If you keep foundations clean and contracts explicit—then compose through shared keys, a semantic layer, and (when needed) a feature store—you get speed and trust, without entangling your platform.

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.