One of the things I’ve been thinking about lately a lot is how you formalize the type of project management that is necessary in data platforms, and what you need to do differently compared to software development projects. I brought in a collaborator, one of the best customer success managers I know, to talk about how to do this correctly.
Agile absolutely works for data platform projects, but you need a lightweight way to lock in critical choices without slowing teams down. Architectural Decision Records (ADRs) provide that spine: they capture why you chose a direction, what you rejected, and the consequences—so you can move fast and keep delivery predictable. Combine ADRs with vertical slices, data contracts, quality gates, and observable pipelines, and you can ship in short cycles while meeting real dates.
1) Agile in one page (the theory you’ll actually use)
Agile’s core is empiricism: deliver a small increment, inspect outcomes, adapt. Whether you use Scrum, Kanban, or Scrumban, you’re optimizing for:
- Small, valuable increments (working data in production each iteration)
- Tight feedback loops (reviews, retros, and automated checks)
- Transparency (boards, metrics, ADRs)
- Adaptation (re‑prioritize as reality teaches you)
Scrum gives you timeboxes and roles; Kanban gives you flow and WIP limits. Both work—if you encode data‑specific realities and decisions into your process.
2) Why data platform work is not the same as application software
Data engineering lives with constraints that “pure” software teams feel less acutely:
- External dependencies: source access, security approvals, legal/governance sign‑off, and upstream release schedules.
- Non‑deterministic inputs: late arrivals, schema drift, and quality issues even when your code hasn’t changed.
- Test realism is hard: correctness depends on distributions and volumes, often entangled with PII.
- Infra is product: storage formats, catalogs, lineage, orchestration, and cost controls are first‑class work.
- Many consumers, conflicting SLAs: BI, analytics, ML, ops APIs—each with different freshness, latency, and cost expectations.
Implication: You need a way to make and communicate architectural choices quickly, visibly, and reversibly. That’s exactly what ADRs do.
3) ADRs: the lightweight governance that makes agility stick
What is an ADR?
A short, versioned note in your repo that records a significant technical choice: the context, options, decision, and consequences. It’s not a 20‑page design doc; it’s a shared memory for high‑impact decisions.
Why ADRs matter for data platforms
- Speed with accountability: Teams don’t stall waiting for tribal knowledge; they implement the current decision and move.
- Auditability & compliance: You can show why PII masking, retention, or lineage choices satisfy policy.
- Risk management: By listing alternatives and consequences, you surface trade‑offs early—before they explode near a deadline.
- Change friendliness: ADRs can be superseded by new ADRs—so today’s reversible bet doesn’t block tomorrow’s learning.
- Cross‑team coherence: Platform, governance, and domain teams share a single source of truth for choices that affect everyone.
Rule of thumb: If a choice affects multiple teams, costs real money to change, or touches data governance/security, write an ADR.
4) Ten agile data practices—each anchored by the right ADR
- Slice vertically, not by component
Ship value source → ingest → transform → serve → observe for one domain use case.
ADR: Domain boundary & slice definition (what’s in scope, consumers, SLAs, data products). - Treat data as code (every artifact in Git + CI/CD)
Pipelines, SQL models, IaC, policies, and tests live together.
ADR: CI/CD strategy (environments, promotion rules, rollback, artifact registry). - Adopt data contracts and schema versioning
Agree on fields, semantics, timeliness, and change rules; automate contract tests.
ADR: Contract format & evolution policy (compatibility rules, deprecation windows, validation). - Build quality gates into Definition of Done
Completeness, uniqueness, referential integrity, freshness, and drift checks must pass on realistic data.
ADR: Quality SLOs & gating thresholds (what blocks deploys, who can waive and when). - Engineer for observability
Freshness, row counts, null rates, late data, cost per run—all with alerts and runbooks.
ADR: Observability & SLOs (golden signals, dashboards, paging policy). - Provision realistic, safe test data
Mask/tokenize; maintain golden datasets that match production distributions.
ADR: Test data strategy (masking method, synthesis, re‑ID risk mitigation). - Platform as a product
A paved road (templates, SDKs, guardrails) that makes domain teams fast and safe.
ADR: Platform standards (table format, orchestration, catalog, lineage, access patterns). - Governance by default, not by exception
Automate cataloging, lineage, tag propagation, and policy enforcement in CI/CD.
ADR: Governance automation (required metadata, tagging, policy checks). - Dual‑run and canary for data changes
Run new vs old, compare outputs, flip via feature flags.
ADR: Migration & cutover strategy (diff thresholds, rollback conditions). - Team topology that fits flow
Stream‑aligned domain teams supported by platform and enabling teams.
ADR: Team responsibilities & RACI (who owns contracts, SLOs, backfills, cost).
Data Vault note: Use ADRs for business key strategy, hash vs natural keys, link key composition, satellite split policy, and effectivity modeling. You’ll avoid “rewire the world” rework later.
5) Hit delivery dates without going waterfall—use decision flow as a planning primitive
Dates matter. Use ADRs to make dates credible:
A. Milestones are capabilities with decision checkpoints
Define milestones customers feel (e.g., “Sales Orders available with <2‑hour freshness”). Add ADRs required for the milestone (contract policy, SLOs, table format, cutover plan). You burn up both capability and decision completion.
B. Forecast with empirical flow and ADR status
Track throughput (slices/week), cycle time, and open ADRs per milestone. Forecast with P50/P90 ranges and explicitly show which decisions are still open.
C. Pull risky decisions forward
Create timeboxed spikes to explore options; they must end in an ADR, even if the decision is “Delay—insufficient data, revisit by .”
D. Make trade‑offs explicit
Use a milestone trade‑off slider and review weekly. ADRs document the chosen position (e.g., “accept eventual consistency to hit date; reevaluate in M+1”).
| Slider | Default | Can flex? | ADR hook |
|---|---|---|---|
| Schedule | Fixed date | Scope flex | ADR notes deferred scope |
| Scope | Prioritized | Yes | ADRs for MVP vs later |
| Quality | Non‑negotiable | No | ADR sets SLOs/gates |
| Cost | Guardrailed | Some | ADR for compute policy |
| Risk | Reduced early | No | ADR for dual‑run/cutover |
6) A pragmatic 12‑week plan (greenfield + first domain) with ADR gates
Milestone A (Weeks 1–4): Platform “walking speed”
- IaC, repo scaffolds, CI checks (lint, unit, dq runner)
- Catalog + lineage auto‑registration
- Ingestion pattern #1 (batch CDC) templatized
- Observability MVP (freshness, row counts, null rates)
ADRs closed: platform standards; table/format; CI/CD strategy; observability SLOs.
Milestone B (Weeks 3–8): First domain live (Sales Orders)
- Contract agreed; PII classification decided
- Ingest → transform (e.g., DV hubs/links/sats) → serve consumer table
- DQ thresholds enforced; runbooks defined
- Dual‑run vs legacy export; automated diffs
ADRs closed: contract policy; SCD/effectivity approach; backfill strategy; cutover plan.
Milestone C (Weeks 7–12): Scale & self‑service
- Ingestion pattern #2 (stream/micro‑batch)
- Masking & access patterns; audit integration
- Enablement: templates/docs/internal training
ADRs closed: masking strategy; access & role model; paved‑road template guarantees.
7) Definitions that save projects (ADR‑aware)
Definition of Ready (for a data story)
- ☑️ Draft ADR identifies the decision(s) this story depends on or will produce
- ☑️ Data contract (or delta) captured
- ☑️ Access path/approvals initiated
- ☑️ PII classification known; policy selected
- ☑️ Golden dataset available
- ☑️ Quality metrics defined (freshness, completeness, drift)
- ☑️ Named consumer & acceptance criteria
Definition of Done
- ☑️ Code + IaC merged; CI green
- ☑️ DQ & contract tests pass in CI and environment
- ☑️ Lineage captured; catalog entry complete (owner, SLA, tags)
- ☑️ Observability dashboards live; runbook linked
- ☑️ Access controls verified
- ☑️ ADR merged (or superseded status updated) and linked to the change
- ☑️ Demoed with realistic data to the consumer
8) Metrics that matter (include decision flow)
Delivery
- Throughput (# vertical slices/week)
- Cycle time (start → production)
- Milestone burn‑up (% acceptance met)
- ADR lead time (first draft → merged), open ADRs blocking milestones
Reliability
- Freshness SLO attainment
- Change failure rate & MTTR
- DQ breach rate (by type: completeness, drift, referential)
Cost
- Cost per successful run / per 1k rows
- Idle resource cost trend
Adoption & Governance
- Active users/queries per served dataset
- Catalog completeness (owner, SLA, tags)
- PII policy adherence (automated checks)
9) ADR gotchas (and how to avoid them)
- Writing ADRs too late: If the code is merged before the decision is documented, you’ve lost the why. Tie ADR merge to the PR.
- Mega‑ADR syndrome: Don’t bundle unrelated choices. Keep ADRs atomic; supersede when learning changes one part.
- No consequences listed: Call out risks and costs. Future you will thank present you.
- No revisit date: Add a sunset/review date for reversible decisions.
- Not linking to artifacts: Every ADR should link to PRs, pipelines, dashboards, and tickets.
- Treating ADRs as approvals: They’re records, not gates. Use them to clarify, not to create bureaucracy.
10) A minimal, field‑tested ADR template (copy/paste)
# ADR-014: Late-Arriving Facts Strategy for Sales Orders
Status: Accepted
Date: 2025-08-21
Decision Type: Reversible (review by 2025-12-01)
Owners: Data Platform TL (@name), Sales Domain TL (@name)
Reviewers: Governance, Security, Finance Analytics
## Context
Sales Orders arrive from ERP batch CDC hourly, but downstream analytics require <2h freshness.
Late records (up to T+48h) and corrections are common; PII fields (email, phone) present.
## Options
1) Reject late data after T+2h; backfill via manual job on request.
2) Accept late data; maintain effectivity tables; downstream views are time-as-of consistent.
3) Hybrid: accept late data but surface "data recency" flags; nightly reconciliation job.
## Decision
Choose **Option 2**. Model with Data Vault satellites for changes; publish serving views with as-of joins.
Expose `is_late` and `as_of_timestamp`. Set freshness SLO 95% < 2h, 99% < 6h.
## Consequences
+ Accurate history and simpler consumer logic.
+ Aligns with future streaming ingestion.
- Higher storage and compute for reconciliation.
- More complex lineage and testing.
## Guardrails & SLOs
- DQ gates: completeness ≥ 99.5%, duplication ≤ 0.1%.
- Alert if late-arrival rate > 5% rolling 24h.
- Backfill job timeboxed to 60 min, cost alert at $X.
## Links
- Contract: /contracts/sales_orders_v1.yaml
- Pipelines: /pipelines/sales_orders/*
- Dashboards: /observability/freshness/sales_orders
- PRs: #412, #427
- Supersedes: ADR-006 (initial batch-only stance)
Keep ADRs in a consistent location. number them, and add an index with status. Each ADR merges in the same PR as the code or immediately before.
11) Decision taxonomy for data platforms (starter checklist)
- Storage & format: open table format, partitioning, clustering, compaction, retention
- Ingestion: CDC vs batch vs stream, ordering guarantees, idempotency, deduplication
- Modeling: Data Vault vs dimensional patterns, SCD/effectivity, business key strategy
- Contracts: schema evolution, breaking vs non‑breaking changes, deprecation windows
- Quality: expectations, thresholds, waivers, ownership of fixes
- Observability: signals, dashboards, SLOs, paging policy
- Security & privacy: masking, tokenization, key management, access model, data residency
- Cost: optimization policy, quotas, budget alerts
- Release & migration: dual‑run, canary, rollback rules, backfill strategy
- Team topology: ownership, RACI, paved‑road guarantees
If a decision touches any item on that list, it likely deserves an ADR.
12) Putting it all together
Agile isn’t permission to be vague about dates; it’s a way to earn dates by shipping small, verifiable increments. For data platforms, ADRs make that agility durable:
- They anchor vertical slices in clear, shared choices.
- They shorten onboarding and audits by preserving context.
- They make risk visible early, enabling credible milestone forecasts.
- They let you evolve—supersede, don’t thrash.
Keep the ceremonies light, the slices thin, the gates automated—and the ADRs flowing. That combination is how you deliver a trustworthy data platform on time, repeatedly, without reinventing the reasoning behind every hard decision.