If you’ve been living in Microsoft Fabric for a few months, you’ve probably felt it: the platform makes it incredibly easy to ingest data… and surprisingly easy to rack up storage spend while you’re doing it (especially considering how much storage is included).
The pattern is common. A team starts with a Lakehouse, adds Pipelines or Dataflows Gen2 for ingestion, follows a sensible medallion approach, and before long they’re keeping “just in case” raw files, repeated snapshots, and long-running history inside OneLake—often at the same performance tier as yesterday’s data. The storage bill grows quietly. Capacity pressure shows up in places you didn’t expect. And suddenly “simple ingestion” is a FinOps conversation.
Here’s the good news: you don’t have to choose between Fabric and sensible archival strategy. Azure Data Lake Storage Gen2 (ADLS Gen2) can be your pressure relief valve—your durable landing zone and archive—while Fabric stays the place you compute, curate, model, and serve.
What follows is a deep dive into how to use ADLS Gen2 accounts to solve the archival and storage-cost traps that show up during Fabric ingestion: where the costs come from, what architectural patterns work well, and the practical implementation details (shortcuts, security, and billing mechanics) that make it real for Microsoft Fabric teams.
Why Fabric ingestion gets expensive faster than people expect
Fabric makes OneLake feel like the obvious landing zone. But the cost model rewards intentionality.
A few mechanics matter more than most teams realize:
OneLake storage is billed separately from your Fabric compute capacity. Storage is pay-as-you-go per GB stored, and soft-deleted data is billed the same as active data.
On the compute side, OneLake transactions (reads/writes) consume Fabric Capacity Units (CUs). The OneLake consumption documentation even calls out that reads/writes are charged in blocks (for example, transactions are counted per 4 MB blocks of data for larger files).
And then there’s the very human problem: ingestion frameworks tend to create copies.
A typical Fabric ingestion “happy path” creates cost in a few predictable ways:
- Multiple copies by design: raw files, curated tables, derived tables, and staging outputs often all live side-by-side.
- Long retention by default: teams keep raw history indefinitely because storage “feels cheap”… until it’s not.
- Soft deletes and workspace retention surprise: deleted workspaces (and their OneLake storage) can remain billable during the retention period (configurable 7–90 days).
- Transaction-heavy ingestion: lots of small files or frequent incremental writes can create disproportionate CU pressure because reads/writes are metered.
None of this is “wrong.” It’s just what happens when a modern platform makes it easy to keep everything.
So the question becomes: where should your system-of-record data live, and where should your system-of-analysis data live?
ADLS Gen2 is built for archival economics (and Fabric can meet it where it is)
ADLS Gen2 exists for exactly the part of the data lifecycle that hurts most in Fabric ingestion: durable, scalable storage with tiering and lifecycle management.
A few ADLS Gen2 capabilities are especially relevant to “ingestion + archive”:
ADLS Gen2 (via GPv2 storage) provides access to multiple storage tiers including Cool and Archive, and it supports automated lifecycle policy management.
It also supports reserved capacity commitments that can reduce storage cost when you have predictable large footprints.
And, importantly for ingestion design, larger files tend to be more cost-effective because transactions are billed in 4 MB blocks; files smaller than 4 MB still incur a full transaction.
That’s the archival side.
Now the Fabric side:
OneLake is built on top of ADLS Gen2, and Fabric stores tabular data in Delta Parquet format in OneLake.
Even more critical: OneLake shortcuts let Fabric virtualize access to external storage locations—including ADLS Gen2—without copying the data into OneLake. Shortcuts behave like symbolic links in the OneLake namespace.
This is the key architectural unlock: ADLS Gen2 can hold your cold and archival data, while Fabric can still access it as if it were “in the lake.”
That’s the foundation of an ADLS-first ingestion strategy for ADLS Gen2 + data engineering teams.
The core pattern: move “land and keep” to ADLS, keep OneLake for “curate and serve”
If you only take one idea from this: use OneLake for what you need Fabric to optimize, and use ADLS for what you need Azure Storage to optimize.
In practice, that means:
- ADLS Gen2 becomes your landing zone and archive zone (raw, immutable history, regulatory retention, “maybe someday” datasets).
- OneLake becomes your curated analytics zone (cleaned Delta tables, optimized structures, star schemas, semantic model-ready assets).
Fabric still does the ingestion orchestration and transformation work—but the “data gravity” for archives shifts to ADLS, where tiering and lifecycle policies can do their job.
Three ways ADLS Gen2 reduces Fabric ingestion cost (without breaking your workflow)
Virtualize instead of duplicate with OneLake shortcuts
Shortcuts exist to eliminate “edge copies.” They unify data across domains and clouds, and OneLake manages the permissions and credentials so each workload doesn’t need bespoke connection setup.
For lakehouses, Fabric also draws a clean boundary:
- In the Tables folder, shortcuts must be top-level and typically point to Delta-formatted data; when the shortcut target is Delta Parquet, Fabric can discover it as a table.
- In the Files folder, shortcuts can be created anywhere and can point to data in any format (no table discovery there).
This matters because it lets you choose your tradeoff:
- Use Files shortcuts for raw landing (JSON, CSV, Parquet, “whatever the source gives you”).
- Use Tables shortcuts when you want external Delta tables to show up in the SQL analytics endpoint and be available to downstream consumption patterns like Direct Lake (assuming the data conforms).
Offload OneLake transaction pressure when reading external data
This is one of the least-discussed cost levers in Fabric.
When you access data via shortcuts, the consumption is attributed to the capacity tied to the workspace where the shortcut is created. But when the shortcut points to an external source (like ADLS Gen2), OneLake does not count CU usage for that external request—those transactions are charged by the external service (ADLS).
In other words: for certain workloads, reading “cold history” via ADLS shortcuts can reduce OneLake transaction-driven CU pressure, which can be a real contributor to capacity sizing.
Keep your archive tier where it belongs (and keep Fabric fast for today’s work)
OneLake is optimized for a unified analytics experience. ADLS is optimized for storage economics across tiers.
An ADLS-first archive strategy lets you apply lifecycle policies (hot → cool → archive) to older partitions, while keeping Fabric’s OneLake storage focused on what you actually query and serve regularly.
The operational benefit is as important as the cost benefit: your OneLake footprint stays “hot and relevant,” which usually improves governance clarity and keeps your lakehouse from becoming a dumping ground.
The security and connectivity piece most teams get stuck on
If your ADLS account is wide open, shortcuts are easy. If your ADLS account is protected (firewall, public access disabled), teams often assume Fabric won’t be able to reach it.
That assumption is outdated.
Trusted workspace access: secure shortcuts to firewall-enabled ADLS
Microsoft Fabric supports accessing firewall-enabled ADLS Gen2 accounts in a controlled way using trusted workspace access. Fabric workspaces with a workspace identity can securely access ADLS Gen2 even when public network access is disabled, and access can be limited to specific workspaces.
A few practical realities from the documentation:
- Trusted workspace access is generally available, but only for F SKU capacities (not Trial).
- You configure access using resource instance rules to allow specific Fabric workspaces.
- You must use the DFS endpoint for the storage account in the shortcut configuration (for example,
https://<account>.dfs.core.windows.net). - It is not compatible with cross-tenant requests, which matters for multi-tenant designs.
Trusted workspace access doesn’t just apply to shortcuts. Microsoft also documents using it for pipelines, T-SQL COPY into a warehouse, semantic models (import mode), and AzCopy into OneLake.
Connections and authentication: what actually works today
For ADLS Gen2 connectivity inside Fabric’s Data Factory experiences, the connector supports multiple authentication types (including service principal and workspace identity).
And the same documentation clarifies an important point for teams adopting “ADLS-first” ingestion: a Fabric workspace identity is an automatically managed service principal that can be associated with a workspace, and it can securely read or write to ADLS Gen2 through shortcuts and pipelines.
This is the difference between “we could do it” and “we can run it as a platform.”
“Can Fabric write back to ADLS via shortcuts?” Yes—know the boundaries
There’s been confusion in the community because different shortcut types have different capabilities, and older guidance often gets repeated.
Microsoft’s end-to-end security scenario documentation is explicit:
- ADLS Gen2 shortcut supports write, and you can write data back out to the storage service through that shortcut type.
- By contrast, write operations via shortcuts are not supported for some other external providers (for example, AWS S3 and Google Cloud Storage).
That said, “it supports write” isn’t the same as “you should treat it like a multi-writer Delta table.”
If you plan to use ADLS as a write target, set expectations early:
- For raw landing (append-only files), writing to ADLS is straightforward.
- For Delta tables stored externally, you need discipline around who writes, how often, and with what compute engine. External Delta tables can absolutely work—but the operational complexity goes up when multiple writers are involved.
In short: use write-back intentionally, not accidentally.
What a cost-optimized ADLS + Fabric ingestion architecture looks like
Here’s a practical blueprint that’s worked well across multiple Fabric adoption patterns.
Step-by-step implementation blueprint
- Create (or designate) an ADLS Gen2 account as your landing + archive store
Make sure hierarchical namespace is enabled. Plan containers and folder conventions around partitioning (date, source system, domain). - Define lifecycle policies for cold data
The goal is simple: data that hasn’t been queried in months shouldn’t cost the same as today’s data. ADLS supports Cool and Archive tiers and automated lifecycle policy management.
Be realistic: Archive tier is for “rarely accessed” data and typically involves rehydration before use. Design for that operationally. - Set up secure access for Fabric
If your storage is firewall-enabled or public access is disabled, configure trusted workspace access and bind access to the specific Fabric workspace identities that need it.
This is where platform teams earn their keep: get the resource instance rules and identity model right once, and you stop revisiting it for every new dataset. - Land raw data into ADLS first
Use Fabric pipelines, Dataflows, or external tools to write into ADLS. The Fabric ADLS connector supports multiple authentication patterns, including workspace identity and service principal. - Create OneLake shortcuts to your ADLS landing zones
In your Fabric Lakehouse:- Use Files shortcuts for broad format support and raw zones.
- Use Tables shortcuts when the target is Delta Parquet and you want table discovery and SQL endpoint exposure.
- Transform and publish curated data into OneLake
This is where Fabric shines. Keep Silver/Gold (or your curated equivalents) inside OneLake so they’re optimized for Fabric engines and downstream consumption patterns. OneLake stores tabular data in Delta Parquet format. - Use caching intentionally for performance (without paying for duplicates)
Shortcut caching (the workspace cache) exists primarily to reduce egress costs for cross-cloud shortcuts and is not currently supported for ADLS shortcuts.
But for Spark workloads, Fabric’s intelligent cache can cache reads from OneLake or ADLS via shortcuts to speed repeated access and reduce redundant remote reads.
That gives you performance relief without turning your archive into a permanent OneLake copy.
Cost levers that matter most in the real world
If your goal is to reduce Fabric ingestion storage cost, the levers are not exotic. They’re operational.
Keep OneLake for “hot analytics value,” not “long retention comfort”
OneLake storage is billed per GB and soft-deleted data is also billed.
So treat OneLake like a curated analytics product layer, not a cold vault.
Avoid small-file ingestion where possible
This applies to both OneLake transaction behavior and ADLS transaction behavior. OneLake and ADLS both meter transactions in blocks (and Azure is explicit that larger files are often more cost-effective).
If you’re ingesting IoT, logs, or CDC feeds, plan compaction early.
Use ADLS reserved capacity when the archive footprint is predictable
If you’re retaining tens or hundreds of TB for years, reserved capacity can matter. Azure Storage reserved capacity is designed specifically to lower storage cost via 1-year or 3-year commitments.
Understand where the CU meter doesn’t run
When a shortcut points to external ADLS Gen2, OneLake does not count CU usage for the external request; the transactions are charged by ADLS.
This is a powerful design point when your “history reads” are driving capacity sizing.
The tradeoffs (because there are always tradeoffs)
An ADLS-first archive strategy is not free of complexity.
You’re choosing:
- More explicit storage governance in exchange for better lifecycle economics.
- Two storage billing streams (OneLake + ADLS) in exchange for the ability to put cold data on cold tiers.
- A clearer data lifecycle (landing → curated → served) in exchange for a bit more architectural intention.
The biggest mistake is trying to make everything symmetrical—treating raw and curated data the same way. Raw is archival. Curated is product. They should live differently.
Closing: Make Fabric your analytics engine, not your attic
Fabric is at its best when OneLake is curated, discoverable, and optimized for analysis—not when it’s filled with years of raw history “just in case.”
ADLS Gen2 gives you the missing half of the story: a landing-and-archive layer with tiering, lifecycle automation, and storage purchase options designed for long retention. Fabric gives you the compute, orchestration, and cross-engine consumption experiences that make that data valuable.
Put them together and you get a clean division of labor:
ADLS is where data can live cheaply for a long time.
OneLake is where data can deliver value quickly and repeatedly.
If you’re feeling the ingestion storage squeeze, the next best move is not another cleanup script. It’s a lifecycle architecture—one that lets Microsoft Fabric stay fast while ADLS Gen2 carries the archive weight.