The NotebookUtils Gems I Wish More Fabric Notebooks Used

Most Fabric notebook code I review has the same telltale shape: a little Spark, a hardcoded path (or three), and just enough glue logic to “get it to run.” And then, a month later, someone copies it into another workspace and everything breaks.

NotebookUtils is one of the easiest ways to avoid that fate. It’s built into Fabric notebooks, it’s designed for the common “day two” problems (orchestration, configuration, identities, file movement), and it’s still surprisingly underused. NotebookUtils is also the successor to mssparkutils—backward compatible today, but clearly where Microsoft is investing going forward.

In this post, I’m going to do two things:

  • Give you a quick, practical orientation to NotebookUtils in Fabric.
  • Walk through the functions I reach for most often—especially the ones I don’t see enough in real projects: runtime.contextrunMultiple()/validateDAG()variableLibrary.getLibrary()fs.fastcp()fs.getMountPath()credentials.getToken(), and lakehouse.loadTable().

Along the way, I’ll call out a few patterns that make notebooks feel less like “scripts you run” and more like reusable components in Microsoft Fabric data engineering work.

NotebookUtils in Fabric: the quick orientation

NotebookUtils is a built-in package for Fabric notebooks that helps with “platform-y” tasks: file system operations, chaining notebooks, working with secrets, and pulling runtime context. It’s available across Spark notebooks (PySpark/Scala/SparkR) and can also be used in Fabric pipelines.

Two details matter right away:

  • MsSparkUtils has been renamed to NotebookUtils. Existing code is backward compatible, but Microsoft recommends moving to notebookutils, and the mssparkutils namespace is expected to be retired in the future.
  • Runtime matters. NotebookUtils is designed for Spark 3.4 (Fabric runtime v1.2) and above, and new features are supported under the notebookutils namespace going forward.

A small habit that pays off: use the built-in help when you’re in the notebook.

notebookutils.fs.help()
notebookutils.notebook.help()
notebookutils.credentials.help()
notebookutils.variableLibrary.help()

Those help pages are often the fastest way to see what’s available in your runtime.

The underused functions I keep coming back to

notebookutils.runtime.context for “self-aware” notebooks

If you want a notebook to be reusable, it needs to know where it’s running—or at least be able to log it. notebookutils.runtime.context gives you the session context: notebook name/id, workspace info, default lakehouse, whether this run is part of a pipeline, and more.

That unlocks two practical patterns:

  • Better logging: every run can stamp itself with workspace, run id, and identity.
  • Safer branching: when you genuinely need different behavior in interactive exploration vs pipeline runs.

Example:

ctx = notebookutils.runtime.context

print({
  "notebook": ctx.currentNotebookName,
  "workspace": ctx.currentWorkspaceName,
  "workspaceId": ctx.currentWorkspaceId,
  "defaultLakehouse": ctx.defaultLakehouseName,
  "isPipeline": ctx.isForPipeline,
  "isReferenceRun": ctx.isReferenceRun,
  "runId": ctx.currentRunId,
  "user": ctx.userName
})

Why this is underused: many teams improvise this with hardcoded strings (“dev workspace”, “prod workspace”) or pipeline parameters. The context object is simpler—and it’s already there.

notebookutils.notebook.runMultiple() and validateDAG() for code-first orchestration

If you only ever use notebooks one at a time, you can ignore this section. But if you’re building a small workflow—bronze load, silver transform, gold publish—runMultiple() is the fastest way to orchestrate notebooks inside a notebook.

runMultiple() can run notebooks in parallel and can also respect dependencies (a DAG). It uses a multithreaded mechanism inside the Spark session, so the child notebook runs share compute resources with the parent.

And here’s the underused companion: validateDAG(). It checks whether your DAG structure is correctly defined before you kick off a run.

Minimal parallel example:

results = notebookutils.notebook.runMultiple(["Ingest_Customers", "Ingest_Orders"])
print(results)

Structured DAG example:

DAG = {
  "activities": [
    {"name": "Bronze", "path": "01_Bronze", "args": {"mode": "incremental"}},
    {"name": "Silver", "path": "02_Silver", "dependencies": ["Bronze"]},
    {"name": "Gold", "path": "03_Gold", "dependencies": ["Silver"]}
  ],
  "concurrency": 10
}

if not notebookutils.notebook.validateDAG(DAG):
  raise ValueError("DAG is not valid")

results = notebookutils.notebook.runMultiple(DAG)

A few reality checks that keep this pattern healthy:

  • The default max concurrency differs by notebook type (Spark notebooks default higher than Python notebooks), and too much parallelism can create stability issues due to shared resources.
  • If you reference notebooks across workspaces, that’s supported on runtime v1.2+ (and you should be explicit about workspace IDs).

This is where Apache Spark meets notebook ergonomics in a way I think more people should exploit.

notebookutils.notebook.exit() for explicit notebook “return values”

A reusable notebook needs an interface. In Fabric, a clean interface often looks like:

  • parameters in
  • exit value out

notebookutils.notebook.exit() ends a notebook with a value, and notebookutils.notebook.run() returns that value to the caller.

In practice, I use this to pass small metadata back to the orchestrator notebook: row counts, output paths, status markers, ids, and timestamps.

Child notebook:

# ... do work ...
notebookutils.notebook.exit("rows_written=128734")

Caller notebook:

val = notebookutils.notebook.run("02_Silver", 90, {"mode": "incremental"})
print(val)

One important “gotcha” that’s easy to miss: exit() overwrites the current cell output, so it’s best called from its own cell.

Also, exit() behaves differently depending on how the notebook is executed (interactive vs pipeline activity vs reference run). That nuance is documented and worth understanding if you build orchestration logic around it.

notebookutils.nbResPath when you reference notebooks that use resources

This is a small one, but it saves hours.

If you’re using notebook resources (built-in files like small datasets, images, modules) and you’re calling a notebook via notebookutils.notebook.run(), use notebookutils.nbResPath inside the referenced notebook to resolve the correct resource path. Fabric notes that builtin/ always points to the root notebook’s built-in folder in reference runs, which can surprise you if you assume it’s “local.”

It’s not flashy, but it’s one of those “works in interactive, breaks in pipeline” traps that NotebookUtils is quietly designed to help you avoid.

notebookutils.variableLibrary.getLibrary() for configuration that survives copy/paste

Hardcoding paths is the fastest way to make a notebook non-portable. Variable libraries are Fabric’s answer to “configuration belongs outside code,” and notebooks are now supported consumers (through NotebookUtils and %%configure).

In NotebookUtils, the core calls are:

  • notebookutils.variableLibrary.getLibrary("Name")
  • notebookutils.variableLibrary.get("$(/**/.../variable)")

Example:

cfg = notebookutils.variableLibrary.getLibrary("ProjectConfig")

lakehouse = cfg.Lakehouse_name
workspace = cfg.Workspace_name

path = f"abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{lakehouse}.Lakehouse/Files/input/customers.csv"
df = spark.read.option("header", "true").csv(path)

Two constraints to plan for:

  • Variable library access is scoped to the same workspace (and has limitations in child notebooks during reference runs).
  • Values come from the active value set in that workspace—exactly what you want for dev/test/prod patterns, but something you should make explicit in your deployment practices.

If you’re trying to build workspace-to-workspace portability in OneLake projects, this is one of the cleanest patterns available right now.

notebookutils.fs.fastcp() for moving lots of files without drama

Most people learn notebookutils.fs.ls() and stop there. But once you start moving real volumes of data (or many small files), the “copy/move” story becomes a performance story.

NotebookUtils includes fs.fastcp() as a more efficient option for copying/moving files, and the docs explicitly recommend it for enhanced performance on Fabric versus the traditional cp.

Example:

src = "Files/staging/customers/"
dst = "Files/curated/customers/"

notebookutils.fs.fastcp(src, dst, recurse=True)

Know the limitations:

  • fastcp doesn’t support copying files across OneLake regions (use cp there).
  • There are special considerations with OneLake shortcuts (particularly S3/GCS shortcut scenarios), where mounted paths are recommended.

Also, one subtle but important point: relative paths behave differently depending on notebook type. In Spark notebooks, relative paths are relative to the default Lakehouse ABFSS path; in Python notebooks, they’re relative to the local working directory unless you use full lakehouse paths.

That difference explains a lot of “it worked for me, why not for the pipeline?” conversations.

notebookutils.fs.getMountPath() for bridging mount points to normal file I/O

When you mount storage (ADLS Gen2, or even a lakehouse mount), you often want to use ordinary open() or standard file APIs. fs.getMountPath() is the helper that makes that clean.

NotebookUtils documentation shows how getMountPath() returns the accurate local path for the mount point, which you can then use with file:// paths or directly with Python file I/O.

Example:

local_root = notebookutils.fs.getMountPath("/landing")

with open(local_root + "/manifest.json", "r") as f:
    manifest = f.read()

This is underused because many teams either stay entirely in ABFSS paths or do awkward string manipulation for /synfs/notebook/{sessionId}/...getMountPath() removes the guesswork.

notebookutils.credentials.getToken() for quick platform calls (with eyes open)

NotebookUtils has a credentials helper that can issue Microsoft Entra tokens for specific audiences (including "storage""pbi""keyvault", and "kusto").

Example:

token = notebookutils.credentials.getToken("storage")

This is a powerful convenience function, but it’s also one where you need to understand who is running your notebook.

Fabric notebook execution can happen:

  • interactively (your identity),
  • as a pipeline activity (identity of the pipeline’s last modified user),
  • or via schedule (identity of the schedule’s creator/last updater).

That identity context changes what the token can do—and what it should be allowed to do.

Also, Microsoft explicitly notes that token scopes for "pbi" may change over time, and that under service principal runs the "pbi" token is limited (and recommends MSAL if you need full Fabric service scope).

So: use getToken() as a pragmatic tool, but treat it as part of a broader authentication design, not the design itself.

notebookutils.lakehouse.loadTable() for simple ingestion without rewriting boilerplate

The notebookutils.lakehouse module is easy to miss because it feels “admin-y.” But it has one function I wish more people tried: loadTable().

It can start a load operation into a lakehouse table from a file path (with format options like CSV header and delimiter).

Example (conceptually):

notebookutils.lakehouse.loadTable(
    {
        "relativePath": "Files/incoming/customers.csv",
        "pathType": "File",
        "mode": "Overwrite",
        "recursive": False,
        "formatOptions": {"format": "Csv", "header": True, "delimiter": ","}
    },
    "customers",
    "MyLakehouse"
)

I like this for two cases:

  • quick, standard ingestion patterns where you don’t need custom parsing logic,
  • building “platform primitives” notebooks that can be called by others (for example, “load any CSV from Files/ to a table”).

Bonus: listTables() and getWithProperties() are also handy when you want to validate that the lakehouse you thinkyou’re targeting is actually the one attached to the notebook.

Pulling the patterns together

If you squint, all of these “favorite functions” reinforce the same handful of notebook design principles:

  • Treat notebooks like components. Parameters in, exit values out. Use run() and exit() intentionally.
  • Make configuration first-class. Put environment-specific values in a variable library and pull them in through NotebookUtils.
  • Make runs observable and explainable. Use runtime.context so logs and outputs carry enough context to debug later.
  • Move data with the right tool. fastcp() and getMountPath() are there to reduce friction and surprise when you cross file systems.
  • Respect identity. Token helpers are convenient, but they’re only safe when you’re clear about execution context (interactive vs pipeline vs schedule).

These aren’t exotic ideas. They’re “basic software engineering,” applied to notebooks—exactly the shift Fabric notebooks are ready to support.

Conclusion: try one new NotebookUtils habit this week

NotebookUtils is easy to underestimate because the first functions you learn are small: fs.ls()notebook.run(). But the “not used enough” functions are the ones that change your notebook practice: orchestration with runMultiple(), safer configuration with variable libraries, self-awareness with runtime.context, and cleaner file movement with fastcp()and getMountPath().

If you want a simple next step: open a Fabric notebook, run notebookutils.notebook.help() and notebookutils.fs.help(), and pick one function you haven’t used before. Then refactor one notebook to stop hardcoding what the platform already knows.

That’s how notebooks move from “personal scratchpad” to “team asset.”

Unknown's avatar

Author: Jason Miles

A solution-focused developer, engineer, and data specialist focusing on diverse industries. He has led data products and citizen data initiatives for almost twenty years and is an expert in enabling organizations to turn data into insight, and then into action. He holds MS in Analytics from Texas A&M, DAMA CDMP Master, and INFORMS CAP-Expert credentials.

Leave a Reply

Discover more from EduDataSci - Educating the world about data and leadership

Subscribe now to keep reading and get access to the full archive.

Continue reading