feat(fabric): support interactive notebook-user auth for Fabric Warehouse + OneLake staging#3872
Open
mattiasthalen wants to merge 16 commits intodlt-hub:develfrom
Open
Conversation
OneLake (Microsoft Fabric) responds with 403 ClientAuthenticationError when BlobClient.exists targets a blob name ending in /. That kills FilesystemClient.initialize_storage at the very first fs.isdir call on self.dataset_path. Non-OneLake backends silently treat it as False and hit the same latent defect, just non-fatally. Strip the empty segment from the pathlib.join so dataset_path never ends in /. Refs dlt-hub#3866
Black wants the multi-line assert in test_dataset_path_has_no_trailing_separator reformatted into a single-line assert. Apply the formatter's output so `make format-check` passes in CI. Refs dlt-hub#3866
Same OneLake 403 root cause as the previous commit on dataset_path, one level deeper. FilesystemClient.truncate_tables calls fs.exists(table_dir) for each entry from get_table_dirs(...), which on OneLake 403s on every table once dataset_path is already fixed. Drop the trailing pathlib.sep so get_table_dir returns a path shape that BlobClient.exists accepts. Refs dlt-hub#3866
…nt paths Tasks 2 and 3 of this PR (dlt-hub#3867) stripped the trailing separator from `FilesystemClient.dataset_path` and `FilesystemClient.get_table_dir`. The pre-existing `test_trailing_separators` hardcoded the old shape (trailing /) in its parameterized assertions. Flip those seven assertions to the corrected shape. Also drop the stale "ending with separator" phrase from `get_table_dir`'s docstring — same invariant flip, land together. `get_table_prefix` is untouched and still preserves its trailing separator for folder-style layouts; the two assertions on that method stay as-is. Refs dlt-hub#3866
Task 2 of this PR (dlt-hub#3867) stripped the trailing separator from `FilesystemClient.dataset_path`. The pre-existing `test_destination_config_in_name` assertion at line 218 was `endswith(dataset_name + pathlib.sep)`, which encoded the old shape. Replace with `endswith(dataset_name)` and drop the now-unused `pathlib` local variable (and its `type: ignore` comment). Caught by `make test-common-p`, not surfaced by the filesystem test module run in Task 4 because this test lives under `tests/destinations/`. Refs dlt-hub#3866
Introduces an optional `access_token` field on `FabricCredentials` that holds a pre-fetched AAD bearer token, and a `get_access_token()` helper that returns it as a raw string or `None`. This is the first piece of notebook-user identity support — a subsequent commit will add an injectable `TokenCredential` path, and the DSN builder and `FabricSqlClient.open_connection` will start branching on `get_access_token()` later in the PR. Refs dlt-hub#3865
Adds an optional `azure_credential: TokenCredential` field on
`FabricCredentials` and teaches `get_access_token()` to call
`get_token("https://database.windows.net/.default")` on it when the
raw `access_token` is not set. This gives long-running notebook
sessions a refreshing credential path while keeping the one-shot
`access_token` string path for simple cases.
The field uses `Optional[Any]` at runtime because dlt's `configspec`
decorator does not support forward-referenced types; the docstring
documents the expected `TokenCredential` protocol.
Refs dlt-hub#3865
…tive `get_odbc_dsn_dict` now checks `get_access_token()` and skips `AUTHENTICATION`/`UID`/`PWD` when a bearer token is available. SP path is unchanged — the existing regression tests for `ActiveDirectoryServicePrincipal` and SP credential derivation remain green. Refs dlt-hub#3865
`FabricCredentials.on_partial` previously attempted to fall back to `DefaultAzureCredential` when explicit SP credentials were missing. That fallback is not supported inside Fabric notebooks. Skip it when either `access_token` or `azure_credential` is set. SP path unchanged. Refs dlt-hub#3865
`FabricSqlClient.open_connection` now branches on
`credentials.get_access_token()`. When a bearer token is available, it
packs the token into the little-endian UTF-16 struct ODBC Driver 18
expects for `SQL_COPT_SS_ACCESS_TOKEN` (1256) and passes it via
`pyodbc.connect(..., attrs_before={1256: ...})`. The datetimeoffset
output converter and autocommit-on behavior are preserved. When no
token is available, the call falls through to the parent path.
Six mocked tests cover the struct layout, attrs_before kwarg,
fall-through, autocommit, output converter, and _conn caching.
Refs dlt-hub#3865
New credential class that returns adlfs kwargs with `account_name` and `account_host` only — no `credential` key. Lets Fabric's registered `OnelakeFileSystem.__init__` fall through to its built-in `make_credential()` helper for notebook-user identity. Only usable inside a Fabric notebook kernel. Pairs with `FabricCredentials.access_token` on the warehouse side. Refs dlt-hub#3865
…secret is empty The Fabric API token warmup builds a `ClientSecretCredential` from `credentials.azure_client_secret` and hits `https://api.fabric.microsoft.com/.default` before every OneLake load. When the SP secret is empty or None, this fails with `ClientAuthenticationError`. Return early when the secret is falsy. The real-SP happy path is unchanged. Refs dlt-hub#3865
Adds a "Notebook user identity" section under the Fabric destination
docs with raw `access_token` (one-shot) and injectable
`TokenCredential` (refreshing) patterns. Includes copy-pasteable
examples using `notebookutils.credentials.getToken("pbi")`.
Cross-links to the filesystem staging OneLake section.
Refs dlt-hub#3865
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ric notebooks Adds an "OneLake under notebook identity" subsection with TOML and Python config examples, a caution that the class is Fabric-notebook-only, and a cross-link back to the Fabric destination notebook identity section. Refs dlt-hub#3865
…type Add `# type: ignore[arg-type]` on SimpleNamespace->typed-class calls in test_fabric_sql_client.py and test_fabric_warmup_gate.py (standard dlt pattern for test mocks). Add `# type: ignore[no-any-return]` on the azure_credential.get_token().token return in configuration.py. Drop void-function return-value captures in warmup gate tests. Refs dlt-hub#3865 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s active `on_partial` returned early when `access_token` was set but did not call `self.resolve()`, leaving the credentials in a partial state. The pipeline then received `None` for credentials and crashed with `AttributeError: 'NoneType' object has no attribute 'database'`. Mirror the existing SP fallback pattern: check `self.host and self.database` and call `self.resolve()` before returning. Caught during live Fabric tenant validation. Refs dlt-hub#3865 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
Live verification against Fabric tenantRan the real PR branch inside a Microsoft Fabric Python notebook against a live tenant under interactive notebook-user identity. Two-run Branch: Config (no monkey patch, no dummy SP fields): os.environ["DESTINATION__FABRIC__CREDENTIALS__HOST"] = "<warehouse>.datawarehouse.fabric.microsoft.com"
os.environ["DESTINATION__FABRIC__CREDENTIALS__DATABASE"] = "destination"
os.environ["DESTINATION__FABRIC__CREDENTIALS__ACCESS_TOKEN"] = notebookutils.credentials.getToken("pbi")
os.environ["DESTINATION__FILESYSTEM__BUCKET_URL"] = "abfss://<ws-guid>@onelake.dfs.fabric.microsoft.com/<lh-guid>/Files/_dlt_stage"
os.environ["DESTINATION__FILESYSTEM__CREDENTIALS__AZURE_STORAGE_ACCOUNT_NAME"] = "onelake"
os.environ["DESTINATION__FILESYSTEM__CREDENTIALS__AZURE_ACCOUNT_HOST"] = "onelake.blob.fabric.microsoft.com"Verification query result (Cell 6 — three-row scd2 history): All three rows expected:
Notes:
Verification summary
Ready for review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Makes
dlt.pipeline(destination="fabric", staging="filesystem")usable from inside a Microsoft Fabric Python notebook under interactive user identity. Adds:access_tokenandazure_credentialfields onFabricCredentials. When either is set,FabricSqlClient.open_connectionpasses the bearer token topyodbc.connectviaattrs_before={1256: ...}(SQL_COPT_SS_ACCESS_TOKEN) and the DSN omitsAUTHENTICATION/UID/PWD. SP path unchanged.OneLakeNotebookIdentityCredentialsclass for filesystem staging. Returns adlfs kwargs withaccount_name/account_hostonly, letting Fabric's registeredOnelakeFileSystem.make_credential()provide notebook-user identity. Only works inside a Fabric notebook kernel.FabricCopyFileLoadJob._ensure_fabric_token_initializedwhen the staging SP secret is empty — previously this built aClientSecretCredentialfrom dummy fields and raisedClientAuthenticationErrorbefore any data moved.Related Issues
fabricdestination (Fabric Warehouse + OneLake staging) #3865fix/3866-filesystem-trailing-slash-onelake) for end-to-end OneLake flows. Will rebase ondevelafter fix: strip trailing slash from FilesystemClient.dataset_path and get_table_dir #3867 merges.Additional Context
pyodbc,azure-identity,requestsstubbed insys.modules. Pre-submission validation will run the real PR branch inside a live Fabric tenant with a two-runscd2smoke test; verification output will be attached before flipping out of draft.