Write Client API config files with owner-only permissions#4855
Conversation
EP-1 (Wave 0) of the Client API Execution Modes program (see docs/design/client_api_execution_modes_plan.md on PR NVIDIA#4853). client_api_config.json embeds live AUTH_TOKEN/AUTH_TOKEN_SIGNATURE and was written with the default umask - world-readable on most systems. ClientConfig.to_json (the single choke point for every writer: write_config_to_file, ClientAPILauncherExecutor.prepare_config_for_launch, ExternalConfigurator) now creates config files 0600 via os.open and tightens pre-existing files with chmod before rewrite. chmod failures degrade to a warning so limited-permission platforms (Windows) never crash. Behavior note: deployments that relied on a different OS user reading client_api_config.json will need explicit permission management - by design per the bootstrap-config protection contract (design doc Appendix B). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens writing of the Client API config (client_api_config.json) to better protect embedded live auth material (e.g., AUTH_TOKEN/AUTH_TOKEN_SIGNATURE) by enforcing owner-only file permissions on POSIX, adding symlink protection, and documenting operational implications for 3rd-party integrations.
Changes:
- Enforce POSIX owner-only permissions (
0600) when persisting Client API config viaClientConfig.to_json, and reject symlink paths usingO_NOFOLLOWwhen available. - Add unit tests covering fresh writes, tightening pre-existing permissions, fail-closed behavior when permissions cannot be set, and symlink rejection.
- Document the same-OS-user requirement (or explicit re-permissioning) for externally started trainers on POSIX.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
nvflare/client/config.py |
Implements secure config writing (owner-only on POSIX + O_NOFOLLOW) as the central write path via ClientConfig.to_json. |
tests/unit_test/client/config_test.py |
Adds focused unit tests validating permission behavior, fail-closed semantics, and symlink rejection. |
docs/programming_guide/execution_api_type/3rd_party_integration.rst |
Documents the operational impact of owner-only config permissions for external trainers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR fixes a live credential exposure by rewriting
Confidence Score: 5/5Safe to merge — the atomic write and permission-hardening logic is correct, the fail-closed contract holds, and the test suite thoroughly exercises the key paths. The core change is a well-scoped security fix: No files require special attention — all three changed files are straightforward and correctly implemented. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[to_json called] --> B["mkstemp(dir=config_dir)\ncreates temp file, returns fd"]
B --> C{os.name == posix?}
C -- Yes --> D["fchmod(fd, 0o600)"]
C -- No --> E["skip fchmod\n(Windows: rely on dir ACLs)"]
D -->|fchmod raises| F[except BaseException]
D --> E
E --> G["fdopen(fd, 'w')\nfd_owned = False"]
G --> H["json.dump(config, f)"]
H -->|write raises| F
H --> I["os.replace(tmp_path, config_file)\natomic rename"]
I -->|replace raises| F
I --> J[Done — config_file is\nowner-only regular file]
F --> K["close fd if still owned\n(fd_owned == True)"]
K --> L["os.remove(tmp_path)\nbest-effort cleanup"]
L --> M["raise — original\nconfig_file untouched"]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[to_json called] --> B["mkstemp(dir=config_dir)\ncreates temp file, returns fd"]
B --> C{os.name == posix?}
C -- Yes --> D["fchmod(fd, 0o600)"]
C -- No --> E["skip fchmod\n(Windows: rely on dir ACLs)"]
D -->|fchmod raises| F[except BaseException]
D --> E
E --> G["fdopen(fd, 'w')\nfd_owned = False"]
G --> H["json.dump(config, f)"]
H -->|write raises| F
H --> I["os.replace(tmp_path, config_file)\natomic rename"]
I -->|replace raises| F
I --> J[Done — config_file is\nowner-only regular file]
F --> K["close fd if still owned\n(fd_owned == True)"]
K --> L["os.remove(tmp_path)\nbest-effort cleanup"]
L --> M["raise — original\nconfig_file untouched"]
Reviews (4): Last reviewed commit: "Merge branch 'main' into yuantingh/clien..." | Re-trigger Greptile |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4855 +/- ##
=======================================
Coverage 56.97% 56.97%
=======================================
Files 969 969
Lines 92262 92281 +19
=======================================
+ Hits 52563 52577 +14
- Misses 39699 39704 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Context for reviewers: this PR is defense-in-depth, not a complete fix for AUTH_TOKEN exposure. The same credential is also leaked via the CJ command line ( |
…lure Review fix (PR NVIDIA#4855): the previous approach opened the target with O_TRUNC before fchmod, so the fail-closed path (fchmod denied) had already wiped the original file to empty — data loss even though no secret was written. Write to a sibling temp file secured with fchmod(0600) before any content, then os.replace() into place. On failure the original file is never touched (no truncation), and a planted symlink at the config path is replaced by a regular owner-only file rather than being written through to its target. Tests updated to assert the original content survives a failed write and that a symlink target is never written through. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
What
Write the Client API config (
client_api_config.json) owner-only on POSIX. It embeds liveAUTH_TOKEN/AUTH_TOKEN_SIGNATUREand was written with the default umask — world-readable on most systems.ClientConfig.to_json(the single choke point for every writer:write_config_to_file,ClientAPILauncherExecutor.prepare_config_for_launch,ExternalConfigurator) now:O_NOFOLLOW(rejects a planted symlink at the config path);fchmods the open descriptor to0600— applies whether the file is newly created or pre-existing (anO_CREATmode only takes effect on creation);fchmodis skipped and this is documented honestly — protection there relies on directory ACLs.Program context
Client API Execution Modes (2.9) — EP-1, Wave 0 of the plan.
Design:
docs/design/client_api_execution_modes.md§ "Appendix B — Bootstrap config protection"Plan:
docs/design/client_api_execution_modes_plan.md(PR #4853)Depends on: none · Unblocks: TE-2 (bootstrap config writer reuses this). Fixes a live exposure on today's external_process path independent of the rest of the program.
Behavior note (release-relevant)
Deployments where an externally started trainer runs as a different OS user than the FL client must now explicitly re-permission the config (or run same-user) — by design per the bootstrap-config protection contract. The 3rd-party integration doc is updated with this note.
Testing
New
config_test.py: fresh write is0600; pre-existing0644file tightened to0600; fail-closed whenfchmodis denied (token not written); symlink target rejected and left untouched. POSIX-only assertions guarded withskipif. 7 new + 377 regression pass; style clean.🤖 Generated with Claude Code