fix(isolation): close rootfs automount reload race (#83) by andypost · Pull Request #84 · freeunitorg/freeunit

andypost · 2026-06-22T14:28:27Z

Summary

Fixes the rootfs automount reload race tracked in #83 — the root cause behind the flaky test_go_isolation_rootfs_automount_tmpfs[_regression] failure on go-1.25 (green on go-1.26 purely by timing; see PR #78 run 27793786107).

Stacking note: the substantive change is a single commit — 9b3f26cc fix(isolation): close rootfs automount reload race (40 lines in src/nxt_isolation.c). Because the pre-1.35.6 base does not yet carry the in-flight 1.35.6 hardening commits (incl. 03e6b0ee, the openat2(RESOLVE_BENEATH) work this fix sits on top of), the diff currently also shows those commits. Review the last commit. Once the hardening lands on pre-1.35.6, this collapses to the one-commit fix.

Root cause

For a rootfs app with namespaces.mount: true, the child sets up as:
unshare(CLONE_NEWNS) → nxt_isolation_prepare_rootfs() mounts <rootfs>/proc → only then pivot_root() severs propagation.

unshare(CLONE_NEWNS) copies the host mount tree and its propagation type; on a systemd host / (and /tmp, where the test rootfs lives) is MS_SHARED, so the new namespace starts as a peer of the host. During the mount loop the new <rootfs>/proc is still propagation-linked to the host — exactly while the previous worker generation's host-side teardown does umount2(<rootfs>/proc, MNT_DETACH) on the same path. They collide → mount(...proc...) ENOENT → prototype exits 1 → respawn storm → never converges (status=None) → ~53 leaked router fds.

The fix — two additive, `CLONE_NEWNS`-gated changes in `src/nxt_isolation.c`

nxt_isolation_prepare_rootfs() — make the new namespace root recursively private (mount("", "/", MS_REC|MS_PRIVATE)) before the mount loop (above the openat2 check). MS_PRIVATE, not MS_SLAVE: a slave still receives propagation inbound, so a host umount would propagate back in. Blocks both directions.
nxt_isolation_unmount_all() — when the worker had its own mount namespace, return early instead of host-side umount2. Those mounts are reaped with the namespace on exit; the host umount2 was the (22: Invalid argument) no-op that raced the next generation.

The non-namespaced (chroot) path is unchanged and still unmounts in the host namespace. The openat2 RESOLVE_BENEATH check is untouched.

Verification

Builds clean (make, no warnings; unitd links).
Authoritative check is the go-1.25 matrix job (the failing toolchain) — local repro isn't possible on go-1.26.
sudo pytest-3 test/test_go_isolation.py::test_go_isolation_rootfs_automount_tmpfs_regression
The transient mount(...proc...) (2: No such file or directory) alert should no longer appear, and router fds should stay bounded across the 100× reload loop.

Follow-up (separate commit, after this is green on go-1.25)

Remove skip_alert(_TMPFS_RELOAD_ALERT) in test/test_go_isolation.py (:413/:422) so the alert becomes a hard regression guard rather than being skipped.

Closes #83

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces a comprehensive set of security hardenings and bug fixes across FreeUnit, focusing on bounds-checking, resource isolation, and memory safety. Key improvements include validating TLS private keys, tightening file-descriptor lifetimes, and adding robust bounds checks for WebSockets, shared memory, and language bindings. The review feedback identifies a prefix-matching bug in the rootfs validation within src/nxt_isolation.c that could lead to false positives, a critical integer underflow vulnerability in src/nxt_unit.c when checking sptr bounds with small buffer sizes, and a recommendation to use NULL instead of empty strings in mount calls for better compatibility.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b3f26cc97

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

An app with rootfs + namespaces.mount + automount under a rapid config reload loop hits a race: the new prototype mounts <rootfs>/proc while the previous worker generation's host-side teardown does umount2(<rootfs>/proc, MNT_DETACH) on the same path, with no serialization. The loser gets mount(...proc...) ENOENT, the prototype exits 1, a respawn storm follows, the generation never converges and dangling app<->router fds trip the fd-leak teardown check. The Go toolchain only shifts the timing (surfaces on go-1.25, hidden on go-1.26). Root cause: unshare(CLONE_NEWNS) copies the host mount tree and its propagation type; on a systemd host "/" (and "/tmp") is MS_SHARED, so the new namespace starts as a peer of the host. pivot_root() only severs propagation after nxt_isolation_prepare_rootfs() has already mounted proc, so during the mount loop the new proc mount is still propagation-linked to the host exactly while the old generation umount2()s the same path. Two additive, CLONE_NEWNS-gated changes: * nxt_isolation_prepare_rootfs(): make the new namespace root recursively private (mount("", "/", MS_REC|MS_PRIVATE)) before the mount loop. MS_PRIVATE (not MS_SLAVE) is required: a slave still receives propagation inbound, so a host umount would propagate back in. * nxt_isolation_unmount_all(): when the worker had its own mount namespace, return early instead of host-side umount2 — those mounts are reaped with the namespace on exit; the host umount2 was the EINVAL no-op that raced the next generation. The non-namespaced (chroot) path is unchanged and still unmounts in the host namespace. Refs nginx#83 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

andypost · 2026-06-22T15:31:01Z

@claude review once

claude · 2026-06-22T15:31:18Z

Claude finished @andypost's task in 2m 11s —— View job

Code Review

Gather context and read the diff
Analyze isolation/mount changes
Review against existing review comments
Post final review

The two-part fix is logically sound and correctly targets the race. Here's a detailed look at each change.

Change 1 — `nxt_isolation_unmount_all()`: early return on NEWNS (`src/nxt_isolation.c:760-774`)

Correct. When a worker exits with CLONE_NEWNS, the kernel tears down its private mount namespace — including every mount inside it — atomically on process exit. Calling umount2(<rootfs>/proc, MNT_DETACH) from the host namespace after that point returns EINVAL because those paths are simply not mounted in the host (hence the (22: Invalid argument) log lines mentioned in the PR). Worse, doing so races the next-generation prototype that is simultaneously mounting the same path. The early return is the right fix.

One thing worth verifying: nxt_isolation_unmount_all is only ever invoked in the parent/router context (via process->isolation.cleanup, set at :602), never from within the child's namespace. From reading the code that appears to be the case, so the guard is placed correctly.

Change 2 — `nxt_isolation_prepare_rootfs()`: `MS_REC|MS_PRIVATE` before mount loop (`src/nxt_isolation.c:810-832`)

Correct and well-timed. The race window is: unshare(CLONE_NEWNS) inherits MS_SHARED propagation from the host → mount loop begins → pivot_root severs propagation. The fix closes this window by making the namespace private before the loop, so no mount event can propagate out to the host peer group or be received from it.

One notable interaction with nxt_isolation_pivot_root (:918): that function opens with:

mount("", "/", "", MS_SLAVE|MS_REC, "")

Once the namespace is already MS_PRIVATE, this call has no master peer to slave to, making it a no-op for propagation purposes (Linux silently accepts it; the mount stays PRIVATE). This is harmless — PRIVATE is strictly stronger than SLAVE — but the MS_SLAVE call in pivot_root is now dead code in the NEWNS path. That function is outside this PR's scope, so just noting it as a follow-up cleanup opportunity.

Similarly, the nxt_isolation_make_private_mount() call inside pivot_root (which reads /proc/self/mounts and privates specific ancestor mounts) is now fully redundant — everything is already PRIVATE. Again harmless, just extra work.

On `""` vs `NULL` in the `mount()` call (`:827`)

mount("", "/", "", MS_REC | MS_PRIVATE, "")

The kernel ignores source, filesystemtype, and data for propagation-only remounts, so both "" and NULL work. The author's response to the Gemini comment is correct: this matches the pre-existing pattern at line 918 (MS_SLAVE|MS_REC). The minor inconsistency with line 946 (MS_SLAVE | MS_REC, NULL) already existed before this PR. Not a blocker.

Comments / documentation

The block comments are unusually thorough and accurate, explaining the MS_SHARED systemd host scenario, the PRIVATE-vs-SLAVE distinction, and the race mechanism. Good for future readers.

Summary

Both changes are correct, complementary, and minimal. No issues that would block merge. The main follow-up items (removing skip_alert(_TMPFS_RELOAD_ALERT) in the test, and eventual cleanup of the now-redundant MS_SLAVE in nxt_isolation_pivot_root) are already tracked in the PR description. The authoritative signal is the go-1.25 CI matrix job as called out in the PR.

andypost · 2026-06-22T15:56:12Z

Folded into the 1.35.6 hardening PR #85 as 75e3b77f — the rootfs automount reload-race fix now ships as part of that stack so it lands together (it was green on go-1.25 standalone here). Closing this PR as subsumed; the fix itself is unchanged.

andypost · 2026-06-22T16:05:22Z

Correction: this PR is not subsumed — disregard the earlier "folded into #85" note. Since #84 is approved (thanks @TuzelKO) and focused/green, it stays the vehicle for the #83 fix. I reverted the fold: #85 was force-pushed back to the pure 14-commit hardening stack (the 75e3b77f cherry-pick is dropped), so this PR is again the standalone fix for #83 and can merge on its own. #85's go-1.25 job will stay red until this lands on pre-1.35.6; I'll rebase #85 afterward.

Add a 1.35.6 Bugfix entry for the rootfs automount reload race fixed in this PR (freeunitorg#83). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

andypost · 2026-06-23T17:17:57Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a rootfs automount reload race condition. It modifies nxt_isolation_unmount_all to skip unmounting when CLONE_NEWNS is active, since the kernel automatically reaps those mounts upon namespace termination. It also updates nxt_isolation_prepare_rootfs to detach the namespace from the host peer group using MS_REC | MS_PRIVATE before mounting, preventing propagation races. There are no review comments, and I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread src/nxt_isolation.c Outdated

Comment thread src/nxt_unit.c Outdated

Comment thread src/nxt_isolation.c

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread src/nxt_file.c Outdated

andypost force-pushed the fix/83-rootfs-automount-reload-race branch from 9b3f26c to 129a60a Compare June 22, 2026 14:42

andypost mentioned this pull request Jun 22, 2026

docs(security): add 14-vector audit + remediation tracker andypost/unit#10

Open

2 tasks

andypost requested a review from TuzelKO June 22, 2026 15:30

andypost mentioned this pull request Jun 22, 2026

fix(security): 1.35.6 hardening stack (14-vector audit remediation) #85

Draft

TuzelKO approved these changes Jun 22, 2026

View reviewed changes

andypost closed this Jun 22, 2026

andypost deleted the fix/83-rootfs-automount-reload-race branch June 22, 2026 16:00

andypost restored the fix/83-rootfs-automount-reload-race branch June 22, 2026 16:00

andypost reopened this Jun 22, 2026

andypost marked this pull request as draft June 22, 2026 16:02

andypost marked this pull request as ready for review June 22, 2026 16:02

docs(changes): record rootfs automount reload-race fix

7bd4f6f

Add a 1.35.6 Bugfix entry for the rootfs automount reload race fixed in this PR (freeunitorg#83). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

TuzelKO mentioned this pull request Jun 22, 2026

Rootfs automount reload race: umount2(<rootfs>/proc) in host ns vs new prototype's mount() (flaky test_go_isolation_rootfs_automount_tmpfs) #83

Closed

gemini-code-assist Bot reviewed Jun 23, 2026

View reviewed changes

andypost merged commit eb0801f into freeunitorg:pre-1.35.6 Jun 24, 2026
21 checks passed

andypost deleted the fix/83-rootfs-automount-reload-race branch June 24, 2026 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(isolation): close rootfs automount reload race (#83)#84

fix(isolation): close rootfs automount reload race (#83)#84
andypost merged 2 commits into
freeunitorg:pre-1.35.6from
andypost:fix/83-rootfs-automount-reload-race

andypost commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

andypost commented Jun 22, 2026

Uh oh!

claude Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

andypost commented Jun 22, 2026

Uh oh!

andypost commented Jun 22, 2026

Uh oh!

andypost commented Jun 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

andypost commented Jun 22, 2026

Summary

Root cause

The fix — two additive, CLONE_NEWNS-gated changes in src/nxt_isolation.c

Verification

Follow-up (separate commit, after this is green on go-1.25)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

andypost commented Jun 22, 2026

Uh oh!

claude Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Change 1 — nxt_isolation_unmount_all(): early return on NEWNS (src/nxt_isolation.c:760-774)

Change 2 — nxt_isolation_prepare_rootfs(): MS_REC|MS_PRIVATE before mount loop (src/nxt_isolation.c:810-832)

On "" vs NULL in the mount() call (:827)

Comments / documentation

Summary

Uh oh!

andypost commented Jun 22, 2026

Uh oh!

andypost commented Jun 22, 2026

Uh oh!

andypost commented Jun 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

The fix — two additive, `CLONE_NEWNS`-gated changes in `src/nxt_isolation.c`

claude Bot commented Jun 22, 2026 •

edited

Loading

Change 1 — `nxt_isolation_unmount_all()`: early return on NEWNS (`src/nxt_isolation.c:760-774`)

Change 2 — `nxt_isolation_prepare_rootfs()`: `MS_REC|MS_PRIVATE` before mount loop (`src/nxt_isolation.c:810-832`)

On `""` vs `NULL` in the `mount()` call (`:827`)