fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths#56
Conversation
There was a problem hiding this comment.
Code Review
This pull request fixes memory pool leaks and file descriptor/buffer leaks in the certificate and script store IPC paths, as well as in various reply paths within the main process. These leaks occurred when port message allocation or socket writes failed. The review feedback highlights a need to adjust the cleanup logic in src/nxt_main_process.c to ensure the correct order of resource release and to use the work queue for buffer completion handlers to maintain consistency with the rest of the codebase.
5a9f37d to
c88ede1
Compare
|
Full audit could be found andypost#10 and andypost#9 |
|
Consider adding regression / fault-injection coverage for these paths before merge (or as a follow-up PR). This class of bugs is difficult to catch with normal integration testing since it only manifests under allocator failure or RPC stream exhaustion conditions. Suggested coverage:
Implementation-wise, this could likely start with targeted Not a merge blocker, but without targeted fault-injection coverage future port-layer refactors could accidentally reintroduce these leaks. |
|
Follow-up coverage plan for the allocator/RPC exhaustion paths mentioned in #56 (comment):
I’m treating this as follow-up coverage rather than a merge blocker for 1.35.5, and I’ll start it as a dedicated PR. |
Closes the last gap in the phpclub coverage request on freeunitorg#56: > verification that mp refcounts return to baseline after error paths The cert/script-store sender fix moved nxt_mp_retain(mp) to AFTER the successful nxt_port_socket_write() call so a pre-queue failure does not leave the temp config pool with an unbalanced retain. This commit pins that pattern down so a future refactor that re-introduces "retain-before-send" in a new caller (or accidentally undoes the fix) breaks the suite. * Adds nxt_mp_test_retain_count(mp) in src/nxt_mp.c — a tiny NXT_TESTS-gated accessor for the internal retain counter (private to nxt_mp.c otherwise). * Adds nxt_port_fail_test_mp_baseline sub-test that: - creates a transient mp (retain == 1 baseline); - invokes nxt_port_fail_test_sender_pattern() — a deliberate mirror of the audit-fixed nxt_cert_store_get / nxt_script_store_get shape: allocate buf in mp, attempt write, retain only on success; - forces nxt_port_msg_alloc() to fail so the write returns NXT_ERROR before the message reaches the port queue; - asserts retain count == baseline (no retain was bumped); - asserts the completion handler did not run (buf never entered the queue). * Sanity-checked the regression-detection direction: temporarily moving the retain to before nxt_port_socket_write() inside nxt_port_fail_test_sender_pattern() makes the new sub-test fail with the expected message ("retain before 1, after 2"). phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — commit 1 2. simulated stream-id pool exhaustion — commit 1 3. mp refcounts return to baseline after error — THIS commit 4. fd leak checks via /proc/self/fd — commit 1 5. completion handlers execute on enqueue failure — commit 2 Validation: ./configure --tests && make -j2 tests && build/tests — `port failure test passed` line still emitted; all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Coverage map for your review checklist (#56 (comment)), now landed in stacked PR #57:
All four sub-tests pass locally ( |
|
@claude review |
|
@claude review once |
Some upstreams (Gitea, strict HTTP/1.1 backends) reject
Transfer-Encoding: chunked and require Content-Length on
forwarded requests. Buffer the chunked body, compute its
length, and emit Content-Length while skipping the original
TE field via field->skip.
Also fix a buffer-stall in nxt_h1p_conn_request_body_read:
nxt_http_chunk_parse leaves b->mem.pos unchanged on the
CHUNK_MIDDLE path, so the next nxt_conn_read had zero space
and the connection hung on bodies larger than body_buffer_size.
Reset/compact the buffer between continuations.
Test infrastructure:
- fake_upstream: Rust HTTP mock with requires-cl, no-te,
strict, echo modes for deterministic backend-behavior tests
- run-local-temp.sh: fast dev runner (direct mount, no rsync)
- run-local.sh: auto-enable clang-ast on C changes,
build fake_upstream into image
Closes nginx#58
Refs nginx#445, nginx#1088, nginx#1278
Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs nginx#58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
The 4 fake_upstream-dependent tests in test_proxy_chunked.py failed in CI with FileNotFoundError: '/usr/local/bin/fake_upstream' — the Rust mock binary was never built. Install rust + cargo build for the unit and python-3.x matrix jobs that actually run these tests. Also add a module-level skipif marker so local runs without a Rust toolchain skip the 4 affected tests instead of failing hard. Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs nginx#58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
|
This is a well-structured reliability fix focused on IPC failure-path correctness and resource ownership semantics. What I particularly like in this PR:
The patch is narrowly scoped and does not alter the normal success-path behavior, which keeps regression risk relatively low. One thing worth double-checking before merge: nxt_work_queue_add(&task->thread->engine->fast_work_queue,
out->completion_handler, task, out,
out->parent);This assumes nxt_assert(out->completion_handler != NULL);I would also consider a follow-up audit of other
Overall this looks like a solid maintainer-grade cleanup of several subtle leak paths. |
Large chunked requests (>16KB) can span multiple buffers. Previous code only counted first buffer in r->body chain → upstream received truncated body with wrong Content-Length. Iterate over full chain matching pattern in nxt_router.c:5709. Refs nginx#58
Reply to phpclub on #56 (comment 4472837908)Paste body into the PR review thread via the GitHub web UI. Thanks for the review. On On the audit of
|
|
squashin commits |
The cert/script-store IPC pattern and several main-process reply paths have latent leaks reachable when nxt_port_msg_alloc() (or the RPC stream-id pool) hits malloc failure inside the port machinery. Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp) was issued before nxt_port_socket_write(), so any failure path between the retain and a successful send left the pool with a refcount that the buffer's completion handler (which is what invokes nxt_mp_release) could never run. Moved the retain to after the buffer is handed off to the port machinery so failure paths above no longer pin the pool. Reply / main-process side: cert_store_get_handler, script_store_get_ handler, main_port_socket_handler, main_port_access_log_handler, and nxt_controller_conf_store all called nxt_port_socket_write() with an owned fd (or a buffer in the engine memory pool) and either ignored the return value with a (void) cast or skipped the failure branch entirely. On non-OK return the port layer never takes ownership, so the fd and the buffer's completion handler were both leaked in the privileged main process. Each site now closes the fd explicitly and (where applicable) queues the buffer completion onto the engine fast work queue so the engine pool reclaims memory. Use nxt_fd_close() rather than nxt_file_close() on the cert/script error paths: file.name has already been freed and the latter would dereference it through "%FN" on a close-failure log path. Also documents the ownership contract over nxt_port_socket_write2() in src/nxt_port.h: on NXT_OK, ownership of fd, fd2, and b transfers to the port layer; on any other return, the caller retains ownership and is responsible for closing fd/fd2 and dispatching b's completion handler. Addresses phpclub's review ask on freeunitorg#56. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
df6b00e to
71fcd89
Compare
Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR nginx#56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_test_rpc_register_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR nginx#57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR nginx#56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_rpc_test_alloc_failures() / nxt_port_rpc_test_insert_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR nginx#57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(tls): stop SSL_write busy-loop on peer-initiated close
SSL_ERROR_SYSCALL(errno=0) and SSL_ERROR_ZERO_RETURN both indicate
the peer closed the connection. On the read path this is a clean EOF;
on the write path it means no further data can be sent and the loop
must terminate.
Before this fix the write path fell through to the read-side handler,
setting socket.closed=1 and returning success, causing the router to
retry SSL_write indefinitely until the event engine timed out.
- SSL_ERROR_SYSCALL + WRITE: use sys_err if non-zero, else ECONNRESET
- SSL_ERROR_ZERO_RETURN + WRITE: always ECONNRESET
Also bumps OpenSSL in CI from 3.6.0 to 3.6.2.
Closes #28 test: fix process filter for single-digit PIDs in containers
Substring match `main_pid in l` false-positives when unit gets a
low PID (e.g. 9) in Docker: "9" matches "2189" in zombie entries.
Word-boundary regex \b<pid>\b fixes the check
- Remove TIPC (domain 40) deny rule — profile replaces Docker's
default entirely when --security-opt is used, so only the
targeted AF_ALG block belongs here (single-purpose profile)
- README: clarify mitigation ≠ fix, add explicit "not applied
automatically" warning, fix verify examples to use
latest-python3.13-slim (latest-minimal has no python3)
- CI: run seccomp tests against both python:3.13-slim-trixie and
ghcr.io/freeunitorg/freeunit:latest-python3.13-slim
- test script: drop TIPC test (rule removed), update comment
* review: tighten SSL_write skip_alerts, harden write-abort test
Address review feedback on the issue #28 fix:
* test_tls.py: replace the broad `SSL_write.+failed` skip with the
specific syscall/zero-return signatures this test produces, so
unrelated SSL_write regressions are not silently masked.
* test_tls.py: bump the response body from 1 MB to 16 MB so the
server is reliably mid-write when the client tears the
connection down on hosts with large autotuned SO_SNDBUF.
* conftest.py: pre-compile the main-PID match and use re.escape()
for safety; minor cleanup of the per-line search.
* nxt_openssl.c: drop the redundant `!= 0` in the ternary to match
surrounding style.
No functional change to the TLS fix itself.
* Update version
* feat(http): convert chunked request body to Content-Length
Some upstreams (Gitea, strict HTTP/1.1 backends) reject
Transfer-Encoding: chunked and require Content-Length on
forwarded requests. Buffer the chunked body, compute its
length, and emit Content-Length while skipping the original
TE field via field->skip.
Also fix a buffer-stall in nxt_h1p_conn_request_body_read:
nxt_http_chunk_parse leaves b->mem.pos unchanged on the
CHUNK_MIDDLE path, so the next nxt_conn_read had zero space
and the connection hung on bodies larger than body_buffer_size.
Reset/compact the buffer between continuations.
Test infrastructure:
- fake_upstream: Rust HTTP mock with requires-cl, no-te,
strict, echo modes for deterministic backend-behavior tests
- run-local-temp.sh: fast dev runner (direct mount, no rsync)
- run-local.sh: auto-enable clang-ast on C changes,
build fake_upstream into image
Closes #58
Refs nginx#445, nginx#1088, nginx#1278
* fix commit
* fix(http): use header->mem.end as nxt_sprintf bound
Idiomatic upper bound from header buffer end instead of
pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling.
Refs #58
test(fake_upstream): handle RFC 9112 chunk extensions
Strip chunk-ext (5;ext=val) before hex parse so a future
client emitting extensions does not hit unwrap_or(0).
build(test): split clang-ast into run-local-full.sh
run-local.sh --clang-ast was broken: Xclang plugin loaded
during ./configure feature-detection trips "no atomic
operations found" before make even starts.
Move clang-ast to a dedicated runner with its own image
(freeunit-test-full:local, debian:testing for clang 21).
run-local.sh now focuses on the test fast-path only and
auto-prefixes bare pytest node-ids with test/.
run-local-temp.sh dropped — same fast-path is in run-local.sh.
TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
* ci: build fake_upstream for proxy chunked tests
The 4 fake_upstream-dependent tests in test_proxy_chunked.py failed
in CI with FileNotFoundError: '/usr/local/bin/fake_upstream' — the
Rust mock binary was never built. Install rust + cargo build for the
unit and python-3.x matrix jobs that actually run these tests.
Also add a module-level skipif marker so local runs without a Rust
toolchain skip the 4 affected tests instead of failing hard.
Idiomatic upper bound from header buffer end instead of
pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling.
Refs #58
test(fake_upstream): handle RFC 9112 chunk extensions
Strip chunk-ext (5;ext=val) before hex parse so a future
client emitting extensions does not hit unwrap_or(0).
build(test): split clang-ast into run-local-full.sh
run-local.sh --clang-ast was broken: Xclang plugin loaded
during ./configure feature-detection trips "no atomic
operations found" before make even starts.
Move clang-ast to a dedicated runner with its own image
(freeunit-test-full:local, debian:testing for clang 21).
run-local.sh now focuses on the test fast-path only and
auto-prefixes bare pytest node-ids with test/.
run-local-temp.sh dropped — same fast-path is in run-local.sh.
TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
* fix(http): count full buffer chain for Content-Length after chunked
Large chunked requests (>16KB) can span multiple buffers. Previous code
only counted first buffer in r->body chain → upstream received truncated
body with wrong Content-Length.
Iterate over full chain matching pattern in nxt_router.c:5709.
Refs #58
* fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths
The cert/script-store IPC pattern and several main-process reply paths
have latent leaks reachable when nxt_port_msg_alloc() (or the RPC
stream-id pool) hits malloc failure inside the port machinery.
Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp)
was issued before nxt_port_socket_write(), so any failure path between
the retain and a successful send left the pool with a refcount that the
buffer's completion handler (which is what invokes nxt_mp_release) could
never run. Moved the retain to after the buffer is handed off to the
port machinery so failure paths above no longer pin the pool.
Reply / main-process side: cert_store_get_handler, script_store_get_
handler, main_port_socket_handler, main_port_access_log_handler, and
nxt_controller_conf_store all called nxt_port_socket_write() with an
owned fd (or a buffer in the engine memory pool) and either ignored the
return value with a (void) cast or skipped the failure branch entirely.
On non-OK return the port layer never takes ownership, so the fd and
the buffer's completion handler were both leaked in the privileged main
process. Each site now closes the fd explicitly and (where applicable)
queues the buffer completion onto the engine fast work queue so the
engine pool reclaims memory.
Use nxt_fd_close() rather than nxt_file_close() on the cert/script
error paths: file.name has already been freed and the latter would
dereference it through "%FN" on a close-failure log path.
Also documents the ownership contract over nxt_port_socket_write2() in
src/nxt_port.h: on NXT_OK, ownership of fd, fd2, and b transfers to the
port layer; on any other return, the caller retains ownership and is
responsible for closing fd/fd2 and dispatching b's completion handler.
Addresses phpclub's review ask on #56.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(isolation): correct mount comparator and add regression coverage
Replace the broken qsort comparator in nxt_isolation_mount_compare()
which returned only 0 or 1. ISO C requires a comparator to return
negative / zero / positive, so the previous form produced undefined
behavior and the resulting mount order varied between runs of the
same input — surfacing as flaky failures of
test_go_isolation_rootfs_automount_tmpfs (#60).
The new comparator sorts on mnt->dst length and uses memcmp() as a
tie-breaker for deterministic total ordering. Switching the sort
key from mnt->src (no safety meaning) to mnt->dst means parent
paths sort before children when they share a prefix, which is the
ordering nxt_isolation_prepare_rootfs() actually needs.
Test:
* Extract the toggle pattern from
test_go_isolation_rootfs_automount_tmpfs into the helper
_assert_rootfs_tmpfs_toggle_stable so multiple iteration counts
share one body.
* The stable test runs 20 iterations — sufficient for deterministic
regression detection against #60 (the original failure surfaced on
the second iteration).
* A separate test_go_isolation_rootfs_automount_tmpfs_regression
runs 100 iterations as belt-and-suspenders coverage.
Fixes #60
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(docker): bump image version to 1.35.5
* feat(eol): add unit-eol-check CLI; sync EOL matrix with endoflife.date
New Rust CLI (pkg/eol/) validates pkg/eol.json against the
endoflife.date API: detects date mismatches, missed EOL flags,
new upstream versions absent from the matrix.
Stale dates corrected:
- alpine 3.20 2026-11→2026-04, 3.21 2027-11→2026-11
- debian 10 2024-06→2022-09, 11 2026-08→2024-08,
12 2028-06→2026-06, 13 2030-06→2028-08
- ubuntu 26.04 eol 2031-05→2031-04
- perl 5.40 eol 2028-07→2027-06
EOL flags added: go 1.24, node 20, fedora 42, alpine 3.20
New entries: node 26, perl 5.42, ruby 4.0, fedora 44, alpine 3.22/3.23
EOL.md synced to matrix.
* ● fix(eol): address review findings in unit-eol-check
- Cargo.toml: edition "2024" → "2021" (2024 not yet stable)
- api_eol_date: add missing category mappings (node→nodejs,
amazonlinux→amazon-linux, centos_stream→centos-stream);
without these, API lookups silently returned None for all
non-jsc runtimes/OSes
- now_yyyy_mm: hard-fail on date cmd error instead of using
a hardcoded fallback date that silently corrupts results
- --fix mode: remove misleading "output corrected eol.json"
comments; output was never JSON, just debug lines
- node 26: correct EOL 2028-04→2029-04, drop 2029-04→2030-04,
add lts:true (was copy-paste of node 24 dates)
- centos_stream 10: correct EOL 2035-05→2030-01 (API mismatch
caught by the tool itself after mapping fix)
- EOL.md: sync node 26 and centos_stream 10 rows to match
* ● docs(todo): add proxy request buffering design notes (issue #58)
Refs #58
* fix(isolation): order procfs/tmpfs/bind in mount comparator
The previous comparator change (sort by mnt->dst length, memcmp
tie-break) was correct C but inadvertently reversed the relative
order of procfs and tmpfs in nxt_isolation_set_lang_mounts:
Old (broken comparator): procfs, tmpfs, lang_deps...
New (dst length asc): tmpfs, procfs, lang_deps...
The new order made the test_go_isolation_rootfs_automount_tmpfs
flake reliably reproducible on CI's ubuntu-latest runners. Mounting
tmpfs at <rootfs>/tmp before mounting procfs at sibling <rootfs>/proc
triggers a kernel-side ENOENT on the procfs mount that the procfs-
first order avoids. The exact kernel root cause is not yet diagnosed
(a local docker reproducer surfaces different errnos than CI), but
the order sensitivity is empirically clear.
Add a primary sort key on mnt->type so PROC (3) comes before TMP
(2) before BIND (1), restoring the previously-working order. The
secondary length/memcmp keys are preserved for lang-deps ordering
within the BIND group.
Tracking the deeper kernel investigation under #60.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(http): add chunk extension support in chunked body parser (RFC 9112 §7.1)
Add sw_chunk_ext state to nxt_http_chunk_parse() that skips characters
after ';' on the chunk-size line until CR. Allows clients to send
chunk extensions (e.g. "4;ext=val\r\ntest\r\n") without causing
chunk_error. Extensions are silently discarded per RFC 9112 §7.1.
Refs nginx#1278
* test(http): add 7 chunked proxy tests — multi-chunk, empty body, extensions, POST/PUT/DELETE, concurrent
New tests in test_proxy_chunked.py:
- test_proxy_chunked_multi_chunk: 5 chunks of varying size, exercises
memmove compact path in nxt_h1proto.c
- test_proxy_chunked_empty_body: 0\r\n\r\n yields Content-Length: 0
- test_proxy_chunked_extensions: chunk extensions (;ext=val) stripped
- test_proxy_chunked_post/put/delete: HTTP methods with chunked body
- test_proxy_chunked_concurrent: 10 simultaneous requests via threads
All tests use fake_upstream strict mode for Content-Length validation.
Consolidated _fake_get_raw + _fake_request into single _chunked_request.
Refs #58, nginx#445, nginx#1278
* fix: review findings — HTTP parsing timeout, CI cache, comments
- test_proxy_chunked: _recvall (100ms race) → _read_http_request with
Content-Length parsing; socket timeout 5s
- run-local.sh: validate -t/-m arguments
- run-local-full.sh: preserve tmp dir on non-zero exit (clang-ast debug)
- ci.yml: use module output for condition; add cargo cache; drop +stable
- rust-toolchain.toml: pin stable channel
- nxt_cert.c, nxt_h1proto.c, nxt_script.c: trim verbose comments
- .gitignore: remove Cargo.lock (already tracked)
* fix(test): reject flag-like arg for -m/-t in run-local.sh
`-t -n` previously consumed `-n` as the test argument instead of
erroring. Guard now also fails when next token starts with `-`.
Refs PR #63 review
* fix(eol): harden unit-eol-check error handling and correct JSC dates
Address PR #62 review: API errors silently swallowed, category
4 mapping duplicated, curl ignored HTTP errors, date normalization
5 rejected valid YYYY-MM, approaching-EOL threshold used grace
6 period (36 months) instead of 12.
Extract shared api_category(), eliminate duplicate mapping
api_eol_date returns Result — callers distinguish network
failure from "version not found"
curl -f flag: HTTP 4xx/5xx no longer exit 0
Date normalization: len() >= 7 instead of >= 10
Mismatch.fetch_error field distinguishes fetch errors from
not-found; exit(2) only on all-fetch-fail
detect_new_versions uses api_category() instead of
separate hardcoded vec
generate_fix warns on Ok(None) and Err, no silent skip
as_object().unwrap() → let Some(obj) = ... else { continue }
months_between None → error entry instead of unwrap_or(0)
Approaching-EOL threshold: 12 months (was grace_months = 36)
Mismatch messages include direction: "behind" / "ahead"
JSC mapping: jdk → eclipse-temurin (old endpoint 404s)
JSC 17: EOL 2029-09 → 2027-10, JSC 21: 2031-09 → 2029-12
(dates were Oracle JDK, not Eclipse Temurin)
run.sh: fix empty array under set -u, add EOF newline
* rust: upgrade cc crate 1.2.30 -> 1.2.62 (#67)
Versions before 1.2.42 fail to compile with rust 1.90 on ppc64.
Link: rust-lang/cc-rs#1581
Co-authored-by: sertonix <sertonix@users.noreply.github.com>
* Docker: add builder mode, parallel default, reference build stats (#66)
* chore(docker): fix make clean, add apt-cacher-ng auto-detection
- Makefile: rm -f build-* → rm -f $(addprefix build-, $(MODVERSIONS))
to avoid deleting build-local.sh
- build-local.sh: auto-detect apt-cacher-ng on port 3142,
pass http_proxy build-arg to skip re-downloading ~100 MB of .deb
packages on repeated builds
- README.md: document apt-cacher-ng setup
* chore(docker): add builder mode, parallel default, reference build stats
- -b flag: pre-built builder images skip apt+Rust download for
minimal/wasm/php8.5 variants (~170 MB saved per build)
- default parallelism: nproc/2 instead of 1
- local/Dockerfile.{minimal,wasm,php8.5}: multi-stage variants for -b
- Dockerfile.builder-{trixie,php8.5}: builder image sources
- README: document builder mode, add reference build timings
(Ryzen 7 5700X, 32 GiB, NVMe — 23 variants in ~58 min)
- TODO: OpenTelemetry 0.24→0.32 upgrade plan with fake_otlp design
* Fix ignore
* TODO.md — expanded the OTEL crate upgrade section (issue #65, milestone
1.35.6):
- Added crate version table (0.24→0.32 gap per crate)
- Documented Ava Hahn's authorship and new contact (@ava-affine,
ava@sunnypup.io)
- Replaced flat task list with a phased plan:
- Phase 0a: audit existing config vs OpenAPI spec
- Phase 0b: new config fields (service_name, headers, root_certificate,
resource_attributes, max_queue_size, export_timeout) —
backward-compatible on current crates
- Phase 1: fake_otlp test infrastructure (mirrors fake_upstream pattern)
- Phase 2: crate rewrite (0.24→0.32)
- Phase 3: housekeeping (rename, cleanup)
- Added current 4-field config JSON for reference
- Listed hardcoded values that must become configurable with file/line
references
* chore(docker): rebrand Makefile library target, add release roadmap
Replaced stale nginx/unit references in `make library` (Maintainers,
GitRepo, GitFetch) with FreeUnit equivalents. Commented out the target with TODO for milestone 1.35.8 (Docker Hub Official Images). Added release roadmap and action plan to TODO.md covering milestones 1.35.5 through 1.35.8 (2026-08-28).
* fix(docker): ldd glob guard, proxy probe, builder race; add CentOS/AlmaLinux guide
Docker fixes:
- Guard ldd loop with `[ -f "$f" ] || continue` — bare glob expands
to literal string when no .so files exist, aborting requirements.apt
- Quote `"$f"` in ldd call (template + all Dockerfile.* + local/)
- Replace `nc -z` apt-proxy probe with bash /dev/tcp (nc absent in
minimal envs); move detection after arg parsing so -n skips probe
- Propagate ensure_builder failure: add `|| return 1` in build_variant
- Serialize builder pre-build before parallel loop — eliminates race
where minimal+wasm simultaneously pull/build the shared trixie image
Community docs:
- New pkg/community/CENTOS.ALMALINUX.md: CentOS Stream 9 / AlmaLinux 9
build guide (three PHP options, systemd unit, migration from nginx/unit)
- Prerequisites: add `dnf install -y epel-release` — brotli-devel is
EPEL-only on EL9
- Option B: add `dnf config-manager --enable remi` before unit-php —
dnf module enable does not activate the non-modular remi repo
* fix(docker): fix ldconfig
---------
Co-authored-by: Alexandr Smirnov <a.smirnov@dwteam.ru>
* chore(release): prep 1.35.5 docs and unitctl metadata
- docs/changes.xml: add 1.35.5 entries (chunked→CL, TLS busy-loop,
IPC leaks, mount comparator, njs 0.9.8, unfreeze-sync.sh)
- docs/unit-openapi.yaml: title 1.35.2 → 1.35.5
- unitctl/*/Cargo.toml: version 1.35.0 → 1.35.5
- unit-openapi: authors/description/edition/docs URL → FreeUnit
- unitctl: deb/rpm metadata reference FreeUnit
- README: socket path nginx-unit.control.sock → control.unit.sock
---------
Co-authored-by: Alexandr Smirnov <a.smirnov@dwteam.ru>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Andy Postnikov <apostnikov@gmail.com>
Co-authored-by: Sertonix <sertonix@posteo.net>
Co-authored-by: sertonix <sertonix@users.noreply.github.com>
Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR nginx#56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_rpc_test_alloc_failures() / nxt_port_rpc_test_insert_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR nginx#57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56. Sub-tests: * socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion not invoked, /proc/self/fd balanced (Linux only). * rpc_register — forces nxt_port_rpc_register_handler_ex alloc and lvlhsh-insert failures; asserts use_count == 1, rpc_streams empty. * error_handler — queues a send_msg with a closeable fd + non-sync buffer, runs nxt_port_error_handler() via an NXT_TESTS trampoline, drains the fast work queue, asserts completion enqueued (not called synchronously) and runs exactly once. * mp_baseline — mirrors the audit-fixed sender shape, forces a pre-queue failure, asserts mp->retain is unchanged. NXT_TESTS-gated trampolines for the harness: nxt_port_test_msg_alloc_ failures()/nxt_port_test_run_error_handler() (nxt_port_socket.c), nxt_port_rpc_test_alloc_failures()/_insert_failures() (nxt_port_rpc.c), nxt_mp_test_retain_count() (nxt_mp.c). Test buffers are heap-owned from a transient mp, matching production callers (nxt_cert_store_get allocates from temp_conf mp). The injected stack engine documents its invariant: the exercised error path must touch nothing beyond fast_work_queue. Also resets file.fd to -1 after nxt_fd_close on the send-fail path in nxt_cert.c and nxt_script.c — defense in depth against an accidental double-close if code is later added below the close. No behavior change. Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tests Stacked follow-up for #56 and the coverage plan in #56 (comment). Adds test-only fault injection for IPC message allocation and RPC registration failures, then covers core ownership invariants: failed queued nxt_port_socket_write() leaves the caller-owned fd open before ownership transfer; the unsent buffer completion is not run by the port layer; /proc/self/fd counts stay balanced where available; failed RPC registration allocation and insert leave use_count unchanged and no stream registration behind. The branch is based on andypost:claude/port-ipc-completion-leaks / PR #56, so it should be retargeted or rebased once #56 lands. Validation: make -j2; ./configure --tests; make tests -j2; build/tests
The cert/script-store IPC pattern has two latent leaks reachable when nxt_port_msg_alloc() (or the RPC stream-id pool) hits malloc failure inside the port machinery:
Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp) was issued before nxt_port_socket_write(), so any failure path between the retain and a successful send left the pool with a refcount that the buffer's completion handler (which is what invokes nxt_mp_release) could never run. The buf was allocated from the temp_conf mp; the unbalanced retain pinned the entire pending config until process exit.
Move the retain to after the successful socket_write call, matching the existing correct pattern in nxt_router_access_log_reopen(). The buffer's completion is scheduled via nxt_work_queue_add and only runs after the current handler returns, so retaining synchronously after the send is race-free.
Receiver side (nxt_cert_store_get_handler, nxt_script_store_get_handler, nxt_main_port_socket_handler, nxt_main_port_access_log_handler): nxt_port_socket_write() with NXT_PORT_MSG_CLOSE_FD relies on the port layer to close the fd once the message is sent or its error_handler fires. When nxt_port_msg_alloc() inside chk_insert returns NXT_ERROR the message never enters port->messages, the error_handler never sees it, and the fd was leaked in the privileged main process.
The socket-listen reply path additionally leaked the diagnostic 'out' buffer allocated from the engine mem_pool when the send failed.
Capture the return value, close the fd explicitly, and queue the buffer's completion handler when the port layer did not take ownership. Use nxt_socket_close for the listening socket and nxt_file_close for the access-log file; keep nxt_fd_close in the cert/script handlers since file.name is freed before the close path and nxt_file_close's "%FN" alert format would dereference it on a close-failure log.
Both leaks are pre-existing in upstream Unit; reachable only under memory pressure / RPC-stream exhaustion. No protocol or config-surface changes.
Proposed changes
Describe the use case and detail of the change. If this PR addresses
a GitHub issue, include a link to it.
Checklist