fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths by andypost · Pull Request #56 · freeunitorg/freeunit

andypost · 2026-05-07T23:44:08Z

The cert/script-store IPC pattern has two latent leaks reachable when nxt_port_msg_alloc() (or the RPC stream-id pool) hits malloc failure inside the port machinery:

Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp) was issued before nxt_port_socket_write(), so any failure path between the retain and a successful send left the pool with a refcount that the buffer's completion handler (which is what invokes nxt_mp_release) could never run. The buf was allocated from the temp_conf mp; the unbalanced retain pinned the entire pending config until process exit.

Move the retain to after the successful socket_write call, matching the existing correct pattern in nxt_router_access_log_reopen(). The buffer's completion is scheduled via nxt_work_queue_add and only runs after the current handler returns, so retaining synchronously after the send is race-free.
Receiver side (nxt_cert_store_get_handler, nxt_script_store_get_handler, nxt_main_port_socket_handler, nxt_main_port_access_log_handler): nxt_port_socket_write() with NXT_PORT_MSG_CLOSE_FD relies on the port layer to close the fd once the message is sent or its error_handler fires. When nxt_port_msg_alloc() inside chk_insert returns NXT_ERROR the message never enters port->messages, the error_handler never sees it, and the fd was leaked in the privileged main process.

The socket-listen reply path additionally leaked the diagnostic 'out' buffer allocated from the engine mem_pool when the send failed.

Capture the return value, close the fd explicitly, and queue the buffer's completion handler when the port layer did not take ownership. Use nxt_socket_close for the listening socket and nxt_file_close for the access-log file; keep nxt_fd_close in the cert/script handlers since file.name is freed before the close path and nxt_file_close's "%FN" alert format would dereference it on a close-failure log.

Both leaks are pre-existing in upstream Unit; reachable only under memory pressure / RPC-stream exhaustion. No protocol or config-surface changes.

Proposed changes

Describe the use case and detail of the change. If this PR addresses
a GitHub issue, include a link to it.

Checklist

I have read CONTRIBUTING.md
If applicable, I have added tests
If applicable, I have updated documentation

gemini-code-assist

Code Review

This pull request fixes memory pool leaks and file descriptor/buffer leaks in the certificate and script store IPC paths, as well as in various reply paths within the main process. These leaks occurred when port message allocation or socket writes failed. The review feedback highlights a need to adjust the cleanup logic in src/nxt_main_process.c to ensure the correct order of resource release and to use the work queue for buffer completion handlers to maintain consistency with the rest of the codebase.

andypost · 2026-05-08T01:03:46Z

Full audit could be found andypost#10 and andypost#9

phpclub · 2026-05-13T20:46:20Z

Consider adding regression / fault-injection coverage for these paths before merge (or as a follow-up PR). This class of bugs is difficult to catch with normal integration testing since it only manifests under allocator failure or RPC stream exhaustion conditions.

Suggested coverage:

forced nxt_port_msg_alloc() failure paths
simulated stream-id pool exhaustion
verification that mp refcounts return to baseline after error paths
fd leak checks via /proc/self/fd or equivalent harness
validation that completion handlers still execute correctly on enqueue failure

Implementation-wise, this could likely start with targeted nxt_test_ unit coverage plus fault injection via allocation/stream reservation stubs.

Not a merge blocker, but without targeted fault-injection coverage future port-layer refactors could accidentally reintroduce these leaks.

andypost · 2026-05-13T21:01:44Z

Follow-up coverage plan for the allocator/RPC exhaustion paths mentioned in #56 (comment):

Add test-only fault injection around nxt_port_msg_alloc() and RPC stream registration/allocation failures.
Cover cert/script-store sender paths to verify temp config pools are not retained when send setup fails.
Cover cert/script/socket/access-log reply paths to verify failed NXT_PORT_MSG_CLOSE_FD sends close owned fds and still run buffer completions.
Add fd-count checks around the failing paths, using /proc/self/fd where available and skipping where the platform lacks it.
Keep the tests isolated under nxt_test_/test-build plumbing so production code has no runtime behavior change.

I’m treating this as follow-up coverage rather than a merge blocker for 1.35.5, and I’ll start it as a dedicated PR.

andypost · 2026-05-13T21:11:20Z

Started the follow-up coverage work as draft PR #57

It is stacked on PR #56 for now, so the diff includes this branch until #56 lands or the follow-up is retargeted.

Closes the last gap in the phpclub coverage request on freeunitorg#56: > verification that mp refcounts return to baseline after error paths The cert/script-store sender fix moved nxt_mp_retain(mp) to AFTER the successful nxt_port_socket_write() call so a pre-queue failure does not leave the temp config pool with an unbalanced retain. This commit pins that pattern down so a future refactor that re-introduces "retain-before-send" in a new caller (or accidentally undoes the fix) breaks the suite. * Adds nxt_mp_test_retain_count(mp) in src/nxt_mp.c — a tiny NXT_TESTS-gated accessor for the internal retain counter (private to nxt_mp.c otherwise). * Adds nxt_port_fail_test_mp_baseline sub-test that: - creates a transient mp (retain == 1 baseline); - invokes nxt_port_fail_test_sender_pattern() — a deliberate mirror of the audit-fixed nxt_cert_store_get / nxt_script_store_get shape: allocate buf in mp, attempt write, retain only on success; - forces nxt_port_msg_alloc() to fail so the write returns NXT_ERROR before the message reaches the port queue; - asserts retain count == baseline (no retain was bumped); - asserts the completion handler did not run (buf never entered the queue). * Sanity-checked the regression-detection direction: temporarily moving the retain to before nxt_port_socket_write() inside nxt_port_fail_test_sender_pattern() makes the new sub-test fail with the expected message ("retain before 1, after 2"). phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — commit 1 2. simulated stream-id pool exhaustion — commit 1 3. mp refcounts return to baseline after error — THIS commit 4. fd leak checks via /proc/self/fd — commit 1 5. completion handlers execute on enqueue failure — commit 2 Validation: ./configure --tests && make -j2 tests && build/tests — `port failure test passed` line still emitted; all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

andypost · 2026-05-13T21:50:19Z

Coverage map for your review checklist (#56 (comment)), now landed in stacked PR #57:

Your bullet	Where it's tested
forced `nxt_port_msg_alloc()` failure paths	`nxt_port_fail_test_socket_write` — `nxt_port_test_msg_alloc_failures(1)` injection, asserts NXT_ERROR + fd-stays-open + completion-not-invoked + `/proc/self/fd` balanced
simulated stream-id pool exhaustion	`nxt_port_fail_test_rpc_register` — forces both `nxt_port_rpc_register_handler_ex` alloc and lvlhsh-insert failures, asserts `port->use_count == 1` and `rpc_streams` empty
mp refcounts return to baseline after error paths	`nxt_port_fail_test_mp_baseline` (commit 3) — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, asserts `mp->retain` is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2"
fd leak checks via `/proc/self/fd` or equivalent	`nxt_port_fail_test_fd_count()` in the socket_write sub-test — counts FDs before/during/after, asserts balance; gracefully skips on non-Linux
completion handlers execute correctly on enqueue failure	`nxt_port_fail_test_error_handler` (commit 2) — manually queues a send_msg, runs the static `nxt_port_error_handler` via an NXT_TESTS trampoline, drains the fast work queue, asserts the completion was enqueued not called synchronously, and runs exactly once

All four sub-tests pass locally (./configure --tests && make tests -j && build/tests). All five bullets actionable as unit tests; nothing left in the original list. Re-flagging here in case it changes your merge posture on #56.

andypost · 2026-05-15T01:51:35Z

@claude review

andypost · 2026-05-15T14:33:49Z

@claude review once

Some upstreams (Gitea, strict HTTP/1.1 backends) reject Transfer-Encoding: chunked and require Content-Length on forwarded requests. Buffer the chunked body, compute its length, and emit Content-Length while skipping the original TE field via field->skip. Also fix a buffer-stall in nxt_h1p_conn_request_body_read: nxt_http_chunk_parse leaves b->mem.pos unchanged on the CHUNK_MIDDLE path, so the next nxt_conn_read had zero space and the connection hung on bodies larger than body_buffer_size. Reset/compact the buffer between continuations. Test infrastructure: - fake_upstream: Rust HTTP mock with requires-cl, no-te, strict, echo modes for deterministic backend-behavior tests - run-local-temp.sh: fast dev runner (direct mount, no rsync) - run-local.sh: auto-enable clang-ast on C changes, build fake_upstream into image Closes nginx#58 Refs nginx#445, nginx#1088, nginx#1278

Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs nginx#58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.

The 4 fake_upstream-dependent tests in test_proxy_chunked.py failed in CI with FileNotFoundError: '/usr/local/bin/fake_upstream' — the Rust mock binary was never built. Install rust + cargo build for the unit and python-3.x matrix jobs that actually run these tests. Also add a module-level skipif marker so local runs without a Rust toolchain skip the 4 affected tests instead of failing hard. Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs nginx#58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.

phpclub · 2026-05-17T23:26:48Z

This is a well-structured reliability fix focused on IPC failure-path correctness and resource ownership semantics.

What I particularly like in this PR:

nxt_mp_retain() is now performed only after successful handoff preparation to the port layer
fd ownership semantics around nxt_port_socket_write() failures are now explicit and consistent
error paths restore normal completion-handler cleanup behavior
comments clearly document why nxt_fd_close() is required instead of nxt_file_close() after file.name destruction

The patch is narrowly scoped and does not alter the normal success-path behavior, which keeps regression risk relatively low.

One thing worth double-checking before merge:

nxt_work_queue_add(&task->thread->engine->fast_work_queue,
                   out->completion_handler, task, out,
                   out->parent);

This assumes out->completion_handler is always initialized whenever out != NULL. That appears correct in the current flow, but a defensive assertion could make future maintenance safer:

nxt_assert(out->completion_handler != NULL);

I would also consider a follow-up audit of other nxt_port_socket_write() call sites to ensure the same ownership rule is consistently respected across the tree:

ownership transfers to the port layer only on NXT_OK

Overall this looks like a solid maintainer-grade cleanup of several subtle leak paths.

Large chunked requests (>16KB) can span multiple buffers. Previous code only counted first buffer in r->body chain → upstream received truncated body with wrong Content-Length. Iterate over full chain matching pattern in nxt_router.c:5709. Refs nginx#58

andypost · 2026-05-18T16:33:35Z

Reply to phpclub on #56 (comment 4472837908)

Paste body into the PR review thread via the GitHub web UI.

Thanks for the review.

On nxt_assert(out->completion_handler != NULL).
I prefer not to add the assert here. All buffer allocators in
src/nxt_buf.c (lines 43, 72, 108, 131, 151) set
b->completion_handler before returning, and nxt_port_socket.c
already relies on this without a check (lines 217, 670, 1290, 1371).
An assert only in the new error branch would document the invariant in
the safest spot and leave the four hotter sites unchecked. If we want
the assert, the right place is nxt_buf.c at allocation time, not the
call sites. Happy to do that in a separate commit if useful.

On the audit of nxt_port_socket_write() callers.
Agreed. Two updates:

Doc comment over nxt_port_socket_write2 in src/nxt_port.h
(commit df6b00e7) states the rule: on NXT_OK the port layer
takes ownership of fd, fd2, and b; on any other return the
caller still owns them and must close fd/fd2 and run
b->completion_handler.

The audit found one more site with the same shape, also fixed in
df6b00e7: nxt_controller_conf_store at
src/nxt_controller.c:2482. It used (void) around the port write
and leaked the shm fd (with NXT_PORT_MSG_CLOSE_FD) and the
engine-pool buffer in the main process when nxt_port_msg_alloc
fails.

Status of the remaining nxt_port_socket_write() call sites (61
total; most pass fd == -1, b == NULL and carry no ownership):

File:line	Status
`src/nxt_cert.c` (get / get_handler / error reply)	fixed in this PR
`src/nxt_script.c` (get / get_handler)	fixed in this PR
`src/nxt_main_process.c:1173` (socket reply)	fixed in this PR
`src/nxt_main_process.c:1731` (access-log reply)	fixed in this PR
`src/nxt_controller.c:2482` (conf_store)	fixed in this PR
`src/nxt_router_access_log.c:423,571`	ok — tmcf teardown / retain-after-success already in place
`src/nxt_main_process.c:612,653,663,1443`	ok — `fd == -1, b == NULL`
`src/nxt_main_process.c:765` (whoami)	minor: buffer not freed on alloc failure; follow-up
`src/nxt_port_memory.c:883` (SHM_ACK)	ok — `fd == -1, b == NULL`
`src/nxt_controller.c:642,1598,2362`	ok — return value checked, caller pool reclaims

I will open a follow-up issue for the whoami case after merge.

andypost · 2026-05-18T16:38:25Z

squashin commits

The cert/script-store IPC pattern and several main-process reply paths have latent leaks reachable when nxt_port_msg_alloc() (or the RPC stream-id pool) hits malloc failure inside the port machinery. Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp) was issued before nxt_port_socket_write(), so any failure path between the retain and a successful send left the pool with a refcount that the buffer's completion handler (which is what invokes nxt_mp_release) could never run. Moved the retain to after the buffer is handed off to the port machinery so failure paths above no longer pin the pool. Reply / main-process side: cert_store_get_handler, script_store_get_ handler, main_port_socket_handler, main_port_access_log_handler, and nxt_controller_conf_store all called nxt_port_socket_write() with an owned fd (or a buffer in the engine memory pool) and either ignored the return value with a (void) cast or skipped the failure branch entirely. On non-OK return the port layer never takes ownership, so the fd and the buffer's completion handler were both leaked in the privileged main process. Each site now closes the fd explicitly and (where applicable) queues the buffer completion onto the engine fast work queue so the engine pool reclaims memory. Use nxt_fd_close() rather than nxt_file_close() on the cert/script error paths: file.name has already been freed and the latter would dereference it through "%FN" on a close-failure log path. Also documents the ownership contract over nxt_port_socket_write2() in src/nxt_port.h: on NXT_OK, ownership of fd, fd2, and b transfers to the port layer; on any other return, the caller retains ownership and is responsible for closing fd/fd2 and dispatching b's completion handler. Addresses phpclub's review ask on freeunitorg#56. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR nginx#56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_test_rpc_register_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR nginx#57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR nginx#56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_rpc_test_alloc_failures() / nxt_port_rpc_test_insert_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR nginx#57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@ava-affine

* fix(tls): stop SSL_write busy-loop on peer-initiated close SSL_ERROR_SYSCALL(errno=0) and SSL_ERROR_ZERO_RETURN both indicate the peer closed the connection. On the read path this is a clean EOF; on the write path it means no further data can be sent and the loop must terminate. Before this fix the write path fell through to the read-side handler, setting socket.closed=1 and returning success, causing the router to retry SSL_write indefinitely until the event engine timed out. - SSL_ERROR_SYSCALL + WRITE: use sys_err if non-zero, else ECONNRESET - SSL_ERROR_ZERO_RETURN + WRITE: always ECONNRESET Also bumps OpenSSL in CI from 3.6.0 to 3.6.2. Closes #28 test: fix process filter for single-digit PIDs in containers Substring match `main_pid in l` false-positives when unit gets a low PID (e.g. 9) in Docker: "9" matches "2189" in zombie entries. Word-boundary regex \b<pid>\b fixes the check - Remove TIPC (domain 40) deny rule — profile replaces Docker's default entirely when --security-opt is used, so only the targeted AF_ALG block belongs here (single-purpose profile) - README: clarify mitigation ≠ fix, add explicit "not applied automatically" warning, fix verify examples to use latest-python3.13-slim (latest-minimal has no python3) - CI: run seccomp tests against both python:3.13-slim-trixie and ghcr.io/freeunitorg/freeunit:latest-python3.13-slim - test script: drop TIPC test (rule removed), update comment * review: tighten SSL_write skip_alerts, harden write-abort test Address review feedback on the issue #28 fix: * test_tls.py: replace the broad `SSL_write.+failed` skip with the specific syscall/zero-return signatures this test produces, so unrelated SSL_write regressions are not silently masked. * test_tls.py: bump the response body from 1 MB to 16 MB so the server is reliably mid-write when the client tears the connection down on hosts with large autotuned SO_SNDBUF. * conftest.py: pre-compile the main-PID match and use re.escape() for safety; minor cleanup of the per-line search. * nxt_openssl.c: drop the redundant `!= 0` in the ternary to match surrounding style. No functional change to the TLS fix itself. * Update version * feat(http): convert chunked request body to Content-Length Some upstreams (Gitea, strict HTTP/1.1 backends) reject Transfer-Encoding: chunked and require Content-Length on forwarded requests. Buffer the chunked body, compute its length, and emit Content-Length while skipping the original TE field via field->skip. Also fix a buffer-stall in nxt_h1p_conn_request_body_read: nxt_http_chunk_parse leaves b->mem.pos unchanged on the CHUNK_MIDDLE path, so the next nxt_conn_read had zero space and the connection hung on bodies larger than body_buffer_size. Reset/compact the buffer between continuations. Test infrastructure: - fake_upstream: Rust HTTP mock with requires-cl, no-te, strict, echo modes for deterministic backend-behavior tests - run-local-temp.sh: fast dev runner (direct mount, no rsync) - run-local.sh: auto-enable clang-ast on C changes, build fake_upstream into image Closes #58 Refs nginx#445, nginx#1088, nginx#1278 * fix commit * fix(http): use header->mem.end as nxt_sprintf bound Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs #58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD. * ci: build fake_upstream for proxy chunked tests The 4 fake_upstream-dependent tests in test_proxy_chunked.py failed in CI with FileNotFoundError: '/usr/local/bin/fake_upstream' — the Rust mock binary was never built. Install rust + cargo build for the unit and python-3.x matrix jobs that actually run these tests. Also add a module-level skipif marker so local runs without a Rust toolchain skip the 4 affected tests instead of failing hard. Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs #58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD. * fix(http): count full buffer chain for Content-Length after chunked Large chunked requests (>16KB) can span multiple buffers. Previous code only counted first buffer in r->body chain → upstream received truncated body with wrong Content-Length. Iterate over full chain matching pattern in nxt_router.c:5709. Refs #58 * fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths The cert/script-store IPC pattern and several main-process reply paths have latent leaks reachable when nxt_port_msg_alloc() (or the RPC stream-id pool) hits malloc failure inside the port machinery. Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp) was issued before nxt_port_socket_write(), so any failure path between the retain and a successful send left the pool with a refcount that the buffer's completion handler (which is what invokes nxt_mp_release) could never run. Moved the retain to after the buffer is handed off to the port machinery so failure paths above no longer pin the pool. Reply / main-process side: cert_store_get_handler, script_store_get_ handler, main_port_socket_handler, main_port_access_log_handler, and nxt_controller_conf_store all called nxt_port_socket_write() with an owned fd (or a buffer in the engine memory pool) and either ignored the return value with a (void) cast or skipped the failure branch entirely. On non-OK return the port layer never takes ownership, so the fd and the buffer's completion handler were both leaked in the privileged main process. Each site now closes the fd explicitly and (where applicable) queues the buffer completion onto the engine fast work queue so the engine pool reclaims memory. Use nxt_fd_close() rather than nxt_file_close() on the cert/script error paths: file.name has already been freed and the latter would dereference it through "%FN" on a close-failure log path. Also documents the ownership contract over nxt_port_socket_write2() in src/nxt_port.h: on NXT_OK, ownership of fd, fd2, and b transfers to the port layer; on any other return, the caller retains ownership and is responsible for closing fd/fd2 and dispatching b's completion handler. Addresses phpclub's review ask on #56. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(isolation): correct mount comparator and add regression coverage Replace the broken qsort comparator in nxt_isolation_mount_compare() which returned only 0 or 1. ISO C requires a comparator to return negative / zero / positive, so the previous form produced undefined behavior and the resulting mount order varied between runs of the same input — surfacing as flaky failures of test_go_isolation_rootfs_automount_tmpfs (#60). The new comparator sorts on mnt->dst length and uses memcmp() as a tie-breaker for deterministic total ordering. Switching the sort key from mnt->src (no safety meaning) to mnt->dst means parent paths sort before children when they share a prefix, which is the ordering nxt_isolation_prepare_rootfs() actually needs. Test: * Extract the toggle pattern from test_go_isolation_rootfs_automount_tmpfs into the helper _assert_rootfs_tmpfs_toggle_stable so multiple iteration counts share one body. * The stable test runs 20 iterations — sufficient for deterministic regression detection against #60 (the original failure surfaced on the second iteration). * A separate test_go_isolation_rootfs_automount_tmpfs_regression runs 100 iterations as belt-and-suspenders coverage. Fixes #60 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(docker): bump image version to 1.35.5 * feat(eol): add unit-eol-check CLI; sync EOL matrix with endoflife.date New Rust CLI (pkg/eol/) validates pkg/eol.json against the endoflife.date API: detects date mismatches, missed EOL flags, new upstream versions absent from the matrix. Stale dates corrected: - alpine 3.20 2026-11→2026-04, 3.21 2027-11→2026-11 - debian 10 2024-06→2022-09, 11 2026-08→2024-08, 12 2028-06→2026-06, 13 2030-06→2028-08 - ubuntu 26.04 eol 2031-05→2031-04 - perl 5.40 eol 2028-07→2027-06 EOL flags added: go 1.24, node 20, fedora 42, alpine 3.20 New entries: node 26, perl 5.42, ruby 4.0, fedora 44, alpine 3.22/3.23 EOL.md synced to matrix. * ● fix(eol): address review findings in unit-eol-check - Cargo.toml: edition "2024" → "2021" (2024 not yet stable) - api_eol_date: add missing category mappings (node→nodejs, amazonlinux→amazon-linux, centos_stream→centos-stream); without these, API lookups silently returned None for all non-jsc runtimes/OSes - now_yyyy_mm: hard-fail on date cmd error instead of using a hardcoded fallback date that silently corrupts results - --fix mode: remove misleading "output corrected eol.json" comments; output was never JSON, just debug lines - node 26: correct EOL 2028-04→2029-04, drop 2029-04→2030-04, add lts:true (was copy-paste of node 24 dates) - centos_stream 10: correct EOL 2035-05→2030-01 (API mismatch caught by the tool itself after mapping fix) - EOL.md: sync node 26 and centos_stream 10 rows to match * ● docs(todo): add proxy request buffering design notes (issue #58) Refs #58 * fix(isolation): order procfs/tmpfs/bind in mount comparator The previous comparator change (sort by mnt->dst length, memcmp tie-break) was correct C but inadvertently reversed the relative order of procfs and tmpfs in nxt_isolation_set_lang_mounts: Old (broken comparator): procfs, tmpfs, lang_deps... New (dst length asc): tmpfs, procfs, lang_deps... The new order made the test_go_isolation_rootfs_automount_tmpfs flake reliably reproducible on CI's ubuntu-latest runners. Mounting tmpfs at <rootfs>/tmp before mounting procfs at sibling <rootfs>/proc triggers a kernel-side ENOENT on the procfs mount that the procfs- first order avoids. The exact kernel root cause is not yet diagnosed (a local docker reproducer surfaces different errnos than CI), but the order sensitivity is empirically clear. Add a primary sort key on mnt->type so PROC (3) comes before TMP (2) before BIND (1), restoring the previously-working order. The secondary length/memcmp keys are preserved for lang-deps ordering within the BIND group. Tracking the deeper kernel investigation under #60. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(http): add chunk extension support in chunked body parser (RFC 9112 §7.1) Add sw_chunk_ext state to nxt_http_chunk_parse() that skips characters after ';' on the chunk-size line until CR. Allows clients to send chunk extensions (e.g. "4;ext=val\r\ntest\r\n") without causing chunk_error. Extensions are silently discarded per RFC 9112 §7.1. Refs nginx#1278 * test(http): add 7 chunked proxy tests — multi-chunk, empty body, extensions, POST/PUT/DELETE, concurrent New tests in test_proxy_chunked.py: - test_proxy_chunked_multi_chunk: 5 chunks of varying size, exercises memmove compact path in nxt_h1proto.c - test_proxy_chunked_empty_body: 0\r\n\r\n yields Content-Length: 0 - test_proxy_chunked_extensions: chunk extensions (;ext=val) stripped - test_proxy_chunked_post/put/delete: HTTP methods with chunked body - test_proxy_chunked_concurrent: 10 simultaneous requests via threads All tests use fake_upstream strict mode for Content-Length validation. Consolidated _fake_get_raw + _fake_request into single _chunked_request. Refs #58, nginx#445, nginx#1278 * fix: review findings — HTTP parsing timeout, CI cache, comments - test_proxy_chunked: _recvall (100ms race) → _read_http_request with Content-Length parsing; socket timeout 5s - run-local.sh: validate -t/-m arguments - run-local-full.sh: preserve tmp dir on non-zero exit (clang-ast debug) - ci.yml: use module output for condition; add cargo cache; drop +stable - rust-toolchain.toml: pin stable channel - nxt_cert.c, nxt_h1proto.c, nxt_script.c: trim verbose comments - .gitignore: remove Cargo.lock (already tracked) * fix(test): reject flag-like arg for -m/-t in run-local.sh `-t -n` previously consumed `-n` as the test argument instead of erroring. Guard now also fails when next token starts with `-`. Refs PR #63 review * fix(eol): harden unit-eol-check error handling and correct JSC dates Address PR #62 review: API errors silently swallowed, category 4 mapping duplicated, curl ignored HTTP errors, date normalization 5 rejected valid YYYY-MM, approaching-EOL threshold used grace 6 period (36 months) instead of 12. Extract shared api_category(), eliminate duplicate mapping api_eol_date returns Result — callers distinguish network failure from "version not found" curl -f flag: HTTP 4xx/5xx no longer exit 0 Date normalization: len() >= 7 instead of >= 10 Mismatch.fetch_error field distinguishes fetch errors from not-found; exit(2) only on all-fetch-fail detect_new_versions uses api_category() instead of separate hardcoded vec generate_fix warns on Ok(None) and Err, no silent skip as_object().unwrap() → let Some(obj) = ... else { continue } months_between None → error entry instead of unwrap_or(0) Approaching-EOL threshold: 12 months (was grace_months = 36) Mismatch messages include direction: "behind" / "ahead" JSC mapping: jdk → eclipse-temurin (old endpoint 404s) JSC 17: EOL 2029-09 → 2027-10, JSC 21: 2031-09 → 2029-12 (dates were Oracle JDK, not Eclipse Temurin) run.sh: fix empty array under set -u, add EOF newline * rust: upgrade cc crate 1.2.30 -> 1.2.62 (#67) Versions before 1.2.42 fail to compile with rust 1.90 on ppc64. Link: rust-lang/cc-rs#1581 Co-authored-by: sertonix <sertonix@users.noreply.github.com> * Docker: add builder mode, parallel default, reference build stats (#66) * chore(docker): fix make clean, add apt-cacher-ng auto-detection - Makefile: rm -f build-* → rm -f $(addprefix build-, $(MODVERSIONS)) to avoid deleting build-local.sh - build-local.sh: auto-detect apt-cacher-ng on port 3142, pass http_proxy build-arg to skip re-downloading ~100 MB of .deb packages on repeated builds - README.md: document apt-cacher-ng setup * chore(docker): add builder mode, parallel default, reference build stats - -b flag: pre-built builder images skip apt+Rust download for minimal/wasm/php8.5 variants (~170 MB saved per build) - default parallelism: nproc/2 instead of 1 - local/Dockerfile.{minimal,wasm,php8.5}: multi-stage variants for -b - Dockerfile.builder-{trixie,php8.5}: builder image sources - README: document builder mode, add reference build timings (Ryzen 7 5700X, 32 GiB, NVMe — 23 variants in ~58 min) - TODO: OpenTelemetry 0.24→0.32 upgrade plan with fake_otlp design * Fix ignore * TODO.md — expanded the OTEL crate upgrade section (issue #65, milestone 1.35.6): - Added crate version table (0.24→0.32 gap per crate) - Documented Ava Hahn's authorship and new contact (@ava-affine, ava@sunnypup.io) - Replaced flat task list with a phased plan: - Phase 0a: audit existing config vs OpenAPI spec - Phase 0b: new config fields (service_name, headers, root_certificate, resource_attributes, max_queue_size, export_timeout) — backward-compatible on current crates - Phase 1: fake_otlp test infrastructure (mirrors fake_upstream pattern) - Phase 2: crate rewrite (0.24→0.32) - Phase 3: housekeeping (rename, cleanup) - Added current 4-field config JSON for reference - Listed hardcoded values that must become configurable with file/line references * chore(docker): rebrand Makefile library target, add release roadmap Replaced stale nginx/unit references in `make library` (Maintainers, GitRepo, GitFetch) with FreeUnit equivalents. Commented out the target with TODO for milestone 1.35.8 (Docker Hub Official Images). Added release roadmap and action plan to TODO.md covering milestones 1.35.5 through 1.35.8 (2026-08-28). * fix(docker): ldd glob guard, proxy probe, builder race; add CentOS/AlmaLinux guide Docker fixes: - Guard ldd loop with `[ -f "$f" ] || continue` — bare glob expands to literal string when no .so files exist, aborting requirements.apt - Quote `"$f"` in ldd call (template + all Dockerfile.* + local/) - Replace `nc -z` apt-proxy probe with bash /dev/tcp (nc absent in minimal envs); move detection after arg parsing so -n skips probe - Propagate ensure_builder failure: add `|| return 1` in build_variant - Serialize builder pre-build before parallel loop — eliminates race where minimal+wasm simultaneously pull/build the shared trixie image Community docs: - New pkg/community/CENTOS.ALMALINUX.md: CentOS Stream 9 / AlmaLinux 9 build guide (three PHP options, systemd unit, migration from nginx/unit) - Prerequisites: add `dnf install -y epel-release` — brotli-devel is EPEL-only on EL9 - Option B: add `dnf config-manager --enable remi` before unit-php — dnf module enable does not activate the non-modular remi repo * fix(docker): fix ldconfig --------- Co-authored-by: Alexandr Smirnov <a.smirnov@dwteam.ru> * chore(release): prep 1.35.5 docs and unitctl metadata - docs/changes.xml: add 1.35.5 entries (chunked→CL, TLS busy-loop, IPC leaks, mount comparator, njs 0.9.8, unfreeze-sync.sh) - docs/unit-openapi.yaml: title 1.35.2 → 1.35.5 - unitctl/*/Cargo.toml: version 1.35.0 → 1.35.5 - unit-openapi: authors/description/edition/docs URL → FreeUnit - unitctl: deb/rpm metadata reference FreeUnit - README: socket path nginx-unit.control.sock → control.unit.sock --------- Co-authored-by: Alexandr Smirnov <a.smirnov@dwteam.ru> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Andy Postnikov <apostnikov@gmail.com> Co-authored-by: Sertonix <sertonix@posteo.net> Co-authored-by: sertonix <sertonix@users.noreply.github.com>

Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR nginx#56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_rpc_test_alloc_failures() / nxt_port_rpc_test_insert_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR nginx#57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56. Sub-tests: * socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion not invoked, /proc/self/fd balanced (Linux only). * rpc_register — forces nxt_port_rpc_register_handler_ex alloc and lvlhsh-insert failures; asserts use_count == 1, rpc_streams empty. * error_handler — queues a send_msg with a closeable fd + non-sync buffer, runs nxt_port_error_handler() via an NXT_TESTS trampoline, drains the fast work queue, asserts completion enqueued (not called synchronously) and runs exactly once. * mp_baseline — mirrors the audit-fixed sender shape, forces a pre-queue failure, asserts mp->retain is unchanged. NXT_TESTS-gated trampolines for the harness: nxt_port_test_msg_alloc_ failures()/nxt_port_test_run_error_handler() (nxt_port_socket.c), nxt_port_rpc_test_alloc_failures()/_insert_failures() (nxt_port_rpc.c), nxt_mp_test_retain_count() (nxt_mp.c). Test buffers are heap-owned from a transient mp, matching production callers (nxt_cert_store_get allocates from temp_conf mp). The injected stack engine documents its invariant: the exercised error path must touch nothing beyond fast_work_queue. Also resets file.fd to -1 after nxt_fd_close on the send-fail path in nxt_cert.c and nxt_script.c — defense in depth against an accidental double-close if code is later added below the close. No behavior change. Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tests Stacked follow-up for #56 and the coverage plan in #56 (comment). Adds test-only fault injection for IPC message allocation and RPC registration failures, then covers core ownership invariants: failed queued nxt_port_socket_write() leaves the caller-owned fd open before ownership transfer; the unsent buffer completion is not run by the port layer; /proc/self/fd counts stay balanced where available; failed RPC registration allocation and insert leave use_count unchanged and no stream registration behind. The branch is based on andypost:claude/port-ipc-completion-leaks / PR #56, so it should be retargeted or rebased once #56 lands. Validation: make -j2; ./configure --tests; make tests -j2; build/tests

andypost changed the base branch from master to pre-1.35.5 May 7, 2026 23:44

andypost requested a review from phpclub May 7, 2026 23:45

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/nxt_main_process.c

andypost force-pushed the claude/port-ipc-completion-leaks branch from 5a9f37d to c88ede1 Compare May 7, 2026 23:51

andypost mentioned this pull request May 7, 2026

fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths andypost/unit#8

Draft

4 tasks

phpclub reviewed May 13, 2026

View reviewed changes

Comment thread CHANGES

andypost mentioned this pull request May 13, 2026

test(port): add IPC failure injection coverage #57

Merged

andypost mentioned this pull request May 14, 2026

docs(security): add 14-vector audit + remediation tracker andypost/unit#10

Open

2 tasks

a.smirnov added 4 commits May 18, 2026 00:39

fix commit

57f87ed

phpclub added this to the 1.35.5 milestone May 17, 2026

phpclub added the z-bug 🐞 Bug report label May 17, 2026

andypost force-pushed the claude/port-ipc-completion-leaks branch from df6b00e to 71fcd89 Compare May 18, 2026 16:40

phpclub merged commit ba8edcc into freeunitorg:pre-1.35.5 May 19, 2026
21 checks passed

andypost mentioned this pull request May 19, 2026

Flaky test: test_go_isolation_rootfs_automount_tmpfs — procfs mount fails with ENOENT after tmpfs automount enabled #60

Closed

freeunitx mentioned this pull request May 24, 2026

Router process CPU spin (100%) causing complete service hang #28

Closed

Uh oh!

Conversation

andypost commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

andypost commented May 8, 2026

Uh oh!

Uh oh!

phpclub commented May 13, 2026

Uh oh!

andypost commented May 13, 2026

Uh oh!

andypost commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andypost commented May 13, 2026

Uh oh!

andypost commented May 15, 2026

Uh oh!

andypost commented May 15, 2026

Uh oh!

phpclub commented May 17, 2026

Uh oh!

andypost commented May 18, 2026

Reply to phpclub on #56 (comment 4472837908)

Uh oh!

andypost commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andypost commented May 7, 2026 •

edited

Loading

andypost commented May 13, 2026 •

edited

Loading