Pr57 conflict check#74
Conversation
SSL_ERROR_SYSCALL(errno=0) and SSL_ERROR_ZERO_RETURN both indicate
the peer closed the connection. On the read path this is a clean EOF;
on the write path it means no further data can be sent and the loop
must terminate.
Before this fix the write path fell through to the read-side handler,
setting socket.closed=1 and returning success, causing the router to
retry SSL_write indefinitely until the event engine timed out.
- SSL_ERROR_SYSCALL + WRITE: use sys_err if non-zero, else ECONNRESET
- SSL_ERROR_ZERO_RETURN + WRITE: always ECONNRESET
Also bumps OpenSSL in CI from 3.6.0 to 3.6.2.
Closes #28 test: fix process filter for single-digit PIDs in containers
Substring match `main_pid in l` false-positives when unit gets a
low PID (e.g. 9) in Docker: "9" matches "2189" in zombie entries.
Word-boundary regex \b<pid>\b fixes the check
- Remove TIPC (domain 40) deny rule — profile replaces Docker's
default entirely when --security-opt is used, so only the
targeted AF_ALG block belongs here (single-purpose profile)
- README: clarify mitigation ≠ fix, add explicit "not applied
automatically" warning, fix verify examples to use
latest-python3.13-slim (latest-minimal has no python3)
- CI: run seccomp tests against both python:3.13-slim-trixie and
ghcr.io/freeunitorg/freeunit:latest-python3.13-slim
- test script: drop TIPC test (rule removed), update comment
Address review feedback on the issue #28 fix: * test_tls.py: replace the broad `SSL_write.+failed` skip with the specific syscall/zero-return signatures this test produces, so unrelated SSL_write regressions are not silently masked. * test_tls.py: bump the response body from 1 MB to 16 MB so the server is reliably mid-write when the client tears the connection down on hosts with large autotuned SO_SNDBUF. * conftest.py: pre-compile the main-PID match and use re.escape() for safety; minor cleanup of the per-line search. * nxt_openssl.c: drop the redundant `!= 0` in the ternary to match surrounding style. No functional change to the TLS fix itself.
Address review feedback on the issue #28 fix: test_tls.py: replace the broad SSL_write.+failed skip with the specific syscall/zero-return signatures this test produces, so unrelated SSL_write regressions are not silently masked. test_tls.py: bump the response body from 1 MB to 16 MB so the server is reliably mid-write when the client tears the connection down on hosts with large autotuned SO_SNDBUF. conftest.py: pre-compile the main-PID match and use re.escape() for safety; minor cleanup of the per-line search. nxt_openssl.c: drop the redundant != 0 in the ternary to match surrounding style. No functional change to the TLS fix itself.
fix(tls): stop SSL_write busy-loop on peer-initiated close
Some upstreams (Gitea, strict HTTP/1.1 backends) reject
Transfer-Encoding: chunked and require Content-Length on
forwarded requests. Buffer the chunked body, compute its
length, and emit Content-Length while skipping the original
TE field via field->skip.
Also fix a buffer-stall in nxt_h1p_conn_request_body_read:
nxt_http_chunk_parse leaves b->mem.pos unchanged on the
CHUNK_MIDDLE path, so the next nxt_conn_read had zero space
and the connection hung on bodies larger than body_buffer_size.
Reset/compact the buffer between continuations.
Test infrastructure:
- fake_upstream: Rust HTTP mock with requires-cl, no-te,
strict, echo modes for deterministic backend-behavior tests
- run-local-temp.sh: fast dev runner (direct mount, no rsync)
- run-local.sh: auto-enable clang-ast on C changes,
build fake_upstream into image
Closes #58
Refs nginx#445, nginx#1088, nginx#1278
Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs #58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
The 4 fake_upstream-dependent tests in test_proxy_chunked.py failed in CI with FileNotFoundError: '/usr/local/bin/fake_upstream' — the Rust mock binary was never built. Install rust + cargo build for the unit and python-3.x matrix jobs that actually run these tests. Also add a module-level skipif marker so local runs without a Rust toolchain skip the 4 affected tests instead of failing hard. Idiomatic upper bound from header buffer end instead of pre-padded p + NXT_OFF_T_LEN. Same safety, less coupling. Refs #58 test(fake_upstream): handle RFC 9112 chunk extensions Strip chunk-ext (5;ext=val) before hex parse so a future client emitting extensions does not hit unwrap_or(0). build(test): split clang-ast into run-local-full.sh run-local.sh --clang-ast was broken: Xclang plugin loaded during ./configure feature-detection trips "no atomic operations found" before make even starts. Move clang-ast to a dedicated runner with its own image (freeunit-test-full:local, debian:testing for clang 21). run-local.sh now focuses on the test fast-path only and auto-prefixes bare pytest node-ids with test/. run-local-temp.sh dropped — same fast-path is in run-local.sh. TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
Large chunked requests (>16KB) can span multiple buffers. Previous code only counted first buffer in r->body chain → upstream received truncated body with wrong Content-Length. Iterate over full chain matching pattern in nxt_router.c:5709. Refs #58
The cert/script-store IPC pattern and several main-process reply paths have latent leaks reachable when nxt_port_msg_alloc() (or the RPC stream-id pool) hits malloc failure inside the port machinery. Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp) was issued before nxt_port_socket_write(), so any failure path between the retain and a successful send left the pool with a refcount that the buffer's completion handler (which is what invokes nxt_mp_release) could never run. Moved the retain to after the buffer is handed off to the port machinery so failure paths above no longer pin the pool. Reply / main-process side: cert_store_get_handler, script_store_get_ handler, main_port_socket_handler, main_port_access_log_handler, and nxt_controller_conf_store all called nxt_port_socket_write() with an owned fd (or a buffer in the engine memory pool) and either ignored the return value with a (void) cast or skipped the failure branch entirely. On non-OK return the port layer never takes ownership, so the fd and the buffer's completion handler were both leaked in the privileged main process. Each site now closes the fd explicitly and (where applicable) queues the buffer completion onto the engine fast work queue so the engine pool reclaims memory. Use nxt_fd_close() rather than nxt_file_close() on the cert/script error paths: file.name has already been freed and the latter would dereference it through "%FN" on a close-failure log path. Also documents the ownership contract over nxt_port_socket_write2() in src/nxt_port.h: on NXT_OK, ownership of fd, fd2, and b transfers to the port layer; on any other return, the caller retains ownership and is responsible for closing fd/fd2 and dispatching b's completion handler. Addresses phpclub's review ask on #56. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths
Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from #56 and addressing phpclub's 5-bullet coverage request. Sub-tests: * nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion is not invoked, and /proc/self/fd balanced (Linux only; skipped elsewhere). * nxt_port_fail_test_rpc_register — forces both nxt_port_rpc_register_handler_ex alloc failure and the lvlhsh-insert failure path; asserts port->use_count == 1 and rpc_streams empty. * nxt_port_fail_test_error_handler — manually queues a send_msg with a closeable fd and a non-sync buffer, runs the static nxt_port_error_handler() via an NXT_TESTS-gated trampoline, drains the fast work queue, and asserts the buffer completion was enqueued (not called synchronously) and runs exactly once. Closes the queued-then-failed shape that PR #56's audit fix reordered (close fd first, queue buffer completion via the fast work queue second — matching nxt_port_error_handler's existing discipline at src/nxt_port_socket.c:1378-1396). * nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender shape (allocate buf in mp, send, retain only on success), forces a pre-queue failure, and asserts mp->retain is unchanged. Sanity-checked: moving the retain back to before the write makes the test report "retain before 1, after 2", so the regression signal is intact. Public NXT_TESTS-gated trampolines added for the harness: * nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_ handler() in src/nxt_port_socket.c. * nxt_port_rpc_test_alloc_failures() / nxt_port_rpc_test_insert_failures() in src/nxt_port_rpc.c. * nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the otherwise-private retain counter for the baseline assertion). Per Gemini's PR #57 review at src/test/nxt_port_fail_test.c:79, test buffers are heap-owned from a transient mp rather than stack-allocated, matching how production callers wire them (nxt_cert_store_get allocates from the temp_conf mp — the very pattern this suite exercises). Each sub-test's failure ladder is anchored on ownership-tracking variables so any goto-fail releases everything in reverse order. phpclub's 5-bullet list is now covered: 1. forced nxt_port_msg_alloc() failure paths — socket_write 2. simulated stream-id pool exhaustion — rpc_register 3. mp refcounts return to baseline after error — mp_baseline 4. fd leak checks via /proc/self/fd — socket_write 5. completion handlers execute on enqueue failure — error_handler Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Gemini's PR #57 review note at src/nxt_cert.c:1246: even though the function returns immediately after the close, resetting file.fd to -1 is defense in depth against an accidental re-introduction of code below the close path that would otherwise double-close. Applies symmetrically in src/nxt_script.c — the script-store handler has the same shape as the cert-store handler. No behavior change today. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All conflicts resolved taking pre-1.35.6 (HEAD): - version/CHANGES/TODO.md: keep 1.35.6 content (PR branched from 1.35.5) - ci.yml: keep HEAD's cached Cargo + fake_otlp build (PR had curl|sh, no cache) - fake_upstream/src/main.rs: keep HEAD (more modes, proper chunk-size error handling) - run-local-full.sh: keep HEAD (otel-enabled, remote image pull, proper cleanup) - run-local.sh: keep HEAD (arg validation, fake_otlp build step) - test_proxy_chunked.py: keep HEAD (_read_http_request, threading, 12 tests vs 6) - .gitignore: keep HEAD (drop PR's redundant /.qwen/ entries + Cargo.lock that is already tracked in the repo) - nxt_cert.c / nxt_script.c: keep HEAD comment style; take PR's file.fd=-1 sentinel - nxt_h1proto.c: keep HEAD's shorter comment
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive port failure test suite (nxt_port_fail_test.c) to verify error handling and memory pool retention invariants when socket writes or RPC registrations fail. It adds several test-only hooks and failure injection helpers across nxt_port, nxt_port_rpc, and nxt_mp. Additionally, it fixes potential double-close or stale file descriptor issues in nxt_cert.c and nxt_script.c by explicitly setting file.fd = -1 after closing. No review comments were provided, so there is no further feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Proposed changes
merge(test): merge PR #57 IPC failure injection with pre-1.35.6 fixes
Resolves all 12 conflicts; pre-1.35.6 wins on every contested file:
Closes #57 (pending)
Checklist