Skip to content

Pr57 conflict check#74

Closed
phpclub wants to merge 15 commits into
pre-1.35.6from
pr57-conflict-check
Closed

Pr57 conflict check#74
phpclub wants to merge 15 commits into
pre-1.35.6from
pr57-conflict-check

Conversation

@phpclub

@phpclub phpclub commented Jun 3, 2026

Copy link
Copy Markdown

Proposed changes

merge(test): merge PR #57 IPC failure injection with pre-1.35.6 fixes

Resolves all 12 conflicts; pre-1.35.6 wins on every contested file:

  • version/CHANGES/TODO.md: keep 1.35.6 content
  • ci.yml: keep Cargo cache + fake_otlp build (drop curl|sh)
  • fake_upstream/src/main.rs: keep extended modes + chunk-size error fix
  • run-local-full.sh/-local.sh: keep otel-enabled, proper cleanup, arg validation
  • test_proxy_chunked.py: keep _read_http_request + 12 tests vs PR's 6
  • .gitignore: drop PR's redundant /.qwen/ entries and tracked Cargo.lock
  • nxt_cert.c/nxt_script.c: HEAD comment + PR's file.fd=-1 sentinel

Closes #57 (pending)

Checklist

  • I have read CONTRIBUTING.md
  • If applicable, I have added tests
  • If applicable, I have updated documentation

Alexandr Smirnov and others added 15 commits May 7, 2026 22:00
                                                                                   SSL_ERROR_SYSCALL(errno=0) and SSL_ERROR_ZERO_RETURN both indicate
                                                                                   the peer closed the connection. On the read path this is a clean EOF;
                                                                                   on the write path it means no further data can be sent and the loop
                                                                                   must terminate.

                                                                                   Before this fix the write path fell through to the read-side handler,
                                                                                   setting socket.closed=1 and returning success, causing the router to
                                                                                   retry SSL_write indefinitely until the event engine timed out.

                                                                                   - SSL_ERROR_SYSCALL + WRITE: use sys_err if non-zero, else ECONNRESET
                                                                                   - SSL_ERROR_ZERO_RETURN + WRITE: always ECONNRESET

                                                                                   Also bumps OpenSSL in CI from 3.6.0 to 3.6.2.

                                                                                   Closes #28                                        test: fix process filter for single-digit PIDs in containers

                                                                                                                                     Substring match `main_pid in l` false-positives when unit gets a
                                                                                                                                     low PID (e.g. 9) in Docker: "9" matches "2189" in zombie entries.
                                                                                                                                     Word-boundary regex \b<pid>\b fixes the check
  - Remove TIPC (domain 40) deny rule — profile replaces Docker's
    default entirely when --security-opt is used, so only the
    targeted AF_ALG block belongs here (single-purpose profile)
  - README: clarify mitigation ≠ fix, add explicit "not applied
    automatically" warning, fix verify examples to use
    latest-python3.13-slim (latest-minimal has no python3)
  - CI: run seccomp tests against both python:3.13-slim-trixie and
    ghcr.io/freeunitorg/freeunit:latest-python3.13-slim
  - test script: drop TIPC test (rule removed), update comment
Address review feedback on the issue #28 fix:

* test_tls.py: replace the broad `SSL_write.+failed` skip with the
  specific syscall/zero-return signatures this test produces, so
  unrelated SSL_write regressions are not silently masked.
* test_tls.py: bump the response body from 1 MB to 16 MB so the
  server is reliably mid-write when the client tears the
  connection down on hosts with large autotuned SO_SNDBUF.
* conftest.py: pre-compile the main-PID match and use re.escape()
  for safety; minor cleanup of the per-line search.
* nxt_openssl.c: drop the redundant `!= 0` in the ternary to match
  surrounding style.

No functional change to the TLS fix itself.
Address review feedback on the issue #28 fix:

test_tls.py: replace the broad SSL_write.+failed skip with the specific syscall/zero-return signatures this test produces, so unrelated SSL_write regressions are not silently masked.
test_tls.py: bump the response body from 1 MB to 16 MB so the server is reliably mid-write when the client tears the connection down on hosts with large autotuned SO_SNDBUF.
conftest.py: pre-compile the main-PID match and use re.escape() for safety; minor cleanup of the per-line search.
nxt_openssl.c: drop the redundant != 0 in the ternary to match surrounding style.
No functional change to the TLS fix itself.
fix(tls): stop SSL_write busy-loop on peer-initiated close
  Some upstreams (Gitea, strict HTTP/1.1 backends) reject
  Transfer-Encoding: chunked and require Content-Length on
  forwarded requests. Buffer the chunked body, compute its
  length, and emit Content-Length while skipping the original
  TE field via field->skip.

  Also fix a buffer-stall in nxt_h1p_conn_request_body_read:
  nxt_http_chunk_parse leaves b->mem.pos unchanged on the
  CHUNK_MIDDLE path, so the next nxt_conn_read had zero space
  and the connection hung on bodies larger than body_buffer_size.
  Reset/compact the buffer between continuations.

  Test infrastructure:
  - fake_upstream: Rust HTTP mock with requires-cl, no-te,
    strict, echo modes for deterministic backend-behavior tests
  - run-local-temp.sh: fast dev runner (direct mount, no rsync)
  - run-local.sh: auto-enable clang-ast on C changes,
    build fake_upstream into image

  Closes #58
  Refs nginx#445, nginx#1088, nginx#1278
  Idiomatic upper bound from header buffer end instead of
  pre-padded p + NXT_OFF_T_LEN.  Same safety, less coupling.

  Refs #58

  test(fake_upstream): handle RFC 9112 chunk extensions

  Strip chunk-ext (5;ext=val) before hex parse so a future
  client emitting extensions does not hit unwrap_or(0).

  build(test): split clang-ast into run-local-full.sh

  run-local.sh --clang-ast was broken: Xclang plugin loaded
  during ./configure feature-detection trips "no atomic
  operations found" before make even starts.

  Move clang-ast to a dedicated runner with its own image
  (freeunit-test-full:local, debian:testing for clang 21).
  run-local.sh now focuses on the test fast-path only and
  auto-prefixes bare pytest node-ids with test/.

  run-local-temp.sh dropped — same fast-path is in run-local.sh.
  TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
  The 4 fake_upstream-dependent tests in test_proxy_chunked.py failed
  in CI with FileNotFoundError: '/usr/local/bin/fake_upstream' — the
  Rust mock binary was never built. Install rust + cargo build for the
  unit and python-3.x matrix jobs that actually run these tests.

  Also add a module-level skipif marker so local runs without a Rust
  toolchain skip the 4 affected tests instead of failing hard.

  Idiomatic upper bound from header buffer end instead of
  pre-padded p + NXT_OFF_T_LEN.  Same safety, less coupling.

  Refs #58

  test(fake_upstream): handle RFC 9112 chunk extensions

  Strip chunk-ext (5;ext=val) before hex parse so a future
  client emitting extensions does not hit unwrap_or(0).

  build(test): split clang-ast into run-local-full.sh

  run-local.sh --clang-ast was broken: Xclang plugin loaded
  during ./configure feature-detection trips "no atomic
  operations found" before make even starts.

  Move clang-ast to a dedicated runner with its own image
  (freeunit-test-full:local, debian:testing for clang 21).
  run-local.sh now focuses on the test fast-path only and
  auto-prefixes bare pytest node-ids with test/.

  run-local-temp.sh dropped — same fast-path is in run-local.sh.
  TODO.md split clang-ast status: OpenSSL 1.1 PASSED, 3.6 TBD.
Large chunked requests (>16KB) can span multiple buffers. Previous code
only counted first buffer in r->body chain → upstream received truncated
body with wrong Content-Length.
Iterate over full chain matching pattern in nxt_router.c:5709.
Refs #58
The cert/script-store IPC pattern and several main-process reply paths
have latent leaks reachable when nxt_port_msg_alloc() (or the RPC
stream-id pool) hits malloc failure inside the port machinery.

Sender side (nxt_cert_store_get, nxt_script_store_get): nxt_mp_retain(mp)
was issued before nxt_port_socket_write(), so any failure path between
the retain and a successful send left the pool with a refcount that the
buffer's completion handler (which is what invokes nxt_mp_release) could
never run.  Moved the retain to after the buffer is handed off to the
port machinery so failure paths above no longer pin the pool.

Reply / main-process side: cert_store_get_handler, script_store_get_
handler, main_port_socket_handler, main_port_access_log_handler, and
nxt_controller_conf_store all called nxt_port_socket_write() with an
owned fd (or a buffer in the engine memory pool) and either ignored the
return value with a (void) cast or skipped the failure branch entirely.
On non-OK return the port layer never takes ownership, so the fd and
the buffer's completion handler were both leaked in the privileged main
process.  Each site now closes the fd explicitly and (where applicable)
queues the buffer completion onto the engine fast work queue so the
engine pool reclaims memory.

Use nxt_fd_close() rather than nxt_file_close() on the cert/script
error paths: file.name has already been freed and the latter would
dereference it through "%FN" on a close-failure log path.

Also documents the ownership contract over nxt_port_socket_write2() in
src/nxt_port.h: on NXT_OK, ownership of fd, fd2, and b transfers to the
port layer; on any other return, the caller retains ownership and is
responsible for closing fd/fd2 and dispatching b's completion handler.
Addresses phpclub's review ask on #56.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths
Adds a fault-injection harness under src/test/nxt_port_fail_test.c
covering the audit-fixed IPC paths from #56 and
addressing phpclub's 5-bullet coverage request.

Sub-tests:

* nxt_port_fail_test_socket_write — forces nxt_port_msg_alloc()
  failure with fd != -1 and a non-sync buf; asserts NXT_ERROR,
  fd stays open, completion is not invoked, and /proc/self/fd
  balanced (Linux only; skipped elsewhere).

* nxt_port_fail_test_rpc_register — forces both
  nxt_port_rpc_register_handler_ex alloc failure and the
  lvlhsh-insert failure path; asserts port->use_count == 1 and
  rpc_streams empty.

* nxt_port_fail_test_error_handler — manually queues a send_msg
  with a closeable fd and a non-sync buffer, runs the static
  nxt_port_error_handler() via an NXT_TESTS-gated trampoline,
  drains the fast work queue, and asserts the buffer completion
  was enqueued (not called synchronously) and runs exactly once.
  Closes the queued-then-failed shape that PR #56's audit fix
  reordered (close fd first, queue buffer completion via the fast
  work queue second — matching nxt_port_error_handler's existing
  discipline at src/nxt_port_socket.c:1378-1396).

* nxt_port_fail_test_mp_baseline — mirrors the audit-fixed sender
  shape (allocate buf in mp, send, retain only on success), forces
  a pre-queue failure, and asserts mp->retain is unchanged.
  Sanity-checked: moving the retain back to before the write makes
  the test report "retain before 1, after 2", so the regression
  signal is intact.

Public NXT_TESTS-gated trampolines added for the harness:

* nxt_port_test_msg_alloc_failures() + nxt_port_test_run_error_
  handler() in src/nxt_port_socket.c.
* nxt_port_rpc_test_alloc_failures() / nxt_port_rpc_test_insert_failures() in src/nxt_port_rpc.c.
* nxt_mp_test_retain_count() in src/nxt_mp.c (exposes the
  otherwise-private retain counter for the baseline assertion).

Per Gemini's PR #57 review at src/test/nxt_port_fail_test.c:79,
test buffers are heap-owned from a transient mp rather than
stack-allocated, matching how production callers wire them
(nxt_cert_store_get allocates from the temp_conf mp — the very
pattern this suite exercises).  Each sub-test's failure ladder is
anchored on ownership-tracking variables so any goto-fail releases
everything in reverse order.

phpclub's 5-bullet list is now covered:

  1. forced nxt_port_msg_alloc() failure paths       — socket_write
  2. simulated stream-id pool exhaustion             — rpc_register
  3. mp refcounts return to baseline after error     — mp_baseline
  4. fd leak checks via /proc/self/fd                — socket_write
  5. completion handlers execute on enqueue failure  — error_handler

Validation: ./configure --tests && make tests -j && build/tests —
all four sub-tests green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Gemini's PR #57 review note at src/nxt_cert.c:1246: even
though the function returns immediately after the close, resetting
file.fd to -1 is defense in depth against an accidental
re-introduction of code below the close path that would otherwise
double-close.

Applies symmetrically in src/nxt_script.c — the script-store
handler has the same shape as the cert-store handler.

No behavior change today.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All conflicts resolved taking pre-1.35.6 (HEAD):
- version/CHANGES/TODO.md: keep 1.35.6 content (PR branched from 1.35.5)
- ci.yml: keep HEAD's cached Cargo + fake_otlp build (PR had curl|sh, no cache)
- fake_upstream/src/main.rs: keep HEAD (more modes, proper chunk-size error handling)
- run-local-full.sh: keep HEAD (otel-enabled, remote image pull, proper cleanup)
- run-local.sh: keep HEAD (arg validation, fake_otlp build step)
- test_proxy_chunked.py: keep HEAD (_read_http_request, threading, 12 tests vs 6)
- .gitignore: keep HEAD (drop PR's redundant /.qwen/ entries + Cargo.lock that
  is already tracked in the repo)
- nxt_cert.c / nxt_script.c: keep HEAD comment style; take PR's file.fd=-1 sentinel
- nxt_h1proto.c: keep HEAD's shorter comment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive port failure test suite (nxt_port_fail_test.c) to verify error handling and memory pool retention invariants when socket writes or RPC registrations fail. It adds several test-only hooks and failure injection helpers across nxt_port, nxt_port_rpc, and nxt_mp. Additionally, it fixes potential double-close or stale file descriptor issues in nxt_cert.c and nxt_script.c by explicitly setting file.fd = -1 after closing. No review comments were provided, so there is no further feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@phpclub phpclub closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants