test(port): add IPC failure injection coverage#57
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses memory pool retain leaks and file descriptor/buffer leaks that occur when port message allocation or socket writing fails. Key changes include deferring memory pool retention until after successful handoff to the port machinery and ensuring that file descriptors and sockets are explicitly closed if a write operation fails in the certificate, script, and main process handlers. Additionally, a new test suite and failure injection mechanisms have been introduced to verify these error paths. I have no feedback to provide as there were no review comments to evaluate.
There was a problem hiding this comment.
Code Review
This pull request addresses memory pool and file descriptor leaks that occur when port message allocation fails during IPC operations. Key changes include moving memory pool retention to after successful handoff and ensuring that file descriptors and buffers are explicitly closed or completed upon port write failures. A new test suite was added to verify these error paths. Feedback suggests improving defensive coding by resetting file descriptors after closure and using heap-allocated buffers in tests to ensure robustness.
|
Added a second commit
Shape:
Local validation: $ ./configure --tests && make -j2 tests && build/tests
…
tests: [notice] port failure test started
tests: [notice] port failure test passed
tests: [notice] clone creds test passedAll three sub-tests green (socket_write / rpc_register / error_handler). Ready for re-review. |
|
Added commit Mirrors the audit-fixed Sanity-checked the regression detection by temporarily moving the retain to before Local Also noting Gemini's two comments on the earlier commits:
|
|
Two more commits addressing Gemini's review nits:
All four sub-tests still green ( Stack of PR #57 now (top → bottom): |
|
Excellent follow-up to #56. This PR does much more than patch individual leak paths — it formalizes IPC/port ownership semantics through deterministic fault-injection testing. Strong points: fault injection is compile-gated under #if (NXT_TESTS) and does not pollute production fast paths, The new nxt_port_fail_test.c coverage is especially strong because it validates invariants, not only API results: fd ownership preservation, This significantly improves long-term maintainability of the port subsystem. Comment 1 — retain ordering The nxt_mp_retain(mp) move is absolutely correct. Before: retain() After: send() This properly matches the ownership model where the completion handler only becomes guaranteed after successful queue handoff. Good fix. Comment 2 — fd ownership semantics Very good clarification of ownership boundaries on: nxt_port_socket_write() != NXT_OK The explicit close paths correctly assume: the port layer never took ownership, This was previously implicit behavior; now it is documented and enforced consistently. Comment 3 — nxt_fd_close() usage Good catch using nxt_fd_close() instead of nxt_file_close() after file.name lifetime ended. The comment explaining the %FN dereference risk is useful future-maintainer documentation. Comment 4 — fault injection design The deterministic failure injection approach is very strong. Testing allocator-failure behavior via: nxt_port_test_msg_alloc_failures() is significantly better than relying on artificial OOM environments or probabilistic failures. This makes regression testing reproducible and CI-friendly. Comment 5 — async cleanup verification The nxt_port_fail_test_error_handler() coverage is particularly valuable. It validates: queued message cleanup, Testing the deferred cleanup path directly through: nxt_port_test_run_error_handler() is a very solid approach. Minor Non-Blocking Suggestion The injected failure counters are currently global statics: static nxt_uint_t nxt_port_test_msg_alloc_failure_count; This is perfectly fine for the current single-threaded test model, but if parallel test execution ever appears in the future, thread-local or engine-local injection state may become preferable. Not a blocker. |
efd0d2d to
9c8a1ea
Compare
|
sqwuashed and rebased |
9c8a1ea to
c0817e1
Compare
|
@codex review once |
|
filed follow-up #60 |
|
@chatgpt-codex-connector review once |
|
To use Codex here, create an environment for this repo. |
|
@codex review once |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Summary: |
|
@claude review once |
|
PR #57 review — test(port): IPC failure injection. Solid coverage, clean teardown. Findings: Production changes (cert/script):
src/test/nxt_port_fail_test.c:
src/nxt_port_rpc.h:
❓ q: nxt_port_rpc_register_handler_ex alloc-fail injection returns NULL before any peer use_count bump — test asserts Verdict: ready after addressing the two 🟡 risks (or documenting them). Nits optional. |
freeunitx
left a comment
There was a problem hiding this comment.
Solid test coverage. Cleanup paths checked against real impl: nxt_port_release_send_msg only frees when allocated, error_handler use_delta math nets correct (use_count 2->1->0), msg->port=NULL safe (never deref'd), close_fd-first/completion-second ordering verified. No bugs found.
Two nits below. Neither blocks.
nxt_cert.c/nxt_script.cfile.fd = -1after close: good defensive guard, matches audit pattern.- fd-count
/proc/self/fdinvariant + NXT_LINUX guard +before >= 0skip on non-Linux: correct.
Additional review — findings not yet raisedAlready covered in prior reviews (Gemini / phpclub / freeunitx): defensive Below — only findings not yet discussed. CRITICAL1. The test does NOT cover the production fast path — regression protection is weaker than it appears
if (nxt_fast_path(port->socket.write_ready
&& nxt_queue_is_empty(&port->messages)))
{
res = NXT_DECLINED; // fast path
}On In production, The synthesised port from Concretely: if someone reverts Suggested fix: add a sub-test with 2.
|
c0817e1 to
187ef84
Compare
|
rebased |
There was a problem hiding this comment.
💡 Codex Review
Lines 342 to 344 in 187ef84
When port->queue is non-NULL and the payload enqueue succeeds with notify != 0, nxt_port_socket_write2() has already consumed b and queued its completion before it can still return NXT_ERROR/NXT_AGAIN from the follow-up socket notification path (msg.buf is then cleared). This new contract says the caller owns and must complete b on any non-NXT_OK return, so callers following it can double-complete/free the buffer in that queue-notification failure case; either the implementation needs to defer ownership transfer until the final return is known, or the contract must exclude the already-enqueued queue path.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adds a fault-injection harness under src/test/nxt_port_fail_test.c covering the audit-fixed IPC paths from freeunitorg#56. Sub-tests: * socket_write — forces nxt_port_msg_alloc() failure with fd != -1 and a non-sync buf; asserts NXT_ERROR, fd stays open, completion not invoked, /proc/self/fd balanced (Linux only). * rpc_register — forces nxt_port_rpc_register_handler_ex alloc and lvlhsh-insert failures; asserts use_count == 1, rpc_streams empty. * error_handler — queues a send_msg with a closeable fd + non-sync buffer, runs nxt_port_error_handler() via an NXT_TESTS trampoline, drains the fast work queue, asserts completion enqueued (not called synchronously) and runs exactly once. * mp_baseline — mirrors the audit-fixed sender shape, forces a pre-queue failure, asserts mp->retain is unchanged. NXT_TESTS-gated trampolines for the harness: nxt_port_test_msg_alloc_ failures()/nxt_port_test_run_error_handler() (nxt_port_socket.c), nxt_port_rpc_test_alloc_failures()/_insert_failures() (nxt_port_rpc.c), nxt_mp_test_retain_count() (nxt_mp.c). Test buffers are heap-owned from a transient mp, matching production callers (nxt_cert_store_get allocates from temp_conf mp). The injected stack engine documents its invariant: the exercised error path must touch nothing beyond fast_work_queue. Also resets file.fd to -1 after nxt_fd_close on the send-fail path in nxt_cert.c and nxt_script.c — defense in depth against an accidental double-close if code is later added below the close. No behavior change. Validation: ./configure --tests && make tests -j && build/tests — all four sub-tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
187ef84 to
14c18ef
Compare
Stacked follow-up for #56 and the coverage plan in #56 (comment).
Adds test-only fault injection for IPC message allocation and RPC registration failures, then covers core ownership invariants: failed queued nxt_port_socket_write() leaves the caller-owned fd open before ownership transfer; the unsent buffer completion is not run by the port layer; /proc/self/fd counts stay balanced where available; failed RPC registration allocation and insert leave use_count unchanged and no stream registration behind.
The branch is based on andypost:claude/port-ipc-completion-leaks / PR #56, so it should be retargeted or rebased once #56 lands.
Validation: make -j2; ./configure --tests; make tests -j2; build/tests