fix(appkit): backpressure inline-Arrow stash before EXTERNAL_LINKS fallback#390
Open
jamesbroadhead wants to merge 1 commit into
Open
fix(appkit): backpressure inline-Arrow stash before EXTERNAL_LINKS fallback#390jamesbroadhead wants to merge 1 commit into
jamesbroadhead wants to merge 1 commit into
Conversation
The ARROW_STREAM path in the analytics plugin stashes inline Arrow IPC bytes server-side and emits a synthetic `inline-<uuid>` id over SSE; the client GETs `/arrow-result/<id>` to drain the stash. When the stash is full, the plugin falls back to `EXTERNAL_LINKS + ARROW_STREAM`. That fallback target is not universal. Reyden refuses `EXTERNAL_LINKS + ARROW_STREAM` outright (HTTP 501 NOT_IMPLEMENTED, "ExternalLinks disposition is not yet implemented."). On a Reyden-backed app, any stash overflow would propagate that 501 to the caller with no further recovery, even though every relevant inline slot frees within one HTTP round-trip (the stash is drain-on-read). Add brief FIFO backpressure inside the stash. `putBlocking()` first tries the existing synchronous `put()` and, on overflow, waits up to `putWaitMs` for an in-flight `take()` (or TTL gc) to free a slot, then retries. The analytics plugin constructs the stash with `putWaitMs: 500` and `await`s the new method at the existing call site. Honors `AbortSignal` so a cancelled request gives up the wait immediately. The EXTERNAL_LINKS fallback path is preserved for true sustained overload on warehouses that accept EXTERNAL_LINKS. The wait queue is strictly FIFO. After a take() / clear() / TTL gc frees capacity, wakeWaiters() walks the queue from the head and shifts each fitting waiter off before invoking it, so a re-entrant gc() inside the waker cannot pick the same waiter up twice. Tests cover: immediate-success, wait-then-succeed via take(), timeout to null, FIFO order across waiters, head-consumes-capacity rejecting the tail, mid-wait abort, pre-aborted signal, and the putWaitMs=0 default behaving like the synchronous put(). The existing "stash full -> EXTERNAL_LINKS" integration test now stubs putBlocking directly so it does not bake the 500ms wait into the suite. Verified live against e2-dogfood "Test Reyden" (warehouse_type=REYDEN): ARROW_STREAM caller returns valid Arrow IPC bytes via the inline path unchanged; the new code only activates under capacity pressure. This pull request was AI-assisted by Isaac. Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #329. On Reyden-backed warehouses, the analytics plugin's stash-full fallback target (
EXTERNAL_LINKS + ARROW_STREAM) does not exist — Reyden returns HTTP 501NOT_IMPLEMENTED: "ExternalLinks disposition is not yet implemented."— so any stash overflow on a Reyden app would surface that 501 to the caller with no further recovery.This PR adds brief FIFO backpressure inside the stash so transient capacity pressure is absorbed by waiting for the next
/arrow-resultdrain (typically tens of ms — the stash is drain-on-read) rather than immediately escalating to EXTERNAL_LINKS. The existing EXTERNAL_LINKS fallback is preserved for true sustained overload on warehouses that accept it.What's the fix
InlineArrowStash: newputWaitMsoption andasync putBlocking(userId, bytes, signal?). On overflow it waits FIFO up toputWaitMsfor atake()/ TTL-gc to free a slot, then retries. HonorsAbortSignal(aborted request drops out of the queue immediately). When the wait elapses without capacity, returnsnulland the caller falls back as before.AnalyticsPlugin: stash now constructed withputWaitMs: 500; the call siteawaitsputBlockingand passes the request's abort signal.wakeWaiters()walks from the head and shifts each fitting waiter off before invoking it, so a re-entrantgc()inside the waker can't pick up the same waiter twice.Surface area is small: the synchronous
put()is unchanged, the EXTERNAL_LINKS fallback path is unchanged, andputWaitMs = 0(the default) reducesputBlockingto the synchronous behavior.Why not the alternatives
Tests
inline-arrow-stash.test.ts(+8 cases):putBlockingsucceeds immediately when capacity is availabletake()to free a slot, then succeedsnullwhenputWaitMselapses without a slot freeingnullwhenAbortSignalaborts mid-waitputWaitMs=0(default) behaves like the synchronousput()analytics.test.ts: existing "stash full → EXTERNAL_LINKS" integration test now stubsputBlockingdirectly instead ofput, so the suite doesn't bake the 500ms wait into the run time.Full suite: 2,682 tests, all green (was 2,674 on the base).
Stack
Targets
stack/arrow-3-inline-arrow-fix; will rebase ontomainafter #329 merges.Test plan
Verified live against e2-dogfood
Test Reyden(warehouse_type=REYDEN, id0000000002e475d4):JSON_ARRAYcaller: single SSEresultevent with row data, no fallback (unchanged)ARROW_STREAMcaller: single SSEarrowevent →GET /arrow-result/inline-<uuid>returns 776 bytes of valid Arrow IPC,application/vnd.apache.arrow.stream(unchanged — new code only activates under capacity pressure)The new code path (backpressure under contention) is covered by the unit tests; reproducing it live requires saturating the stash, which is outside the scope of this PR.
This pull request was AI-assisted by Isaac.