fix: harden agy-acp session binding with state-file persistence#928
Conversation
Replace global latest-conversation heuristic with:
- Snapshot-based conversation binding (pre/post dir diff)
- Persistent session→conversation mapping in ~/.openab/agy-acp/sessions.json
- File locking (fs2) for concurrent access safety
- Atomic write (tmp+rename) to prevent corruption
- Prefix-checked delta extraction with trim_start_matches('\n')
- stderr capture and exit status checking for agy subprocess
- Fail-soft single-turn mode when binding is ambiguous
The state file allows session bindings to survive adapter restarts
and supports concurrent sessions with exclusive file locking.
Addresses the same stale-output bug as #927 but with persistence
for production Discord deployments.
|
All PRs must reference a prior Discord discussion to ensure community alignment before implementation. Please edit the PR description to include a link like: This PR will be automatically closed in 3 days if the link is not added. |
- F1: add #[ignore] to test_new_conversation_id_returns_none_when_multiple_files - F2: use dedicated lock file (sessions.lock) for proper read-write mutual exclusion; persist_session does read-modify-write under single lock
- F3: return JSON-RPC error when agy exits non-zero with empty stdout - F4: cap in-memory sessions at 64, evict oldest when full
When a session is evicted from the in-memory HashMap due to the 64-session cap, restore it from sessions.json on the next prompt request to maintain conversation continuity.
- F2: remove dead restore_session call in handle_session_new (new UUIDs can never exist in state file) - F3: persist_session only on first successful bind, not every turn
- Remove unused save_store() (persist_session does its own lock+write) - Fix comment: HashMap eviction is arbitrary, not oldest
OpenAB PR ScreeningThis is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Screening reportblocked by the local tool sandbox before the write step.i confirmed with
the first write attempt failed before after that, even trivial commands like Screening Report IntentPR #928 fixes stale or cross-session output in FeatFix work. It adds snapshot-diff binding, persistent session state, lock-file guarded writes, safer delta extraction, subprocess error handling, and bounded session restore. Who It ServesAgent runtime operators and maintainers first. End users benefit indirectly through fewer stale or misbound ACP replies. Rewritten PromptHarden Merge PitchThis should move forward because it addresses the stale-output class more durably than #927. Risk is moderate: filesystem state, locking, subprocess behavior, and eviction all need careful review. Best-Practice ComparisonOpenClaw is relevant for durable routing and isolated execution. This PR moves closer by persisting explicit session bindings, though it does not cover broader job persistence, retry/backoff, delivery routing, or run logs. Hermes Agent is relevant for file locking and atomic persisted daemon state. The fresh-session scheduled-run model does not directly apply. Implementation Options
Comparison Table
RecommendationTake the balanced path. Advance #928 for detailed review, focusing on lock behavior, atomic write failure modes, corrupted JSON recovery, and whether the 64-session eviction policy can evict active sessions. Close #927 as superseded if #928 lands. GitHub comment: not created due sandbox failure. |
|
CHANGES REQUESTED What This PR DoesIntroduces persistent, file-based session-to-conversation mapping for the How It WorksCreates a state file at Findings
Finding Details🔴 F1: Random Eviction Vulnerabilitywhile self.sessions.len() >= MAX_SESSIONS {
if let Some(key) = self.sessions.keys().next().cloned() {
self.sessions.remove(&key);
}
}
Suggested fix: Track 🟡 F2: Silent I/O FailuresWhen Suggested fix: Log warnings to stderr on lock/parse failures. Consider returning 🟡 F3: Concurrency Race ConditionWhen multiple unbound sessions receive prompts simultaneously, each takes a filesystem snapshot, then diffs after the agent responds. If multiple new conversation files appear between snapshots, the safety check ( Suggested fix: Use the session-specific expected path pattern or agent PID to disambiguate, rather than relying on a global diff. 🟡 F4: Missing fsyncThe atomic write pattern (write temp → rename) does not call Suggested fix: Add Baseline Check
What's Good (🟢)
Reviewed by: 覺渡法師 · Coordinated by: 超渡法師 |
Move persist_session() call outside the mutable borrow scope of self.sessions.get_mut() to satisfy the borrow checker.
|
just built an image for this PR |
Summary
Alternative implementation to #927 for the same stale-output bug. This PR supersedes #927 by adding persistent state-file based session binding and addressing all review findings identified during the #927 review.
Why This PR Over #927
PR #927 introduced snapshot-based conversation binding which is better than main, but our team review identified 7 findings (1 🔴 + 6 🟡). Rather than patching #927 incrementally, this PR implements the recommended architecture from the review discussion:
~/.openab/agy-acp/sessions.jsonsessions.lock) withfs2trim_start()— strips meaningful whitespacetrim_start_matches(\x27\\n\x27)— preserves indentation#[ignore]#[ignore]test_<scenario>_<expected_outcome>per ADRprev_output.clone()every turn, unbounded sessionsprev_outputby ref, 64-session cap with eviction + restoreThe ideal solution would be for
agyCLI to natively return the conversation ID, eliminating all filesystem guessing. That requires an upstream change. This PR is the best achievable without upstream modifications.Changes
agy-acp/Cargo.tomlfs2 = "0.4.3"for file lockingagy-acp/src/main.rsagy-acp/src/main.rstrim_start_matches(\x27\\n\x27)agy-acp/src/main.rsagy-acp/src/main.rsagy-acp/src/main.rsagy-acp/src/main.rsTesting
Unit tests (always run):
test_extract_delta_returns_full_text_when_unboundtest_extract_delta_strips_prefix_when_boundtest_extract_delta_returns_full_when_not_append_onlytest_extract_delta_preserves_leading_spacesIntegration tests (
#[ignore], run withCHI_INTEG=1):test_new_conversation_id_returns_none_when_multiple_filestest_snapshot_diff_binds_single_new_conversationtest_persist_and_restore_session_bindingCompatibility
~/.openab/agy-acp/sessions.json(auto-created)fs2 = "0.4.3"(well-maintained, minimal)