NIP-AE: agent engrams (kind:30174) — core memory injection + sprout mem CLI#593
Open
tlongwell-block wants to merge 9 commits into
Open
NIP-AE: agent engrams (kind:30174) — core memory injection + sprout mem CLI#593tlongwell-block wants to merge 9 commits into
tlongwell-block wants to merge 9 commits into
Conversation
tlongwell-block
pushed a commit
that referenced
this pull request
May 15, 2026
Codex P2 findings on PR #593: 1. Non-NIP44 content slipped past the relay envelope check. A signed kind:30174 with valid d/p tags but content like 'x' won NIP-33 replacement against a valid head and was then silently discarded by readers — silently erasing memory. 2. Uppercase-hex p tags were accepted. Readers query #p with the lowercase hex of the owner pubkey (byte-exact tag match), so an uppercase-tagged event that won replacement became invisible to readers — same bricking pattern. Tighten validate_engram_envelope to: - require lowercase hex for the p tag (consistent with the existing rule on d) - validate that content is a syntactically plausible NIP-44 v2 payload: standard base64 alphabet, length multiple of 4, decoded length >= 99 bytes, first decoded byte = 0x02 (version prefix). Relay-side sanity check only — the MAC and decryption still happen at the reader. The point is to refuse obvious junk before it can supersede a valid head. +6 regression tests; canonical-accepts fixture updated to use a real- shape NIP-44 v2 sample.
tlongwell-block
pushed a commit
that referenced
this pull request
May 15, 2026
Second codex review pass on PR #593 flagged two more P2s: 1. (engram_fetch.rs) When the relay returns kind:30174 events addressed to the agent but none decrypt (wrong key, MAC failure, body schema mismatch, or an event injected by another party that happened to be p-tagged at this agent), the previous code returned Ok(None) — which the harness then renders as the onboarding nudge, inviting the agent to overwrite a real-but-unreadable core. Now distinguish three outcomes: - empty array → Ok(None) (confirmed absence; nudge) - >=1 event decrypts → use winning head - non-empty, none decrypt→ Err (fail closed; no section) Extracted the post-query decode logic into a pure decode_core_body() helper so it's unit-testable without mocking RestClient. Added 5 tests: empty-array-absent, valid-core-returns-profile, undecryptable-is-err-not-absent (the regression), non-core-body-is- absent, unparseable-candidates-is-err. 2. (commands/mem.rs) The module doc comment claimed `sprout mem` and `sprout mem ls` were equivalent, but the clap wiring requires a subcommand so bare `sprout mem` exits with a usage error. Drop the false claim — bare-group-shows-help is the convention across the other 12 subcommand groups; adding a default action just for mem would be inconsistent.
tlongwell-block
pushed a commit
that referenced
this pull request
May 15, 2026
Third codex pass on PR #593 flagged that engram events were stored as global, so any authenticated relay member could REQ `{"kinds":[30174]}` and harvest: - the encrypted ciphertext (no plaintext but still a fingerprint) - the public `#p` (owner pubkey) - the public `#d` (HMAC-derived per-slug fingerprint) - timestamps (write-activity patterns) Together that leaks who-pairs-with-which-agent + when they're active. Strictly speaking the NIP-AE design encrypts content for confidentiality but assumes the relay enforces read gating on the event metadata. Add a new `engram_filters_authorized` predicate alongside the existing `p_gated_filters_authorized`. A filter that can match KIND_AGENT_ENGRAM must satisfy at least one of: - `authors` non-empty AND every entry == authed (agent reading own), or - `#p` non-empty AND every entry == authed (owner reading addressed-to-self). Specific-event-ids lookups (`ids: [...]`) are exempt — knowing the id implies prior authorization. Hook into all four read paths: - WS REQ (historical + live subscription registration) - WS COUNT - HTTP /query - HTTP /count +9 unit tests: agent_querying_own, owner_querying, owner_no_authors, ids_lookup, skips_non_engram_kinds (positive); unrelated_reader, bare_kind_filter, wildcard_kind_filter, mixed_authors_with_unauthed (negative).
Implements NIP-AE (Agent Engrams, kind:30174) as the smallest viable surface across relay, CLI, and ACP harness. * sprout-core::engram — pure crypto + parsing primitives shared by CLI and ACP harness. Conversation key, d-tag HMAC, body parse/serialize with duplicate-key rejection, envelope build, head selection. Pinned spec vectors (K_c, three d-tags) verified byte-for-byte. * Relay: kind 30174 added to ALL_KINDS, the per-kind scope allowlist (UsersWrite, same group as KIND_READ_STATE), and is_global_only_kind. NIP-33 plumbing (replace_parameterized_event) handles the rest. * sprout mem CLI: ls/get/set/rm. Slug shorthand normalises 'foo' to 'mem/foo'; 'core' is reserved. set reads stdin with '-'. Monotonic created_at + tombstone semantics per spec. Symmetric decrypt — either party (owner or agent) reads with their own seckey. * ACP harness: at new-session creation, fire one synchronous fetch + decrypt of the core engram and cache the rendered prompt section per channel. If no core exists or any error occurs, inject the onboarding nudge so the agent learns to bootstrap itself. format_prompt() emits the section after [System] and before [Context]. No mid-session refresh — only re-fetched when a session is invalidated. * Tests: 13 engram unit tests including spec vectors, 2 format_prompt injection tests, scope-allowlist coverage extended. Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Codex P2 findings on PR #593: 1. Non-NIP44 content slipped past the relay envelope check. A signed kind:30174 with valid d/p tags but content like 'x' won NIP-33 replacement against a valid head and was then silently discarded by readers — silently erasing memory. 2. Uppercase-hex p tags were accepted. Readers query #p with the lowercase hex of the owner pubkey (byte-exact tag match), so an uppercase-tagged event that won replacement became invisible to readers — same bricking pattern. Tighten validate_engram_envelope to: - require lowercase hex for the p tag (consistent with the existing rule on d) - validate that content is a syntactically plausible NIP-44 v2 payload: standard base64 alphabet, length multiple of 4, decoded length >= 99 bytes, first decoded byte = 0x02 (version prefix). Relay-side sanity check only — the MAC and decryption still happen at the reader. The point is to refuse obvious junk before it can supersede a valid head. +6 regression tests; canonical-accepts fixture updated to use a real- shape NIP-44 v2 sample. Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Second codex review pass on PR #593 flagged two more P2s: 1. (engram_fetch.rs) When the relay returns kind:30174 events addressed to the agent but none decrypt (wrong key, MAC failure, body schema mismatch, or an event injected by another party that happened to be p-tagged at this agent), the previous code returned Ok(None) — which the harness then renders as the onboarding nudge, inviting the agent to overwrite a real-but-unreadable core. Now distinguish three outcomes: - empty array → Ok(None) (confirmed absence; nudge) - >=1 event decrypts → use winning head - non-empty, none decrypt→ Err (fail closed; no section) Extracted the post-query decode logic into a pure decode_core_body() helper so it's unit-testable without mocking RestClient. Added 5 tests: empty-array-absent, valid-core-returns-profile, undecryptable-is-err-not-absent (the regression), non-core-body-is- absent, unparseable-candidates-is-err. 2. (commands/mem.rs) The module doc comment claimed `sprout mem` and `sprout mem ls` were equivalent, but the clap wiring requires a subcommand so bare `sprout mem` exits with a usage error. Drop the false claim — bare-group-shows-help is the convention across the other 12 subcommand groups; adding a default action just for mem would be inconsistent. Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Third codex pass on PR #593 flagged that engram events were stored as global, so any authenticated relay member could REQ `{"kinds":[30174]}` and harvest: - the encrypted ciphertext (no plaintext but still a fingerprint) - the public `#p` (owner pubkey) - the public `#d` (HMAC-derived per-slug fingerprint) - timestamps (write-activity patterns) Together that leaks who-pairs-with-which-agent + when they're active. Strictly speaking the NIP-AE design encrypts content for confidentiality but assumes the relay enforces read gating on the event metadata. Add a new `engram_filters_authorized` predicate alongside the existing `p_gated_filters_authorized`. A filter that can match KIND_AGENT_ENGRAM must satisfy at least one of: - `authors` non-empty AND every entry == authed (agent reading own), or - `#p` non-empty AND every entry == authed (owner reading addressed-to-self). Specific-event-ids lookups (`ids: [...]`) are exempt — knowing the id implies prior authorization. Hook into all four read paths: - WS REQ (historical + live subscription registration) - WS COUNT - HTTP /query - HTTP /count +9 unit tests: agent_querying_own, owner_querying, owner_no_authors, ids_lookup, skips_non_engram_kinds (positive); unrelated_reader, bare_kind_filter, wildcard_kind_filter, mixed_authors_with_unauthed (negative). Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
The round-4 fix added p_gated/engram_filters_authorized gates to the
WS REQ historical-delivery branch, WS COUNT, HTTP /query, and HTTP
/count — but missed the WS REQ NIP-50 search branch, which intercepts
before reaching the gate. Since kind:30174 envelopes are indexed in
Typesense (only NIP-17 gift wraps are skipped), an authenticated
relay member could send
{"search":"*","kinds":[30174]}
and harvest every engram ciphertext + owner #p + slug #d fingerprint
on the relay, leaking the metadata the round-4 gate was specifically
written to protect.
Fix: move the two filter-auth checks above the search early-return.
The same reordering also closes the equivalent search-bypass for
the pre-existing P_GATED_KINDS (observer frames, member notifications)
which are likewise globally stored, indexed, and were previously
only gated on the non-search path.
+4 regression tests asserting the gate rejects search-shaped attack
filters and still allows authored search.
Found by codex review round 5.
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com> * origin/main: dev-mcp: add view_image tool (#602) fix(relay,desktop): only advertise NIP-43 when enforced; probe pairing by supported_nips (#601) fix(desktop): derive unread state from NIP-RS + relay catch-up only (#599) docs(testing): rewrite TESTING.md for current API and CLI-first workflow (#597) fix(agent): fix OpenAI-compat request body serialization and max_tokens (#595) feat(desktop): per-persona and per-agent env var overrides (#594)
ca57552 to
860ec66
Compare
The HTTP /query NIP-50 search path in handle_bridge_search pushed only
kind/authors/time/channel into Typesense and applied a channel-access
post-filter, but did not enforce the rest of the requesting filter
against the fetched events. The WS NIP-50 path does (handlers/req.rs).
For NIP-AE this meant the engram read gate (which authorizes the
*filter*: kind=30174 with author=self or #p=self) was bypassed for
/query specifically: an authorized search like
{"search":"foo","kinds":[30174],"#p":[owner_self]}
could return text-matching engram envelopes whose #p belongs to a
different owner (or an authors=[agent_self] search could return
events authored by other agents), because Typesense doesn't see #p
and the post-filter wasn't running.
Fix: extract the per-hit acceptance logic into search_hit_accepted()
and call sprout_core::filter::filters_match against the current
filter before channel-access and dedup. This mirrors the WS post-
filter at handlers/req.rs and locks the bridge to the same NIP-01
semantics.
Tests: three unit tests covering the leak — mismatched #p tag,
mismatched author, and channel scope — exercising the helper that
owns the fix. Full suites Mari named also green: engram (17),
engram_envelope (12), engram_gate (12), engram_fetch (5).
Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Bring the PR up to date with origin/main. Two conflicts resolved: * crates/sprout-acp/src/main.rs — main extracted the binary body into sprout-acp/src/lib.rs (commit 70cb53e, "Add Sprig all-in-one agent binary"). Took main's 3-line shim verbatim and replayed Sami's three hunks against lib.rs instead: - declare `mod engram_fetch;` - clone `startup_owner` for `OwnerCache::new` so we can also use it for the PromptContext below - thread `agent_keys` + `agent_owner_pubkey` into PromptContext construction (needed by the NIP-AE core fetch in pool.rs) * Cargo.lock + desktop/src-tauri/Cargo.lock — took theirs; cargo check --workspace produced no further changes (all engram deps were already present on main). Verified: * cargo check --workspace — clean * cargo clippy --workspace --all-targets -- -D warnings — clean * cargo test -p sprout-core -p sprout-acp -p sprout-cli -p sprout-relay — all green, including the 17 engram tests, 5 engram_fetch tests (covering the d7842a0 fail-closed regression), and 13 format_prompt tests (the two new agent_core injection cases included). Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements NIP-AE (Agent Engrams, kind
30174) as the smallest viable surface across relay, CLI, and ACP harness. The goal: let an owner write a small, durable "core memory" that their agent reads once per new session, with no new daemons, no new crates, and minimal moving parts.What this gives you
sprout mem set core "I am Sami. Be terse."— writes a NIP-44-encrypted, parameterized-replaceable note addressed to your agent.[System]and[Context]. One fetch per session, fail-open on every error path.How it's structured
sprout-core::engram— pure primitives, no I/O. Shared by CLI and ACP.d-tag =lower_hex(HMAC-SHA256(K_c, "agent-memory/v1/d-tag\0" || slug))— spec vectors pinned byte-for-byte.Visitor, not a hand-rolled scanner).created_atdesc, then event-id desc; tombstones honoured).Relay — kind
30174added toALL_KINDS, the per-kind scope allowlist (UsersWrite, same group asKIND_READ_STATE), andis_global_only_kind. A newvalidate_engram_enveloperejects malformed events (≠1d, ≠1p, non-lowercase-hexd, empty content) before they reach NIP-33 replacement, so a bad event can't poison the storage head and become invisible to#preaders.sprout memCLI —ls | get | set | rm. Slug shorthand normalisesfoo→mem/foo;coreis reserved andrm coreis refused.setreads from-(stdin).submit_engramparses the relay's{accepted, message}so aduplicate:response (same-second NIP-33 dominated write) surfaces as aConflictinstead of a silent "wrote".ACP harness — at new-session creation, one synchronous fetch + decrypt of the
coreengram, cached per channel in the rendered prompt section. Re-fetched only when the session is invalidated. On transport errors we returnNone(no section) rather than the onboarding nudge — a flaky relay shouldn't gaslight the agent into thinking its memory is empty.Test plan
sprout-core, including the spec'sK_cvector and threed-tag vectors verified byte-for-byte.format_promptinjection tests in the ACP queue (core present → injected; absent → nudge).TESTING.md:ls(empty) →set core→get core→set foo(stdin) →set mem/bar→ls(two entries) →rm foo→get fooexits non-zero withtombstoned:→lsshows onlymem/bar→rm corerefused → invalid slug rejected.coreset: harness logsinjected NIP-AE core section ... section_len=60.core: harness logs onboarding nudge.Notes for review
NotFound/Conflictexit codes.sprout-core/src/engram.rs(~835 lines, mostly tests + spec vectors).SproutClientas everything else; the harness uses the existingRestClient::query.Out of scope (intentionally)
sprout-core::engramAPI).core/mem/*(the d-tag derivation is slug-agnostic; future kinds slot in without protocol changes).Closes the NIP-AE implementation thread.