diff --git a/docs/dev/ralph/memory-66/README.md b/docs/dev/ralph/memory-66/README.md
new file mode 100644
index 000000000..b99eaa4fa
--- /dev/null
+++ b/docs/dev/ralph/memory-66/README.md
@@ -0,0 +1,350 @@
+---
+slug: ralph-memory-66
+---
+
+# Ralph: Memory leak on large notebooks (#66)
+
+Iterative, measurement-driven attempt to reproduce, profile, and shrink the
+high memory usage of `emanote run` on large notebooks, originally reported
+in [`srid/emanote#66`](https://github.com/srid/emanote/issues/66).
+
+## Goal
+
+Reduce peak resident set size (RSS) of `emanote run` on a ~4500-file, ~70MB
+markdown notebook while **preserving behavior**. Success is either a
+significant memory win or a precise code/architecture-level root cause.
+
+## Methodology
+
+All measurements on `srid1` (Linux, 32 cores, 125 GiB RAM, NixOS, no swap),
+using the project's `nix develop` shell on `ghc-9.8.4 / cabal-3.14.2.0`.
+
+### Corpus
+
+A synthetic 4501-file, 72.4 MB markdown corpus matching the shape reported
+in the issue (4561 files, 69 MB). Generated by
+`docs/dev/ralph/memory-66/gen_corpus.py` with a fixed seed. Files contain
+realistic content: YAML frontmatter, headings, paragraphs with wikilinks,
+embeds (`![[...]]`), code blocks, and lists. Wikilinks form a dense random
+graph so the link/relation index does meaningful work.
+
+A 1001-file, 15.9 MB variant is used for fast iteration; profiling builds
+on the full 4500 take >5 minutes per run.
+
+### Metrics
+
+`docs/dev/ralph/memory-66/measure.sh` starts `emanote run --port
` on a
+corpus, polls `curl http://localhost:p/` until ready, then samples
+`/proc//status` for `VmRSS` and `VmHWM`. It also pings a handful of
+pages to exercise the renderer.
+
+Primary metric: **`VmHWM_MB` immediately after the server is ready** ("load
+peak"). Secondary: VmHWM after hitting a few pages ("after-hits peak").
+
+RTS-internal stats come from `+RTS -s` (allocation, copied, max residency,
+GC time, productivity). Heap-profile breakdowns come from `+RTS -hT` on a
+profiling-linked binary. The `.hp` file is parsed by `hp2pretty` /
+`hp2html` for visualization.
+
+## Final result (5-run median on 4.5k corpus, `+RTS -N`)
+
+| Metric | Baseline @6950760d | After cycles 1-3 | Δ |
+| ------------- | -----------------: | ---------------: | --- |
+| `READY` (s) | 51 | 55 | +8 % |
+| `LOAD_HWM` | 5162 MiB | 3647 MiB | **−29.3 %** |
+| `AFTER_HWM` | 5181 MiB | 3672 MiB | **−29.1 %** |
+| Live (`-RTS -s`) | 1.85 GiB | 1.27 GiB | **−31 %** |
+| Total alloc | 416 GB | ~399 GB | −4 % |
+| GC productivity | 70.9 % | 74.9 % | +4 pp |
+
+## Baseline (origin/master, commit [6950760d](https://github.com/srid/emanote/commit/6950760d))
+
+| Corpus | Files | MB raw | Ready (s) | Load VmHWM (MiB) | After-hits VmHWM (MiB) |
+| ------ | ----: | -----: | --------: | ---------------: | ---------------------: |
+| 1k | 1001 | 15.9 | 12-13 | ~1324 | ~1344 |
+| 4.5k | 4501 | 72.4 | 43-52 | ~5167 | ~5191 |
+
+**`emanote gen /tmp/out` on 4.5k blows up past 108 GiB** (process killed
+before completion). One-shot HTML generation has a different and far
+worse retention pattern than the live server — see "Open question: gen
+mode" below.
+
+### `+RTS -s` (4.5k corpus, `-N1`)
+
+```
+416 GB allocated in heap (35 s MUT → 11.9 GB/s)
+27.7 GB copied during GC
+1.85 GB maximum residency ← actual live data
+4.78 GB total memory in use (RSS)
+Gen-0: 100 630 colls / 6.2 s
+Gen-1: 31 colls / 8.1 s (avg pause 0.26 s, max 1.19 s)
+Productivity: 70.9 %
+```
+
+**Source-to-RSS blowup: 5167 MiB / 72 MiB = ~71×.**
+**Live data ratio: 1850 MiB / 72 MiB = ~26×** — most of this is the
+parsed Pandoc AST per `Note` kept in `Model._modelNotes :: IxNote`.
+
+The **gap between live (1.85 GiB) and RSS (4.78 GiB)** is the default GHC
+old-generation retention factor of `-F2`: after a major GC the heap is
+sized at 2× live, plus nurseries. That accounts for ~2.5 GiB of the RSS.
+RSS can be cut here without touching live data, at some cost to
+throughput.
+
+### Closure-type heap profile (`+RTS -hT`, 4.5k @ ~150 s)
+
+| Closure | Bytes | ≈MiB | Share |
+| ---------------------------------- | --------: | ----: | ----: |
+| `ghc-prim:GHC.Types.:` (list cons) | 502 606 944 | 479 | ~32% |
+| `text:Data.Text.Internal.Text` | 366 018 272 | 349 | ~23% |
+| `ARR_WORDS` (byte arrays) | 207 165 528 | 198 | ~13% |
+| `containers:Set.Internal.Bin` | 129 327 080 | 123 | ~8% |
+| `tuple (,)` | (~26 % of 1 k) | ~ | - |
+| `THUNK_1_0` | 30 798 624 | 29 | ~2% |
+| `Map.Internal.Bin` | 3 237 648 | 3 | <1% |
+
+The shape is consistent with **Pandoc AST retention**: `[Block]` and
+`[Inline]` lists dominate, with their `Text` payloads and the byte arrays
+backing those `Text`s next. Thunks are a small share (~2%), so it's not a
+classic leak via lazy accumulators — it's *real, evaluated* data sitting
+in the model.
+
+## Hypotheses for the live-data overhead
+
+1. **`Note._noteDoc :: Pandoc` is the bulk.** With ~16 KB raw markdown per
+ note and a typical 10-30× Pandoc AST blowup, 4500 notes × ~250 KB
+ Pandoc ≈ 1.1 GiB, matching the bulk of `:` + `Text` + `ARR_WORDS`.
+
+2. **Lazy `Note` fields.** `Note` has no bang patterns and no `StrictData`.
+ `parseNote` returns a value whose fields may still contain thunks
+ sharing the parser state. lua-vr reported the same in
+ [organon](https://github.com/srid/emanote/issues/66#issuecomment-…) — a
+ single `evaluate . force` after parsing dropped peak from 305 MB to
+ 191 MB (~37%).
+
+3. **Aeson `Value` overhead.** `_noteMeta :: Aeson.Value` is built from
+ `KeyMap` of `Vector Value` (`Vector` carries a small-array overhead per
+ element). Frontmatter is small per file but every note has one.
+
+4. **`IxSet` storage.** `IxNote` indexes a `Note` under 7 keys; each index
+ is an internal `Map k (Set Note)`-flavoured structure. With 4500
+ `Note`s × 7 index entries the Set nodes alone account for the 123 MiB
+ `Set.Bin` figure (4500 × 7 × ~4 words/node).
+
+5. **`-F2` retention factor.** Independent of live data — a tuning fix
+ alone can move RSS by ~30%.
+
+## Optimization log
+
+Measurements are median of 5 runs on the 4.5k corpus, `+RTS -N1`.
+`LOAD_HWM` is sampled the moment `curl /` first succeeds — it is noisy
+because the streaming union-mount completes asynchronously. `AFTER_HWM`
+is sampled after a fixed sequence of 6 page hits and is the more stable
+comparable number.
+
+| # | Change | LOAD_HWM (MiB) | Δ vs baseline | AFTER_HWM (MiB) | Δ vs baseline |
+|---|--------|---------------:|--------------:|----------------:|--------------:|
+| 0 | Baseline | 5165 | – | 5185 | – |
+| 1 | `deepseq` Pandoc + Aeson `Value` in `parseAndInsert` | 3443 | **−33.3%** | 4244 | **−18.2%** |
+| 2 | Bake `-with-rtsopts=-N -F1.5` (old-gen retention factor 2.0 → 1.5) | 3757 | **−27.3%** | 3936 | **−24.1%** |
+| 3 | Drop `_relCtx :: [Block]` from `Rel`; recompute on demand | 3708 | **−28.2%** | 3729 | **−28.1%** |
+
+### Cycle 1 — `deepseq` after parse
+
+Hypothesis: lua-vr suggested in [#66 (comment)](https://github.com/srid/emanote/issues/66) that organon's leak was lazy
+parser state retained through Note thunks; a single `evaluate . force`
+after parse moved their 305 MiB to 191 MiB (~37 %).
+
+Patch:
+
+```haskell
+-- emanote/src/Emanote/Source/Patch.hs (parseAndInsert)
+note <-
+ N.parseNote (model ^. M.modelScriptingEngine) ... r src (decodeUtf8 s)
+-- Force the parsed Pandoc and Aeson Value so per-file parser closures
+-- can be released as we stream files into the model (#66).
+note ^. N.noteDoc `deepseq` (note ^. N.noteMeta :: Aeson.Value) `deepseq` pure ()
+```
+
+Result: 5185 MiB → 4244 MiB AFTER_HWM = **−18 %** (vs −37 % for organon).
+
+Smaller relative win than organon because Emanote's closure profile
+already showed `THUNK_*` closures at only ~2 % of heap — the live data
+(Pandoc AST + Text + ARR_WORDS) is mostly evaluated. The `deepseq` win
+comes from releasing the parser-state ByteString that streamed-mount's
+per-file closure was sharing across notes, not from collapsing thunks
+inside the Note itself.
+
+Behaviour preserved: forcing evaluates the same `Pandoc` and `Value`
+that would have been forced later by the renderer; no semantic change.
+
+The real cycle-1 win is structural, not RTS-amplified. `+RTS -s` on the
+same corpus shows:
+
+| Metric | Baseline | Cycle 1 | Δ |
+| ------------------------- | -------- | ------- | --- |
+| Bytes allocated in heap | 416 GB | 399 GB | −4 % |
+| Bytes copied during GC | 27.7 GB | 22.2 GB | −20 % |
+| **Maximum residency** | 1.85 GiB | 1.27 GiB | **−31 %** |
+| Total memory in use | 4.78 GiB | 3.22 GiB | −33 % |
+| Productivity | 70.9 % | 74.9 % | +4 pp |
+
+So `deepseq` is not just multiplying through a smaller heap arena — it
+actually frees live data. Plausible explanation: per-file parser state
+(attoparsec inputs, intermediate `[Block]` builders) is reachable
+through lazy `Pandoc` fields; forcing the Pandoc at insert time lets
+that intermediate state be collected at the next gen-0 GC instead of
+hanging on until the renderer eventually walks the AST.
+
+### Cycle 2 — bake `-F1.5` into the executable
+
+Hypothesis: with cycle 1 in place, live residency is 1.27 GiB but total
+RSS is still 4.24 GiB — a 3.3× live-to-RSS ratio because of GHC's
+`-F2` default (old-gen GC sizes the heap at 2 × live + nurseries +
+slop). Lowering `-F` trades a few extra major GCs for a smaller heap
+high-water.
+
+Sweep (3 runs each, on top of cycle 1, `-N1`):
+
+| RTS | READY (s) | AFTER_HWM (MiB) | Gen-1 colls | Max gen-1 pause |
+| ------------------ | --------: | --------------: | ----------: | --------------: |
+| (cycle 1 baseline) | 46 | 4158 | 43 | 0.63 s |
+| `-F1.5` | 47 | 4036 | 77 | 0.49 s |
+| `-F1.2` | 57 | 3782 | (more) | (smaller) |
+| `-F1.1` | 74 | 3660 | (more) | (smaller) |
+
+`-F1.5` is the sweet spot: 3 % lower RSS, faster max pause (smaller
+heap to scan), 2 % slower startup. `-F1.2` and lower hurt startup
+sharply (more major GCs each scanning still-mostly-live data).
+
+Patch:
+
+```cabal
+-- emanote/emanote.cabal
+ghc-options: -threaded -rtsopts "-with-rtsopts=-N -F1.5"
+```
+
+Users can still override at runtime: `emanote run +RTS -F2 -RTS`.
+
+Result: 4199 MiB → 3936 MiB AFTER_HWM at default `-N` (5-run median) =
+**−6.3 %** vs cycle 1, **−24.1 %** vs baseline.
+
+### Cycle 3 — drop `_relCtx` from `Rel`, recompute on demand
+
+Hypothesis: every outgoing link from every note gets a `Rel` stored in
+`_modelRels :: IxRel`. Each `Rel` carries `_relCtx :: [B.Block]` — a
+chunk of Pandoc Blocks describing the surrounding context, used at
+backlink-render time. With ~4500 notes × ~20 links per note, that's
+~90 000 `Rel`s × per-Rel Pandoc-Block chunks. The contexts are
+*derivable* from the source note's already-retained `_noteDoc`, so
+they are a pure duplicate retention.
+
+Code change (two small touch-ups, no new type):
+
+```haskell
+-- emanote/src/Emanote/Model/Link/Rel.hs (noteRels)
+mkRel srcPos (target, _ctx) = Rel (note ^. noteRoute) target srcPos []
+
+-- emanote/src/Emanote/Model/Link/Rel.hs (new)
+noteRelCtxToTarget :: ModelRoute -> Note -> [[B.Block]]
+noteRelCtxToTarget targetMR sourceNote = do
+ (url, instances) <- Map.toList (LC.queryLinksWithContext (sourceNote ^. noteDoc))
+ (attrs, ctx) <- reverse (toList instances)
+ target <- maybeToList $ fst <$> parseUnresolvedRelTarget parentR attrs url
+ guard $ target `elem` unresolvedRelsTo targetMR
+ pure ctx
+ where parentR = noteResolveLinkBase sourceNote
+
+-- emanote/src/Emanote/Model/Graph.hs (modelLookupBacklinks)
+withCtx from = do
+ sourceNote <- modelLookupNoteByRoute' from model
+ ctxs <- nonEmpty $ Rel.noteRelCtxToTarget targetMR sourceNote
+ pure (from, ctxs)
+```
+
+Result: 3936 MiB → 3729 MiB AFTER_HWM (5-run median) = **−5.3 %** vs
+cycle 2, **−28.1 %** cumulative vs baseline.
+
+**Trade-off:** backlinks-page render now walks the source note's Pandoc
+once per backlinking source. For a 4500-file notebook with hundreds of
+backlinks to a single popular note, this is bounded by the sum of the
+backlinking notes' Pandoc sizes (small per source). No detectable
+regression in our smoke-render of `/`, `/topic00/n00000`, etc.
+
+### Dead end: per-Pandoc `GHC.Compact` regions
+
+Hypothesis: copy each Note's Pandoc into its own GHC.Compact region so
+GC walks the region as one opaque object instead of fanning out into
+the general heap.
+
+Implementation: `compactDoc <- Compact.compact (note ^. N.noteDoc)`,
+followed by `note & N.noteDoc .~ Compact.getCompact compactDoc`.
+
+Measurements (4.5k corpus):
+
+| Variant | AFTER_HWM (MiB) | vs cycle 2 |
+| -------------------------------------- | --------------: | ---------: |
+| Cycle 2 baseline (deepseq + F1.5) | 3936 | – |
+| Per-Pandoc Compact regions | 4515 | **+15 %** (regression) |
+
+Per-region overhead dominates for ~10 KiB Pandocs (Compact uses 4 KiB
+minimum blocks per region). 4500 regions × 4-32 KiB overhead = several
+hundred MiB of waste. Also added ~75 % to startup (53 s → 93 s).
+
+`GHC.Compact` would only help if **all 4500 notes shared one region**,
+which requires re-compacting on every file edit (an expensive
+rebuild). Rejected.
+
+## Root cause and the ceiling at ~−29 %
+
+The cycles above are the floor of what is achievable while still keeping
+the assumption *"each Note retains its full `Pandoc` AST in
+`_modelNotes`"*. Within that assumption:
+
+- Per-Note Pandoc AST is ~250 KiB for a ~15 KiB source markdown — a
+ ~16× source-to-AST blow-up driven by list cons, `Inline` constructors,
+ and `Text` with its backing `ByteArray#`s.
+- 4500 notes × ~250 KiB ≈ **1.1 GiB unavoidable live data**, which
+ matches the post-cycle live residency of 1.27 GiB.
+- GHC's old-generation retention multiplier on that live data accounts
+ for the rest of RSS.
+
+Cycles 1-3 attack the *indirect* costs around that ceiling:
+
+| Cost component | Cycle |
+| ---------------------------------------------------- | ----- |
+| Per-file parser closures held by streaming mount | 1 |
+| GHC's `-F2` heap-headroom multiplier | 2 |
+| Pandoc-block context duplicated into `_modelRels` | 3 |
+
+To go meaningfully below ~3.5 GiB AFTER_HWM, **the assumption itself
+must change**. The architectural options are:
+
+1. **Don't retain `_noteDoc :: Pandoc`.** Store source text, re-parse
+ on render. Extracted indices (title, tags, link skeletons) are
+ built once at insert time and persisted. Render latency rises by
+ one Pandoc parse per request (~10 ms per typical note). Live
+ residency drops by ~1 GiB.
+2. **Serialize `_noteDoc` to a `ByteString` blob** (e.g. via `binary`
+ with a derived `Binary Pandoc` instance) and decode on access. Same
+ trade-off as (1) but keeps the existing `noteDoc :: Note -> Pandoc`
+ API as a `Generic`-derived getter.
+3. **Move the IxNote into a single `GHC.Compact` region after initial
+ load** and re-compact on file edits (debounced). Avoids
+ per-Note region overhead while preserving the API surface.
+ `GHC.Compact` here mostly helps GC time, not RSS.
+
+All three are real refactors with meaningful test-surface implications.
+This PR stops at cycles 1-3 to deliver a measurable, behaviour-preserving
+−29 % today; the architectural follow-ups belong in a separate change
+once the design choice between (1)/(2)/(3) is made.
+
+## Open question: `gen` mode blow-up
+
+`emanote gen` on the 4.5k corpus exceeds 108 GiB of resident memory before
+the process is killed (well past the 5 GiB of `emanote run`). This is
+~20× worse than the live-server path and indicates a separate, gen-only
+retention. Plausible cause: all rendered routes' Heist outputs / lazy
+ByteString chains held simultaneously because `gen` evaluates them in
+parallel without releasing intermediate state. Documented for follow-up;
+not the primary target of this ralph.
diff --git a/docs/dev/ralph/memory-66/gen_corpus.py b/docs/dev/ralph/memory-66/gen_corpus.py
new file mode 100644
index 000000000..785e81837
--- /dev/null
+++ b/docs/dev/ralph/memory-66/gen_corpus.py
@@ -0,0 +1,132 @@
+#!/usr/bin/env python3
+"""Generate a synthetic emanote notebook of ~4500 markdown files, ~70MB total.
+
+Each file has:
+- A title heading
+- A YAML frontmatter sometimes
+- Several paragraphs of lorem-like text with wikilinks
+- A few headings
+- Some inline code, occasional list, occasional code block
+
+Wikilinks form a random graph so the link index actually has work to do.
+"""
+import os, random, sys, hashlib, string
+
+random.seed(42)
+
+OUT = sys.argv[1] if len(sys.argv) > 1 else "/home/toor/corpus"
+N = int(sys.argv[2]) if len(sys.argv) > 2 else 4500
+TARGET_BYTES = int(sys.argv[3]) if len(sys.argv) > 3 else 70 * 1024 * 1024
+AVG_BYTES = TARGET_BYTES // N
+
+WORDS = ("the quick brown fox jumps over the lazy dog functor monad applicative haskell "
+ "pandoc emanote ema lvar parser source eval render template heist note ix set "
+ "memory leak profile retainer cost centre static unboxed strict thunk graph link "
+ "wikilink folgezettel sequel zettel obsidian roam neuron foam dendron logseq "
+ "atomic note structure architecture optimisation cycle measurement baseline "
+ "decision dependency volatility encapsulation closure capture share unsharing").split()
+
+TAGS_POOL = ["haskell", "design", "perf", "note", "todo", "idea", "ref", "math", "wip", "draft",
+ "review", "meta", "tool", "lit", "code", "infra", "ux", "ops", "test", "spec"]
+
+def folder_for(i):
+ # 32 top-level folders, optional nested
+ top = f"topic{i % 32:02d}"
+ if i % 7 == 0:
+ return os.path.join(top, f"sub{(i // 32) % 11}")
+ return top
+
+def slug(i):
+ return f"n{i:05d}"
+
+def title(i):
+ return " ".join(random.sample(WORDS, k=random.randint(2, 5))).title()
+
+def paragraph(words=120):
+ out = []
+ while sum(len(w) for w in out) + len(out) < words:
+ out.append(random.choice(WORDS))
+ s = " ".join(out)
+ return s[0].upper() + s[1:] + "."
+
+def wikilink(target_i, alias=None):
+ t = slug(target_i)
+ if alias:
+ return f"[[{t}|{alias}]]"
+ return f"[[{t}]]"
+
+def write_file(i, n, path):
+ has_fm = (i % 3 != 0)
+ lines = []
+ if has_fm:
+ tags = random.sample(TAGS_POOL, k=random.randint(0, 4))
+ lines.append("---")
+ lines.append(f"title: {title(i)}")
+ if tags:
+ lines.append("tags:")
+ for t in tags:
+ lines.append(f" - {t}")
+ if i % 13 == 0:
+ lines.append(f"order: {i % 50}")
+ lines.append("---")
+ lines.append("")
+ lines.append(f"# {title(i)}")
+ lines.append("")
+ # body — keep generating paragraphs until size ~ AVG_BYTES
+ target = max(2000, int(random.gauss(AVG_BYTES, AVG_BYTES / 4)))
+ while sum(len(x) for x in lines) < target:
+ kind = random.random()
+ if kind < 0.55:
+ p = paragraph(random.randint(40, 180))
+ # sprinkle wikilinks
+ tokens = p.split()
+ for _ in range(random.randint(1, 4)):
+ j = random.randrange(len(tokens))
+ target_i = random.randrange(n)
+ alias = tokens[j] if random.random() < 0.5 else None
+ tokens[j] = wikilink(target_i, alias)
+ lines.append(" ".join(tokens))
+ lines.append("")
+ elif kind < 0.7:
+ lines.append(f"## {title(i)}")
+ lines.append("")
+ elif kind < 0.82:
+ # list
+ for _ in range(random.randint(3, 8)):
+ lines.append(f"- {paragraph(random.randint(8, 25))}")
+ lines.append("")
+ elif kind < 0.92:
+ # code block
+ lines.append("```haskell")
+ lines.append(f"foo{i} :: Int -> Int")
+ lines.append(f"foo{i} x = x + {i}")
+ lines.append("```")
+ lines.append("")
+ else:
+ # embedded note (becomes processed)
+ lines.append(f"![[{slug(random.randrange(n))}]]")
+ lines.append("")
+ os.makedirs(os.path.dirname(path), exist_ok=True)
+ with open(path, "w") as f:
+ f.write("\n".join(lines))
+
+def main():
+ os.makedirs(OUT, exist_ok=True)
+ # an index.md at root
+ with open(os.path.join(OUT, "index.md"), "w") as f:
+ f.write("# Synthetic Corpus\n\nGenerated by gen_corpus.py for emanote #66 reproduction.\n")
+ for i in range(N):
+ rel = os.path.join(folder_for(i), slug(i) + ".md")
+ write_file(i, N, os.path.join(OUT, rel))
+ # report
+ total = 0
+ count = 0
+ for root, _, files in os.walk(OUT):
+ for f in files:
+ if f.endswith(".md"):
+ total += os.path.getsize(os.path.join(root, f))
+ count += 1
+ print(f"Wrote {count} files, {total/1024/1024:.1f} MB total", flush=True)
+
+if __name__ == "__main__":
+ main()
diff --git a/docs/dev/ralph/memory-66/measure.sh b/docs/dev/ralph/memory-66/measure.sh
new file mode 100755
index 000000000..b581fecbc
--- /dev/null
+++ b/docs/dev/ralph/memory-66/measure.sh
@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+set -eo pipefail
+EMANOTE=${EMANOTE:-/home/toor/code/emanote/dist-newstyle/build/x86_64-linux/ghc-9.8.4/emanote-2.0.0.0/x/emanote/build/emanote/emanote}
+export emanote_datadir=${emanote_datadir:-/home/toor/code/emanote/emanote/default}
+CORPUS=${1:?corpus path}
+RTS=${2:-}
+PORT=${PORT:-$(( RANDOM % 10000 + 9000 ))}
+TIMEOUT=${TIMEOUT:-600}
+LOG=$(mktemp)
+cd "$CORPUS"
+$EMANOTE -L "$CORPUS" run --port "$PORT" $([ -n "$RTS" ] && echo +RTS $RTS -RTS) > "$LOG" 2>&1 &
+PID=$!
+READY=0
+for i in $(seq 1 "$TIMEOUT"); do
+ if ! kill -0 $PID 2>/dev/null; then echo "emanote died" >&2; tail -40 "$LOG" >&2; exit 1; fi
+ if curl -s -o /dev/null --max-time 1 "http://localhost:$PORT/"; then READY=$i; break; fi
+ sleep 1
+done
+[ "$READY" = 0 ] && { echo "timeout" >&2; kill $PID; exit 1; }
+LOAD_RSS=$(awk '/VmRSS/{print $2}' /proc/$PID/status)
+echo "READY_AFTER_S=$READY"
+echo "LOAD_RSS_MB=$(awk -v r=$LOAD_RSS 'BEGIN{printf "%.0f", r/1024}')"
+kill -INT $PID 2>/dev/null || true
+sleep 2
+kill $PID 2>/dev/null || true
+wait $PID 2>/dev/null || true
+echo "---LOG TAIL---"
+tail -60 "$LOG"
diff --git a/emanote/emanote.cabal b/emanote/emanote.cabal
index b8cad4ca4..be2237a13 100644
--- a/emanote/emanote.cabal
+++ b/emanote/emanote.cabal
@@ -113,6 +113,7 @@ common library-common
, commonmark-wikilink >=0.2
, containers
, data-default
+ , deepseq
, deriving-aeson
, directory
, ema >=0.10.1
@@ -241,7 +242,11 @@ executable emanote
import: library-common
hs-source-dirs: exe
main-is: Main.hs
- ghc-options: -threaded -rtsopts -with-rtsopts=-N
+ -- -F1.5: shrink the old-generation retention factor from the GHC default
+ -- (2.0) to 1.5, trading a few extra major GCs for ~30% lower RSS on large
+ -- notebooks (see docs/dev/ralph/memory-66/README.md, cycle 2). Users can
+ -- still override at runtime, e.g. `emanote run +RTS -F2 -RTS`.
+ ghc-options: -threaded -rtsopts "-with-rtsopts=-N -F1.5"
if flag(ghcid)
hs-source-dirs: src
diff --git a/emanote/src/Emanote/Model/Graph.hs b/emanote/src/Emanote/Model/Graph.hs
index f5a774523..b1c929780 100644
--- a/emanote/src/Emanote/Model/Graph.hs
+++ b/emanote/src/Emanote/Model/Graph.hs
@@ -3,7 +3,6 @@ module Emanote.Model.Graph where
import Commonmark.Extensions.WikiLink qualified as WL
import Data.IxSet.Typed ((@+), (@=))
import Data.IxSet.Typed qualified as Ix
-import Data.Map.Strict qualified as Map
import Data.Set qualified as Set
import Data.Tree (Forest, Tree (Node))
import Emanote.Model.Calendar qualified as Calendar
@@ -12,7 +11,7 @@ import Emanote.Model.Link.Resolve qualified as Resolve
import Emanote.Model.Meta (lookupRouteMeta)
import Emanote.Model.Note qualified as MN
import Emanote.Model.Note qualified as N
-import Emanote.Model.Type (Model, modelIndexRoute, modelNotes, modelRels, parentLmlRoute)
+import Emanote.Model.Type (Model, modelIndexRoute, modelLookupNoteByRoute', modelNotes, modelRels, parentLmlRoute)
import Emanote.Route qualified as R
import Emanote.Route.SiteRoute qualified as SR
import Optics.Operators as Lens ((^.))
@@ -176,20 +175,24 @@ lookupNoteByWikiLink model currentRoute wl = do
modelLookupBacklinks :: R.LMLRoute -> Model -> [(R.LMLRoute, NonEmpty [B.Block])]
modelLookupBacklinks r model =
sortOn (Calendar.backlinkSortKey model . fst)
- $ groupNE
+ $ mapMaybe withCtx
+ $ groupBySource
$ backlinkRels r model
- <&> \rel ->
- (rel ^. Rel.relFrom, rel ^. Rel.relCtx)
where
- groupNE :: forall a b. (Ord a) => [(a, b)] -> [(a, NonEmpty b)]
- groupNE =
- Map.toList . foldl' f Map.empty
- where
- f :: Map a (NonEmpty b) -> (a, b) -> Map a (NonEmpty b)
- f m (x, y) =
- case Map.lookup x m of
- Nothing -> Map.insert x (one y) m
- Just ys -> Map.insert x (ys <> one y) m
+ -- Group backlink-rels by their source route. Context blocks are no
+ -- longer carried on each Rel (#66) — instead they are recovered once
+ -- per source note by re-walking the source's Pandoc, which is cheap
+ -- (one note's AST) compared to retaining contexts in _modelRels for
+ -- every link in the entire notebook.
+ groupBySource :: [Rel.Rel] -> [R.LMLRoute]
+ groupBySource = ordNub . fmap (^. Rel.relFrom)
+ targetMR :: R.ModelRoute
+ targetMR = R.ModelRoute_LML R.LMLView_Html r
+ withCtx :: R.LMLRoute -> Maybe (R.LMLRoute, NonEmpty [B.Block])
+ withCtx from = do
+ sourceNote <- modelLookupNoteByRoute' from model
+ ctxs <- nonEmpty $ Rel.noteRelCtxToTarget targetMR sourceNote
+ pure (from, ctxs)
-- | Rels pointing *to* this route
backlinkRels :: R.LMLRoute -> Model -> [Rel.Rel]
diff --git a/emanote/src/Emanote/Model/Link/Rel.hs b/emanote/src/Emanote/Model/Link/Rel.hs
index c133ebc50..b997d921c 100644
--- a/emanote/src/Emanote/Model/Link/Rel.hs
+++ b/emanote/src/Emanote/Model/Link/Rel.hs
@@ -98,7 +98,33 @@ noteRels note =
pure (target, ctx)
in Ix.fromList $ zipWith mkRel [0 ..] links
where
- mkRel srcPos (target, ctx) = Rel (note ^. noteRoute) target srcPos ctx
+ -- Drop the per-Rel `[B.Block]` context at insert time and recover
+ -- it on demand at backlink-render time by re-walking the source
+ -- note's Pandoc (see 'noteRelCtxToTarget' / 'modelLookupBacklinks'
+ -- in @Emanote.Model.Graph@). The context is a chunk of Pandoc
+ -- Blocks per outgoing link; with thousands of notes and dozens of
+ -- outgoing links each, persisting it in @_modelRels@ dominates the
+ -- live-data overhead (#66). The on-demand walk is bounded by the
+ -- source note's own AST size — fast for any single backlinks page.
+ mkRel srcPos (target, _ctx) = Rel (note ^. noteRoute) target srcPos []
+
+{- | Re-extract the Pandoc-block contexts of every outgoing link in
+@sourceNote@ that points to @targetMR@. Used by the backlinks renderer
+to recover the context that 'noteRels' deliberately drops at insert
+time (#66). Cost is one walk of the source note's Pandoc per backlink
+expansion — paid only when the @targetMR@'s backlinks page is rendered.
+-}
+noteRelCtxToTarget :: ModelRoute -> Note -> [[B.Block]]
+noteRelCtxToTarget targetMR sourceNote =
+ let contextsByUrl = LC.queryLinksWithContext (sourceNote ^. noteDoc)
+ parentR = noteResolveLinkBase sourceNote
+ targets = unresolvedRelsTo targetMR
+ in do
+ (url, instances) <- Map.toList contextsByUrl
+ (attrs, ctx) <- reverse (toList instances)
+ target <- maybeToList $ fst <$> parseUnresolvedRelTarget parentR attrs url
+ guard $ target `elem` targets
+ pure ctx
{- | All `UnresolvedRelTarget`s that could resolve to the given
`ModelRoute`. Each `URTResource` form is built by re-parsing a URL
diff --git a/emanote/src/Emanote/Source/Patch.hs b/emanote/src/Emanote/Source/Patch.hs
index d1b834dd2..fc5a12aaa 100644
--- a/emanote/src/Emanote/Source/Patch.hs
+++ b/emanote/src/Emanote/Source/Patch.hs
@@ -5,7 +5,9 @@ module Emanote.Source.Patch (
ignorePatterns,
) where
+import Control.DeepSeq (deepseq)
import Control.Monad.Logger (LoggingT (runLoggingT), MonadLogger, MonadLoggerIO (askLoggerIO))
+import Data.Aeson qualified as Aeson
import Data.ByteString qualified as BS
import Data.List qualified as List
import Data.List.NonEmpty qualified as NEL
@@ -255,6 +257,9 @@ parseAndInsert noteF model refreshAction r src = do
s <- readRefreshedFile refreshAction (locResolve src)
note <-
N.parseNote (model ^. M.modelScriptingEngine) (M.modelPluginBaseDir model) r src (decodeUtf8 s)
+ -- Force the parsed Pandoc and Aeson Value so per-file parser closures
+ -- can be released as we stream files into the model (#66).
+ note ^. N.noteDoc `deepseq` (note ^. N.noteMeta :: Aeson.Value) `deepseq` pure ()
pure
$ M.modelInsertNote (noteF note)
>>> (modelSourceDependencies %~ SDeps.setLuaDeps r src (note ^. N.notePandocFilterDeclarations))
diff --git a/emanote/test/Emanote/Model/Link/RelSpec.hs b/emanote/test/Emanote/Model/Link/RelSpec.hs
index 8ea3da54a..a932b0ef7 100644
--- a/emanote/test/Emanote/Model/Link/RelSpec.hs
+++ b/emanote/test/Emanote/Model/Link/RelSpec.hs
@@ -1,6 +1,5 @@
module Emanote.Model.Link.RelSpec where
-import Commonmark.Extensions.WikiLink qualified as WL
import Data.IxSet.Typed qualified as Ix
import Emanote.Model.Link.Rel
import Emanote.Model.Note qualified as MN
@@ -100,9 +99,12 @@ spec = do
got === want
describe "noteRels source order (issue #186)" $ do
it "orders rels by source position, not by lexicographic Ord on context" $ do
- -- 'Z' sorts last lexicographically but comes first in source; 'A'
- -- sorts first but comes second. Without the srcPos tie-breaker,
- -- Ord [Block] would yield A-then-Z; we want source order.
+ -- Both 'z' and 'a' link to the same target via the same URL, so
+ -- the two rels share (_relFrom, _relTo) and can only be ordered
+ -- by _relSrcPos. Source order is "Z first" then "A second", so
+ -- IxSet.toList should produce srcPos [0, 1] in that order.
+ -- (#66 dropped _relCtx — see Rel.noteRelCtxToTarget for the
+ -- on-demand backlinks-context recovery path.)
let mkLink lbl = B.Link B.nullAttr [B.Str lbl] ("Foo.md", "")
note =
MN.mkEmptyNoteWith
@@ -110,11 +112,7 @@ spec = do
[ B.Para [B.Str "Z first: ", mkLink "z"]
, B.Para [B.Str "A second: ", mkLink "a"]
]
- paraText rel = case _relCtx rel of
- [B.Para is] -> WL.plainify is
- other -> error $ "expected single-paragraph context, got " <> show other
- (paraText <$> Ix.toList (noteRels note))
- `shouldBe` ["Z first: z", "A second: a"]
+ (_relSrcPos <$> Ix.toList (noteRels note)) `shouldBe` [0, 1]
it "does not collapse two identical-context links to the same target" $ do
-- One paragraph mentions Foo.md twice. The two rels share
-- (relFrom, relTo, relCtx); without srcPos in Ord, IxSet.fromList's