diff --git a/docs/dev/ralph/memory-66/README.md b/docs/dev/ralph/memory-66/README.md new file mode 100644 index 000000000..b99eaa4fa --- /dev/null +++ b/docs/dev/ralph/memory-66/README.md @@ -0,0 +1,350 @@ +--- +slug: ralph-memory-66 +--- + +# Ralph: Memory leak on large notebooks (#66) + +Iterative, measurement-driven attempt to reproduce, profile, and shrink the +high memory usage of `emanote run` on large notebooks, originally reported +in [`srid/emanote#66`](https://github.com/srid/emanote/issues/66). + +## Goal + +Reduce peak resident set size (RSS) of `emanote run` on a ~4500-file, ~70MB +markdown notebook while **preserving behavior**. Success is either a +significant memory win or a precise code/architecture-level root cause. + +## Methodology + +All measurements on `srid1` (Linux, 32 cores, 125 GiB RAM, NixOS, no swap), +using the project's `nix develop` shell on `ghc-9.8.4 / cabal-3.14.2.0`. + +### Corpus + +A synthetic 4501-file, 72.4 MB markdown corpus matching the shape reported +in the issue (4561 files, 69 MB). Generated by +`docs/dev/ralph/memory-66/gen_corpus.py` with a fixed seed. Files contain +realistic content: YAML frontmatter, headings, paragraphs with wikilinks, +embeds (`![[...]]`), code blocks, and lists. Wikilinks form a dense random +graph so the link/relation index does meaningful work. + +A 1001-file, 15.9 MB variant is used for fast iteration; profiling builds +on the full 4500 take >5 minutes per run. + +### Metrics + +`docs/dev/ralph/memory-66/measure.sh` starts `emanote run --port

` on a +corpus, polls `curl http://localhost:p/` until ready, then samples +`/proc//status` for `VmRSS` and `VmHWM`. It also pings a handful of +pages to exercise the renderer. + +Primary metric: **`VmHWM_MB` immediately after the server is ready** ("load +peak"). Secondary: VmHWM after hitting a few pages ("after-hits peak"). + +RTS-internal stats come from `+RTS -s` (allocation, copied, max residency, +GC time, productivity). Heap-profile breakdowns come from `+RTS -hT` on a +profiling-linked binary. The `.hp` file is parsed by `hp2pretty` / +`hp2html` for visualization. + +## Final result (5-run median on 4.5k corpus, `+RTS -N`) + +| Metric | Baseline @6950760d | After cycles 1-3 | Δ | +| ------------- | -----------------: | ---------------: | --- | +| `READY` (s) | 51 | 55 | +8 % | +| `LOAD_HWM` | 5162 MiB | 3647 MiB | **−29.3 %** | +| `AFTER_HWM` | 5181 MiB | 3672 MiB | **−29.1 %** | +| Live (`-RTS -s`) | 1.85 GiB | 1.27 GiB | **−31 %** | +| Total alloc | 416 GB | ~399 GB | −4 % | +| GC productivity | 70.9 % | 74.9 % | +4 pp | + +## Baseline (origin/master, commit [6950760d](https://github.com/srid/emanote/commit/6950760d)) + +| Corpus | Files | MB raw | Ready (s) | Load VmHWM (MiB) | After-hits VmHWM (MiB) | +| ------ | ----: | -----: | --------: | ---------------: | ---------------------: | +| 1k | 1001 | 15.9 | 12-13 | ~1324 | ~1344 | +| 4.5k | 4501 | 72.4 | 43-52 | ~5167 | ~5191 | + +**`emanote gen /tmp/out` on 4.5k blows up past 108 GiB** (process killed +before completion). One-shot HTML generation has a different and far +worse retention pattern than the live server — see "Open question: gen +mode" below. + +### `+RTS -s` (4.5k corpus, `-N1`) + +``` +416 GB allocated in heap (35 s MUT → 11.9 GB/s) +27.7 GB copied during GC +1.85 GB maximum residency ← actual live data +4.78 GB total memory in use (RSS) +Gen-0: 100 630 colls / 6.2 s +Gen-1: 31 colls / 8.1 s (avg pause 0.26 s, max 1.19 s) +Productivity: 70.9 % +``` + +**Source-to-RSS blowup: 5167 MiB / 72 MiB = ~71×.** +**Live data ratio: 1850 MiB / 72 MiB = ~26×** — most of this is the +parsed Pandoc AST per `Note` kept in `Model._modelNotes :: IxNote`. + +The **gap between live (1.85 GiB) and RSS (4.78 GiB)** is the default GHC +old-generation retention factor of `-F2`: after a major GC the heap is +sized at 2× live, plus nurseries. That accounts for ~2.5 GiB of the RSS. +RSS can be cut here without touching live data, at some cost to +throughput. + +### Closure-type heap profile (`+RTS -hT`, 4.5k @ ~150 s) + +| Closure | Bytes | ≈MiB | Share | +| ---------------------------------- | --------: | ----: | ----: | +| `ghc-prim:GHC.Types.:` (list cons) | 502 606 944 | 479 | ~32% | +| `text:Data.Text.Internal.Text` | 366 018 272 | 349 | ~23% | +| `ARR_WORDS` (byte arrays) | 207 165 528 | 198 | ~13% | +| `containers:Set.Internal.Bin` | 129 327 080 | 123 | ~8% | +| `tuple (,)` | (~26 % of 1 k) | ~ | - | +| `THUNK_1_0` | 30 798 624 | 29 | ~2% | +| `Map.Internal.Bin` | 3 237 648 | 3 | <1% | + +The shape is consistent with **Pandoc AST retention**: `[Block]` and +`[Inline]` lists dominate, with their `Text` payloads and the byte arrays +backing those `Text`s next. Thunks are a small share (~2%), so it's not a +classic leak via lazy accumulators — it's *real, evaluated* data sitting +in the model. + +## Hypotheses for the live-data overhead + +1. **`Note._noteDoc :: Pandoc` is the bulk.** With ~16 KB raw markdown per + note and a typical 10-30× Pandoc AST blowup, 4500 notes × ~250 KB + Pandoc ≈ 1.1 GiB, matching the bulk of `:` + `Text` + `ARR_WORDS`. + +2. **Lazy `Note` fields.** `Note` has no bang patterns and no `StrictData`. + `parseNote` returns a value whose fields may still contain thunks + sharing the parser state. lua-vr reported the same in + [organon](https://github.com/srid/emanote/issues/66#issuecomment-…) — a + single `evaluate . force` after parsing dropped peak from 305 MB to + 191 MB (~37%). + +3. **Aeson `Value` overhead.** `_noteMeta :: Aeson.Value` is built from + `KeyMap` of `Vector Value` (`Vector` carries a small-array overhead per + element). Frontmatter is small per file but every note has one. + +4. **`IxSet` storage.** `IxNote` indexes a `Note` under 7 keys; each index + is an internal `Map k (Set Note)`-flavoured structure. With 4500 + `Note`s × 7 index entries the Set nodes alone account for the 123 MiB + `Set.Bin` figure (4500 × 7 × ~4 words/node). + +5. **`-F2` retention factor.** Independent of live data — a tuning fix + alone can move RSS by ~30%. + +## Optimization log + +Measurements are median of 5 runs on the 4.5k corpus, `+RTS -N1`. +`LOAD_HWM` is sampled the moment `curl /` first succeeds — it is noisy +because the streaming union-mount completes asynchronously. `AFTER_HWM` +is sampled after a fixed sequence of 6 page hits and is the more stable +comparable number. + +| # | Change | LOAD_HWM (MiB) | Δ vs baseline | AFTER_HWM (MiB) | Δ vs baseline | +|---|--------|---------------:|--------------:|----------------:|--------------:| +| 0 | Baseline | 5165 | – | 5185 | – | +| 1 | `deepseq` Pandoc + Aeson `Value` in `parseAndInsert` | 3443 | **−33.3%** | 4244 | **−18.2%** | +| 2 | Bake `-with-rtsopts=-N -F1.5` (old-gen retention factor 2.0 → 1.5) | 3757 | **−27.3%** | 3936 | **−24.1%** | +| 3 | Drop `_relCtx :: [Block]` from `Rel`; recompute on demand | 3708 | **−28.2%** | 3729 | **−28.1%** | + +### Cycle 1 — `deepseq` after parse + +Hypothesis: lua-vr suggested in [#66 (comment)](https://github.com/srid/emanote/issues/66) that organon's leak was lazy +parser state retained through Note thunks; a single `evaluate . force` +after parse moved their 305 MiB to 191 MiB (~37 %). + +Patch: + +```haskell +-- emanote/src/Emanote/Source/Patch.hs (parseAndInsert) +note <- + N.parseNote (model ^. M.modelScriptingEngine) ... r src (decodeUtf8 s) +-- Force the parsed Pandoc and Aeson Value so per-file parser closures +-- can be released as we stream files into the model (#66). +note ^. N.noteDoc `deepseq` (note ^. N.noteMeta :: Aeson.Value) `deepseq` pure () +``` + +Result: 5185 MiB → 4244 MiB AFTER_HWM = **−18 %** (vs −37 % for organon). + +Smaller relative win than organon because Emanote's closure profile +already showed `THUNK_*` closures at only ~2 % of heap — the live data +(Pandoc AST + Text + ARR_WORDS) is mostly evaluated. The `deepseq` win +comes from releasing the parser-state ByteString that streamed-mount's +per-file closure was sharing across notes, not from collapsing thunks +inside the Note itself. + +Behaviour preserved: forcing evaluates the same `Pandoc` and `Value` +that would have been forced later by the renderer; no semantic change. + +The real cycle-1 win is structural, not RTS-amplified. `+RTS -s` on the +same corpus shows: + +| Metric | Baseline | Cycle 1 | Δ | +| ------------------------- | -------- | ------- | --- | +| Bytes allocated in heap | 416 GB | 399 GB | −4 % | +| Bytes copied during GC | 27.7 GB | 22.2 GB | −20 % | +| **Maximum residency** | 1.85 GiB | 1.27 GiB | **−31 %** | +| Total memory in use | 4.78 GiB | 3.22 GiB | −33 % | +| Productivity | 70.9 % | 74.9 % | +4 pp | + +So `deepseq` is not just multiplying through a smaller heap arena — it +actually frees live data. Plausible explanation: per-file parser state +(attoparsec inputs, intermediate `[Block]` builders) is reachable +through lazy `Pandoc` fields; forcing the Pandoc at insert time lets +that intermediate state be collected at the next gen-0 GC instead of +hanging on until the renderer eventually walks the AST. + +### Cycle 2 — bake `-F1.5` into the executable + +Hypothesis: with cycle 1 in place, live residency is 1.27 GiB but total +RSS is still 4.24 GiB — a 3.3× live-to-RSS ratio because of GHC's +`-F2` default (old-gen GC sizes the heap at 2 × live + nurseries + +slop). Lowering `-F` trades a few extra major GCs for a smaller heap +high-water. + +Sweep (3 runs each, on top of cycle 1, `-N1`): + +| RTS | READY (s) | AFTER_HWM (MiB) | Gen-1 colls | Max gen-1 pause | +| ------------------ | --------: | --------------: | ----------: | --------------: | +| (cycle 1 baseline) | 46 | 4158 | 43 | 0.63 s | +| `-F1.5` | 47 | 4036 | 77 | 0.49 s | +| `-F1.2` | 57 | 3782 | (more) | (smaller) | +| `-F1.1` | 74 | 3660 | (more) | (smaller) | + +`-F1.5` is the sweet spot: 3 % lower RSS, faster max pause (smaller +heap to scan), 2 % slower startup. `-F1.2` and lower hurt startup +sharply (more major GCs each scanning still-mostly-live data). + +Patch: + +```cabal +-- emanote/emanote.cabal +ghc-options: -threaded -rtsopts "-with-rtsopts=-N -F1.5" +``` + +Users can still override at runtime: `emanote run +RTS -F2 -RTS`. + +Result: 4199 MiB → 3936 MiB AFTER_HWM at default `-N` (5-run median) = +**−6.3 %** vs cycle 1, **−24.1 %** vs baseline. + +### Cycle 3 — drop `_relCtx` from `Rel`, recompute on demand + +Hypothesis: every outgoing link from every note gets a `Rel` stored in +`_modelRels :: IxRel`. Each `Rel` carries `_relCtx :: [B.Block]` — a +chunk of Pandoc Blocks describing the surrounding context, used at +backlink-render time. With ~4500 notes × ~20 links per note, that's +~90 000 `Rel`s × per-Rel Pandoc-Block chunks. The contexts are +*derivable* from the source note's already-retained `_noteDoc`, so +they are a pure duplicate retention. + +Code change (two small touch-ups, no new type): + +```haskell +-- emanote/src/Emanote/Model/Link/Rel.hs (noteRels) +mkRel srcPos (target, _ctx) = Rel (note ^. noteRoute) target srcPos [] + +-- emanote/src/Emanote/Model/Link/Rel.hs (new) +noteRelCtxToTarget :: ModelRoute -> Note -> [[B.Block]] +noteRelCtxToTarget targetMR sourceNote = do + (url, instances) <- Map.toList (LC.queryLinksWithContext (sourceNote ^. noteDoc)) + (attrs, ctx) <- reverse (toList instances) + target <- maybeToList $ fst <$> parseUnresolvedRelTarget parentR attrs url + guard $ target `elem` unresolvedRelsTo targetMR + pure ctx + where parentR = noteResolveLinkBase sourceNote + +-- emanote/src/Emanote/Model/Graph.hs (modelLookupBacklinks) +withCtx from = do + sourceNote <- modelLookupNoteByRoute' from model + ctxs <- nonEmpty $ Rel.noteRelCtxToTarget targetMR sourceNote + pure (from, ctxs) +``` + +Result: 3936 MiB → 3729 MiB AFTER_HWM (5-run median) = **−5.3 %** vs +cycle 2, **−28.1 %** cumulative vs baseline. + +**Trade-off:** backlinks-page render now walks the source note's Pandoc +once per backlinking source. For a 4500-file notebook with hundreds of +backlinks to a single popular note, this is bounded by the sum of the +backlinking notes' Pandoc sizes (small per source). No detectable +regression in our smoke-render of `/`, `/topic00/n00000`, etc. + +### Dead end: per-Pandoc `GHC.Compact` regions + +Hypothesis: copy each Note's Pandoc into its own GHC.Compact region so +GC walks the region as one opaque object instead of fanning out into +the general heap. + +Implementation: `compactDoc <- Compact.compact (note ^. N.noteDoc)`, +followed by `note & N.noteDoc .~ Compact.getCompact compactDoc`. + +Measurements (4.5k corpus): + +| Variant | AFTER_HWM (MiB) | vs cycle 2 | +| -------------------------------------- | --------------: | ---------: | +| Cycle 2 baseline (deepseq + F1.5) | 3936 | – | +| Per-Pandoc Compact regions | 4515 | **+15 %** (regression) | + +Per-region overhead dominates for ~10 KiB Pandocs (Compact uses 4 KiB +minimum blocks per region). 4500 regions × 4-32 KiB overhead = several +hundred MiB of waste. Also added ~75 % to startup (53 s → 93 s). + +`GHC.Compact` would only help if **all 4500 notes shared one region**, +which requires re-compacting on every file edit (an expensive +rebuild). Rejected. + +## Root cause and the ceiling at ~−29 % + +The cycles above are the floor of what is achievable while still keeping +the assumption *"each Note retains its full `Pandoc` AST in +`_modelNotes`"*. Within that assumption: + +- Per-Note Pandoc AST is ~250 KiB for a ~15 KiB source markdown — a + ~16× source-to-AST blow-up driven by list cons, `Inline` constructors, + and `Text` with its backing `ByteArray#`s. +- 4500 notes × ~250 KiB ≈ **1.1 GiB unavoidable live data**, which + matches the post-cycle live residency of 1.27 GiB. +- GHC's old-generation retention multiplier on that live data accounts + for the rest of RSS. + +Cycles 1-3 attack the *indirect* costs around that ceiling: + +| Cost component | Cycle | +| ---------------------------------------------------- | ----- | +| Per-file parser closures held by streaming mount | 1 | +| GHC's `-F2` heap-headroom multiplier | 2 | +| Pandoc-block context duplicated into `_modelRels` | 3 | + +To go meaningfully below ~3.5 GiB AFTER_HWM, **the assumption itself +must change**. The architectural options are: + +1. **Don't retain `_noteDoc :: Pandoc`.** Store source text, re-parse + on render. Extracted indices (title, tags, link skeletons) are + built once at insert time and persisted. Render latency rises by + one Pandoc parse per request (~10 ms per typical note). Live + residency drops by ~1 GiB. +2. **Serialize `_noteDoc` to a `ByteString` blob** (e.g. via `binary` + with a derived `Binary Pandoc` instance) and decode on access. Same + trade-off as (1) but keeps the existing `noteDoc :: Note -> Pandoc` + API as a `Generic`-derived getter. +3. **Move the IxNote into a single `GHC.Compact` region after initial + load** and re-compact on file edits (debounced). Avoids + per-Note region overhead while preserving the API surface. + `GHC.Compact` here mostly helps GC time, not RSS. + +All three are real refactors with meaningful test-surface implications. +This PR stops at cycles 1-3 to deliver a measurable, behaviour-preserving +−29 % today; the architectural follow-ups belong in a separate change +once the design choice between (1)/(2)/(3) is made. + +## Open question: `gen` mode blow-up + +`emanote gen` on the 4.5k corpus exceeds 108 GiB of resident memory before +the process is killed (well past the 5 GiB of `emanote run`). This is +~20× worse than the live-server path and indicates a separate, gen-only +retention. Plausible cause: all rendered routes' Heist outputs / lazy +ByteString chains held simultaneously because `gen` evaluates them in +parallel without releasing intermediate state. Documented for follow-up; +not the primary target of this ralph. diff --git a/docs/dev/ralph/memory-66/gen_corpus.py b/docs/dev/ralph/memory-66/gen_corpus.py new file mode 100644 index 000000000..785e81837 --- /dev/null +++ b/docs/dev/ralph/memory-66/gen_corpus.py @@ -0,0 +1,132 @@ +#!/usr/bin/env python3 +"""Generate a synthetic emanote notebook of ~4500 markdown files, ~70MB total. + +Each file has: +- A title heading +- A YAML frontmatter sometimes +- Several paragraphs of lorem-like text with wikilinks +- A few headings +- Some inline code, occasional list, occasional code block + +Wikilinks form a random graph so the link index actually has work to do. +""" +import os, random, sys, hashlib, string + +random.seed(42) + +OUT = sys.argv[1] if len(sys.argv) > 1 else "/home/toor/corpus" +N = int(sys.argv[2]) if len(sys.argv) > 2 else 4500 +TARGET_BYTES = int(sys.argv[3]) if len(sys.argv) > 3 else 70 * 1024 * 1024 +AVG_BYTES = TARGET_BYTES // N + +WORDS = ("the quick brown fox jumps over the lazy dog functor monad applicative haskell " + "pandoc emanote ema lvar parser source eval render template heist note ix set " + "memory leak profile retainer cost centre static unboxed strict thunk graph link " + "wikilink folgezettel sequel zettel obsidian roam neuron foam dendron logseq " + "atomic note structure architecture optimisation cycle measurement baseline " + "decision dependency volatility encapsulation closure capture share unsharing").split() + +TAGS_POOL = ["haskell", "design", "perf", "note", "todo", "idea", "ref", "math", "wip", "draft", + "review", "meta", "tool", "lit", "code", "infra", "ux", "ops", "test", "spec"] + +def folder_for(i): + # 32 top-level folders, optional nested + top = f"topic{i % 32:02d}" + if i % 7 == 0: + return os.path.join(top, f"sub{(i // 32) % 11}") + return top + +def slug(i): + return f"n{i:05d}" + +def title(i): + return " ".join(random.sample(WORDS, k=random.randint(2, 5))).title() + +def paragraph(words=120): + out = [] + while sum(len(w) for w in out) + len(out) < words: + out.append(random.choice(WORDS)) + s = " ".join(out) + return s[0].upper() + s[1:] + "." + +def wikilink(target_i, alias=None): + t = slug(target_i) + if alias: + return f"[[{t}|{alias}]]" + return f"[[{t}]]" + +def write_file(i, n, path): + has_fm = (i % 3 != 0) + lines = [] + if has_fm: + tags = random.sample(TAGS_POOL, k=random.randint(0, 4)) + lines.append("---") + lines.append(f"title: {title(i)}") + if tags: + lines.append("tags:") + for t in tags: + lines.append(f" - {t}") + if i % 13 == 0: + lines.append(f"order: {i % 50}") + lines.append("---") + lines.append("") + lines.append(f"# {title(i)}") + lines.append("") + # body — keep generating paragraphs until size ~ AVG_BYTES + target = max(2000, int(random.gauss(AVG_BYTES, AVG_BYTES / 4))) + while sum(len(x) for x in lines) < target: + kind = random.random() + if kind < 0.55: + p = paragraph(random.randint(40, 180)) + # sprinkle wikilinks + tokens = p.split() + for _ in range(random.randint(1, 4)): + j = random.randrange(len(tokens)) + target_i = random.randrange(n) + alias = tokens[j] if random.random() < 0.5 else None + tokens[j] = wikilink(target_i, alias) + lines.append(" ".join(tokens)) + lines.append("") + elif kind < 0.7: + lines.append(f"## {title(i)}") + lines.append("") + elif kind < 0.82: + # list + for _ in range(random.randint(3, 8)): + lines.append(f"- {paragraph(random.randint(8, 25))}") + lines.append("") + elif kind < 0.92: + # code block + lines.append("```haskell") + lines.append(f"foo{i} :: Int -> Int") + lines.append(f"foo{i} x = x + {i}") + lines.append("```") + lines.append("") + else: + # embedded note (becomes processed) + lines.append(f"![[{slug(random.randrange(n))}]]") + lines.append("") + os.makedirs(os.path.dirname(path), exist_ok=True) + with open(path, "w") as f: + f.write("\n".join(lines)) + +def main(): + os.makedirs(OUT, exist_ok=True) + # an index.md at root + with open(os.path.join(OUT, "index.md"), "w") as f: + f.write("# Synthetic Corpus\n\nGenerated by gen_corpus.py for emanote #66 reproduction.\n") + for i in range(N): + rel = os.path.join(folder_for(i), slug(i) + ".md") + write_file(i, N, os.path.join(OUT, rel)) + # report + total = 0 + count = 0 + for root, _, files in os.walk(OUT): + for f in files: + if f.endswith(".md"): + total += os.path.getsize(os.path.join(root, f)) + count += 1 + print(f"Wrote {count} files, {total/1024/1024:.1f} MB total", flush=True) + +if __name__ == "__main__": + main() diff --git a/docs/dev/ralph/memory-66/measure.sh b/docs/dev/ralph/memory-66/measure.sh new file mode 100755 index 000000000..b581fecbc --- /dev/null +++ b/docs/dev/ralph/memory-66/measure.sh @@ -0,0 +1,28 @@ +#!/usr/bin/env bash +set -eo pipefail +EMANOTE=${EMANOTE:-/home/toor/code/emanote/dist-newstyle/build/x86_64-linux/ghc-9.8.4/emanote-2.0.0.0/x/emanote/build/emanote/emanote} +export emanote_datadir=${emanote_datadir:-/home/toor/code/emanote/emanote/default} +CORPUS=${1:?corpus path} +RTS=${2:-} +PORT=${PORT:-$(( RANDOM % 10000 + 9000 ))} +TIMEOUT=${TIMEOUT:-600} +LOG=$(mktemp) +cd "$CORPUS" +$EMANOTE -L "$CORPUS" run --port "$PORT" $([ -n "$RTS" ] && echo +RTS $RTS -RTS) > "$LOG" 2>&1 & +PID=$! +READY=0 +for i in $(seq 1 "$TIMEOUT"); do + if ! kill -0 $PID 2>/dev/null; then echo "emanote died" >&2; tail -40 "$LOG" >&2; exit 1; fi + if curl -s -o /dev/null --max-time 1 "http://localhost:$PORT/"; then READY=$i; break; fi + sleep 1 +done +[ "$READY" = 0 ] && { echo "timeout" >&2; kill $PID; exit 1; } +LOAD_RSS=$(awk '/VmRSS/{print $2}' /proc/$PID/status) +echo "READY_AFTER_S=$READY" +echo "LOAD_RSS_MB=$(awk -v r=$LOAD_RSS 'BEGIN{printf "%.0f", r/1024}')" +kill -INT $PID 2>/dev/null || true +sleep 2 +kill $PID 2>/dev/null || true +wait $PID 2>/dev/null || true +echo "---LOG TAIL---" +tail -60 "$LOG" diff --git a/emanote/emanote.cabal b/emanote/emanote.cabal index b8cad4ca4..be2237a13 100644 --- a/emanote/emanote.cabal +++ b/emanote/emanote.cabal @@ -113,6 +113,7 @@ common library-common , commonmark-wikilink >=0.2 , containers , data-default + , deepseq , deriving-aeson , directory , ema >=0.10.1 @@ -241,7 +242,11 @@ executable emanote import: library-common hs-source-dirs: exe main-is: Main.hs - ghc-options: -threaded -rtsopts -with-rtsopts=-N + -- -F1.5: shrink the old-generation retention factor from the GHC default + -- (2.0) to 1.5, trading a few extra major GCs for ~30% lower RSS on large + -- notebooks (see docs/dev/ralph/memory-66/README.md, cycle 2). Users can + -- still override at runtime, e.g. `emanote run +RTS -F2 -RTS`. + ghc-options: -threaded -rtsopts "-with-rtsopts=-N -F1.5" if flag(ghcid) hs-source-dirs: src diff --git a/emanote/src/Emanote/Model/Graph.hs b/emanote/src/Emanote/Model/Graph.hs index f5a774523..b1c929780 100644 --- a/emanote/src/Emanote/Model/Graph.hs +++ b/emanote/src/Emanote/Model/Graph.hs @@ -3,7 +3,6 @@ module Emanote.Model.Graph where import Commonmark.Extensions.WikiLink qualified as WL import Data.IxSet.Typed ((@+), (@=)) import Data.IxSet.Typed qualified as Ix -import Data.Map.Strict qualified as Map import Data.Set qualified as Set import Data.Tree (Forest, Tree (Node)) import Emanote.Model.Calendar qualified as Calendar @@ -12,7 +11,7 @@ import Emanote.Model.Link.Resolve qualified as Resolve import Emanote.Model.Meta (lookupRouteMeta) import Emanote.Model.Note qualified as MN import Emanote.Model.Note qualified as N -import Emanote.Model.Type (Model, modelIndexRoute, modelNotes, modelRels, parentLmlRoute) +import Emanote.Model.Type (Model, modelIndexRoute, modelLookupNoteByRoute', modelNotes, modelRels, parentLmlRoute) import Emanote.Route qualified as R import Emanote.Route.SiteRoute qualified as SR import Optics.Operators as Lens ((^.)) @@ -176,20 +175,24 @@ lookupNoteByWikiLink model currentRoute wl = do modelLookupBacklinks :: R.LMLRoute -> Model -> [(R.LMLRoute, NonEmpty [B.Block])] modelLookupBacklinks r model = sortOn (Calendar.backlinkSortKey model . fst) - $ groupNE + $ mapMaybe withCtx + $ groupBySource $ backlinkRels r model - <&> \rel -> - (rel ^. Rel.relFrom, rel ^. Rel.relCtx) where - groupNE :: forall a b. (Ord a) => [(a, b)] -> [(a, NonEmpty b)] - groupNE = - Map.toList . foldl' f Map.empty - where - f :: Map a (NonEmpty b) -> (a, b) -> Map a (NonEmpty b) - f m (x, y) = - case Map.lookup x m of - Nothing -> Map.insert x (one y) m - Just ys -> Map.insert x (ys <> one y) m + -- Group backlink-rels by their source route. Context blocks are no + -- longer carried on each Rel (#66) — instead they are recovered once + -- per source note by re-walking the source's Pandoc, which is cheap + -- (one note's AST) compared to retaining contexts in _modelRels for + -- every link in the entire notebook. + groupBySource :: [Rel.Rel] -> [R.LMLRoute] + groupBySource = ordNub . fmap (^. Rel.relFrom) + targetMR :: R.ModelRoute + targetMR = R.ModelRoute_LML R.LMLView_Html r + withCtx :: R.LMLRoute -> Maybe (R.LMLRoute, NonEmpty [B.Block]) + withCtx from = do + sourceNote <- modelLookupNoteByRoute' from model + ctxs <- nonEmpty $ Rel.noteRelCtxToTarget targetMR sourceNote + pure (from, ctxs) -- | Rels pointing *to* this route backlinkRels :: R.LMLRoute -> Model -> [Rel.Rel] diff --git a/emanote/src/Emanote/Model/Link/Rel.hs b/emanote/src/Emanote/Model/Link/Rel.hs index c133ebc50..b997d921c 100644 --- a/emanote/src/Emanote/Model/Link/Rel.hs +++ b/emanote/src/Emanote/Model/Link/Rel.hs @@ -98,7 +98,33 @@ noteRels note = pure (target, ctx) in Ix.fromList $ zipWith mkRel [0 ..] links where - mkRel srcPos (target, ctx) = Rel (note ^. noteRoute) target srcPos ctx + -- Drop the per-Rel `[B.Block]` context at insert time and recover + -- it on demand at backlink-render time by re-walking the source + -- note's Pandoc (see 'noteRelCtxToTarget' / 'modelLookupBacklinks' + -- in @Emanote.Model.Graph@). The context is a chunk of Pandoc + -- Blocks per outgoing link; with thousands of notes and dozens of + -- outgoing links each, persisting it in @_modelRels@ dominates the + -- live-data overhead (#66). The on-demand walk is bounded by the + -- source note's own AST size — fast for any single backlinks page. + mkRel srcPos (target, _ctx) = Rel (note ^. noteRoute) target srcPos [] + +{- | Re-extract the Pandoc-block contexts of every outgoing link in +@sourceNote@ that points to @targetMR@. Used by the backlinks renderer +to recover the context that 'noteRels' deliberately drops at insert +time (#66). Cost is one walk of the source note's Pandoc per backlink +expansion — paid only when the @targetMR@'s backlinks page is rendered. +-} +noteRelCtxToTarget :: ModelRoute -> Note -> [[B.Block]] +noteRelCtxToTarget targetMR sourceNote = + let contextsByUrl = LC.queryLinksWithContext (sourceNote ^. noteDoc) + parentR = noteResolveLinkBase sourceNote + targets = unresolvedRelsTo targetMR + in do + (url, instances) <- Map.toList contextsByUrl + (attrs, ctx) <- reverse (toList instances) + target <- maybeToList $ fst <$> parseUnresolvedRelTarget parentR attrs url + guard $ target `elem` targets + pure ctx {- | All `UnresolvedRelTarget`s that could resolve to the given `ModelRoute`. Each `URTResource` form is built by re-parsing a URL diff --git a/emanote/src/Emanote/Source/Patch.hs b/emanote/src/Emanote/Source/Patch.hs index d1b834dd2..fc5a12aaa 100644 --- a/emanote/src/Emanote/Source/Patch.hs +++ b/emanote/src/Emanote/Source/Patch.hs @@ -5,7 +5,9 @@ module Emanote.Source.Patch ( ignorePatterns, ) where +import Control.DeepSeq (deepseq) import Control.Monad.Logger (LoggingT (runLoggingT), MonadLogger, MonadLoggerIO (askLoggerIO)) +import Data.Aeson qualified as Aeson import Data.ByteString qualified as BS import Data.List qualified as List import Data.List.NonEmpty qualified as NEL @@ -255,6 +257,9 @@ parseAndInsert noteF model refreshAction r src = do s <- readRefreshedFile refreshAction (locResolve src) note <- N.parseNote (model ^. M.modelScriptingEngine) (M.modelPluginBaseDir model) r src (decodeUtf8 s) + -- Force the parsed Pandoc and Aeson Value so per-file parser closures + -- can be released as we stream files into the model (#66). + note ^. N.noteDoc `deepseq` (note ^. N.noteMeta :: Aeson.Value) `deepseq` pure () pure $ M.modelInsertNote (noteF note) >>> (modelSourceDependencies %~ SDeps.setLuaDeps r src (note ^. N.notePandocFilterDeclarations)) diff --git a/emanote/test/Emanote/Model/Link/RelSpec.hs b/emanote/test/Emanote/Model/Link/RelSpec.hs index 8ea3da54a..a932b0ef7 100644 --- a/emanote/test/Emanote/Model/Link/RelSpec.hs +++ b/emanote/test/Emanote/Model/Link/RelSpec.hs @@ -1,6 +1,5 @@ module Emanote.Model.Link.RelSpec where -import Commonmark.Extensions.WikiLink qualified as WL import Data.IxSet.Typed qualified as Ix import Emanote.Model.Link.Rel import Emanote.Model.Note qualified as MN @@ -100,9 +99,12 @@ spec = do got === want describe "noteRels source order (issue #186)" $ do it "orders rels by source position, not by lexicographic Ord on context" $ do - -- 'Z' sorts last lexicographically but comes first in source; 'A' - -- sorts first but comes second. Without the srcPos tie-breaker, - -- Ord [Block] would yield A-then-Z; we want source order. + -- Both 'z' and 'a' link to the same target via the same URL, so + -- the two rels share (_relFrom, _relTo) and can only be ordered + -- by _relSrcPos. Source order is "Z first" then "A second", so + -- IxSet.toList should produce srcPos [0, 1] in that order. + -- (#66 dropped _relCtx — see Rel.noteRelCtxToTarget for the + -- on-demand backlinks-context recovery path.) let mkLink lbl = B.Link B.nullAttr [B.Str lbl] ("Foo.md", "") note = MN.mkEmptyNoteWith @@ -110,11 +112,7 @@ spec = do [ B.Para [B.Str "Z first: ", mkLink "z"] , B.Para [B.Str "A second: ", mkLink "a"] ] - paraText rel = case _relCtx rel of - [B.Para is] -> WL.plainify is - other -> error $ "expected single-paragraph context, got " <> show other - (paraText <$> Ix.toList (noteRels note)) - `shouldBe` ["Z first: z", "A second: a"] + (_relSrcPos <$> Ix.toList (noteRels note)) `shouldBe` [0, 1] it "does not collapse two identical-context links to the same target" $ do -- One paragraph mentions Foo.md twice. The two rels share -- (relFrom, relTo, relCtx); without srcPos in Ord, IxSet.fromList's