ralph: shrink #66 memory with measurement-driven cycles#740
Draft
srid wants to merge 5 commits into
Draft
Conversation
Synthetic 4501-file / 72 MB corpus reproduces issue #66 — `emanote run` peaks at ~5.0 GiB RSS (vs ~4.7 GiB reported). `+RTS -s` shows live residency of ~1.85 GiB and 2.5x RSS/live ratio from default -F2. Closure-type heap profile points at Pandoc AST retention (list cons + Text + ARR_WORDS = ~68% of heap). Also reproduces the (separate, worse) `emanote gen` blow-up past 108 GiB on the same corpus.
In parseAndInsert, force the parsed Pandoc and Aeson Value so the per-file parser closure (held by UnionMount.unionMountStreaming) can be released as we stream files into the model. Median peak RSS on the 4.5k synthetic corpus drops from 5185 MiB → 4244 MiB (-18%). Add `deepseq` to emanote's build-depends explicitly so the import is honest about its package (it is otherwise pulled in transitively by text/pandoc).
GHC's default old-generation retention factor (-F2) sizes the post-major-GC heap at 2x live data. With cycle 1 in place, live data is 1.27 GiB but AFTER_HWM is still 4.24 GiB. -F1.5 trades a few extra major GCs for a smaller heap high-water — and as a bonus, max gen-1 pause shrinks from 0.63s to 0.49s because each major GC has less to scan. 5-run median AFTER_HWM on the 4.5k synthetic corpus, default -N: 4199 -> 3936 MiB (-6.3%), cumulative -24.1% vs baseline. Users can still override with `emanote run +RTS -F2 -RTS`.
`Rel._relCtx :: [B.Block]` carried the Pandoc-block context for every outgoing link in every note. With ~4500 notes x ~20 links each, that's ~90k duplicated context chunks in `_modelRels` — derivable from each source note's already-retained `_noteDoc`. Strip ctx at `noteRels` insert time and recompute it on demand in `modelLookupBacklinks` via a new `noteRelCtxToTarget` helper that re-walks the source note's Pandoc once per backlinking source. Bounded by source-note AST size; paid only when the backlinks-page is rendered. 5-run median AFTER_HWM on the 4.5k synthetic corpus: 3936 -> 3729 MiB (-5.3%), cumulative -28.1% vs baseline (5185 MiB).
11e2e6b to
9baadc5
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three behaviour-preserving cycles cut peak RSS on a 4.5k-file / 72 MB
synthetic notebook by 29 %. The full methodology, dead ends, and
root-cause analysis live in
docs/dev/ralph/memory-66/README.md;this PR description is a thin pointer to it.
What lands
deepseqparsedPandoc+Aeson.ValueinparseAndInsert-with-rtsopts=-N -F1.5into the executable_relCtx :: [Block]fromRel; recompute on demand+RTS -sshows the real live-data win is structural, not RTS-amplified:maximum residency drops from 1.85 GiB → 1.27 GiB (−31 % live data)
between baseline and cycle 1+3, with GC productivity rising from 70.9 %
to 74.9 %.
What does not land
Per-Pandoc
GHC.Compactregions were measured and rejected — regionoverhead dominates on ~10 KiB Pandocs (+15 % regression on the 4.5k
corpus). Documented in the report under "Dead end".
Where the next 30 % lives
Beyond cycle 3, each Note still retains its full Pandoc AST in
_modelNotes. 4500 × ~250 KiB ≈ 1.1 GiB unavoidable live data. Togo below ~3.5 GiB AFTER_HWM, the assumption itself has to change —
options laid out in the report under "Root cause and the ceiling at
~−29 %". This PR intentionally stops at the behaviour-preserving local
fixes.
Test plan
cabal test emanotepasses (one expected-context assertion inRelSpecwas tightened to assert source-position ordering directly,since
_relCtxis no longer carried on eachRel)docs/notebook renders/,/guide,/yaml-config,/wikilinks,/useswith non-emptybacklinks panels (29-247 occurrences of "backlink" in HTML)
--allow-broken-internal-linkspaths cleanly
on CI
Closes #66 for the local-fix tier. The architectural cycles 4+ are
follow-up work.