Warn on unbound Heist template splices (closes #81)#715
Conversation
Heist's interpreted runNode leaves unknown <ema:foo> elements verbatim and
getAttributeSplice falls back to the literal "${name}" text when 'name' is
unbound. Both paths failed silently.
Add Emanote.View.LintTemplate, scan each rendered HTML asset post-render
inside emanoteSiteOutput, and log one warning per fresh (route, splice)
pair via logW. A process-wide IORef de-dupes across live-server reloads.
Move warnUnboundSplices, the dedup IORef, and the logging side-effect out of Emanote.View.LintTemplate and into Emanote.View.Template (the rendering orchestration layer). LintTemplate is now a pure scanner — inputs are bytes, outputs are [UnboundSplice]. Whether to log, batch, push to a UI banner, or dedupe across renders is the caller's call.
Before: an attribute value like "\${incomplete \${valid}" reported a single
mangled name "incomplete \${valid"; "\${valid} \${trailing" silently dropped
the trailing token. Mirrors Heist's "unbalanced \${ is literal text"
substitution behavior so a real \${name} that follows a malformed token is
still surfaced on its own.
scanRenderedHtml used to swallow XmlHtml.parseHTML errors and return [], giving callers a false-clean lint result on unparseable output. Return Either Text [UnboundSplice] so the caller logs the parse error as a distinct diagnostic.
Make the trade-off explicit: HTML5 tags are never colon-bearing and inline SVG/MathML uses unprefixed forms, so any colon in a rendered tag name is unambiguously a Heist splice survivor. If a future legitimate use of a colon tag arises, an allow-list belongs at this function.
…s skipped Also extend the lintWarningCache haddock with the practical growth bound, so a future reader knows the unbounded-by-type Set has a small bound in practice (routes × distinct typos) and the lack of eviction is intentional.
Replace the per-node Set construction (one (SpliceElement …), Set.fromList, foldMap-via-Set.union) with a [UnboundSplice] accumulator deduped once at the document root via Relude's sortNub. Same semantics, fewer intermediate Sets — the empty-attribute / empty-children case no longer pays for tree allocation it discards.
Both warnUnboundSplices code paths emit "Template lint on '<route>': <…>". Extract a local 'warn' helper so the route locator phrasing matches between parse-failure and unbound-splice messages, and grep-by-route stays deterministic. Also drop the warnUnboundSplices haddock that restated its name in prose and replace it with a why-pointer at LintTemplate (where the scanning rationale already lives).
Hickey/Lowy Analysis
Hickey rationaleDetection (the AST walk) and delivery ( The Lowy rationaleThe module placement is correct — the heuristic encapsulates a Heist-namespace volatility axis that belongs at Emanote's layer, not in the heist-extra dependency. Lowy's open seams (delivery channel, parallel-rendering safety, allow-list extensibility, pure-boundary stability) all fall out of Hickey's #1 split: |
The lint runs on every rendered HTML route, and X.parseHTML + AST walk
add ~1.85s on the docs/ build (57 routes, 2.13s → 3.92s). Add a
conservative byte scan: if the bytes contain neither a literal "\${"
token nor a colon inside any tag opener, there cannot be an unbound
splice survivor and we return Right [] without parsing.
Brings the docs/ overhead from +85% to ~+9% (2.13s → 2.32s). Pages with
typo'd splices still pay the full parse, which is the rare case the lint
exists to catch.
Evidence + PerformanceFunctional verificationThe 11-case unit suite in Performance impact (
|
| Variant | Real time (3 runs) | Median | vs master |
|---|---|---|---|
master (no lint) |
2.135 / 2.131 / 2.227 s | 2.131 s | — |
Lint, no pre-check (commit bddaea8e) |
3.892 / 3.926 / 3.925 s | 3.925 s | +84% |
Lint, byte-level pre-check (commit 972a4cdd) |
2.300 / 2.318 / 2.395 s | 2.318 s | +9% |
The new commit short-circuits XmlHtml.parseHTML when the rendered bytes contain neither a literal ${ token nor a colon inside any tag opener — those are the only two ways an unbound splice can survive. The common case (clean output) skips the parse and AST walk entirely.
How the pre-check works
hasSpliceMarker bs = "${" `BS.isInfixOf` bs || hasColonTagName bs
hasColonTagName = go
where
go bs = case BSC.elemIndex '<' bs of
Nothing -> False
Just i ->
let after = BS.drop (i + 1) bs
tagStart = case BSC.uncons after of
Just ('/', rest) -> rest
_ -> after
(name, _) = BSC.break isNameEnd tagStart
in BSC.elem ':' name
|| maybe False (\j -> go (BS.drop (j + 1) after)) (BSC.elemIndex '>' after)It's a single linear pass over the bytes — BS.isInfixOf is C-implemented, and the colon-tag scan only inspects bytes between < and the first whitespace///>. False positives are fine: they trigger the full parse, which then correctly reports zero warnings; only false negatives would be a bug, and the heuristic mirrors the exact two surface patterns Heist's interpreted renderer leaks.
Impact on a page with a real typo
When a typo exists, the page goes through the full parse + walk path, exactly as before. The perf cost there is dominated by XmlHtml.parseHTML, which is what produces the diagnostic. The optimization is a no-op for the cases the lint exists to catch — and a near-free path for everything else.
|
| Step | Status | Duration | Verification |
|---|---|---|---|
| sync | ✓ | 0s | git fetch ok; forge=github; noGit=false |
| research | ✓ | 24m 3s | File:line map of heist 1.1.1.2 interpreted runNode + getAttributeSplice + Emanote integration points |
| branch | ✓ | 5s | Feature branch from origin/master |
| implement | ✓ | 2m 8s | Pure scanner module + wire-up + tests + CHANGELOG |
| check | ✓ | 1m 0s | cabal build all clean (63 lib modules, 14 test modules) |
| docs | ✓ | 34s | CHANGELOG bug-fix entry + new "Diagnosing typos" subsection in docs/guide/html-template.md |
| fmt | ✓ | 14s | cabal-fmt + fourmolu + hlint + nixpkgs-fmt all green |
| commit | ✓ | 1m 9s | Primary feature commit pushed |
| hickey+lowy | ✓ | 12m 11s | A1+A2 decomplect, A3 unbalanced ${ recovery, A4 surface parse failures, A5 heuristic doc — 4 follow-up commits |
| police | ✓ | 19m 14s | Rules + fact-check + elegance (/simplify); silent-skip comment, sortNub collapse, shared log helper |
| test | ✓ | 40s | 102 hspec examples (11 new for LintTemplate), 0 failures |
| create-pr | ✓ | 1m 2s | Draft PR with hickey/lowy analysis comment |
| ci | ✓ | 2m 57s | vira ci aarch64-darwin + x86_64-linux signoff; 3 e2e suites green |
| evidence | ✓ | 13m 52s | Test output + perf benchmark; led to commit 972a4cdd |
| Total | 79m 25s |
Slowest step: research (24m 3s).
Optimization suggestions
researchdominated at 30% of total time — the workflow spent that long because Heist's interpreted-render contract (silent-failure semantics,_spliceMaplookup,_errorNotBoundonly firing under non-empty namespace) had to be verified against/tmp/heist's source. For future PRs touching Heist internals, pre-cloningsrid/emanote#issuesorsnapframework/heistwould letresearchstart with the source already at hand. Consider caching/tmp/heistand/tmp/xmlhtmlbetween runs.evidenceran 14m because perf data was discovered late — the user asked "what's the performance impact?" only after the lint had landed, which forced a build-master, build-branch, optimize, build-optimized, re-CI loop. For future user-facing features touching the render hot path, treat perf measurement as a research-step deliverable so the optimization (if any) lands in the primary commit instead of as a follow-up.policeran 19m largely on/simplify's three-sub-agent fan-out — reuse, quality, and efficiency lenses each spawned an agent. The reuse and quality reviews mostly confirmed structure already validated byhickey+lowy. For diffs that have already gone through the structural-review skills,/simplify's rule + fact-check passes (no fan-out) plus a single elegance lens would cover the same ground in ~5m.ciran twice (3m + 3m) — once onbddaea8e, once on972a4cddafter the perf optimization landed. The second run was unavoidable (CI must cover HEAD), but the first could have been deferred until after the perf measurement so a single CI sweep covers everything.
Workflow completed at 2026-05-06T21:56Z.
Generated by /do on Claude Code (model claude-opus-4-7).
Emanote now logs a warning whenever a Heist template references a splice that has no binding — a typo like
<ema:tite>for<ema:title>, or${value:sitURL}for${value:siteUrl}. Previously these failed silently: Heist's interpretedrunNodeleaves an unknown element verbatim, andgetAttributeSplicefalls back to a literal${name}text token, so the malformed marker leaked straight into the rendered page. Closes #81.How it works
The scanner is a pure module (
Emanote.View.LintTemplate) — it takes bytes, returns warnings. Logging and dedup live one layer up inEmanote.View.Templateso the lint module stays testable and the dedup state is visible at the orchestration layer.What gets caught
<ema:tite><ema:tite></ema:tite>survives<ema:tite/>${value:sitURL}${value:sitURL}in attr${value:sitURL}The element check is the empirical colon heuristic: any element name with a
:in it. HTML5 tags never contain a colon and inline SVG/MathML uses unprefixed forms (<svg>,<math>,<mfrac>), so a colon is unambiguously a splice survivor. The attribute check ignores${...}inside text nodes — only attribute values are scanned, mirroring Heist's substitution scope.Where you see warnings
Both
emanote runandemanote genroute this through the existingMonadLoggerIOboundary. A newDiagnosing typossubsection indocs/guide/html-templatetells site authors what to grep for.Try it locally
Generated by
/doon Claude Code (modelclaude-opus-4-7).