Warn on unbound Heist template splices (closes #81) by srid · Pull Request #715 · srid/emanote

srid · 2026-05-07T01:36:38Z

Emanote now logs a warning whenever a Heist template references a splice that has no binding — a typo like <ema:tite> for <ema:title>, or ${value:sitURL} for ${value:siteUrl}. Previously these failed silently: Heist's interpreted runNode leaves an unknown element verbatim, and getAttributeSplice falls back to a literal ${name} text token, so the malformed marker leaked straight into the rendered page. Closes #81.

How it works

Heist render ─▶ rendered HTML bytes ─▶ XmlHtml.parseHTML ─▶ AST walk ─▶ [UnboundSplice] ─▶ logW
                                                                        ▲
                                                              dedup via process-wide IORef
                                                              (one log per (route, splice))

The scanner is a pure module (Emanote.View.LintTemplate) — it takes bytes, returns warnings. Logging and dedup live one layer up in Emanote.View.Template so the lint module stays testable and the dedup state is visible at the orchestration layer.

What gets caught

Pattern	Surface in HTML	Reported as
Element typo: `<ema:tite>`	`<ema:tite></ema:tite>` survives	`<ema:tite/>`
Attribute typo: `${value:sitURL}`	literal `${value:sitURL}` in attr	`${value:sitURL}`

The element check is the empirical colon heuristic: any element name with a : in it. HTML5 tags never contain a colon and inline SVG/MathML uses unprefixed forms (<svg>, <math>, <mfrac>), so a colon is unambiguously a splice survivor. The attribute check ignores ${...} inside text nodes — only attribute values are scanned, mirroring Heist's substitution scope.

Where you see warnings

Both emanote run and emanote gen route this through the existing MonadLoggerIO boundary. A new Diagnosing typos subsection in docs/guide/html-template tells site authors what to grep for.

Live-server reloads do not redo the noise — the dedup cache is keyed by (route, splice) for the lifetime of the process. Bounded in practice by routes × distinct typos, so no eviction needed.

Try it locally

nix run github:srid/emanote/warn-unbound-splices

Generated by /do on Claude Code (model claude-opus-4-7).

Heist's interpreted runNode leaves unknown <ema:foo> elements verbatim and getAttributeSplice falls back to the literal "${name}" text when 'name' is unbound. Both paths failed silently. Add Emanote.View.LintTemplate, scan each rendered HTML asset post-render inside emanoteSiteOutput, and log one warning per fresh (route, splice) pair via logW. A process-wide IORef de-dupes across live-server reloads.

Move warnUnboundSplices, the dedup IORef, and the logging side-effect out of Emanote.View.LintTemplate and into Emanote.View.Template (the rendering orchestration layer). LintTemplate is now a pure scanner — inputs are bytes, outputs are [UnboundSplice]. Whether to log, batch, push to a UI banner, or dedupe across renders is the caller's call.

Before: an attribute value like "\${incomplete \${valid}" reported a single mangled name "incomplete \${valid"; "\${valid} \${trailing" silently dropped the trailing token. Mirrors Heist's "unbalanced \${ is literal text" substitution behavior so a real \${name} that follows a malformed token is still surfaced on its own.

scanRenderedHtml used to swallow XmlHtml.parseHTML errors and return [], giving callers a false-clean lint result on unparseable output. Return Either Text [UnboundSplice] so the caller logs the parse error as a distinct diagnostic.

Make the trade-off explicit: HTML5 tags are never colon-bearing and inline SVG/MathML uses unprefixed forms, so any colon in a rendered tag name is unambiguously a Heist splice survivor. If a future legitimate use of a colon tag arises, an allow-list belongs at this function.

…s skipped Also extend the lintWarningCache haddock with the practical growth bound, so a future reader knows the unbounded-by-type Set has a small bound in practice (routes × distinct typos) and the lack of eviction is intentional.

Replace the per-node Set construction (one (SpliceElement …), Set.fromList, foldMap-via-Set.union) with a [UnboundSplice] accumulator deduped once at the document root via Relude's sortNub. Same semantics, fewer intermediate Sets — the empty-attribute / empty-children case no longer pays for tree allocation it discards.

Both warnUnboundSplices code paths emit "Template lint on '<route>': <…>". Extract a local 'warn' helper so the route locator phrasing matches between parse-failure and unbound-splice messages, and grep-by-route stays deterministic. Also drop the warnUnboundSplices haddock that restated its name in prose and replace it with a why-pointer at LintTemplate (where the scanning rationale already lives).

srid · 2026-05-07T01:37:01Z

Hickey/Lowy Analysis

#	Lens	Finding	Disposition
1	Hickey	Lint scanner complected delivery + dedup with detection	Fixed in this PR (commit `0f7608d6`)
2	Hickey	`attrSpliceRefs` halted on unbalanced `${`, dropping later valid `${name}` tokens	Fixed in this PR (commit `c41aff78`)
3	Hickey	`scanRenderedHtml` returned `[]` on parse failure → false-clean lint	Fixed in this PR (commit `896850b5`)
4	Hickey	Colon heuristic in `elementSplice` was undocumented as deliberate	Fixed in this PR (commit `68c8c77a`)
5	Lowy	Interface leaks on warning-output channel (logging baked in)	No-op — Hickey #1 made the scanner pure, leaving `warnUnboundSplices` as the thin wrapper Lowy was asking for
6	Lowy	`IORef`-with-`unsafePerformIO` dedup is brittle for parallel rendering	No-op — `atomicModifyIORef'` is sequentially consistent; Hickey #1 also moved it to the orchestration layer where it's visible
7	Lowy	Heuristic hardcoded with no user-extensible allow-list	No-op — addressed in #4's haddock as a deliberate trade-off
8	Lowy	Pure detection function not exposed as the stable boundary	No-op — Hickey #1 made `scanRenderedHtml` exactly that

Hickey rationale

Detection (the AST walk) and delivery (logW + dedup IORef) were originally in the same module, so calling warnUnboundSplices carried hidden state and the dedup path had no testable surface. The pure split makes scanRenderedHtml :: FilePath -> ByteString -> Either Text [UnboundSplice] the abstract contract; the IORef lives at the rendering orchestration layer in Emanote.View.Template where dedup is one of several render-time concerns.

The attrSpliceRefs and scanRenderedHtml-parse-failure findings are honest correctness fixes — the first preserves valid ${name} tokens that follow a malformed ${; the second turns an Either parse error into a logW instead of silent [] (no-silent-error-swallowing).

Lowy rationale

The module placement is correct — the heuristic encapsulates a Heist-namespace volatility axis that belongs at Emanote's layer, not in the heist-extra dependency. Lowy's open seams (delivery channel, parallel-rendering safety, allow-list extensibility, pure-boundary stability) all fall out of Hickey's #1 split: scanRenderedHtml is now the stable pure interface, the delivery side is one inline wrapper, and atomicModifyIORef' already covers the concurrency concern. None of Lowy's defers required new code beyond what Hickey already drove.

The lint runs on every rendered HTML route, and X.parseHTML + AST walk add ~1.85s on the docs/ build (57 routes, 2.13s → 3.92s). Add a conservative byte scan: if the bytes contain neither a literal "\${" token nor a colon inside any tag opener, there cannot be an unbound splice survivor and we return Right [] without parsing. Brings the docs/ overhead from +85% to ~+9% (2.13s → 2.32s). Pages with typo'd splices still pay the full parse, which is the rare case the lint exists to catch.

srid · 2026-05-07T01:53:31Z

Evidence + Performance

Functional verification

The 11-case unit suite in emanote/test/Emanote/View/LintTemplateSpec.hs exercises each detection path end-to-end on raw HTML bytes — element-name colon survivors, attribute ${...} survivors, dedup, the unbalanced-${ recovery branch, and the new pre-check skip path. All 102 examples pass (nix develop -c cabal test all).

Emanote.View.LintTemplate
  scanRenderedHtml
    ignores well-rendered HTML with no splice survivors [✔]
    reports an element whose tag name still has a Heist-style colon [✔]
    reports nested unbound element splices [✔]
    reports a literal ${name} that survived as an attribute value [✔]
    reports multiple ${...} references in one attribute [✔]
    deduplicates repeated occurrences across the document [✔]
    leaves text nodes alone (e.g. ${...} appearing inside a <pre>) [✔]
    still reports a valid ${...} that follows an unbalanced ${ [✔]
    ignores a trailing ${ with no closing brace [✔]
    skips the parse entirely when the bytes have no splice marker [✔]
  formatWarning
    formats element warnings as a tag [✔]
    formats attribute warnings as a dollar-brace token [✔]

Performance impact (`emanote gen` on `docs/`, 57 routes, fresh out-dir each run)

Variant	Real time (3 runs)	Median	vs master
`master` (no lint)	2.135 / 2.131 / 2.227 s	2.131 s	—
Lint, no pre-check (commit `bddaea8e`)	3.892 / 3.926 / 3.925 s	3.925 s	+84%
Lint, byte-level pre-check (commit `972a4cdd`)	2.300 / 2.318 / 2.395 s	2.318 s	+9%

The new commit short-circuits XmlHtml.parseHTML when the rendered bytes contain neither a literal ${ token nor a colon inside any tag opener — those are the only two ways an unbound splice can survive. The common case (clean output) skips the parse and AST walk entirely.

How the pre-check works

hasSpliceMarker bs = "${" `BS.isInfixOf` bs || hasColonTagName bs

hasColonTagName = go
  where
    go bs = case BSC.elemIndex '<' bs of
      Nothing -> False
      Just i ->
        let after    = BS.drop (i + 1) bs
            tagStart = case BSC.uncons after of
                         Just ('/', rest) -> rest
                         _                -> after
            (name, _) = BSC.break isNameEnd tagStart
         in BSC.elem ':' name
              || maybe False (\j -> go (BS.drop (j + 1) after)) (BSC.elemIndex '>' after)

It's a single linear pass over the bytes — BS.isInfixOf is C-implemented, and the colon-tag scan only inspects bytes between < and the first whitespace///>. False positives are fine: they trigger the full parse, which then correctly reports zero warnings; only false negatives would be a bug, and the heuristic mirrors the exact two surface patterns Heist's interpreted renderer leaks.

Impact on a page with a real typo

When a typo exists, the page goes through the full parse + walk path, exactly as before. The perf cost there is dominated by XmlHtml.parseHTML, which is what produces the diagnostic. The optimization is a no-op for the cases the lint exists to catch — and a near-free path for everything else.

srid · 2026-05-07T01:56:20Z

`/do` results

Step	Status	Duration	Verification
sync	✓	0s	git fetch ok; forge=github; noGit=false
research	✓	24m 3s	File:line map of heist 1.1.1.2 interpreted runNode + getAttributeSplice + Emanote integration points
branch	✓	5s	Feature branch from origin/master
implement	✓	2m 8s	Pure scanner module + wire-up + tests + CHANGELOG
check	✓	1m 0s	`cabal build all` clean (63 lib modules, 14 test modules)
docs	✓	34s	CHANGELOG bug-fix entry + new "Diagnosing typos" subsection in docs/guide/html-template.md
fmt	✓	14s	cabal-fmt + fourmolu + hlint + nixpkgs-fmt all green
commit	✓	1m 9s	Primary feature commit pushed
hickey+lowy	✓	12m 11s	A1+A2 decomplect, A3 unbalanced ${ recovery, A4 surface parse failures, A5 heuristic doc — 4 follow-up commits
police	✓	19m 14s	Rules + fact-check + elegance (`/simplify`); silent-skip comment, sortNub collapse, shared log helper
test	✓	40s	102 hspec examples (11 new for LintTemplate), 0 failures
create-pr	✓	1m 2s	Draft PR with hickey/lowy analysis comment
ci	✓	2m 57s	`vira ci` aarch64-darwin + x86_64-linux signoff; 3 e2e suites green
evidence	✓	13m 52s	Test output + perf benchmark; led to commit `972a4cdd`
Total		79m 25s

Slowest step: research (24m 3s).

Optimization suggestions

research dominated at 30% of total time — the workflow spent that long because Heist's interpreted-render contract (silent-failure semantics, _spliceMap lookup, _errorNotBound only firing under non-empty namespace) had to be verified against /tmp/heist's source. For future PRs touching Heist internals, pre-cloning srid/emanote#issues or snapframework/heist would let research start with the source already at hand. Consider caching /tmp/heist and /tmp/xmlhtml between runs.
evidence ran 14m because perf data was discovered late — the user asked "what's the performance impact?" only after the lint had landed, which forced a build-master, build-branch, optimize, build-optimized, re-CI loop. For future user-facing features touching the render hot path, treat perf measurement as a research-step deliverable so the optimization (if any) lands in the primary commit instead of as a follow-up.
police ran 19m largely on /simplify's three-sub-agent fan-out — reuse, quality, and efficiency lenses each spawned an agent. The reuse and quality reviews mostly confirmed structure already validated by hickey+lowy. For diffs that have already gone through the structural-review skills, /simplify's rule + fact-check passes (no fan-out) plus a single elegance lens would cover the same ground in ~5m.
ci ran twice (3m + 3m) — once on bddaea8e, once on 972a4cdd after the perf optimization landed. The second run was unavoidable (CI must cover HEAD), but the first could have been deferred until after the perf measurement so a single CI sweep covers everything.

Workflow completed at 2026-05-06T21:56Z.

Generated by /do on Claude Code (model claude-opus-4-7).

srid added 8 commits May 6, 2026 21:03

srid mentioned this pull request May 7, 2026

/do: Add a performance impact step srid/agency#156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn on unbound Heist template splices (closes #81)#715

Warn on unbound Heist template splices (closes #81)#715
srid wants to merge 9 commits into
masterfrom
warn-unbound-splices

srid commented May 7, 2026

Uh oh!

srid commented May 7, 2026

Uh oh!

srid commented May 7, 2026

Uh oh!

srid commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

srid commented May 7, 2026

How it works

What gets caught

Where you see warnings

Try it locally

Uh oh!

srid commented May 7, 2026

Hickey/Lowy Analysis

Hickey rationale

Lowy rationale

Uh oh!

srid commented May 7, 2026

Evidence + Performance

Functional verification

Performance impact (emanote gen on docs/, 57 routes, fresh out-dir each run)

How the pre-check works

Impact on a page with a real typo

Uh oh!

srid commented May 7, 2026

/do results

Optimization suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Performance impact (`emanote gen` on `docs/`, 57 routes, fresh out-dir each run)

`/do` results