Skip to content

Warn on unbound Heist template splices (closes #81)#715

Draft
srid wants to merge 9 commits into
masterfrom
warn-unbound-splices
Draft

Warn on unbound Heist template splices (closes #81)#715
srid wants to merge 9 commits into
masterfrom
warn-unbound-splices

Conversation

@srid

@srid srid commented May 7, 2026

Copy link
Copy Markdown
Owner

Emanote now logs a warning whenever a Heist template references a splice that has no binding — a typo like <ema:tite> for <ema:title>, or ${value:sitURL} for ${value:siteUrl}. Previously these failed silently: Heist's interpreted runNode leaves an unknown element verbatim, and getAttributeSplice falls back to a literal ${name} text token, so the malformed marker leaked straight into the rendered page. Closes #81.

How it works

Heist render ─▶ rendered HTML bytes ─▶ XmlHtml.parseHTML ─▶ AST walk ─▶ [UnboundSplice] ─▶ logW
                                                                        ▲
                                                              dedup via process-wide IORef
                                                              (one log per (route, splice))

The scanner is a pure module (Emanote.View.LintTemplate) — it takes bytes, returns warnings. Logging and dedup live one layer up in Emanote.View.Template so the lint module stays testable and the dedup state is visible at the orchestration layer.

What gets caught

Pattern Surface in HTML Reported as
Element typo: <ema:tite> <ema:tite></ema:tite> survives <ema:tite/>
Attribute typo: ${value:sitURL} literal ${value:sitURL} in attr ${value:sitURL}

The element check is the empirical colon heuristic: any element name with a : in it. HTML5 tags never contain a colon and inline SVG/MathML uses unprefixed forms (<svg>, <math>, <mfrac>), so a colon is unambiguously a splice survivor. The attribute check ignores ${...} inside text nodes — only attribute values are scanned, mirroring Heist's substitution scope.

Where you see warnings

Both emanote run and emanote gen route this through the existing MonadLoggerIO boundary. A new Diagnosing typos subsection in docs/guide/html-template tells site authors what to grep for.

Live-server reloads do not redo the noise — the dedup cache is keyed by (route, splice) for the lifetime of the process. Bounded in practice by routes × distinct typos, so no eviction needed.

Try it locally

nix run github:srid/emanote/warn-unbound-splices

Generated by /do on Claude Code (model claude-opus-4-7).

srid added 8 commits May 6, 2026 21:03
Heist's interpreted runNode leaves unknown <ema:foo> elements verbatim and
getAttributeSplice falls back to the literal "${name}" text when 'name' is
unbound. Both paths failed silently.

Add Emanote.View.LintTemplate, scan each rendered HTML asset post-render
inside emanoteSiteOutput, and log one warning per fresh (route, splice)
pair via logW. A process-wide IORef de-dupes across live-server reloads.
Move warnUnboundSplices, the dedup IORef, and the logging side-effect out
of Emanote.View.LintTemplate and into Emanote.View.Template (the
rendering orchestration layer). LintTemplate is now a pure scanner —
inputs are bytes, outputs are [UnboundSplice]. Whether to log, batch,
push to a UI banner, or dedupe across renders is the caller's call.
Before: an attribute value like "\${incomplete \${valid}" reported a single
mangled name "incomplete \${valid"; "\${valid} \${trailing" silently dropped
the trailing token. Mirrors Heist's "unbalanced \${ is literal text"
substitution behavior so a real \${name} that follows a malformed token is
still surfaced on its own.
scanRenderedHtml used to swallow XmlHtml.parseHTML errors and return [],
giving callers a false-clean lint result on unparseable output. Return
Either Text [UnboundSplice] so the caller logs the parse error as a
distinct diagnostic.
Make the trade-off explicit: HTML5 tags are never colon-bearing and inline
SVG/MathML uses unprefixed forms, so any colon in a rendered tag name is
unambiguously a Heist splice survivor. If a future legitimate use of a
colon tag arises, an allow-list belongs at this function.
…s skipped

Also extend the lintWarningCache haddock with the practical growth bound,
so a future reader knows the unbounded-by-type Set has a small bound in
practice (routes × distinct typos) and the lack of eviction is intentional.
Replace the per-node Set construction (one (SpliceElement …),
Set.fromList, foldMap-via-Set.union) with a [UnboundSplice]
accumulator deduped once at the document root via Relude's sortNub.
Same semantics, fewer intermediate Sets — the empty-attribute /
empty-children case no longer pays for tree allocation it discards.
Both warnUnboundSplices code paths emit "Template lint on '<route>': <…>".
Extract a local 'warn' helper so the route locator phrasing matches between
parse-failure and unbound-splice messages, and grep-by-route stays
deterministic.

Also drop the warnUnboundSplices haddock that restated its name in prose
and replace it with a why-pointer at LintTemplate (where the scanning
rationale already lives).
@srid

srid commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Hickey/Lowy Analysis

# Lens Finding Disposition
1 Hickey Lint scanner complected delivery + dedup with detection Fixed in this PR (commit 0f7608d6)
2 Hickey attrSpliceRefs halted on unbalanced ${, dropping later valid ${name} tokens Fixed in this PR (commit c41aff78)
3 Hickey scanRenderedHtml returned [] on parse failure → false-clean lint Fixed in this PR (commit 896850b5)
4 Hickey Colon heuristic in elementSplice was undocumented as deliberate Fixed in this PR (commit 68c8c77a)
5 Lowy Interface leaks on warning-output channel (logging baked in) No-op — Hickey #1 made the scanner pure, leaving warnUnboundSplices as the thin wrapper Lowy was asking for
6 Lowy IORef-with-unsafePerformIO dedup is brittle for parallel rendering No-op — atomicModifyIORef' is sequentially consistent; Hickey #1 also moved it to the orchestration layer where it's visible
7 Lowy Heuristic hardcoded with no user-extensible allow-list No-op — addressed in #4's haddock as a deliberate trade-off
8 Lowy Pure detection function not exposed as the stable boundary No-op — Hickey #1 made scanRenderedHtml exactly that

Hickey rationale

Detection (the AST walk) and delivery (logW + dedup IORef) were originally in the same module, so calling warnUnboundSplices carried hidden state and the dedup path had no testable surface. The pure split makes scanRenderedHtml :: FilePath -> ByteString -> Either Text [UnboundSplice] the abstract contract; the IORef lives at the rendering orchestration layer in Emanote.View.Template where dedup is one of several render-time concerns.

The attrSpliceRefs and scanRenderedHtml-parse-failure findings are honest correctness fixes — the first preserves valid ${name} tokens that follow a malformed ${; the second turns an Either parse error into a logW instead of silent [] (no-silent-error-swallowing).

Lowy rationale

The module placement is correct — the heuristic encapsulates a Heist-namespace volatility axis that belongs at Emanote's layer, not in the heist-extra dependency. Lowy's open seams (delivery channel, parallel-rendering safety, allow-list extensibility, pure-boundary stability) all fall out of Hickey's #1 split: scanRenderedHtml is now the stable pure interface, the delivery side is one inline wrapper, and atomicModifyIORef' already covers the concurrency concern. None of Lowy's defers required new code beyond what Hickey already drove.

The lint runs on every rendered HTML route, and X.parseHTML + AST walk
add ~1.85s on the docs/ build (57 routes, 2.13s → 3.92s). Add a
conservative byte scan: if the bytes contain neither a literal "\${"
token nor a colon inside any tag opener, there cannot be an unbound
splice survivor and we return Right [] without parsing.

Brings the docs/ overhead from +85% to ~+9% (2.13s → 2.32s). Pages with
typo'd splices still pay the full parse, which is the rare case the lint
exists to catch.
@srid

srid commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Evidence + Performance

Functional verification

The 11-case unit suite in emanote/test/Emanote/View/LintTemplateSpec.hs exercises each detection path end-to-end on raw HTML bytes — element-name colon survivors, attribute ${...} survivors, dedup, the unbalanced-${ recovery branch, and the new pre-check skip path. All 102 examples pass (nix develop -c cabal test all).

Emanote.View.LintTemplate
  scanRenderedHtml
    ignores well-rendered HTML with no splice survivors [✔]
    reports an element whose tag name still has a Heist-style colon [✔]
    reports nested unbound element splices [✔]
    reports a literal ${name} that survived as an attribute value [✔]
    reports multiple ${...} references in one attribute [✔]
    deduplicates repeated occurrences across the document [✔]
    leaves text nodes alone (e.g. ${...} appearing inside a <pre>) [✔]
    still reports a valid ${...} that follows an unbalanced ${ [✔]
    ignores a trailing ${ with no closing brace [✔]
    skips the parse entirely when the bytes have no splice marker [✔]
  formatWarning
    formats element warnings as a tag [✔]
    formats attribute warnings as a dollar-brace token [✔]

Performance impact (emanote gen on docs/, 57 routes, fresh out-dir each run)

Variant Real time (3 runs) Median vs master
master (no lint) 2.135 / 2.131 / 2.227 s 2.131 s
Lint, no pre-check (commit bddaea8e) 3.892 / 3.926 / 3.925 s 3.925 s +84%
Lint, byte-level pre-check (commit 972a4cdd) 2.300 / 2.318 / 2.395 s 2.318 s +9%

The new commit short-circuits XmlHtml.parseHTML when the rendered bytes contain neither a literal ${ token nor a colon inside any tag opener — those are the only two ways an unbound splice can survive. The common case (clean output) skips the parse and AST walk entirely.

How the pre-check works

hasSpliceMarker bs = "${" `BS.isInfixOf` bs || hasColonTagName bs

hasColonTagName = go
  where
    go bs = case BSC.elemIndex '<' bs of
      Nothing -> False
      Just i ->
        let after    = BS.drop (i + 1) bs
            tagStart = case BSC.uncons after of
                         Just ('/', rest) -> rest
                         _                -> after
            (name, _) = BSC.break isNameEnd tagStart
         in BSC.elem ':' name
              || maybe False (\j -> go (BS.drop (j + 1) after)) (BSC.elemIndex '>' after)

It's a single linear pass over the bytes — BS.isInfixOf is C-implemented, and the colon-tag scan only inspects bytes between < and the first whitespace///>. False positives are fine: they trigger the full parse, which then correctly reports zero warnings; only false negatives would be a bug, and the heuristic mirrors the exact two surface patterns Heist's interpreted renderer leaks.

Impact on a page with a real typo

When a typo exists, the page goes through the full parse + walk path, exactly as before. The perf cost there is dominated by XmlHtml.parseHTML, which is what produces the diagnostic. The optimization is a no-op for the cases the lint exists to catch — and a near-free path for everything else.

@srid

srid commented May 7, 2026

Copy link
Copy Markdown
Owner Author

/do results

Step Status Duration Verification
sync 0s git fetch ok; forge=github; noGit=false
research 24m 3s File:line map of heist 1.1.1.2 interpreted runNode + getAttributeSplice + Emanote integration points
branch 5s Feature branch from origin/master
implement 2m 8s Pure scanner module + wire-up + tests + CHANGELOG
check 1m 0s cabal build all clean (63 lib modules, 14 test modules)
docs 34s CHANGELOG bug-fix entry + new "Diagnosing typos" subsection in docs/guide/html-template.md
fmt 14s cabal-fmt + fourmolu + hlint + nixpkgs-fmt all green
commit 1m 9s Primary feature commit pushed
hickey+lowy 12m 11s A1+A2 decomplect, A3 unbalanced ${ recovery, A4 surface parse failures, A5 heuristic doc — 4 follow-up commits
police 19m 14s Rules + fact-check + elegance (/simplify); silent-skip comment, sortNub collapse, shared log helper
test 40s 102 hspec examples (11 new for LintTemplate), 0 failures
create-pr 1m 2s Draft PR with hickey/lowy analysis comment
ci 2m 57s vira ci aarch64-darwin + x86_64-linux signoff; 3 e2e suites green
evidence 13m 52s Test output + perf benchmark; led to commit 972a4cdd
Total 79m 25s

Slowest step: research (24m 3s).

Optimization suggestions

  • research dominated at 30% of total time — the workflow spent that long because Heist's interpreted-render contract (silent-failure semantics, _spliceMap lookup, _errorNotBound only firing under non-empty namespace) had to be verified against /tmp/heist's source. For future PRs touching Heist internals, pre-cloning srid/emanote#issues or snapframework/heist would let research start with the source already at hand. Consider caching /tmp/heist and /tmp/xmlhtml between runs.
  • evidence ran 14m because perf data was discovered late — the user asked "what's the performance impact?" only after the lint had landed, which forced a build-master, build-branch, optimize, build-optimized, re-CI loop. For future user-facing features touching the render hot path, treat perf measurement as a research-step deliverable so the optimization (if any) lands in the primary commit instead of as a follow-up.
  • police ran 19m largely on /simplify's three-sub-agent fan-out — reuse, quality, and efficiency lenses each spawned an agent. The reuse and quality reviews mostly confirmed structure already validated by hickey+lowy. For diffs that have already gone through the structural-review skills, /simplify's rule + fact-check passes (no fan-out) plus a single elegance lens would cover the same ground in ~5m.
  • ci ran twice (3m + 3m) — once on bddaea8e, once on 972a4cdd after the perf optimization landed. The second run was unavoidable (CI must cover HEAD), but the first could have been deferred until after the perf measurement so a single CI sweep covers everything.

Workflow completed at 2026-05-06T21:56Z.

Generated by /do on Claude Code (model claude-opus-4-7).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Display error when using invalid template variables

1 participant