Skip to content

cranelift(x64): lower bare ctz/clz boolean tests via test+CC#13334

Open
ggreif wants to merge 1 commit into
bytecodealliance:mainfrom
ggreif:gabor/ctz-clz-brif-lowering
Open

cranelift(x64): lower bare ctz/clz boolean tests via test+CC#13334
ggreif wants to merge 1 commit into
bytecodealliance:mainfrom
ggreif:gabor/ctz-clz-brif-lowering

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented May 11, 2026

Summary

Follow-up to #13332. That PR added egraph rules collapsing (eq (ctz X) 0) / (ne (ctz X) 0) / (eq (clz X) 0) / (ne (clz X) 0) to direct LSB / sign-bit tests — but only when the comparison is mediated by an explicit icmp. The wasm front-end translates wasm if (ctz X) to brif (ireduce.i32 (ctz.i64 X)) directly (no icmp), so the egraph rules don't fire on the wasm-natural shape.

This PR closes the gap by specialising is_nonzero in the x64 backend — the helper that all brif/select/trapif lowerings funnel through.

Rules

In cranelift/codegen/src/isa/x64/inst.isle:

(rule 3 (is_nonzero (ctz (ty_32_or_64 ty) val))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (ireduce _ (ctz (ty_32_or_64 ty) val)))
      (CondResult.CC (x64_test ty val (RegMemImm.Imm 1)) (CC.Z)))
(rule 3 (is_nonzero (clz (ty_32_or_64 ty) val))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))
(rule 3 (is_nonzero (ireduce _ (clz (ty_32_or_64 ty) val)))
      (let ((gpr Gpr val)) (CondResult.CC (x64_test ty gpr gpr) (CC.NS))))

The ireduce variant catches the wasm front-end's i32.wrap_i64 over a 64-bit ctz/clz — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat)

consumer before after
if_ctz_bare_i32 5 insns (bsfl + cmovel + test + jne) 2 (testl $1, %edx; je)
if_ctz_bare_i64 5 insns (bsfq + cmovq + test + jne) 2 (testq $1, %rdx; je)
if_clz_bare_i32 7 insns (bsr + cmov + sub + test + jne) 2 (testl + jns)

The icmp-mediated cases (collapsed by #13332's egraph rules) are unchanged. The numeric-comparison negative test ((ctz X) == 4) stays untouched.

Motivation

Motoko's moc codegen emits i64.ctz X; i32.wrap_i64; if for compactness/sign tests in the EOP backend (see caffeinelabs/motoko#6103). Before this PR, that lowers to 5 native instructions per dispatch; after, 2.

A concrete idiomatic example: in Motoko, the let-else pattern over Result

let #ok payload = queryProp(...) else return defaultValue;

desugars to a 2-arm refutable variant match (#ok vs #err). The variant-tag hashes are hash("ok") = 0x611C (LSB 0) and hash("err") = 0x4D0765 (LSB 1) — they differ exactly at the LSB. The planned variant-switch BitTest dispatch (caffeinelabs/motoko's gabor/variant-switch) recognizes this and emits a single LSB-test for the dispatch; combined with this PR, the entire let-else lowers to load hash; testq $1, ...; jcc on x64 — three instructions for a pattern match. Every Result-returning API + every let-else-style early return collapses to this shape.

Aggregated across hot paths (variant-switch dispatch, GC compact/heap discriminator, sign tests, …) this is meaningful.

Follow-ups (not in this PR)

  • aarch64, riscv64, s390x analogues — separate PRs once x64 reviewer feedback lands.
  • select-consumer variant — select already routes through is_nonzero_cmpis_nonzero, so this PR's rules cover it too without extra work.

Follow-up to bytecodealliance#13332. That PR added egraph rules collapsing
`(eq (ctz X) 0)` / `(ne (ctz X) 0)` / clz analogues to direct
LSB / sign-bit tests — but only when the comparison is mediated by an
explicit `icmp`. The wasm front-end translates `wasm if (ctz X)` to
`brif (ireduce.i32 (ctz.i64 X))` directly (no `icmp`), so the egraph
rules don't fire on the wasm-natural shape.

This commit closes the gap by specialising `is_nonzero` in the x64
backend — the helper that all `brif`/`select`/`trapif` lowerings
funnel through. Four rules: `ctz`/`clz` × bare/`ireduce`-wrapped.

The `ireduce` variant catches the wasm front-end's `i32.wrap_i64`
over a 64-bit `ctz`/`clz` — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat):

  if_ctz_bare_i32:   5 insns -> 2 (testl $1, %edx; je)
  if_ctz_bare_i64:   5 insns -> 2 (testq $1, %rdx; je)
  if_clz_bare_i32:   7 insns -> 2 (testl %edx, %edx; jns)

The icmp-mediated cases (collapsed by bytecodealliance#13332's egraph rules) are
unchanged. The numeric-comparison negative test stays untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ggreif ggreif changed the title cranelift(x64): lower bare ctz/clz boolean tests via test+CC cranelift(x64): lower bare ctz/clz boolean tests via test+CC May 11, 2026
@ggreif ggreif marked this pull request as ready for review May 11, 2026 16:17
@ggreif ggreif requested review from a team as code owners May 11, 2026 16:17
@ggreif ggreif requested review from pchickey and uweigand and removed request for a team May 11, 2026 16:17
@ggreif ggreif changed the title cranelift(x64): lower bare ctz/clz boolean tests via test+CC cranelift(x64): lower bare ctz/clz boolean tests via test+CC May 11, 2026
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels May 11, 2026
@alexcrichton alexcrichton removed the request for review from uweigand May 12, 2026 01:49
@alexcrichton
Copy link
Copy Markdown
Member

@cfallin or @fitzgen do y'all have any ideas about how to sort of deduplicate this with the optimization rules landed in #13332? It feels a bit unfortunate that we need basically the same rules twice, once for general expressions and once because cond != 0 is implicit in all conditional-y locations (brif, select, trapnz, etc). Without redefining how those instructions work I'm not sure how to avoid the duplication myself, but figured I'd ask if y'all had ideas. Another option might be to only have these rules in backends as opposed to also the mid-end, but that also doesn't feel great, the duplication probably isn't so bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants