Skip to content

cranelift: fold ctz/clz directly into brif cond via simplify_skeleton#13343

Merged
cfallin merged 3 commits into
bytecodealliance:mainfrom
ggreif:gabor/brif-cond-simplify
May 12, 2026
Merged

cranelift: fold ctz/clz directly into brif cond via simplify_skeleton#13343
cfallin merged 3 commits into
bytecodealliance:mainfrom
ggreif:gabor/brif-cond-simplify

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented May 12, 2026

Motivation

PR #13332 landed mid-end rules that fold (eq/ne (ctz/clz X) 0) icmp shapes into direct bit tests on X. Those rules hinge on an icmp interposed between the bit-counter and its consumer — i.e. the wasm 3-op pattern i32.ctz; i32.eqz; br_if.

Frontends that emit the 2-op form i32.ctz; br_if (with no i32.eqz between them — e.g. Motoko's moc, after its and 1; eqz; br_ifctz; br_if byte-size peephole) feed (brif (ctz X)) into cranelift, with no icmp for the existing rules to match. #13334 (x64) and #13336 (aarch64) added backend lowering rules to cover that gap. As @cfallin pointed out in #13336, the backend is the wrong place — both for SWE reasons (rule duplication per ISA) and because we want these simplifications to compose with other mid-end opts.

Approach

This PR extends simplify_skeleton to rewrite the condition operand of an existing brif in place. The CFG is preserved by construction: the opcode and successor blocks stay; only argument 0 changes.

Concretely:

  1. New SkeletonInstSimplification::ReplaceBranchCond(Value) variant in prelude_opt.isle — a narrow rewrite that carries just the new cond value.

  2. Driver patch in cranelift/codegen/src/egraph/mod.rs: handle the new variant — in the cost-loop, accept it eagerly (no cost ranking against opcode-preserving rewrites); in the apply site, swap argument 0 in place via inst_args_mut. Composes with Cranelift: Rewrite conditional branches with constant conditions into unconditional jumps #13267's existing branch-simplification machinery; no guard relaxation needed.

  3. replace_branch_cond constructor in prelude_opt.isle.

  4. Two ISLE rules in opts/icmp.isle:

    (rule (simplify_skeleton (brif (ctz x_ty X) _ _))
          (replace_branch_cond
            (eq $I8 (band x_ty X (iconst_u x_ty 1)) (iconst_u x_ty 0))))
    (rule (simplify_skeleton (brif (clz x_ty X) _ _))
          (replace_branch_cond (sge $I8 X (iconst_u x_ty 0))))
    

Effect

On the 2-op brif (ctz X) / brif (clz X) patterns:

platform input mid-end-alone lowering
x86_64 brif (ctz X) testl $1, %edi; je ✓ (matches #13334's x64 backend rules)
x86_64 brif (clz X) testl %edi, %edi; jge ✓ (matches #13336's intent)
aarch64 brif (ctz X) tbz w0, #0single-instruction test-and-branch, tighter than #13336's tst+cmp+b.cc

Test

New filetest cranelift/filetests/filetests/egraph/brif-cnt-cond.clif covers ctz/clz over i32/i64 in the 2-op brif-direct form. All cranelift egraph filetests pass; tests/disas/ctz-clz-bool-condition.wat re-blessed (the bare-form cases now collapse to the optimal 2-instruction shape).

Supersedes

Future work

The ReplaceBranchCond variant covers the in-place cond swap; #13267 covers full brif-to-jump rewrites for constant conditions. A natural follow-up is extending the same cond-only rewrite shape to trapnz / trapz so e.g. (trapnz (ctz x) code) collapses to the bit-test form.

@ggreif ggreif requested a review from a team as a code owner May 12, 2026 21:38
@ggreif ggreif requested review from cfallin and removed request for a team May 12, 2026 21:38
@ggreif
Copy link
Copy Markdown
Contributor Author

ggreif commented May 12, 2026

Cross-backend confirmation that this mid-end change is a strict win on every target wasmtime supports — no per-backend rules needed:

backend brif (ctz v0) brif (clz v0)
x86_64 testl $1, %edi; je testl %edi, %edi; jge
aarch64 tbz w0, #0 (single op) cmp w0, #0; b.ge
riscv64 andi a0, a0, 1; sext.w a0, a0; beqz a0, … sext.w a0, a0; bgez a0, …
s390x nilf %r2, 1; clfi %r2, 0; jge chi %r2, 0; jghe

Notes:

  • aarch64 ctz gets a single-instruction tbz (test-bit-and-branch) — tighter than cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond #13336's tst+cmp+b.cc form.
  • s390x (and the other two) fall in via the existing per-backend icmp+brif fusion path — the mid-end rewrite hands them a shape they already lower optimally, so no new backend rules anywhere.
  • The sext.w overhead on riscv64 is a separate i32→i64 register-width peephole gap, independent of this PR; the bit-counter is still elided.

So the cost-vs-benefit picture for landing this PR vs the two backend PRs (#13334 / #13336): one ~50-line mid-end change covers 4 ISAs simultaneously, with the aarch64 case getting strictly better code than dedicated backend rules would produce.

Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine -- thanks!

@cfallin
Copy link
Copy Markdown
Member

cfallin commented May 12, 2026

@ggreif it looks like you'll need to re-bless a test; and also run cargo fmt to ensure all source is properly formatted. Happy to merge once that's done. Thanks!

@cfallin
Copy link
Copy Markdown
Member

cfallin commented May 12, 2026

(I saw your push to fix the formatting; the test-blessing failure is here and should be fixable with WASMTIME_TEST_BLESS=1 cargo test --test disas)

@ggreif ggreif requested a review from a team as a code owner May 12, 2026 22:14
@ggreif ggreif requested review from pchickey and removed request for a team May 12, 2026 22:14
@cfallin
Copy link
Copy Markdown
Member

cfallin commented May 12, 2026

Ah, and now there's a merge conflict -- sorry for the merging troubles, @ggreif! If you could fix that, I'll be happy to merge.

ggreif and others added 3 commits May 13, 2026 00:19
…keleton`

The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0`
shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends
that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its
`and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))`
into cranelift with no `icmp` for the existing rules to match.

This commit extends `simplify_skeleton` to rewrite the *condition operand*
of an existing `brif` in place, without touching its opcode or successor
blocks (CFG-preserving by construction). A new `SkeletonInstSimplification`
variant `ReplaceBranchCond(Value)` carries the new condition; the egraph
driver applies it by writing through `inst_args_mut`. Two ISLE rules in
`opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)`
to brifs over the equivalent bit-extract form:

  brif (ctz X) bt be   →   brif (eq (band X 1) 0) bt be
  brif (clz X) bt be   →   brif (sge X 0)         bt be

End-to-end lowering on the resulting brif then composes with existing
backend `icmp+brif` fusion to produce:

  x86_64  brif (ctz X):   `testl $1, %edi; je`
  x86_64  brif (clz X):   `testl %edi, %edi; jge`
  aarch64 brif (ctz X):   `tbz w0, #0` — single-instruction test-and-branch

This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64
rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did).

The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond`
simplification variants on `brif` (a `Replace inst` of a brif would risk
changing successor block IDs and is left to a future, broader extension).

Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the
2-op form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new `simplify_skeleton`-on-`brif` rule rewrites the 2-op
`if (ctz/clz x)` cases that bytecodealliance#13332's commentary noted were the
non-icmp-mediated holdouts. Bare-form lowering shrinks from
~9 instructions (bsf/bsr + cmov + test + jne + …) to
`testl $1, %edx; je` (ctz) and `testl %edx, %edx; jge` (clz).

Offsets on the subsequent non-bare functions shift down to match.
@ggreif ggreif force-pushed the gabor/brif-cond-simplify branch from cd73adf to d5ce29b Compare May 12, 2026 22:20
@cfallin cfallin enabled auto-merge May 12, 2026 22:21
@cfallin cfallin added this pull request to the merge queue May 12, 2026
@ggreif
Copy link
Copy Markdown
Contributor Author

ggreif commented May 12, 2026

Sidebar / future work for riscv64: per the cross-backend table above, the mid-end rewrite leaves a sext.w in the riscv64 lowering:

brif (ctz v0):  andi a0, a0, 1; sext.w a0, a0; beqz a0, ...
brif (clz v0):  sext.w a0, a0; bgez a0, ...

For the ctz form the sext.w is unconditionally redundant: andi with a non-negative immediate (here 1) zeroes the upper 32 bits, so the subsequent beqz (which tests the full 64-bit register against x0) reads the same value with or without the sext — the LSB-zero-ness is preserved either way. So andi a0, a0, 1; beqz a0, ... is the optimal 2-op form.

For the clz form bgez does depend on the 64-bit sign bit; the sext.w is only redundant when X is already known-canonical i32 in its register slot (e.g. via a known-extending producer like lw/sext.w/icmp result). Narrower peephole there.

@alexcrichton — flagging since you're touching riscv64 currently; the patterns above are stable shapes the mid-end now emits unconditionally for any brif (ctz x) / brif (clz x) consumer.

Merged via the queue into bytecodealliance:main with commit f6a7288 May 12, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants