Skip to content

feat: integrate fork-choice compliance spec test suite#9314

Draft
GrapeBaBa wants to merge 3 commits into
ChainSafe:unstablefrom
GrapeBaBa:gr/feat-fork-choice-compliance
Draft

feat: integrate fork-choice compliance spec test suite#9314
GrapeBaBa wants to merge 3 commits into
ChainSafe:unstablefrom
GrapeBaBa:gr/feat-fork-choice-compliance

Conversation

@GrapeBaBa
Copy link
Copy Markdown
Contributor

Summary

  • Wire the consensus-specs Fork Choice Compliance suite (ethereum/consensus-specs#3831) into the existing forkChoiceTest runner so it runs alongside fork_choice and sync runners under test:spec:* once fixtures are downloaded.
  • Add getViableHeads() to fork-choice and verify the compliance suite's viable_for_head_roots_and_weights invariant against it (the new check that motivates the suite).
  • Add scripts/download-compliance-fc-tests.sh for fetching the consensus-specs Compliance Tests workflow artifact (the fixtures are NOT bundled with the standard release tarball).

Marked draft because there are real fork-choice gaps (notably consensus-specs#4807) that are NOT addressed here — see Known follow-ups below.

Why a separate download path

release.yml on consensus-specs only invokes make test ... reftests=true, which builds the standard reftests bundle. Compliance tests are produced by make comptests and published only via the daily comptests.yml workflow as a transient GH Actions artifact (<config>.tar.gz). They are not in the releases/download/<tag>/<preset>.tar.gz URL space the existing download-spec-tests script reads.

The new script:

  • supports --tarball <path> / --run-id <id> / auto-fetch latest successful run
  • handles both raw tar.gz (current workflow output, archive: false) and zip-wrapped responses from actions/artifacts/<id>/zip
  • extracts into packages/beacon-node/spec-tests/tests/..., the same root the release tarball uses — paths don't collide because the compliance generator only emits fork_choice_compliance/ subtrees

Cache lives at \$LODESTAR_COMPLIANCE_FC_CACHE (default /var/tmp/lodestar_compliance_fc_cache).

Test-runner accommodations

  1. bls_setting: 2 — every compliance fixture uses placeholder signatures. Pass validSignatures: testcase.meta?.bls_setting !== BigInt(1) so verification short-circuits. Standard fork_choice fixtures use bls_setting: 1 so behavior there is unchanged.
  2. BLOCK_ERROR_ALREADY_KNOWN — compliance fixtures intentionally re-import the same block (dup_shift mutations in their meta.yaml). Spec semantics for on_block(store, known_block) is a no-op success. Production block import correctly rejects with ALREADY_KNOWN; the runner now treats that as success only when the step is valid: true. Production block import path is untouched.
  3. Cross-epoch attestation shufflingon_attestation decodes aggregation_bits using the state at the attestation's target checkpoint, not the head state. The runner now resolves the right shuffling via ShufflingCache + regen (mirroring the production validation path) instead of headState.epochCtx.getIndexedAttestation, which only worked when the attestation's epoch happened to be in the head's epoch cache (±1 epoch).

Two new check fields supported:

  • viable_for_head_roots_and_weights (consensus-specs#3831) — compared against getViableHeads(). Both sides are sorted by root since spec doesn't fix order.
  • head_payload_status (gloas) — mapped between our internal enum ordering (PENDING=0, EMPTY=1, FULL=2) and spec ordering (EMPTY=0, FULL=1, PENDING=2).

getViableHeads() design

A node is a filter_block_tree leaf iff (a) it is viable AND (b) no descendant is viable. maybeUpdateBestChildAndDescendant only sets bestChild to a child that nodeLeadsToViableHead, so bestChild === undefined ⟺ no viable descendant. Combined with nodeIsViableForHead for (a), this matches the spec exactly. Same approach as Teku's ForkChoiceStrategy.getChainHeads.

Weights are returned in Gwei (compliance fixtures use Gwei). Internal storage is in EFFECTIVE_BALANCE_INCREMENT units, multiplied at read time. Note the proposer-boost score contribution is also stored in increments and may round down on minimal preset (102 vs 102.4 ETH); attestation weight is exact, the rounding only surfaces while boost is active. Mainnet boost values are integer ETH so this is a non-issue there.

Current pass rate (fulu, small config, 1472 cases)

Tests: 1219 failed | 253 passed (17.2%)

Per-handler:

Handler Pass Fail Pass%
block_cover_test 180 12 94%
shuffling_test 22 234 9%
block_tree_test 38 474 7%
block_weight_test 8 248 3%
attester_slashing_test 4 124 3%
invalid_message_test 1 127 1%

Failure breakdown:

Count Class
~973 Invalid proposer boost root — see follow-up #1
~14 Invalid viable heads — proposer-boost score rounding on minimal preset
~1 FORKCHOICE_ERROR_INVALID_ATTESTATION

Known follow-ups (not in this PR)

  1. consensus-specs#4807 (update_proposer_boost_root proposer-index check) — the dominant class of failures. Spec now requires block.proposer_index == get_beacon_proposer_index(head_state) before granting boost; this applies to all forks, not just gloas. PR feat: implement should_apply_proposer_boost for gloas #9233 implements the gloas-equivocation half of docs: update docs/usage/local.md #4807 (should_apply_proposer_boost) but explicitly leaves pre-gloas boost behavior unconditional. Closing the boost-root gap is a separate, higher-risk change that should be reviewed independently.
  2. viable_for_head_roots_and_weights weight tolerance — minimal-preset proposer-boost score lossy increment rounding causes false-positive weight mismatches. Could relax to set-equality on roots only, or add a ±1 increment tolerance.
  3. Gloas SSZ deserialization — running gloas compliance against the same artifact triggers SSZ container shape mismatches. The artifact is built from consensus-specs master; our @lodestar/types and spec-tests-version.json (v1.7.0-alpha.5) lag. Belongs in a separate types-bump PR.

Relation to PR #9290

#9290 takes a different infra path for the same goal: extract the runner into forkChoiceRunner.ts, register a separate compliance_fork_choice test file, extract fixtures into a sibling spec-tests-compliance/<config>/ directory. It explicitly skips the viable_for_head_roots_and_weights check, which is the main differentiator from this PR.

Both approaches surface the same underlying fork-choice gaps; merging strategy is up to maintainers — happy to rebase on top of #9290 if its infra is preferred.

CI

Intentionally not wired. The runner cleanly skips the compliance suite when fixtures are absent, so existing test:spec:* jobs are unaffected. Devs and reviewers run on demand:

pnpm download-compliance-fc-tests              # auto-fetch small via gh
pnpm download-compliance-fc-tests --tarball ~/Downloads/small.tar.gz
pnpm vitest run --project spec-minimal -t fork_choice_compliance test/spec/presets/fork_choice.test.ts

Test plan

  • pnpm check-types clean (beacon-node, fork-choice, state-transition)
  • pnpm lint clean
  • Existing fork_choice and sync runners unchanged in behavior — pure additive runner registration + factory annotations
  • fulu compliance against latest comptests small.tar.gz artifact: 253/1472 pass, no infra-side failures, all failures attributable to spec gaps in Known follow-ups

AI Assistance Disclosure

Used Claude Code.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the fork-choice compliance test suite by adding a fixture download script and updating the test runner to handle cross-epoch attestations and duplicate block imports. It also implements a getViableHeads method in the fork-choice logic to allow verification of the internal filtered block tree state. Feedback was provided to use bigint instead of number for weight calculations in getViableHeads to prevent potential precision loss on Mainnet.

Comment on lines +1404 to +1412
getViableHeads(currentSlot: Slot): {root: RootHex; weight: number}[] {
const result: {root: RootHex; weight: number}[] = [];
for (const node of this.nodes) {
if (node.bestChild === undefined && this.nodeIsViableForHead(node, currentSlot)) {
result.push({root: node.blockRoot, weight: node.weight * EFFECTIVE_BALANCE_INCREMENT});
}
}
return result;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of getViableHeads uses number for weights, which leads to precision loss on Mainnet.

In Lodestar, node.weight is stored in EFFECTIVE_BALANCE_INCREMENT units (1 increment = 1 ETH = 10^9 Gwei). On Mainnet, the total active balance is currently ~33M ETH. Multiplying 33,000,000 * 1,000,000,000 results in 3.3 * 10^16, which exceeds Number.MAX_SAFE_INTEGER (9,007,199,254,740,9919 * 10^15).

To maintain accuracy for production use cases (even if compliance tests currently use smaller weights), this should return bigint.

Suggested change
getViableHeads(currentSlot: Slot): {root: RootHex; weight: number}[] {
const result: {root: RootHex; weight: number}[] = [];
for (const node of this.nodes) {
if (node.bestChild === undefined && this.nodeIsViableForHead(node, currentSlot)) {
result.push({root: node.blockRoot, weight: node.weight * EFFECTIVE_BALANCE_INCREMENT});
}
}
return result;
}
getViableHeads(currentSlot: Slot): {root: RootHex; weight: bigint}[] {
const result: {root: RootHex; weight: bigint}[] = [];
for (const node of this.nodes) {
if (node.bestChild === undefined && this.nodeIsViableForHead(node, currentSlot)) {
result.push({root: node.blockRoot, weight: BigInt(node.weight) * BigInt(EFFECTIVE_BALANCE_INCREMENT)});
}
}
return result;
}

Comment on lines +565 to +567
getViableHeads(): {root: RootHex; weight: number}[] {
return this.protoArray.getViableHeads(this.fcStore.currentSlot);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the return type to bigint to match the fix in ProtoArray and avoid precision loss.

Suggested change
getViableHeads(): {root: RootHex; weight: number}[] {
return this.protoArray.getViableHeads(this.fcStore.currentSlot);
}
getViableHeads(): {root: RootHex; weight: bigint}[] {
return this.protoArray.getViableHeads(this.fcStore.currentSlot);
}

* Retrieves all viable-for-head leaves of the filtered_block_tree along with their weights.
* Used by the fork-choice compliance test suite (`viable_for_head_roots_and_weights`).
*/
getViableHeads(): {root: RootHex; weight: number}[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the interface to return bigint for weights.

Suggested change
getViableHeads(): {root: RootHex; weight: number}[];
getViableHeads(): {root: RootHex; weight: bigint}[];

Comment on lines +521 to +528
const expected = step.checks.viable_for_head_roots_and_weights
.map((entry) => ({root: entry.root, weight: bnToNum(entry.weight)}))
.sort((a, b) => a.root.localeCompare(b.root));
const actual = (chain.forkChoice as ForkChoice)
.getViableHeads()
.map(({root, weight}) => ({root, weight}))
.sort((a, b) => a.root.localeCompare(b.root));
expect(actual).toEqualWithMessage(expected, `Invalid viable heads at step ${i}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since getViableHeads should return bigint to avoid precision loss, the test comparison should also operate on bigint values directly instead of converting them to number via bnToNum.

Suggested change
const expected = step.checks.viable_for_head_roots_and_weights
.map((entry) => ({root: entry.root, weight: bnToNum(entry.weight)}))
.sort((a, b) => a.root.localeCompare(b.root));
const actual = (chain.forkChoice as ForkChoice)
.getViableHeads()
.map(({root, weight}) => ({root, weight}))
.sort((a, b) => a.root.localeCompare(b.root));
expect(actual).toEqualWithMessage(expected, `Invalid viable heads at step ${i}`);
const expected = step.checks.viable_for_head_roots_and_weights
.map((entry) => ({root: entry.root, weight: entry.weight}))
.sort((a, b) => a.root.localeCompare(b.root));
const actual = (chain.forkChoice as ForkChoice)
.getViableHeads()
.sort((a, b) => a.root.localeCompare(b.root));
expect(actual).toEqualWithMessage(expected, `Invalid viable heads at step ${i}`);

@GrapeBaBa GrapeBaBa force-pushed the gr/feat-fork-choice-compliance branch 2 times, most recently from 348726d to ae8be0c Compare May 1, 2026 07:21
GrapeBaBa added 3 commits May 1, 2026 15:43
The fork-choice compliance suite (consensus-specs#3831) introduces a
`viable_for_head_roots_and_weights` check that asserts the leaves and
weights of the internal `filter_block_tree(store, justified.root)`.
Head equivalence alone is too weak to catch filtered-tree regressions.

Add `IForkChoice.getViableHeads()` that returns the viable-for-head
leaves of the filtered tree paired with their weights:

- A node is a filter_block_tree leaf iff it is viable AND has no viable
  descendants. `maybeUpdateBestChildAndDescendant` only sets `bestChild`
  to a child that `nodeLeadsToViableHead`, so `bestChild === undefined`
  iff no viable descendant. Combined with `nodeIsViableForHead`, this
  matches the spec exactly. Same approach as Teku's
  `ForkChoiceStrategy.getChainHeads`.
- Weight is converted from internal increment units to Gwei on read
  (compliance fixtures expect Gwei). Note the proposer-boost score
  contribution is also stored in increments and may round down on
  minimal preset (102 vs 102.4 ETH); attestation weight is exact, the
  rounding only surfaces while boost is active. Mainnet boost values
  are integer ETH so this is a non-issue there.
Wire the consensus-specs Fork Choice Compliance suite (ChainSafe#3831) into
the existing `forkChoiceTest` runner. The on-disk layout matches the
standard spec-test layout
(`tests/<preset>/<fork>/fork_choice_compliance/<handler>/<suite>/<case>/`),
so it slots in alongside `fork_choice` and `sync` runners.

Three test-only accommodations the compliance fixtures require:

1. `bls_setting: 2` — every compliance fixture uses placeholder
   signatures. Pass `validSignatures: testcase.meta?.bls_setting !==
   BigInt(1)` to `chain.processBlock` so verification short-circuits.
   Standard `fork_choice` fixtures use `bls_setting: 1` so behavior
   there is unchanged.

2. `BLOCK_ERROR_ALREADY_KNOWN` — compliance fixtures intentionally
   re-import the same block (`dup_shift` mutations in their
   `meta.yaml`). Spec semantics for `on_block(store, known_block)` is
   a no-op success. Production block import correctly rejects with
   ALREADY_KNOWN; this runner treats that case as success only when
   the step is `valid: true`.

3. Cross-epoch attestation shuffling — `on_attestation` decodes
   aggregation_bits using the state at the attestation's target
   checkpoint, not the head state. The runner now resolves the right
   shuffling via ShufflingCache + regen (mirroring the production
   validation path) instead of `headState.epochCtx.getIndexedAttestation`,
   which only worked when the attestation's epoch happened to be in
   the head's epoch cache (±1 epoch) and broke on cross-epoch fork
   attestations surfaced by the compliance suite.

Adds support for two compliance-only check fields:

- `viable_for_head_roots_and_weights` (consensus-specs#3831): compared
  via `getViableHeads()`. Both sides are sorted by root before
  comparison since the spec doesn't fix order.
- `head_payload_status` (gloas): mapped between our internal enum
  ordering (PENDING=0, EMPTY=1, FULL=2) and spec ordering (EMPTY=0,
  FULL=1, PENDING=2).

Pass rate against the latest comptests workflow `small.tar.gz` artifact:

  fulu/fork_choice_compliance: 253/1472 cases pass (17.2%)

Top remaining failures:
- ~80% `Invalid proposer boost root` — consensus-specs#4807 introduced
  a `block.proposer_index == get_beacon_proposer_index(head_state)`
  guard in `update_proposer_boost_root` that we do not yet implement;
  affects all forks (not just gloas equivocation handling). Tracked
  for follow-up alongside ChainSafe#9233.
- ~1% `Invalid viable heads` — proposer-boost rounding on minimal
  preset (see `getViableHeads()` weight note).
Add `scripts/download-compliance-fc-tests.sh` plus an npm script wrapper
(`pnpm download-compliance-fc-tests`) for fetching consensus-specs Fork
Choice Compliance test fixtures.

These fixtures are NOT in the standard `download-spec-tests` bundle —
the consensus-specs `release.yml` workflow does not invoke
`make comptests`. Instead they are produced by the scheduled
"Compliance Tests" workflow (`comptests.yml`) and published only as a
transient GitHub Actions artifact (`<config>.tar.gz`). The script
mirrors Prysm's `hack/compliance-fc-report.sh` resolution order and
handles three modes:

  --tarball <path>   Use a local .tar.gz file
  --run-id <id>      Download artifact from a specific GH Actions run
  (default)          Auto-fetch the latest successful run via `gh`

The artifact is extracted into `packages/beacon-node/spec-tests/`
alongside the release-tarball output. Top-level paths in the artifact
are `tests/<preset>/<fork>/fork_choice_compliance/...`, which do not
collide with anything emitted by the standard release tarball, so
extraction is purely additive — no path remapping needed in the runner.

The GH artifact API may return either raw `.tar.gz` (when
`upload-artifact` used `archive: false`, current workflow behavior) or
a zip-wrapped one. The script sniffs the first two magic bytes and
unwraps the zip case if needed.

Default config is `small` (~89 MB compressed, ~950 MB extracted, ~2900
cases). Cache lives at `$LODESTAR_COMPLIANCE_FC_CACHE` (default
`/var/tmp/lodestar_compliance_fc_cache`).

CI is intentionally not wired in this PR — the runner cleanly skips
the compliance suite when the fixtures are absent, so existing
`test:spec:*` jobs are unaffected. Devs and reviewers run on demand:

  pnpm download-compliance-fc-tests
  pnpm test:spec:minimal -t fork_choice_compliance
@GrapeBaBa GrapeBaBa force-pushed the gr/feat-fork-choice-compliance branch from ae8be0c to a31929d Compare May 1, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant