feat: integrate fork-choice compliance spec test suite#9314
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the fork-choice compliance test suite by adding a fixture download script and updating the test runner to handle cross-epoch attestations and duplicate block imports. It also implements a getViableHeads method in the fork-choice logic to allow verification of the internal filtered block tree state. Feedback was provided to use bigint instead of number for weight calculations in getViableHeads to prevent potential precision loss on Mainnet.
| getViableHeads(currentSlot: Slot): {root: RootHex; weight: number}[] { | ||
| const result: {root: RootHex; weight: number}[] = []; | ||
| for (const node of this.nodes) { | ||
| if (node.bestChild === undefined && this.nodeIsViableForHead(node, currentSlot)) { | ||
| result.push({root: node.blockRoot, weight: node.weight * EFFECTIVE_BALANCE_INCREMENT}); | ||
| } | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
The current implementation of getViableHeads uses number for weights, which leads to precision loss on Mainnet.
In Lodestar, node.weight is stored in EFFECTIVE_BALANCE_INCREMENT units (1 increment = 1 ETH = 10^9 Gwei). On Mainnet, the total active balance is currently ~33M ETH. Multiplying 33,000,000 * 1,000,000,000 results in 3.3 * 10^16, which exceeds Number.MAX_SAFE_INTEGER (9,007,199,254,740,991 ≈ 9 * 10^15).
To maintain accuracy for production use cases (even if compliance tests currently use smaller weights), this should return bigint.
| getViableHeads(currentSlot: Slot): {root: RootHex; weight: number}[] { | |
| const result: {root: RootHex; weight: number}[] = []; | |
| for (const node of this.nodes) { | |
| if (node.bestChild === undefined && this.nodeIsViableForHead(node, currentSlot)) { | |
| result.push({root: node.blockRoot, weight: node.weight * EFFECTIVE_BALANCE_INCREMENT}); | |
| } | |
| } | |
| return result; | |
| } | |
| getViableHeads(currentSlot: Slot): {root: RootHex; weight: bigint}[] { | |
| const result: {root: RootHex; weight: bigint}[] = []; | |
| for (const node of this.nodes) { | |
| if (node.bestChild === undefined && this.nodeIsViableForHead(node, currentSlot)) { | |
| result.push({root: node.blockRoot, weight: BigInt(node.weight) * BigInt(EFFECTIVE_BALANCE_INCREMENT)}); | |
| } | |
| } | |
| return result; | |
| } |
| getViableHeads(): {root: RootHex; weight: number}[] { | ||
| return this.protoArray.getViableHeads(this.fcStore.currentSlot); | ||
| } |
There was a problem hiding this comment.
Update the return type to bigint to match the fix in ProtoArray and avoid precision loss.
| getViableHeads(): {root: RootHex; weight: number}[] { | |
| return this.protoArray.getViableHeads(this.fcStore.currentSlot); | |
| } | |
| getViableHeads(): {root: RootHex; weight: bigint}[] { | |
| return this.protoArray.getViableHeads(this.fcStore.currentSlot); | |
| } |
| * Retrieves all viable-for-head leaves of the filtered_block_tree along with their weights. | ||
| * Used by the fork-choice compliance test suite (`viable_for_head_roots_and_weights`). | ||
| */ | ||
| getViableHeads(): {root: RootHex; weight: number}[]; |
| const expected = step.checks.viable_for_head_roots_and_weights | ||
| .map((entry) => ({root: entry.root, weight: bnToNum(entry.weight)})) | ||
| .sort((a, b) => a.root.localeCompare(b.root)); | ||
| const actual = (chain.forkChoice as ForkChoice) | ||
| .getViableHeads() | ||
| .map(({root, weight}) => ({root, weight})) | ||
| .sort((a, b) => a.root.localeCompare(b.root)); | ||
| expect(actual).toEqualWithMessage(expected, `Invalid viable heads at step ${i}`); |
There was a problem hiding this comment.
Since getViableHeads should return bigint to avoid precision loss, the test comparison should also operate on bigint values directly instead of converting them to number via bnToNum.
| const expected = step.checks.viable_for_head_roots_and_weights | |
| .map((entry) => ({root: entry.root, weight: bnToNum(entry.weight)})) | |
| .sort((a, b) => a.root.localeCompare(b.root)); | |
| const actual = (chain.forkChoice as ForkChoice) | |
| .getViableHeads() | |
| .map(({root, weight}) => ({root, weight})) | |
| .sort((a, b) => a.root.localeCompare(b.root)); | |
| expect(actual).toEqualWithMessage(expected, `Invalid viable heads at step ${i}`); | |
| const expected = step.checks.viable_for_head_roots_and_weights | |
| .map((entry) => ({root: entry.root, weight: entry.weight})) | |
| .sort((a, b) => a.root.localeCompare(b.root)); | |
| const actual = (chain.forkChoice as ForkChoice) | |
| .getViableHeads() | |
| .sort((a, b) => a.root.localeCompare(b.root)); | |
| expect(actual).toEqualWithMessage(expected, `Invalid viable heads at step ${i}`); |
348726d to
ae8be0c
Compare
The fork-choice compliance suite (consensus-specs#3831) introduces a `viable_for_head_roots_and_weights` check that asserts the leaves and weights of the internal `filter_block_tree(store, justified.root)`. Head equivalence alone is too weak to catch filtered-tree regressions. Add `IForkChoice.getViableHeads()` that returns the viable-for-head leaves of the filtered tree paired with their weights: - A node is a filter_block_tree leaf iff it is viable AND has no viable descendants. `maybeUpdateBestChildAndDescendant` only sets `bestChild` to a child that `nodeLeadsToViableHead`, so `bestChild === undefined` iff no viable descendant. Combined with `nodeIsViableForHead`, this matches the spec exactly. Same approach as Teku's `ForkChoiceStrategy.getChainHeads`. - Weight is converted from internal increment units to Gwei on read (compliance fixtures expect Gwei). Note the proposer-boost score contribution is also stored in increments and may round down on minimal preset (102 vs 102.4 ETH); attestation weight is exact, the rounding only surfaces while boost is active. Mainnet boost values are integer ETH so this is a non-issue there.
Wire the consensus-specs Fork Choice Compliance suite (ChainSafe#3831) into the existing `forkChoiceTest` runner. The on-disk layout matches the standard spec-test layout (`tests/<preset>/<fork>/fork_choice_compliance/<handler>/<suite>/<case>/`), so it slots in alongside `fork_choice` and `sync` runners. Three test-only accommodations the compliance fixtures require: 1. `bls_setting: 2` — every compliance fixture uses placeholder signatures. Pass `validSignatures: testcase.meta?.bls_setting !== BigInt(1)` to `chain.processBlock` so verification short-circuits. Standard `fork_choice` fixtures use `bls_setting: 1` so behavior there is unchanged. 2. `BLOCK_ERROR_ALREADY_KNOWN` — compliance fixtures intentionally re-import the same block (`dup_shift` mutations in their `meta.yaml`). Spec semantics for `on_block(store, known_block)` is a no-op success. Production block import correctly rejects with ALREADY_KNOWN; this runner treats that case as success only when the step is `valid: true`. 3. Cross-epoch attestation shuffling — `on_attestation` decodes aggregation_bits using the state at the attestation's target checkpoint, not the head state. The runner now resolves the right shuffling via ShufflingCache + regen (mirroring the production validation path) instead of `headState.epochCtx.getIndexedAttestation`, which only worked when the attestation's epoch happened to be in the head's epoch cache (±1 epoch) and broke on cross-epoch fork attestations surfaced by the compliance suite. Adds support for two compliance-only check fields: - `viable_for_head_roots_and_weights` (consensus-specs#3831): compared via `getViableHeads()`. Both sides are sorted by root before comparison since the spec doesn't fix order. - `head_payload_status` (gloas): mapped between our internal enum ordering (PENDING=0, EMPTY=1, FULL=2) and spec ordering (EMPTY=0, FULL=1, PENDING=2). Pass rate against the latest comptests workflow `small.tar.gz` artifact: fulu/fork_choice_compliance: 253/1472 cases pass (17.2%) Top remaining failures: - ~80% `Invalid proposer boost root` — consensus-specs#4807 introduced a `block.proposer_index == get_beacon_proposer_index(head_state)` guard in `update_proposer_boost_root` that we do not yet implement; affects all forks (not just gloas equivocation handling). Tracked for follow-up alongside ChainSafe#9233. - ~1% `Invalid viable heads` — proposer-boost rounding on minimal preset (see `getViableHeads()` weight note).
Add `scripts/download-compliance-fc-tests.sh` plus an npm script wrapper (`pnpm download-compliance-fc-tests`) for fetching consensus-specs Fork Choice Compliance test fixtures. These fixtures are NOT in the standard `download-spec-tests` bundle — the consensus-specs `release.yml` workflow does not invoke `make comptests`. Instead they are produced by the scheduled "Compliance Tests" workflow (`comptests.yml`) and published only as a transient GitHub Actions artifact (`<config>.tar.gz`). The script mirrors Prysm's `hack/compliance-fc-report.sh` resolution order and handles three modes: --tarball <path> Use a local .tar.gz file --run-id <id> Download artifact from a specific GH Actions run (default) Auto-fetch the latest successful run via `gh` The artifact is extracted into `packages/beacon-node/spec-tests/` alongside the release-tarball output. Top-level paths in the artifact are `tests/<preset>/<fork>/fork_choice_compliance/...`, which do not collide with anything emitted by the standard release tarball, so extraction is purely additive — no path remapping needed in the runner. The GH artifact API may return either raw `.tar.gz` (when `upload-artifact` used `archive: false`, current workflow behavior) or a zip-wrapped one. The script sniffs the first two magic bytes and unwraps the zip case if needed. Default config is `small` (~89 MB compressed, ~950 MB extracted, ~2900 cases). Cache lives at `$LODESTAR_COMPLIANCE_FC_CACHE` (default `/var/tmp/lodestar_compliance_fc_cache`). CI is intentionally not wired in this PR — the runner cleanly skips the compliance suite when the fixtures are absent, so existing `test:spec:*` jobs are unaffected. Devs and reviewers run on demand: pnpm download-compliance-fc-tests pnpm test:spec:minimal -t fork_choice_compliance
ae8be0c to
a31929d
Compare
Summary
forkChoiceTestrunner so it runs alongsidefork_choiceandsyncrunners undertest:spec:*once fixtures are downloaded.getViableHeads()to fork-choice and verify the compliance suite'sviable_for_head_roots_and_weightsinvariant against it (the new check that motivates the suite).scripts/download-compliance-fc-tests.shfor fetching the consensus-specs Compliance Tests workflow artifact (the fixtures are NOT bundled with the standard release tarball).Marked draft because there are real fork-choice gaps (notably consensus-specs#4807) that are NOT addressed here — see Known follow-ups below.
Why a separate download path
release.ymlon consensus-specs only invokesmake test ... reftests=true, which builds the standard reftests bundle. Compliance tests are produced bymake comptestsand published only via the dailycomptests.ymlworkflow as a transient GH Actions artifact (<config>.tar.gz). They are not in thereleases/download/<tag>/<preset>.tar.gzURL space the existingdownload-spec-testsscript reads.The new script:
--tarball <path>/--run-id <id>/ auto-fetch latest successful runtar.gz(current workflow output,archive: false) and zip-wrapped responses fromactions/artifacts/<id>/zippackages/beacon-node/spec-tests/tests/..., the same root the release tarball uses — paths don't collide because the compliance generator only emitsfork_choice_compliance/subtreesCache lives at
\$LODESTAR_COMPLIANCE_FC_CACHE(default/var/tmp/lodestar_compliance_fc_cache).Test-runner accommodations
bls_setting: 2— every compliance fixture uses placeholder signatures. PassvalidSignatures: testcase.meta?.bls_setting !== BigInt(1)so verification short-circuits. Standardfork_choicefixtures usebls_setting: 1so behavior there is unchanged.BLOCK_ERROR_ALREADY_KNOWN— compliance fixtures intentionally re-import the same block (dup_shiftmutations in theirmeta.yaml). Spec semantics foron_block(store, known_block)is a no-op success. Production block import correctly rejects with ALREADY_KNOWN; the runner now treats that as success only when the step isvalid: true. Production block import path is untouched.on_attestationdecodes aggregation_bits using the state at the attestation's target checkpoint, not the head state. The runner now resolves the right shuffling via ShufflingCache + regen (mirroring the production validation path) instead ofheadState.epochCtx.getIndexedAttestation, which only worked when the attestation's epoch happened to be in the head's epoch cache (±1 epoch).Two new check fields supported:
viable_for_head_roots_and_weights(consensus-specs#3831) — compared againstgetViableHeads(). Both sides are sorted by root since spec doesn't fix order.head_payload_status(gloas) — mapped between our internal enum ordering (PENDING=0, EMPTY=1, FULL=2) and spec ordering (EMPTY=0, FULL=1, PENDING=2).getViableHeads()designA node is a
filter_block_treeleaf iff (a) it is viable AND (b) no descendant is viable.maybeUpdateBestChildAndDescendantonly setsbestChildto a child thatnodeLeadsToViableHead, sobestChild === undefined⟺ no viable descendant. Combined withnodeIsViableForHeadfor (a), this matches the spec exactly. Same approach as Teku'sForkChoiceStrategy.getChainHeads.Weights are returned in Gwei (compliance fixtures use Gwei). Internal storage is in
EFFECTIVE_BALANCE_INCREMENTunits, multiplied at read time. Note the proposer-boost score contribution is also stored in increments and may round down on minimal preset (102 vs 102.4 ETH); attestation weight is exact, the rounding only surfaces while boost is active. Mainnet boost values are integer ETH so this is a non-issue there.Current pass rate (fulu, small config, 1472 cases)
Per-handler:
Failure breakdown:
Invalid proposer boost root— see follow-up #1Invalid viable heads— proposer-boost score rounding on minimal presetFORKCHOICE_ERROR_INVALID_ATTESTATIONKnown follow-ups (not in this PR)
update_proposer_boost_rootproposer-index check) — the dominant class of failures. Spec now requiresblock.proposer_index == get_beacon_proposer_index(head_state)before granting boost; this applies to all forks, not just gloas. PR feat: implement should_apply_proposer_boost for gloas #9233 implements the gloas-equivocation half of docs: update docs/usage/local.md #4807 (should_apply_proposer_boost) but explicitly leaves pre-gloas boost behavior unconditional. Closing the boost-root gap is a separate, higher-risk change that should be reviewed independently.viable_for_head_roots_and_weightsweight tolerance — minimal-preset proposer-boost score lossy increment rounding causes false-positive weight mismatches. Could relax to set-equality on roots only, or add a ±1 increment tolerance.master; our@lodestar/typesandspec-tests-version.json(v1.7.0-alpha.5) lag. Belongs in a separate types-bump PR.Relation to PR #9290
#9290 takes a different infra path for the same goal: extract the runner into
forkChoiceRunner.ts, register a separatecompliance_fork_choicetest file, extract fixtures into a siblingspec-tests-compliance/<config>/directory. It explicitly skips theviable_for_head_roots_and_weightscheck, which is the main differentiator from this PR.Both approaches surface the same underlying fork-choice gaps; merging strategy is up to maintainers — happy to rebase on top of #9290 if its infra is preferred.
CI
Intentionally not wired. The runner cleanly skips the compliance suite when fixtures are absent, so existing
test:spec:*jobs are unaffected. Devs and reviewers run on demand:Test plan
pnpm check-typesclean (beacon-node, fork-choice, state-transition)pnpm lintcleanfork_choiceandsyncrunners unchanged in behavior — pure additive runner registration + factory annotationssmall.tar.gzartifact: 253/1472 pass, no infra-side failures, all failures attributable to spec gaps in Known follow-upsAI Assistance Disclosure
Used Claude Code.