Skip to content

fix: builder payment quorum integer overflow at mainnet-scale stake#9350

Open
parithosh wants to merge 1 commit into
ChainSafe:glamsterdam-devnet-3from
parithosh:fix/builder-payment-quorum-overflow
Open

fix: builder payment quorum integer overflow at mainnet-scale stake#9350
parithosh wants to merge 1 commit into
ChainSafe:glamsterdam-devnet-3from
parithosh:fix/builder-payment-quorum-overflow

Conversation

@parithosh
Copy link
Copy Markdown

Summary

getBuilderPaymentQuorumThreshold (in packages/state-transition/src/util/gloas.ts) computes totalActiveBalanceIncrements * EFFECTIVE_BALANCE_INCREMENT as a JS number. The intermediate gwei product crosses Number.MAX_SAFE_INTEGER (2^53 - 1 ≈ 9.007 × 10^15) once total active stake passes ~9 M ETH, silently losing precision. Other CL clients (Prysm, Lighthouse, Teku, Nimbus, Grandine) compute the spec-exact uint64 result, so a Gloas-enabled mainnet would see Lodestar diverge on the post-state root at the first epoch transition that promotes any builder payment near the quorum boundary, forking Lodestar nodes off the network.

The same overflow class also affects the per-slot BuilderPendingPayment.weight accumulator in processAttestationsAltair.ts: cumulative attesting weight against a single slot can exceed 2^53 gwei at mainnet stake.

Why now

Bug is dormant on glamsterdam-devnet-3 (~50 k ETH stake — two orders of magnitude below the 2^53 boundary, so f64 and uint64 agree to the bit) and on every other current network because Gloas isn't activated anywhere yet. It would activate on day one of Gloas mainnet with no attacker required — the network's stake itself is the trigger. Landing the fix before any Gloas activation costs nothing; landing it after costs a fork.

Arithmetic

Scenario totalActiveBalanceIncrements (ETH) × 1e9 (gwei) > 2^53?
glamsterdam-devnet-3 (~50 k ETH) 5 × 10^4 5 × 10^13 No (180× under)
Mainnet today (~35 M ETH) 3.5 × 10^7 3.5 × 10^16 Yes (3.88× over)
MaxEB worst case (~64 M ETH) 6.4 × 10^7 6.4 × 10^16 Yes (7.1× over)

Changes

File Change
packages/types/src/gloas/sszTypes.ts BuilderPendingPayment.weight: UintNum64UintBn64. On-wire encoding identical (uint64 LE); local TS type becomes bigint, matching the spec's domain.
packages/state-transition/src/util/gloas.ts getBuilderPaymentQuorumThreshold rewritten to use bigint intermediates; return type numberbigint.
packages/state-transition/src/block/processAttestationsAltair.ts Per-slot builder weight accumulator: Map<number, number>Map<number, bigint>; arithmetic in bigint.
packages/state-transition/src/block/processExecutionPayloadBid.ts Initial weight: 0 literal → 0n.
packages/state-transition/test/unit/util/gloas.test.ts (new) Precision regression test: parameterised across 50 k / 9 M / 35 M / 64 M ETH stake, asserts equivalence with a bigint reference. Brackets the f64 boundary.

The comparison payment.weight >= quorum in processBuilderPendingPayments.ts:14 needed no code change — both sides are now bigint, comparison flows through.

Scope

Fully Gloas-only:

  • BuilderPendingPayment is a Gloas-only SSZ container; not embedded pre-Gloas.
  • processBuilderPendingPayments is registered only in the Gloas branch (epoch/index.ts:166-168).
  • The accumulator block in processAttestationsAltair.ts:146 is gated behind if (fork >= ForkSeq.gloas).
  • getBuilderPaymentQuorumThreshold and processExecutionPayloadBid are Gloas-only.
  • Wire compatibility: UintBn64 and UintNum64 produce bit-identical SSZ bytes — state roots / block roots / gossip serialisation unchanged.
  • Behavioural compatibility: at sub-2^53 stake the bigint result equals the prior f64 result to the bit, so existing devnet-3 nodes with and without this fix will produce identical state roots.

Test plan

  • pnpm --filter @lodestar/types build — clean
  • pnpm --filter @lodestar/state-transition exec vitest run --project unit244/244 passed, includes new gloas.test.ts
  • pnpm lint on @lodestar/types and @lodestar/state-transition — clean
  • Spec tests (pnpm download-spec-tests && pnpm test:spec) — not run locally; CI will cover
  • Devnet smoke run on glamsterdam-devnet-3 once merged — expected behaviour unchanged at this stake
  • pnpm check-types repo-wide — not run; note that packages/state-transition/src/util/validator.ts has 4 pre-existing build errors on glamsterdam-devnet-3 (and unstable) introduced by 87cbe69c66 (EIP-8061 churn), unrelated to this PR

AI disclosure

Fix drafted with assistance from Claude (Opus 4.7, 1 M context). Bug originally surfaced from an audit-style code-read of the Gloas state-transition path; arithmetic, fix design, and PR text reviewed and authored interactively with parithosh.

🤖 Generated with Claude Code

`getBuilderPaymentQuorumThreshold` computed
`totalActiveBalanceIncrements * EFFECTIVE_BALANCE_INCREMENT` as a JS
`number`. The intermediate gwei product crosses `Number.MAX_SAFE_INTEGER`
(2^53 - 1) once total active stake passes ~9M ETH, silently losing
precision. Other clients compute the spec-exact uint64 result, so a
Gloas-enabled mainnet would see Lodestar diverge on the post-state root
at the first epoch transition that promotes builder payments through the
quorum check, forking Lodestar nodes off the network.

Two sites overflow:
- `getBuilderPaymentQuorumThreshold` (gloas.ts) - the threshold itself.
- `BuilderPendingPayment.weight` accumulator in
  `processAttestationsAltair.ts` - per-slot gwei weight is also in the
  10^16 range at mainnet-scale stake.

Fix:
- Switch `BuilderPendingPayment.weight` SSZ field from `UintNum64` to
  `UintBn64`. On-wire encoding is identical (uint64 LE); local TS type
  becomes `bigint`, matching the spec's domain.
- Use bigint arithmetic in `getBuilderPaymentQuorumThreshold` and in the
  per-slot weight accumulator. The threshold function now returns
  `bigint`; the comparison in `processBuilderPendingPayments` flows
  through unchanged.

Scope: Gloas-only. `BuilderPendingPayment`, the threshold function, and
the gloas branch in `processAttestationsAltair` are all unreachable
pre-Gloas. Bug is dormant on `glamsterdam-devnet-3` (~50k ETH stake,
two orders of magnitude below the precision boundary) and on any
network without Gloas activated, so this can land before any divergence
risk materialises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@parithosh parithosh requested a review from a team as a code owner May 9, 2026 17:40
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates builder weight and quorum threshold calculations from numbers to bigints to prevent precision loss when the total active stake exceeds approximately 9 million ETH. Key changes include updating the builderWeightMap in attestation processing, modifying getBuilderPaymentQuorumThreshold to use bigint arithmetic, and updating the SSZ type for BuilderPendingPayment. New unit tests were added to verify the quorum threshold calculations across various stake levels. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

@lodekeeper lodekeeper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tightly scoped fix for the f64 precision boundary at mainnet-scale stake.

Spec match: arithmetic order in getBuilderPaymentQuorumThreshold mirrors the spec ((total_active_balance // SLOTS_PER_EPOCH * NUMERATOR) // DENOMINATOR); bigint floor-division agrees with Python //.

SSZ retype is the right tool. Pushing bigint into the type system means weight reads/writes stay correct by construction across all consumers. SSZ wire format is bit-identical (UintNum64 and UintBn64 both encode as uint64 LE), so state/block/gossip roots are unchanged and devnet-3 nodes at ~50k ETH stake produce identical state roots with and without this patch.

Consumer audit (all covered):

  • getBuilderPaymentQuorumThreshold has one call site (processBuilderPendingPayments.ts:10) — payment.weight >= quorum comparison flows through unchanged with both sides bigint.
  • payment.weight production reads/writes: processAttestationsAltair.ts:153,162 and processBuilderPendingPayments.ts:14 — all handled.
  • Default construction via BuilderPendingPayment.defaultViewDU() in processProposerSlashing.ts:50 and processParentExecutionPayload.ts:109 continues to work (SSZ default for UintBn64 is 0n).

Per-attestation paymentWeightToAdd stays as number safely (bounded by committee_size × max-increment, well under 2^53) and is converted to bigint before the multiply by EFFECTIVE_BALANCE_INCREMENT — correct.

Non-blocking suggestion (test coverage gap): the new gloas.test.ts covers the threshold path well, but the processAttestationsAltair accumulator path has no direct unit test at mainnet-scale stake. Harder to set up (needs mock state + attestations) so reasonable to leave for a follow-up; spec tests will exercise it indirectly.

builderWeightMap.get(builderPendingPaymentIndex) ??
(state as CachedBeaconStateGloas).builderPendingPayments.get(builderPendingPaymentIndex).weight;
const updatedWeight = existingWeight + paymentWeightToAdd * EFFECTIVE_BALANCE_INCREMENT;
const updatedWeight = existingWeight + BigInt(paymentWeightToAdd) * BigInt(EFFECTIVE_BALANCE_INCREMENT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Minor: BigInt(EFFECTIVE_BALANCE_INCREMENT) is re-evaluated every iteration of the attestation loop. Consider hoisting to a module-level constant for consistency with how other bigint constants are handled, e.g.:

Suggested change
const updatedWeight = existingWeight + BigInt(paymentWeightToAdd) * BigInt(EFFECTIVE_BALANCE_INCREMENT);
const updatedWeight = existingWeight + BigInt(paymentWeightToAdd) * EFFECTIVE_BALANCE_INCREMENT_BN;

with const EFFECTIVE_BALANCE_INCREMENT_BN = BigInt(EFFECTIVE_BALANCE_INCREMENT); at module scope. Probably negligible at runtime (V8 caches small bigint constants), but cheap and idiomatic.

@pk910
Copy link
Copy Markdown

pk910 commented May 10, 2026

@lodekeeper
I think the report is not right for this particular case:

The float would loose precision in the calculation if EFFECTIVE_BALANCE_INCREMENT (1_000_000_000) has a slightly different value, like a prime factor (999_999_937), or effectively anything that is not dividable by 2 multiple times, so the binary representation does not end with 9 zero bits as it does.
It goes deep into how floating point numbers are encoded (splitting value and a power 2 exponent).
The reported arithmetic expression is in the unsafe value range, but in this particular case (multiplied by the static 1_000_000_000), it is still ensured to be precise up to about 1.537 billion ETH staked, which is far from reachable on any network.

claude review:

Verdict: The reported bug does NOT exist. The report rests on a false premise.                                                                                                                                                                                      
                                                                                          
  Setup: Checked out glamsterdam-devnet-3 of ChainSafe/lodestar (HEAD 05a33e512f). Every file path, line number, and code snippet in the report matches verbatim.                                                                                                     
                                                                                                                                                                                                                                                                      
  What's right in the report                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                      
  - The vulnerable function exists exactly as quoted.                                                                                                                                                                                                                 
  - The constants are right (EFFECTIVE_BALANCE_INCREMENT = 1e9, SLOTS_PER_EPOCH = 32, BUILDER_PAYMENT_THRESHOLD_NUMERATOR/DENOMINATOR = 6/10).
  - Lodestar genuinely stores totalActiveBalanceIncrements in ETH-units to dodge Number.MAX_SAFE_INTEGER.                                                                                                                                                             
  - BuilderPendingPayment.weight is indeed typed UintNum64 (number-domain), not UintBn64 (bigint).                                                                                                                                                                    
  - The function does feed a state-tree mutation each epoch transition.                                                                                                                                                                                               
                                                                                                                                                                                                                                                                      
  Where the report is wrong (the load-bearing claim)                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                      
  The report equates "value > 2^53" with "loses precision in IEEE-754". That's incorrect.                                                                                                                                                                             
   
  EFFECTIVE_BALANCE_INCREMENT = 1,000,000,000 = 2_ _ 5_ has nine trailing zero bits in binary. So T _ 10_ is exactly representable as a double whenever T _ 5_ < 2__, i.e. T < ~4.61 billion ETH. The full pipeline's tightest step is the _ 6, giving a real         
  precision boundary at ~1.54 billion ETH, not 9 million.         
                                                                                                                                                                                                                                                                      
  The report's worked example (3.5e16 _ 6 = 2.1e17, ~23_ past 2__) also has an arithmetic-of-the-code mistake _ the function divides by 32 before multiplying by 6, so step 3 is ~6.56e15, well below 2__.                                                            
   
  The "second compounding site" (processAttestationsAltair.ts:153) is also fine: payment.weight is per-slot, not per-epoch, capped at total_active_balance / 32, and the same trailing-zero argument keeps the sum exact. 

  Empirical proof

  Two independent verifications:                                                                                                                                                                                                                                      
  1. Brute-force JS reproduction over every integer T _ [1M, 1B] ETH: zero divergences. First diff at T = 1,537,243,057 ETH.
  2. A vitest test added inside the lodestar repo (packages/state-transition/test/unit/util/gloasQuorumPrecision.test.ts) imports the real @lodestar/params constants and checks against a BigInt reference for 50 k, 1 M, 9 M, 9.01 M, 32 M, 35 M, 35,000,017, 64 M, 
  122 M, and 1 B ETH: 12/12 pass, including an explicit assertion that T _ 1e9 > Number.MAX_SAFE_INTEGER at 35 M ETH but the result is still bit-exact.                                                                                                              

  Recommendation                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                      
  No fix is needed for glamsterdam-devnet-3 or for Gloas mainnet activation. As defence-in-depth, the existing guard at epochCache.ts:392 could be tightened to ~1.54 B ETH (the actual precision boundary) instead of ~9 quadrillion ETH, but it's not               
  consensus-blocking.

@lodekeeper
Copy link
Copy Markdown
Contributor

@pk910 you're right — thanks for the correction.

The math: EFFECTIVE_BALANCE_INCREMENT = 10^9 = 2^9 × 5^9 has nine trailing zero bits, so T × 10^9 is exact in f64 as long as T × 5^9 < 2^53, giving T < ~4.61B ETH for the multiplication step. The /SLOTS_PER_EPOCH is just a 2^-5 exponent shift, and the × NUMERATOR / DENOMINATOR (× 6 / 10 = × 3 / 5) introduces a factor of 3 that pushes the real precision boundary to T × 3 × 5^9 < 2^53, i.e. T < ~1.537B ETH. Far past any plausible network stake.

Numerically confirmed (BigInt vs f64): T = 35M ETH, T = 1.5B ETH, T = 1.6B ETH all match exactly; first divergence is around T ≈ 1.537B ETH.

So the "overflows at ~9M ETH" framing rests on the naive value > 2^53 ⇒ imprecise premise, which doesn't hold for products with constants whose binary representation has many trailing zeros. My LGTM review repeated that framing — that was wrong, apologies.

That said, I still think the PR is worth keeping as defence-in-depth:

  • The trailing-zero-bits invariant is non-obvious and silently bound to EFFECTIVE_BALANCE_INCREMENT, SLOTS_PER_EPOCH, BUILDER_PAYMENT_THRESHOLD_NUMERATOR/DENOMINATOR, and the arithmetic order. Anyone touching those constants or the order of operations has to re-prove it.
  • Pushing bigint into the SSZ type makes the payment.weight >= quorum comparison in processBuilderPendingPayments.ts correct by construction across all current and future consumers.
  • SSZ wire format is unchanged (UintNum64 and UintBn64 both encode as uint64 LE), so it's a no-op on state-root / network level.

@parithosh — worth amending the PR description / commit message to drop the "9M ETH overflow" claim and reframe as something like "remove dependence on f64 trailing-zero precision invariant" or "make builder payment quorum consensus-safe by construction", otherwise the misleading framing will live on in the merged history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants