Skip to content

fix: fix qwen3-235b deepseek-v3 h100 perf tests#2703

Open
yuki-97 wants to merge 5 commits into
mainfrom
yukih/fix-reference-logp
Open

fix: fix qwen3-235b deepseek-v3 h100 perf tests#2703
yuki-97 wants to merge 5 commits into
mainfrom
yukih/fix-reference-logp

Conversation

@yuki-97

@yuki-97 yuki-97 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Issue

related to #2579.

Summary

fix several h100 perf tests list below, gb200 related perf tests still have other problems.

  • grpo-qwen3-235b-16n8g, grpo-qwen3-235b-32n8g, grpo-qwen3-235b-32n8g-async-1off
  • grpo-deepseek-v3-32n8g, grpo-deepseek-v3-64n8g, grpo-deepseek-v3-64n8g-async-1off

Changes

  1. fix prepare_for_lp_inference, only skip when both skip_prev_logprobs + skip_reference_logprobs . previously only when skip_prev_logprobs.
  2. use vllm_kwargs.moe_backend=triton to keep same behavior with vLLM==0.17.
  3. remove duplicated to_compute_kl since skip_reference_policy_logprobs_calculation will be set to True in setup when reference_policy_kl_penalty==0.
  4. remove flaky check in grpo-qwen3-235b-16n8g.

@copy-pr-bot

copy-pr-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Jun 5, 2026
@yuki-97

yuki-97 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test d5b2fa8

yuki-97 added 3 commits June 5, 2026 20:25
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/fix-reference-logp branch from a1d2059 to 46336be Compare June 6, 2026 03:25
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 changed the title fix: fix skip_reference_logprobs fix: fix qwen3-235b deepseek-v3 h100 perf tests Jun 6, 2026
@yuki-97 yuki-97 marked this pull request as ready for review June 6, 2026 12:44
@yuki-97 yuki-97 requested review from a team as code owners June 6, 2026 12:44
@yuki-97

yuki-97 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test cd840e7

@yuki-97

yuki-97 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 810268f

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/fix-reference-logp branch from 810268f to 68b68fa Compare June 6, 2026 16:59
@yuki-97

yuki-97 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 68b68fa

@terrykong terrykong left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed by an agent team (5 agents). No production bugs found — the core change is sound and behavior-preserving.

The prepare_for_lp_inference() restructure fixes a real bug: previously prepare_for_lp_inference() ran only when not skip_prev_logprobs, yet get_reference_policy_logprobs could still execute afterward (when force_on_policy_ratio=True and a reference forward was needed), forwarding on a model that hadn't been onloaded. Decoupling the two is correct. The to_compute_kl removal is equivalent given setup() auto-enables skip_reference_policy_logprobs_calculation when reference_policy_kl_penalty==0 and asserts the flag requires kl==0. All six perf tests named in the description inherit moe_backend: triton through config defaults:.

Comments below are all non-blocking suggestions/questions. Thanks for the fix!

Generated by Claude Code

def test_grpo_train_skips_reference_policy_logprobs_when_kl_disabled(
mock_grpo_components, train_func, skip_reference_policy_logprobs_calculation
):
def test_grpo_train_skips_reference_policy_logprobs(mock_grpo_components, train_func):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_grpo.py:1814

Dropping the to_compute_kl guard from grpo_train/async_grpo_train makes setup()'s auto-enable the sole enforcer of "reference_policy_kl_penalty==0 => skip reference logprobs". Removing the implicit_kl_zero parametrization was correct (this test calls grpo_train directly, bypassing setup()), but no test now asserts that auto-enable. Consider locking it in — the existing test_setup_sglang_sets_model_path_and_parallel_flag already runs setup() with reference_policy_kl_penalty=0.0, so one assertion (verified passing locally) covers it:

# after tests/unit/algorithms/test_grpo.py:1583
assert master_config.grpo["skip_reference_policy_logprobs_calculation"] is True

uv run tests/check_metrics.py $JSON_METRICS \
'median(data["train/token_mult_prob_error"]) < 1.1' \
'data["train/token_mult_prob_error"]["10"] < 1.1'
'median(data["train/token_mult_prob_error"]) < 1.1'

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpo-qwen3-235b-16n8g.sh:39

The per-step check data["train/token_mult_prob_error"]["10"] < 1.1 was dropped here, but it still exists in the sibling perf tests this PR lists as fixed — e.g. grpo-deepseek-v3-32n8g.sh:48 (whose YAML this PR also edits) and grpo-qwen3-235b-32n8g.sh:40. Was the step-10 flakiness observed only on 16n8g (with the moe_backend change stabilizing the rest), or should the removal extend to those configs too?

tensor_parallel_size: 32
async_engine: true
vllm_kwargs:
moe_backend: triton

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpo-deepseek-v3-32n8g.yaml:52

Nice — the GRPO qwen3-235b and deepseek-v3 perf configs all inherit this moe_backend: triton through their defaults: chains, so no per-file additions are needed. One gap for a possible follow-up: dapo-deepseek-v3-64n8g.v2.yaml inherits from grpo_math_1B.yaml rather than this base, so it won't pick up the setting despite being the same DeepSeek-V3 MoE model. It's outside #2579's enumerated tests, so not blocking — just flagging.

tensor_parallel_size: 16
async_engine: true
vllm_kwargs:
moe_backend: triton

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpo-qwen3-235b-16n8g.yaml:54

Minor: consider mirroring the explanatory comment from grpo-moonlight-16ba3b-4n8g-megatron.yaml:18-20 here and in the deepseek config (vLLM 0.20's FlashInfer TRTLLM MoE backend isn't refit-compatible => TRITON keeps the refit-compatible 3D MoE layout). Helps future readers understand why triton is pinned.

@terrykong terrykong left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo those comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants