fix: fix qwen3-235b deepseek-v3 h100 perf tests#2703
Conversation
|
/ok to test d5b2fa8 |
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
a1d2059 to
46336be
Compare
Signed-off-by: Yuki Huang <yukih@nvidia.com>
|
/ok to test cd840e7 |
|
/ok to test 810268f |
Signed-off-by: Yuki Huang <yukih@nvidia.com>
810268f to
68b68fa
Compare
|
/ok to test 68b68fa |
terrykong
left a comment
There was a problem hiding this comment.
Reviewed by an agent team (5 agents). No production bugs found — the core change is sound and behavior-preserving.
The prepare_for_lp_inference() restructure fixes a real bug: previously prepare_for_lp_inference() ran only when not skip_prev_logprobs, yet get_reference_policy_logprobs could still execute afterward (when force_on_policy_ratio=True and a reference forward was needed), forwarding on a model that hadn't been onloaded. Decoupling the two is correct. The to_compute_kl removal is equivalent given setup() auto-enables skip_reference_policy_logprobs_calculation when reference_policy_kl_penalty==0 and asserts the flag requires kl==0. All six perf tests named in the description inherit moe_backend: triton through config defaults:.
Comments below are all non-blocking suggestions/questions. Thanks for the fix!
Generated by Claude Code
| def test_grpo_train_skips_reference_policy_logprobs_when_kl_disabled( | ||
| mock_grpo_components, train_func, skip_reference_policy_logprobs_calculation | ||
| ): | ||
| def test_grpo_train_skips_reference_policy_logprobs(mock_grpo_components, train_func): |
There was a problem hiding this comment.
Dropping the to_compute_kl guard from grpo_train/async_grpo_train makes setup()'s auto-enable the sole enforcer of "reference_policy_kl_penalty==0 => skip reference logprobs". Removing the implicit_kl_zero parametrization was correct (this test calls grpo_train directly, bypassing setup()), but no test now asserts that auto-enable. Consider locking it in — the existing test_setup_sglang_sets_model_path_and_parallel_flag already runs setup() with reference_policy_kl_penalty=0.0, so one assertion (verified passing locally) covers it:
# after tests/unit/algorithms/test_grpo.py:1583
assert master_config.grpo["skip_reference_policy_logprobs_calculation"] is True| uv run tests/check_metrics.py $JSON_METRICS \ | ||
| 'median(data["train/token_mult_prob_error"]) < 1.1' \ | ||
| 'data["train/token_mult_prob_error"]["10"] < 1.1' | ||
| 'median(data["train/token_mult_prob_error"]) < 1.1' |
There was a problem hiding this comment.
The per-step check data["train/token_mult_prob_error"]["10"] < 1.1 was dropped here, but it still exists in the sibling perf tests this PR lists as fixed — e.g. grpo-deepseek-v3-32n8g.sh:48 (whose YAML this PR also edits) and grpo-qwen3-235b-32n8g.sh:40. Was the step-10 flakiness observed only on 16n8g (with the moe_backend change stabilizing the rest), or should the removal extend to those configs too?
| tensor_parallel_size: 32 | ||
| async_engine: true | ||
| vllm_kwargs: | ||
| moe_backend: triton |
There was a problem hiding this comment.
grpo-deepseek-v3-32n8g.yaml:52
Nice — the GRPO qwen3-235b and deepseek-v3 perf configs all inherit this moe_backend: triton through their defaults: chains, so no per-file additions are needed. One gap for a possible follow-up: dapo-deepseek-v3-64n8g.v2.yaml inherits from grpo_math_1B.yaml rather than this base, so it won't pick up the setting despite being the same DeepSeek-V3 MoE model. It's outside #2579's enumerated tests, so not blocking — just flagging.
| tensor_parallel_size: 16 | ||
| async_engine: true | ||
| vllm_kwargs: | ||
| moe_backend: triton |
There was a problem hiding this comment.
Minor: consider mirroring the explanatory comment from grpo-moonlight-16ba3b-4n8g-megatron.yaml:18-20 here and in the deepseek config (vLLM 0.20's FlashInfer TRTLLM MoE backend isn't refit-compatible => TRITON keeps the refit-compatible 3D MoE layout). Helps future readers understand why triton is pinned.
terrykong
left a comment
There was a problem hiding this comment.
lgtm modulo those comments
Issue
related to #2579.
Summary
fix several h100 perf tests list below, gb200 related perf tests still have other problems.
grpo-qwen3-235b-16n8g,grpo-qwen3-235b-32n8g,grpo-qwen3-235b-32n8g-async-1offgrpo-deepseek-v3-32n8g,grpo-deepseek-v3-64n8g,grpo-deepseek-v3-64n8g-async-1offChanges
prepare_for_lp_inference, only skip when bothskip_prev_logprobs+skip_reference_logprobs. previously only whenskip_prev_logprobs.vllm_kwargs.moe_backend=tritonto keep same behavior with vLLM==0.17.to_compute_klsinceskip_reference_policy_logprobs_calculationwill be set toTrueinsetupwhenreference_policy_kl_penalty==0.grpo-qwen3-235b-16n8g.