fix: fix qwen3-235b deepseek-v3 h100 perf tests by yuki-97 · Pull Request #2703 · NVIDIA-NeMo/RL

yuki-97 · 2026-06-05T15:21:21Z

Issue

related to #2579.

Summary

fix several h100 perf tests list below, gb200 related perf tests still have other problems.

grpo-qwen3-235b-16n8g, grpo-qwen3-235b-32n8g, grpo-qwen3-235b-32n8g-async-1off
grpo-deepseek-v3-32n8g, grpo-deepseek-v3-64n8g, grpo-deepseek-v3-64n8g-async-1off

Changes

fix prepare_for_lp_inference, only skip when both skip_prev_logprobs + skip_reference_logprobs . previously only when skip_prev_logprobs.
use vllm_kwargs.moe_backend=triton to keep same behavior with vLLM==0.17.
remove duplicated to_compute_kl since skip_reference_policy_logprobs_calculation will be set to True in setup when reference_policy_kl_penalty==0.
remove flaky check in grpo-qwen3-235b-16n8g.

copy-pr-bot · 2026-06-05T15:21:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuki-97 · 2026-06-05T15:22:17Z

/ok to test d5b2fa8

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 · 2026-06-06T12:44:31Z

/ok to test cd840e7

yuki-97 · 2026-06-06T16:29:11Z

/ok to test 810268f

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 · 2026-06-06T17:00:34Z

/ok to test 68b68fa

terrykong

Reviewed by an agent team (5 agents). No production bugs found — the core change is sound and behavior-preserving.

The prepare_for_lp_inference() restructure fixes a real bug: previously prepare_for_lp_inference() ran only when not skip_prev_logprobs, yet get_reference_policy_logprobs could still execute afterward (when force_on_policy_ratio=True and a reference forward was needed), forwarding on a model that hadn't been onloaded. Decoupling the two is correct. The to_compute_kl removal is equivalent given setup() auto-enables skip_reference_policy_logprobs_calculation when reference_policy_kl_penalty==0 and asserts the flag requires kl==0. All six perf tests named in the description inherit moe_backend: triton through config defaults:.

Comments below are all non-blocking suggestions/questions. Thanks for the fix!

Generated by Claude Code

terrykong · 2026-06-08T21:45:50Z

-def test_grpo_train_skips_reference_policy_logprobs_when_kl_disabled(
-    mock_grpo_components, train_func, skip_reference_policy_logprobs_calculation
-):
+def test_grpo_train_skips_reference_policy_logprobs(mock_grpo_components, train_func):


test_grpo.py:1814

Dropping the to_compute_kl guard from grpo_train/async_grpo_train makes setup()'s auto-enable the sole enforcer of "reference_policy_kl_penalty==0 => skip reference logprobs". Removing the implicit_kl_zero parametrization was correct (this test calls grpo_train directly, bypassing setup()), but no test now asserts that auto-enable. Consider locking it in — the existing test_setup_sglang_sets_model_path_and_parallel_flag already runs setup() with reference_policy_kl_penalty=0.0, so one assertion (verified passing locally) covers it:

# after tests/unit/algorithms/test_grpo.py:1583 assert master_config.grpo["skip_reference_policy_logprobs_calculation"] is True

terrykong · 2026-06-08T21:45:50Z

    uv run tests/check_metrics.py $JSON_METRICS \
-        'median(data["train/token_mult_prob_error"]) < 1.1' \
-        'data["train/token_mult_prob_error"]["10"] < 1.1'
+        'median(data["train/token_mult_prob_error"]) < 1.1'


grpo-qwen3-235b-16n8g.sh:39

The per-step check data["train/token_mult_prob_error"]["10"] < 1.1 was dropped here, but it still exists in the sibling perf tests this PR lists as fixed — e.g. grpo-deepseek-v3-32n8g.sh:48 (whose YAML this PR also edits) and grpo-qwen3-235b-32n8g.sh:40. Was the step-10 flakiness observed only on 16n8g (with the moe_backend change stabilizing the rest), or should the removal extend to those configs too?

terrykong · 2026-06-08T21:45:50Z

      tensor_parallel_size: 32
      async_engine: true
+    vllm_kwargs:
+      moe_backend: triton


grpo-deepseek-v3-32n8g.yaml:52

Nice — the GRPO qwen3-235b and deepseek-v3 perf configs all inherit this moe_backend: triton through their defaults: chains, so no per-file additions are needed. One gap for a possible follow-up: dapo-deepseek-v3-64n8g.v2.yaml inherits from grpo_math_1B.yaml rather than this base, so it won't pick up the setting despite being the same DeepSeek-V3 MoE model. It's outside #2579's enumerated tests, so not blocking — just flagging.

terrykong · 2026-06-08T21:45:50Z

      tensor_parallel_size: 16
      async_engine: true
+    vllm_kwargs:
+      moe_backend: triton


grpo-qwen3-235b-16n8g.yaml:54

Minor: consider mirroring the explanatory comment from grpo-moonlight-16ba3b-4n8g-megatron.yaml:18-20 here and in the deepseek config (vLLM 0.20's FlashInfer TRTLLM MoE backend isn't refit-compatible => TRITON keeps the refit-compatible 3D MoE layout). Helps future readers understand why triton is pinned.

terrykong

lgtm modulo those comments

yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Jun 5, 2026

copy-pr-bot Bot temporarily deployed to public June 5, 2026 15:22 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 15:23 Inactive

copy-pr-bot Bot temporarily deployed to test June 5, 2026 15:25 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 15:26 Inactive

yuki-97 added 3 commits June 5, 2026 20:25

fix skip_reference_logprobs

51daa66

Signed-off-by: Yuki Huang <yukih@nvidia.com>

use triton moe_backend

6929534

Signed-off-by: Yuki Huang <yukih@nvidia.com>

remove flaky check

46336be

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/fix-reference-logp branch from a1d2059 to 46336be Compare June 6, 2026 03:25

fix rebase

cd840e7

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 changed the title ~~fix: fix skip_reference_logprobs~~ fix: fix qwen3-235b deepseek-v3 h100 perf tests Jun 6, 2026

yuki-97 marked this pull request as ready for review June 6, 2026 12:44

yuki-97 requested review from a team as code owners June 6, 2026 12:44

copy-pr-bot Bot temporarily deployed to public June 6, 2026 12:44 Inactive

yuki-97 requested a review from terrykong June 6, 2026 12:45

copy-pr-bot Bot temporarily deployed to public June 6, 2026 12:45 Inactive

copy-pr-bot Bot temporarily deployed to test June 6, 2026 12:47 Inactive

copy-pr-bot Bot temporarily deployed to public June 6, 2026 12:48 Inactive

copy-pr-bot Bot temporarily deployed to public June 6, 2026 16:29 Inactive

copy-pr-bot Bot temporarily deployed to test June 6, 2026 16:32 Inactive

copy-pr-bot Bot temporarily deployed to public June 6, 2026 16:33 Inactive

update unit test

68b68fa

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/fix-reference-logp branch from 810268f to 68b68fa Compare June 6, 2026 16:59

copy-pr-bot Bot temporarily deployed to public June 6, 2026 17:00 Inactive

copy-pr-bot Bot temporarily deployed to public June 6, 2026 17:01 Inactive

copy-pr-bot Bot temporarily deployed to test June 6, 2026 17:03 Inactive

copy-pr-bot Bot temporarily deployed to public June 6, 2026 17:04 Inactive

terrykong reviewed Jun 8, 2026

View reviewed changes

terrykong approved these changes Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix qwen3-235b deepseek-v3 h100 perf tests#2703

fix: fix qwen3-235b deepseek-v3 h100 perf tests#2703
yuki-97 wants to merge 5 commits into
mainfrom
yukih/fix-reference-logp

yuki-97 commented Jun 5, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

yuki-97 commented Jun 5, 2026

Uh oh!

yuki-97 commented Jun 6, 2026

Uh oh!

yuki-97 commented Jun 6, 2026

Uh oh!

yuki-97 commented Jun 6, 2026

Uh oh!

terrykong left a comment

Uh oh!

terrykong Jun 8, 2026

Uh oh!

terrykong Jun 8, 2026

Uh oh!

terrykong Jun 8, 2026

Uh oh!

terrykong Jun 8, 2026

Uh oh!

terrykong left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuki-97 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Summary

Changes

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

yuki-97 commented Jun 5, 2026

Uh oh!

yuki-97 commented Jun 6, 2026

Uh oh!

yuki-97 commented Jun 6, 2026

Uh oh!

yuki-97 commented Jun 6, 2026

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

terrykong Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

terrykong Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

terrykong Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuki-97 commented Jun 5, 2026 •

edited

Loading