Allow and handle None rewards (miles side) by flukeskywalker · Pull Request #52 · LLM360/miles

flukeskywalker · 2026-06-24T08:01:24Z

Treat Sample.remove_sample=True as a training-semantic removal, not only a loss-mask change.
Exclude removed samples from default GRPO/GSPO/reinforce++-baseline reward normalization and keep processed rewards/advantages/returns neutral.
Preserve shape in rollout/train artifacts with explicit remove_samples metadata and allow zero loss masks only for explicitly removed samples.
Update --rollout-sample-filter-path help/docs and add focused tests for normalization and zero-mask validation.

Why

With https://github.com/LLM360/RL360/pull/427, RL360 will mark no-verifier/no-reward Harbor samples with remove_sample=True. These will have reward=None after https://github.com/LLM360/RL360/pull/415 is merged. Miles needs that marker to mean the sample is excluded from loss as well as training math instead of still influencing group reward baselines or advantage normalization.

TODO: make removal of such samples from reward normalization optional, defaulting to False to preserve current behavior.

Implement removed sample training semantics

d065e4a

flukeskywalker changed the base branch from main to prod June 24, 2026 08:01

flukeskywalker changed the title ~~[codex] Implement removed sample training semantics~~ Allow and handle None rewards (miles side) Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow and handle None rewards (miles side)#52

Allow and handle None rewards (miles side)#52
flukeskywalker wants to merge 1 commit into
prodfrom
training-semantics

flukeskywalker commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

flukeskywalker commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

flukeskywalker commented Jun 24, 2026 •

edited

Loading