[TRAIN-8] Add end-to-end custom reward functions tutorial by mkcash · Pull Request #2733 · NVIDIA-NeMo/RL

mkcash · 2026-06-07T05:13:28Z

Summary

Adds a new tutorial page Custom Reward Functions to the docs/guides section, providing an end-to-end example for:

What's included

Implementing reward functions — Two complete examples:
- conciseness_reward() — scores responses by length
- keyword_coverage_reward() — scores by required keyword presence
Registering via custom environment — ConcisenessEnvironment class that:
- Subclasses BaseEnvironment
- Uses the @register_environment decorator
- Supports multi-reward (GDPO) and single-reward (GRPO) patterns
- Accepts configurable parameters via env_config
Training config integration — Complete YAML example with:
- trainer.env: "ConcisenessEnv" wiring
- GDPO multi-reward weights
- CLI override examples
Standalone runnable example — FormatAdherenceEnvironment as a minimal starting point

Related docs

See docs/guides/environments.md for built-in environment reference
See docs/guides/grpo.md for training configuration details

Adds a comprehensive end-to-end tutorial covering: - Implementing custom reward functions - Registering them via custom environment classes - Wiring into training configs with working examples

copy-pr-bot · 2026-06-07T05:13:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

[TRAIN-8] Add Custom Reward Functions tutorial

eadcb54

Adds a comprehensive end-to-end tutorial covering: - Implementing custom reward functions - Registering them via custom environment classes - Wiring into training configs with working examples

mkcash requested a review from a team as a code owner June 7, 2026 05:13

github-actions Bot added Documentation Improvements or additions to documentation community-request labels Jun 7, 2026

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRAIN-8] Add end-to-end custom reward functions tutorial#2733

[TRAIN-8] Add end-to-end custom reward functions tutorial#2733
mkcash wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
mkcash:docs/custom-reward-tutorial

mkcash commented Jun 7, 2026

Uh oh!

copy-pr-bot Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkcash commented Jun 7, 2026

Summary

What's included

Related docs

Uh oh!

copy-pr-bot Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants