Skip to content

[TRAIN-8] Add end-to-end custom reward functions tutorial#2733

Open
mkcash wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
mkcash:docs/custom-reward-tutorial
Open

[TRAIN-8] Add end-to-end custom reward functions tutorial#2733
mkcash wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
mkcash:docs/custom-reward-tutorial

Conversation

@mkcash

@mkcash mkcash commented Jun 7, 2026

Copy link
Copy Markdown

Summary

Closes #2724

Adds a new tutorial page Custom Reward Functions to the docs/guides section, providing an end-to-end example for:

What's included

  1. Implementing reward functions — Two complete examples:

    • conciseness_reward() — scores responses by length
    • keyword_coverage_reward() — scores by required keyword presence
  2. Registering via custom environmentConcisenessEnvironment class that:

    • Subclasses BaseEnvironment
    • Uses the @register_environment decorator
    • Supports multi-reward (GDPO) and single-reward (GRPO) patterns
    • Accepts configurable parameters via env_config
  3. Training config integration — Complete YAML example with:

    • trainer.env: "ConcisenessEnv" wiring
    • GDPO multi-reward weights
    • CLI override examples
  4. Standalone runnable exampleFormatAdherenceEnvironment as a minimal starting point

Related docs

  • See docs/guides/environments.md for built-in environment reference
  • See docs/guides/grpo.md for training configuration details

Adds a comprehensive end-to-end tutorial covering:
- Implementing custom reward functions
- Registering them via custom environment classes
- Wiring into training configs with working examples
@mkcash mkcash requested a review from a team as a code owner June 7, 2026 05:13
@copy-pr-bot

copy-pr-bot Bot commented Jun 7, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added Documentation Improvements or additions to documentation community-request labels Jun 7, 2026
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Documentation Improvements or additions to documentation waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TRAIN-8] No end-to-end tutorial for writing and registering custom reward functions

2 participants