Quick Start

conda create -n agent python=3.10 -y
conda activate agent

# Install dependencies
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
pip3 install -r requirements.txt

# Add verl as a submodule
git submodule add https://github.com/volcengine/verl.git verl
git submodule update --init --recursive --remote
cd verl
pip3 install -e .
cd ..

Supported Environments

Sokoban
Search
Zebra Puzzle
Math
Code

Features

Clear support for custom environments
Easy integration with verl via submodule
Rephrased environment feedback
Combined SFT and RL loss
Monte Carlo Tree Search (MCTS)

How to Add a New Environment

Create a new folder under src/env/ (e.g., myenv/) and implement env.py following the examples in other environments.
Your environment class should implement at least:
- run(responses_str, batch, chat_template): main agent-environment interaction.
- get_reward_allocation(reward_tensors): (optional) custom reward allocation.
Register your environment in src/env/__init__.py and add a branch in the get_env function.
Specify your environment in training scripts with +env.name=your_env.

Dataset Generation

Each environment provides a create_dataset.py script for generating training and test datasets. For example:

cd src/env/sokoban
python create_dataset.py --output dataset/sokoban --train_size 10000 --test_size 100

Other environments (math, zebra, code) are similar. See each script for available arguments.

Training & Validation

For example, to train on the math environment (see train_math_ppo.sh for details):

bash train_math_ppo.sh

Or run the main program directly:

python -m src.core.main_ppo +env.name=math ...

See the training scripts for all configurable parameters.

References

TinyZero: https://github.com/Jiayi-Pan/TinyZero
RAGEN: https://github.com/RAGEN-AI/RAGEN/
Search-R1: https://github.com/PeterGriffinJin/Search-R1
rllm: https://github.com/agentica-project/rllm

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
src		src
verl @ 3eaaf24		verl @ 3eaaf24
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
debug_code_ppo.sh		debug_code_ppo.sh
debug_math_ppo.sh		debug_math_ppo.sh
debug_search_ppo.sh		debug_search_ppo.sh
debug_soko_ppo.sh		debug_soko_ppo.sh
debug_zebra_ppo.sh		debug_zebra_ppo.sh
requirements.txt		requirements.txt
train_code_ppo.sh		train_code_ppo.sh
train_math_ppo.sh		train_math_ppo.sh
train_search_ppo.sh		train_search_ppo.sh
train_soko_ppo.sh		train_soko_ppo.sh
train_zebra_ppo.sh		train_zebra_ppo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Supported Environments

Features

How to Add a New Environment

Dataset Generation

Training & Validation

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Supported Environments

Features

How to Add a New Environment

Dataset Generation

Training & Validation

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages