Skip to content

[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904

Merged
vmoens merged 4 commits into
pytorch:mainfrom
theap06:feat/offline-to-online-trainer
Jun 23, 2026
Merged

[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904
vmoens merged 4 commits into
pytorch:mainfrom
theap06:feat/offline-to-online-trainer

Conversation

@theap06

@theap06 theap06 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an offline-to-online SAC trainer and a runnable SOTA example for the offline-pretrain to online-finetune workflow.

This builds on the OfflineToOnlineReplayBuffer and dataset-loading helpers from #3900.

What's added

  • OfflineToOnlineTrainer: a SACTrainer subclass that uses OfflineToOnlineReplayBuffer for mixed offline/online optimization batches.
  • Trainer hooks:
    • OfflineToOnlineReplayBufferHook stores collected experience in the online buffer and samples mixed optimization batches.
    • OfflineToOnlineAnnealHook decays the offline sampling fraction over collected frames.
  • Hydra config support through OfflineToOnlineTrainerConfig, including parity with the trainer constructor and registration in the config store.
  • Checkpoint support for the online buffer and current/base offline sampling fractions.
  • sota-implementations/offline_to_online/train.py: standalone SAC offline-to-online training script using registered dataset strings such as d4rl: and minari:.

Docs and tests

  • Adds reference entries for OfflineToOnlineTrainer and OfflineToOnlineTrainerConfig.
  • Extends test/test_offline_to_online.py with hook, trainer wiring, config, and checkpoint coverage.

@pytorch-bot

pytorch-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3904

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 18 Pending

As of commit 6651edb with merge base f7ba109 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2026
@github-actions github-actions Bot added Data Data-related PR, will launch data-related jobs sota-implementations/ ReplayBuffers Trainers Feature New feature labels Jun 23, 2026
@theap06 theap06 marked this pull request as draft June 23, 2026 04:49
@theap06 theap06 marked this pull request as ready for review June 23, 2026 05:24
@theap06 theap06 force-pushed the feat/offline-to-online-trainer branch from b9ba04f to 2893f5b Compare June 23, 2026 05:58
@theap06 theap06 changed the title [Feature] OfflineToOnlineTrainer + sota-implementation for offline→online RL [Feature] OfflineToOnlineTrainer + sota script for offline→online RL Jun 23, 2026
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Jun 23, 2026
theap06 and others added 2 commits June 23, 2026 13:36
Follow-up to the OfflineToOnlineReplayBuffer PR: a SAC trainer that drives the
offline-pretrain -> online-finetune transition, plus a standalone
sota-implementations script.

- OfflineToOnlineTrainer (subclasses SACTrainer): routes collected experience
  to the online buffer (pre_epoch), samples a mixed offline/online batch
  (process_optim_batch), and anneals the offline fraction to zero over
  anneal_frames (post_steps). Backed by two reusable hooks:
  OfflineToOnlineReplayBufferHook (projects online transitions onto the offline
  dataset schema so the mixed-batch concat stays valid) and
  OfflineToOnlineAnnealHook.
- sota-implementations/offline_to_online/train.py: a self-contained SAC
  offline->online script (offline dataset via d4rl:/minari: string).
- Tests: hook + flow tests and a gated functional train() run on Pendulum.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vmoens vmoens force-pushed the feat/offline-to-online-trainer branch from 07dbcac to c50a082 Compare June 23, 2026 20:37
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results: PR 6651edb6 vs main f7ba1092

Benchmark run: https://github.com/pytorch/rl/actions/runs/28063125784

Higher ops/sec is better. Tables are sorted by largest absolute change.

CPU

Compared 216 benchmarks. Regressions over 5%: 13. Improvements over 5%: 14.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 61.09 525.62 +760.38%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 193.56 37.80 -80.47%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 185.49 55.01 -70.34%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 53.77 86.43 +60.73%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 203.47 250.19 +22.96%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 893.02 744.79 -16.60%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] 22.66 19.28 -14.93%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 24.67 27.64 +12.06%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-backward] 128.26 143.71 +12.04%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] 4,370 3,939 -9.85%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,096 1,909 -8.94%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 259.91 282.19 +8.57%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2,800 3,010 +7.50%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 133.50 143.50 +7.49%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,867 2,005 +7.37%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,171 2,013 -7.27%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 5,201 4,836 -7.02%
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] 7,746 7,286 -5.94%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3,120 2,939 -5.82%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 34,649 32,662 -5.74%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 719.33 759.66 +5.61%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,767 2,919 +5.50%
benchmarks/test_envs_benchmark.py::test_simple 1.7089 1.8017 +5.43%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] 265.30 279.52 +5.36%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] 284.71 269.96 -5.18%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.3657 1.2962 -5.09%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] 637.86 670.19 +5.07%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 29,565 28,096 -4.97%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 766.84 804.47 +4.91%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 108.61 113.93 +4.90%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] 32,341 30,761 -4.88%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 8.2561 7.8825 -4.53%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 193.21 184.79 -4.36%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] 63,039 65,780 +4.35%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,682 2,571 -4.14%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 2,239 2,148 -4.10%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,166 3,042 -3.93%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] 469.83 488.22 +3.91%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 284.99 296.05 +3.88%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 273.73 284.10 +3.79%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 776.32 747.67 -3.69%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[True-None] 219.26 226.95 +3.50%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] 165.04 159.36 -3.44%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] 398.11 411.76 +3.43%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 50,422 48,697 -3.42%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 176.70 170.74 -3.37%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 18,886 18,253 -3.35%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 279.08 288.42 +3.35%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] 308.13 318.19 +3.26%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,557 2,476 -3.18%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,369 3,265 -3.08%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 401.64 413.98 +3.07%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] 467.04 481.35 +3.06%
benchmarks/test_envs_benchmark.py::test_transformed 0.8881 0.9148 +3.00%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-backward] 53.84 55.46 +3.00%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,328 4,455 +2.93%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] 30,080 29,203 -2.92%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 535.11 550.68 +2.91%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-gru] 1.4478 1.4059 -2.90%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] 376,455 365,562 -2.89%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 260.15 267.58 +2.86%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 48.48 47.11 -2.82%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 75,973 78,106 +2.81%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 950.75 977.42 +2.81%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 57,171 55,592 -2.76%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,732 1,779 +2.71%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 484.41 497.46 +2.69%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,487 3,574 +2.51%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 21,497 22,021 +2.44%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] 2,648 2,584 -2.39%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[reduce-overhead-None] 115.17 117.79 +2.27%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] 12,290 12,018 -2.21%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[reduce-overhead-None] 286.99 293.13 +2.14%
benchmarks/test_envs_benchmark.py::test_serial 0.5732 0.5852 +2.09%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] 84.76 86.53 +2.08%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] 44,046 43,132 -2.07%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 59.22 60.45 +2.07%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 30,471 29,840 -2.07%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] 43.53 44.43 +2.06%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[reduce-overhead-None] 1,771 1,807 +2.02%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 702.11 716.27 +2.02%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] 116.56 118.91 +2.02%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] 87.37 89.11 +1.99%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 52.17 51.14 -1.98%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] 330.15 336.64 +1.96%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[reduce-overhead-None] 84.54 86.20 +1.96%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[reduce-overhead-None] 223.95 228.26 +1.92%
benchmarks/test_envs_benchmark.py::test_parallel 0.9669 0.9484 -1.91%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] 704.69 691.33 -1.90%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] 130.37 132.63 +1.73%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-None] 37.79 38.41 +1.66%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] 19,649 19,323 -1.66%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-1] 517.20 525.66 +1.64%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 23,609 23,992 +1.62%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 37,863 37,258 -1.60%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[reduce-overhead-None] 562.97 571.88 +1.58%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 29.04 28.58 -1.56%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[50-img_shape0-small] 866.20 879.45 +1.53%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] 106.15 104.53 -1.52%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-None] 49.31 50.05 +1.49%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-None] 94.21 95.60 +1.48%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 41,426 42,025 +1.45%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,393 19,115 -1.43%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-None] 176.12 178.63 +1.42%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] 62.07 62.96 +1.42%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] 10.99 10.84 -1.41%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-None] 121.72 120.00 -1.41%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] 7,329 7,226 -1.40%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 23,498 23,177 -1.37%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 0.5923 0.6003 +1.35%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 15.10 15.30 +1.33%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1,066 1,052 -1.33%
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] 34.89 35.35 +1.32%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] 2.0454 2.0191 -1.28%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-gru] 1.3456 1.3626 +1.27%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-16] 36.77 37.23 +1.25%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] 210.33 212.95 +1.24%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[reduce-overhead-None] 332.65 336.69 +1.22%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1,080 1,093 +1.19%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[False-None] 700.28 692.06 -1.17%
... ... ... Showing 120 of 216 comparisons, sorted by absolute change.

GPU

Compared 226 benchmarks. Regressions over 5%: 17. Improvements over 5%: 10.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 195.69 39.70 -79.71%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 3,275 4,215 +28.72%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,859 3,457 +20.94%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,096 3,710 +19.85%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 105.95 84.92 -19.85%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,270 2,695 -17.59%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,335 2,805 -15.88%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,027 2,339 +15.40%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,971 2,587 -12.91%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] 5.6075 4.9273 -12.13%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] 6.5121 7.2753 +11.72%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,539 2,824 +11.23%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 791.96 703.64 -11.15%
benchmarks/test_collectors_benchmark.py::test_single_with_rb_pixels 5.3563 4.8042 -10.31%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,051 1,878 -8.47%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 953.64 883.53 -7.35%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 448.23 480.07 +7.10%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[False-None] 107.86 115.06 +6.68%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,835 2,654 -6.38%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 24,123 22,594 -6.34%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 539.62 505.73 -6.28%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] 10.36 10.99 +6.04%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 860.45 809.90 -5.88%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 354.83 336.26 -5.23%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 50,999 48,340 -5.21%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 52.46 55.16 +5.13%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] 363.28 345.01 -5.03%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 316.83 332.54 +4.96%
benchmarks/test_envs_benchmark.py::test_simple 1.2534 1.1923 -4.87%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 39,069 37,239 -4.68%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 570.02 543.32 -4.68%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 30,025 28,646 -4.59%
benchmarks/test_collectors_benchmark.py::test_sync_pixels 9.8631 10.31 +4.54%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] 6,277 6,003 -4.38%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 392.34 375.24 -4.36%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 141.79 135.62 -4.35%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-backward] 156.63 149.95 -4.27%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[200-img_shape1-large_batch] 8.9206 8.5409 -4.26%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-backward] 273.29 261.70 -4.24%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[200-img_shape1-large_batch] 8.5706 8.2130 -4.17%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] 4,911 4,707 -4.16%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[100-img_shape0-atari] 17.08 16.37 -4.12%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 953.78 992.12 +4.02%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-4] 142.64 148.32 +3.98%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] 135.45 130.14 -3.92%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] 49.50 47.59 -3.86%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 898.52 863.86 -3.86%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] 44,864 43,139 -3.85%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 50,531 48,606 -3.81%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 177.98 171.23 -3.79%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 32,001 30,802 -3.75%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] 286.54 275.82 -3.74%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,868 1,936 +3.63%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 33,156 31,969 -3.58%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 717.53 692.53 -3.48%
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] 311.66 301.03 -3.41%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2,930 2,836 -3.21%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-64] 12.35 12.74 +3.21%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cuda] 2,233 2,302 +3.11%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 52.09 53.69 +3.06%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] 32,439 31,472 -2.98%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] 42.43 43.68 +2.96%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 47.86 49.26 +2.93%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-False] 35,421 34,396 -2.90%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 721.65 742.35 +2.87%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-backward] 344.36 354.07 +2.82%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-None] 596.98 613.80 +2.82%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 64,442 62,648 -2.78%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 517.64 503.30 -2.77%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 56,376 54,822 -2.76%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-cuda_storage_cuda_samp... 1,481 1,521 +2.72%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 193.71 188.48 -2.70%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 77,070 75,011 -2.67%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] 19,988 19,458 -2.65%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] 808.73 830.12 +2.65%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.5896 0.6047 +2.56%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] 64,885 63,239 -2.54%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-memmap_cpu_storage_cud... 978.21 1,001 +2.36%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 22.23 22.75 +2.35%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 0.5253 0.5130 -2.33%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-None] 774.60 756.81 -2.30%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 80.08 81.92 +2.30%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,092 2,140 +2.27%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 30,291 29,610 -2.25%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-4] 48.87 47.80 -2.18%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 33,654 34,383 +2.17%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] 332.72 325.55 -2.16%
benchmarks/test_envs_benchmark.py::test_parallel 0.5326 0.5439 +2.12%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 41,174 42,043 +2.11%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 238.26 243.20 +2.07%
benchmarks/test_envs_benchmark.py::test_transformed 0.7167 0.7022 -2.02%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] 278.96 273.33 -2.02%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,573 3,503 -1.98%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 223.96 219.61 -1.94%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 20,454 20,849 +1.93%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] 393.11 400.67 +1.92%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] 349.34 342.89 -1.85%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] 633.07 621.82 -1.78%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 18,696 18,376 -1.71%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 414.36 421.39 +1.70%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] 820.62 834.55 +1.70%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] 98.29 99.93 +1.67%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 500.67 508.99 +1.66%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-False] 42,800 42,103 -1.63%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 159.52 162.11 +1.62%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,482 4,410 -1.60%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-4] 189.53 186.51 -1.59%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 800.45 812.44 +1.50%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 187.76 190.55 +1.49%
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 12.32 12.49 +1.43%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 166.44 168.81 +1.42%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] 277.17 273.29 -1.40%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-4] 72.49 71.49 -1.39%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[100-img_shape0-atari] 17.82 17.57 -1.38%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 21.62 21.92 +1.37%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 193.27 195.89 +1.36%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 30.60 30.19 -1.34%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 0.7086 0.6992 -1.33%
benchmarks/test_envs_benchmark.py::test_serial 0.4230 0.4287 +1.33%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-16] 36.37 36.84 +1.30%
... ... ... Showing 120 of 226 comparisons, sorted by absolute change.

@github-actions github-actions Bot added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Jun 23, 2026

@vmoens vmoens left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@vmoens vmoens merged commit b660f05 into pytorch:main Jun 23, 2026
107 of 109 checks passed
@theap06

theap06 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

@vmoens lmk if there is anything I can do with helping maintaining the library.

@vmoens

vmoens commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

@theap06 wanna work with me on async collectors + async envs?

I vibe coded these:
#3897
#3896
#3895
#3894
#3893

but I'd need some quick reviews -- what I want to achieve is max throughput for envs/policies that are not trivial and can benefit from high asynchronocity (eg VLA + envs that have a lot of variability in the time spent in step/reset)

@theap06

theap06 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

@vmoens I can review them tonight when I get home from work

@theap06

theap06 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@vmoens had some work I had to finish up tonight. I can review the prs tmrw + design the ci changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Data Data-related PR, will launch data-related jobs Documentation Improvements or additions to documentation Feature New feature ReplayBuffers sota-implementations/ Trainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants