[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3904
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 18 PendingAs of commit 6651edb with merge base f7ba109 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
b9ba04f to
2893f5b
Compare
Follow-up to the OfflineToOnlineReplayBuffer PR: a SAC trainer that drives the offline-pretrain -> online-finetune transition, plus a standalone sota-implementations script. - OfflineToOnlineTrainer (subclasses SACTrainer): routes collected experience to the online buffer (pre_epoch), samples a mixed offline/online batch (process_optim_batch), and anneals the offline fraction to zero over anneal_frames (post_steps). Backed by two reusable hooks: OfflineToOnlineReplayBufferHook (projects online transitions onto the offline dataset schema so the mixed-batch concat stays valid) and OfflineToOnlineAnnealHook. - sota-implementations/offline_to_online/train.py: a self-contained SAC offline->online script (offline dataset via d4rl:/minari: string). - Tests: hook + flow tests and a gated functional train() run on Pendulum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
07dbcac to
c50a082
Compare
Benchmark Results: PR
|
| Benchmark | main ops | PR ops | Change |
|---|---|---|---|
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] |
61.09 | 525.62 | +760.38% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] |
193.56 | 37.80 | -80.47% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] |
185.49 | 55.01 | -70.34% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] |
53.77 | 86.43 | +60.73% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] |
203.47 | 250.19 | +22.96% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] |
893.02 | 744.79 | -16.60% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] |
22.66 | 19.28 | -14.93% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] |
24.67 | 27.64 | +12.06% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-backward] |
128.26 | 143.71 | +12.04% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] |
4,370 | 3,939 | -9.85% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
2,096 | 1,909 | -8.94% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] |
259.91 | 282.19 | +8.57% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
2,800 | 3,010 | +7.50% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] |
133.50 | 143.50 | +7.49% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
1,867 | 2,005 | +7.37% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] |
2,171 | 2,013 | -7.27% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] |
5,201 | 4,836 | -7.02% |
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] |
7,746 | 7,286 | -5.94% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
3,120 | 2,939 | -5.82% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] |
34,649 | 32,662 | -5.74% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] |
719.33 | 759.66 | +5.61% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
2,767 | 2,919 | +5.50% |
benchmarks/test_envs_benchmark.py::test_simple |
1.7089 | 1.8017 | +5.43% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] |
265.30 | 279.52 | +5.36% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] |
284.71 | 269.96 | -5.18% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] |
1.3657 | 1.2962 | -5.09% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] |
637.86 | 670.19 | +5.07% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] |
29,565 | 28,096 | -4.97% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] |
766.84 | 804.47 | +4.91% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] |
108.61 | 113.93 | +4.90% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] |
32,341 | 30,761 | -4.88% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] |
8.2561 | 7.8825 | -4.53% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] |
193.21 | 184.79 | -4.36% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] |
63,039 | 65,780 | +4.35% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
2,682 | 2,571 | -4.14% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] |
2,239 | 2,148 | -4.10% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
3,166 | 3,042 | -3.93% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] |
469.83 | 488.22 | +3.91% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] |
284.99 | 296.05 | +3.88% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] |
273.73 | 284.10 | +3.79% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] |
776.32 | 747.67 | -3.69% |
benchmarks/test_objectives_benchmarks.py::test_redq_speed[True-None] |
219.26 | 226.95 | +3.50% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] |
165.04 | 159.36 | -3.44% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] |
398.11 | 411.76 | +3.43% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] |
50,422 | 48,697 | -3.42% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] |
176.70 | 170.74 | -3.37% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] |
18,886 | 18,253 | -3.35% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] |
279.08 | 288.42 | +3.35% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] |
308.13 | 318.19 | +3.26% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
2,557 | 2,476 | -3.18% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
3,369 | 3,265 | -3.08% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] |
401.64 | 413.98 | +3.07% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] |
467.04 | 481.35 | +3.06% |
benchmarks/test_envs_benchmark.py::test_transformed |
0.8881 | 0.9148 | +3.00% |
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-backward] |
53.84 | 55.46 | +3.00% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] |
4,328 | 4,455 | +2.93% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] |
30,080 | 29,203 | -2.92% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] |
535.11 | 550.68 | +2.91% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-gru] |
1.4478 | 1.4059 | -2.90% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] |
376,455 | 365,562 | -2.89% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] |
260.15 | 267.58 | +2.86% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] |
48.48 | 47.11 | -2.82% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] |
75,973 | 78,106 | +2.81% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] |
950.75 | 977.42 | +2.81% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] |
57,171 | 55,592 | -2.76% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] |
1,732 | 1,779 | +2.71% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] |
484.41 | 497.46 | +2.69% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] |
3,487 | 3,574 | +2.51% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] |
21,497 | 22,021 | +2.44% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] |
2,648 | 2,584 | -2.39% |
benchmarks/test_objectives_benchmarks.py::test_iql_speed[reduce-overhead-None] |
115.17 | 117.79 | +2.27% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] |
12,290 | 12,018 | -2.21% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[reduce-overhead-None] |
286.99 | 293.13 | +2.14% |
benchmarks/test_envs_benchmark.py::test_serial |
0.5732 | 0.5852 | +2.09% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] |
84.76 | 86.53 | +2.08% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] |
44,046 | 43,132 | -2.07% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] |
59.22 | 60.45 | +2.07% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] |
30,471 | 29,840 | -2.07% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] |
43.53 | 44.43 | +2.06% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[reduce-overhead-None] |
1,771 | 1,807 | +2.02% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] |
702.11 | 716.27 | +2.02% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] |
116.56 | 118.91 | +2.02% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] |
87.37 | 89.11 | +1.99% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] |
52.17 | 51.14 | -1.98% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] |
330.15 | 336.64 | +1.96% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[reduce-overhead-None] |
84.54 | 86.20 | +1.96% |
benchmarks/test_objectives_benchmarks.py::test_redq_speed[reduce-overhead-None] |
223.95 | 228.26 | +1.92% |
benchmarks/test_envs_benchmark.py::test_parallel |
0.9669 | 0.9484 | -1.91% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] |
704.69 | 691.33 | -1.90% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] |
130.37 | 132.63 | +1.73% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-None] |
37.79 | 38.41 | +1.66% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] |
19,649 | 19,323 | -1.66% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-1] |
517.20 | 525.66 | +1.64% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] |
23,609 | 23,992 | +1.62% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] |
37,863 | 37,258 | -1.60% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[reduce-overhead-None] |
562.97 | 571.88 | +1.58% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-32-512] |
29.04 | 28.58 | -1.56% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[50-img_shape0-small] |
866.20 | 879.45 | +1.53% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] |
106.15 | 104.53 | -1.52% |
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-None] |
49.31 | 50.05 | +1.49% |
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-None] |
94.21 | 95.60 | +1.48% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] |
41,426 | 42,025 | +1.45% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] |
19,393 | 19,115 | -1.43% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-None] |
176.12 | 178.63 | +1.42% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] |
62.07 | 62.96 | +1.42% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] |
10.99 | 10.84 | -1.41% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-None] |
121.72 | 120.00 | -1.41% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] |
7,329 | 7,226 | -1.40% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] |
23,498 | 23,177 | -1.37% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-False] |
0.5923 | 0.6003 | +1.35% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] |
15.10 | 15.30 | +1.33% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] |
1,066 | 1,052 | -1.33% |
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] |
34.89 | 35.35 | +1.32% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] |
2.0454 | 2.0191 | -1.28% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-gru] |
1.3456 | 1.3626 | +1.27% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-16] |
36.77 | 37.23 | +1.25% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] |
210.33 | 212.95 | +1.24% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[reduce-overhead-None] |
332.65 | 336.69 | +1.22% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] |
1,080 | 1,093 | +1.19% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[False-None] |
700.28 | 692.06 | -1.17% |
| ... | ... | ... | Showing 120 of 216 comparisons, sorted by absolute change. |
GPU
Compared 226 benchmarks. Regressions over 5%: 17. Improvements over 5%: 10.
| Benchmark | main ops | PR ops | Change |
|---|---|---|---|
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] |
195.69 | 39.70 | -79.71% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] |
3,275 | 4,215 | +28.72% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
2,859 | 3,457 | +20.94% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
3,096 | 3,710 | +19.85% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] |
105.95 | 84.92 | -19.85% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
3,270 | 2,695 | -17.59% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
3,335 | 2,805 | -15.88% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] |
2,027 | 2,339 | +15.40% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
2,971 | 2,587 | -12.91% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] |
5.6075 | 4.9273 | -12.13% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] |
6.5121 | 7.2753 | +11.72% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
2,539 | 2,824 | +11.23% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] |
791.96 | 703.64 | -11.15% |
benchmarks/test_collectors_benchmark.py::test_single_with_rb_pixels |
5.3563 | 4.8042 | -10.31% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
2,051 | 1,878 | -8.47% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] |
953.64 | 883.53 | -7.35% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] |
448.23 | 480.07 | +7.10% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[False-None] |
107.86 | 115.06 | +6.68% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
2,835 | 2,654 | -6.38% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] |
24,123 | 22,594 | -6.34% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] |
539.62 | 505.73 | -6.28% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] |
10.36 | 10.99 | +6.04% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] |
860.45 | 809.90 | -5.88% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] |
354.83 | 336.26 | -5.23% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] |
50,999 | 48,340 | -5.21% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] |
52.46 | 55.16 | +5.13% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] |
363.28 | 345.01 | -5.03% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] |
316.83 | 332.54 | +4.96% |
benchmarks/test_envs_benchmark.py::test_simple |
1.2534 | 1.1923 | -4.87% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] |
39,069 | 37,239 | -4.68% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] |
570.02 | 543.32 | -4.68% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] |
30,025 | 28,646 | -4.59% |
benchmarks/test_collectors_benchmark.py::test_sync_pixels |
9.8631 | 10.31 | +4.54% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] |
6,277 | 6,003 | -4.38% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] |
392.34 | 375.24 | -4.36% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] |
141.79 | 135.62 | -4.35% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-backward] |
156.63 | 149.95 | -4.27% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[200-img_shape1-large_batch] |
8.9206 | 8.5409 | -4.26% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-backward] |
273.29 | 261.70 | -4.24% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[200-img_shape1-large_batch] |
8.5706 | 8.2130 | -4.17% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] |
4,911 | 4,707 | -4.16% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[100-img_shape0-atari] |
17.08 | 16.37 | -4.12% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] |
953.78 | 992.12 | +4.02% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-4] |
142.64 | 148.32 | +3.98% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] |
135.45 | 130.14 | -3.92% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] |
49.50 | 47.59 | -3.86% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] |
898.52 | 863.86 | -3.86% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] |
44,864 | 43,139 | -3.85% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] |
50,531 | 48,606 | -3.81% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] |
177.98 | 171.23 | -3.79% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] |
32,001 | 30,802 | -3.75% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] |
286.54 | 275.82 | -3.74% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] |
1,868 | 1,936 | +3.63% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] |
33,156 | 31,969 | -3.58% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] |
717.53 | 692.53 | -3.48% |
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] |
311.66 | 301.03 | -3.41% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
2,930 | 2,836 | -3.21% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-64] |
12.35 | 12.74 | +3.21% |
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cuda] |
2,233 | 2,302 | +3.11% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] |
52.09 | 53.69 | +3.06% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] |
32,439 | 31,472 | -2.98% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] |
42.43 | 43.68 | +2.96% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] |
47.86 | 49.26 | +2.93% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-False] |
35,421 | 34,396 | -2.90% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] |
721.65 | 742.35 | +2.87% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-backward] |
344.36 | 354.07 | +2.82% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-None] |
596.98 | 613.80 | +2.82% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] |
64,442 | 62,648 | -2.78% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] |
517.64 | 503.30 | -2.77% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] |
56,376 | 54,822 | -2.76% |
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-cuda_storage_cuda_samp... |
1,481 | 1,521 | +2.72% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] |
193.71 | 188.48 | -2.70% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] |
77,070 | 75,011 | -2.67% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] |
19,988 | 19,458 | -2.65% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] |
808.73 | 830.12 | +2.65% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] |
0.5896 | 0.6047 | +2.56% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] |
64,885 | 63,239 | -2.54% |
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-memmap_cpu_storage_cud... |
978.21 | 1,001 | +2.36% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] |
22.23 | 22.75 | +2.35% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] |
0.5253 | 0.5130 | -2.33% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-None] |
774.60 | 756.81 | -2.30% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] |
80.08 | 81.92 | +2.30% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
2,092 | 2,140 | +2.27% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] |
30,291 | 29,610 | -2.25% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-4] |
48.87 | 47.80 | -2.18% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] |
33,654 | 34,383 | +2.17% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] |
332.72 | 325.55 | -2.16% |
benchmarks/test_envs_benchmark.py::test_parallel |
0.5326 | 0.5439 | +2.12% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] |
41,174 | 42,043 | +2.11% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] |
238.26 | 243.20 | +2.07% |
benchmarks/test_envs_benchmark.py::test_transformed |
0.7167 | 0.7022 | -2.02% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] |
278.96 | 273.33 | -2.02% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] |
3,573 | 3,503 | -1.98% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] |
223.96 | 219.61 | -1.94% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] |
20,454 | 20,849 | +1.93% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] |
393.11 | 400.67 | +1.92% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] |
349.34 | 342.89 | -1.85% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] |
633.07 | 621.82 | -1.78% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] |
18,696 | 18,376 | -1.71% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] |
414.36 | 421.39 | +1.70% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] |
820.62 | 834.55 | +1.70% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] |
98.29 | 99.93 | +1.67% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] |
500.67 | 508.99 | +1.66% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-False] |
42,800 | 42,103 | -1.63% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] |
159.52 | 162.11 | +1.62% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] |
4,482 | 4,410 | -1.60% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-4] |
189.53 | 186.51 | -1.59% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] |
800.45 | 812.44 | +1.50% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] |
187.76 | 190.55 | +1.49% |
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] |
12.32 | 12.49 | +1.43% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] |
166.44 | 168.81 | +1.42% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] |
277.17 | 273.29 | -1.40% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-4] |
72.49 | 71.49 | -1.39% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[100-img_shape0-atari] |
17.82 | 17.57 | -1.38% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] |
21.62 | 21.92 | +1.37% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] |
193.27 | 195.89 | +1.36% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] |
30.60 | 30.19 | -1.34% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] |
0.7086 | 0.6992 | -1.33% |
benchmarks/test_envs_benchmark.py::test_serial |
0.4230 | 0.4287 | +1.33% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-16] |
36.37 | 36.84 | +1.30% |
| ... | ... | ... | Showing 120 of 226 comparisons, sorted by absolute change. |
|
@vmoens lmk if there is anything I can do with helping maintaining the library. |
|
@theap06 wanna work with me on async collectors + async envs? I vibe coded these: but I'd need some quick reviews -- what I want to achieve is max throughput for envs/policies that are not trivial and can benefit from high asynchronocity (eg VLA + envs that have a lot of variability in the time spent in step/reset) |
|
@vmoens I can review them tonight when I get home from work |
|
@vmoens had some work I had to finish up tonight. I can review the prs tmrw + design the ci changes |
Summary
Adds an offline-to-online SAC trainer and a runnable SOTA example for the offline-pretrain to online-finetune workflow.
This builds on the
OfflineToOnlineReplayBufferand dataset-loading helpers from #3900.What's added
OfflineToOnlineTrainer: aSACTrainersubclass that usesOfflineToOnlineReplayBufferfor mixed offline/online optimization batches.OfflineToOnlineReplayBufferHookstores collected experience in the online buffer and samples mixed optimization batches.OfflineToOnlineAnnealHookdecays the offline sampling fraction over collected frames.OfflineToOnlineTrainerConfig, including parity with the trainer constructor and registration in the config store.sota-implementations/offline_to_online/train.py: standalone SAC offline-to-online training script using registered dataset strings such asd4rl:andminari:.Docs and tests
OfflineToOnlineTrainerandOfflineToOnlineTrainerConfig.test/test_offline_to_online.pywith hook, trainer wiring, config, and checkpoint coverage.