GeoT optimization 2/4: Datapipes producer/consumer refactor + stream overlap#1742
Draft
coreyjadams wants to merge 6 commits into
Draft
GeoT optimization 2/4: Datapipes producer/consumer refactor + stream overlap#1742coreyjadams wants to merge 6 commits into
coreyjadams wants to merge 6 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Refactor the datapipe prefetch path into a thread-safe host producer (_load_host) plus a main-thread consumer (_consume) with a FIFO submit/consume primitive (io_pump.IOPump), so all device/Warp kernels launch on the consuming thread. Build deferred-sync stream overlap on top: _consume records the preprocessing CUDA event into _events_pending and the DataLoader does one-batch lookahead, inserting compute_stream.wait_event just before each yield so batch N+1 preprocessing overlaps batch N compute. - New io_pump.py (FIFO pump); producer/consumer protocols; _rng fork_generator; core.function_spec.warp_stream_from_torch; refactored readers (base/numpy/zarr/tensorstore_zarr) and datapipes __init__. - MeshDataset parallel disk read + pin (serialize_load_consume=False); DomainMeshReader drop_interior_cells / drop_in_file_boundaries; volume configs enable both. - radius_search pinned non_blocking H2D; recipe train loop pinned async loss D2H. Opt-in timing + torch.profiler labels; streaming + reader tests, docs, and the iterable-dataset tutorial. No Warp keepalive machinery: with the Warp-free SDF (parent branch) the datapipe no longer launches Warp kernels in _consume.
…ressive IO prefetching
76ddb69 to
65cd32b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PhysicsNeMo Pull Request
This PR is stacked 🥞 on #1741 . It requires the torch SDF implementation first.
This aggressively rebuilds the datapipe's IO behavior to use prefetching more. I also updated the docs. This is still work in progress but getting there.
Description
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.