Add new radius search cuda kernels for geotransolver#1763
Draft
shrek wants to merge 11 commits into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PhysicsNeMo Pull Request
Description
This PR implements 2 new radius search kernels for geotransolver. These kernels outperform the current warp-based radius search implementation. The kernels are implemented in cuda c++, and wrapped with CuPy.
Compact Cell Points
In this algorithm, points are first hashed into radius sized cells. Then, cell-counts are prefix-summed, and points in each cell are scattered into contiguous locations in an array. Finally, radius search is implemented with 1 warp per query point, and for each query point, the points in 27 cells adjacent to the query point cell are computed for distances.
Morton Cell Points
In this approach, radius-sized cell-ids of points are morton-sorted. Then, a cell directory is constructed which contains cell counts, and offsets of points in a contiguous array. Then 1 warp processes each query wherin 27 adjacent cells are binary searched in the morton order, and distances are computed with points in those cells.
Performance
Hopper
Configuration
For radius search alone:
For the entire training step:
Blackwell
TBD
Queries can also be assigned cells and morton sorted, and queries in same cell could be searched in a block to improve memory locality.
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.