Skip to content

[DRAFT] Add AOT JIT+LTO capability#22390

Open
divyegala wants to merge 12 commits intorapidsai:mainfrom
divyegala:feat/nvjitlink_kernels
Open

[DRAFT] Add AOT JIT+LTO capability#22390
divyegala wants to merge 12 commits intorapidsai:mainfrom
divyegala:feat/nvjitlink_kernels

Conversation

@divyegala
Copy link
Copy Markdown
Member

@divyegala divyegala commented May 6, 2026

Description

This PR is a POC to add AOT JIT+LTO capability by using the cuVS architecture. The initial POC is on murmurhash_x86_32 functionality, which uses a device-side type dispatcher. The idea behind LTO for this functionality is that we generate a hashing device function fragment for each cudf type, and then only link the fragments for the types present in the input table. For the other types, a no-op fragment is linked instead.

Benchmarks:

## [0] NVIDIA GB10

|  num_rows  |  nulls  |     hash_name      |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|------------|---------|--------------------|------------|-------------|------------|-------------|-------------|---------|----------|
|   65536    |    0    | murmurhash3_x86_32 | 211.529 us |      91.43% | 224.138 us |      94.74% |   12.609 us |   5.96% |   SAME   |
|  16777216  |    0    | murmurhash3_x86_32 |   3.602 ms |       7.98% |   2.890 ms |       8.87% | -711.999 us | -19.77% |   FAST   |
|   65536    |   0.1   | murmurhash3_x86_32 | 457.605 us |      90.61% | 262.888 us |     120.01% | -194.717 us | -42.55% |   SAME   |
|  16777216  |   0.1   | murmurhash3_x86_32 |   3.990 ms |      11.70% |   3.243 ms |      11.95% | -746.530 us | -18.71% |   FAST   |

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@divyegala divyegala requested review from a team as code owners May 6, 2026 02:09
@divyegala divyegala requested review from devavret and vuule May 6, 2026 02:09
@github-actions github-actions Bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels May 6, 2026
@devavret
Copy link
Copy Markdown
Contributor

devavret commented May 6, 2026

Is this overlapping #22209 or meant as an alternative approach?

@divyegala
Copy link
Copy Markdown
Member Author

@devavret it is meant as an alternative approach. This is the approach we are using in cuVS in production currently, designed by @robertmaynard, @KyleFromNVIDIA, and myself.

@divyegala divyegala requested a review from a team as a code owner May 7, 2026 00:49
@divyegala divyegala requested a review from jameslamb May 7, 2026 00:49
@divyegala divyegala changed the title Add AOT JIT+LTO capability [DRAFT] Add AOT JIT+LTO capability May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants