Skip to content

Use GitHub tarball downloads for CPM dependencies#1005

Open
bdice wants to merge 1 commit intorapidsai:mainfrom
bdice:cpm-tarball-downloads
Open

Use GitHub tarball downloads for CPM dependencies#1005
bdice wants to merge 1 commit intorapidsai:mainfrom
bdice:cpm-tarball-downloads

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Apr 14, 2026

Description

Convert 7 CPM packages from git clone to GitHub tarball archive downloads (url + url_hash) in versions.json, reducing download sizes significantly.

Depends on #1011 (patch-application support for tarball-mode packages).

Converted packages: benchmark, bs_thread_pool, cuco, rapids_logger, GTest, nvbench, nvtx3

Intentionally excluded:

  • rmm — uses dynamic version placeholders (${rapids-cmake-version})
  • nvcomp — has proprietary_binary entries requiring special handling

Size Comparison

Package Tarball Download Tarball Extracted Git Clone (full) Git .git Dir Download Savings
benchmark 200.0 KB 878.4 KB 4.6 MB 3.7 MB 95%
bs_thread_pool 80.7 KB 348.1 KB 951.3 KB 603.1 KB 87%
cuco 1.0 MB 3.9 MB 12.3 MB 8.4 MB 88%
rapids_logger 30.4 KB 109.9 KB 319.8 KB 209.9 KB 86%
GTest 856.9 KB 3.9 MB 18.5 MB 14.7 MB 94%
nvbench 513.3 KB 2.4 MB 4.6 MB 2.3 MB 78%
nvtx3 3.8 MB 11.9 MB 18.4 MB 6.6 MB 42%

Total tarball download: 6.5 MB
Total git download (approx .git dir): 36.4 MB
Total savings: 82%

Notes:

  • "Git .git Dir" is the approximate network transfer size (the .git directory after a full clone)
  • All packages previously used git_shallow: false (full history clones), so the savings are real
  • nvtx3 has the smallest relative savings because its repo contains large binary assets in the working tree
  • CCCL (already converted in Add URL option to fetch packages from GitHub tarballs #968) and rmm/nvcomp (excluded) are not shown
  • SHA256 hashes were computed from the GitHub archive tarballs and spot-checked for correctness

Closes #968

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The cmake-format.json is up to date with these changes.

@bdice bdice requested a review from a team as a code owner April 14, 2026 17:10
@bdice bdice added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 14, 2026
@bdice bdice self-assigned this Apr 14, 2026
@bdice bdice force-pushed the cpm-tarball-downloads branch from db42265 to a40f5d7 Compare April 14, 2026 19:12
Copy link
Copy Markdown
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The work around to get tests to pass is unacceptable. If we want to properly support tarballs we also need to have inline pathces work with tarballs, you can't just bypass that entirely in your tests

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Apr 15, 2026

The work around to get tests to pass is unacceptable. If we want to properly support tarballs we also need to have inline pathces work with tarballs, you can't just bypass that entirely in your tests

Good catch. I'll correct this.

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Apr 27, 2026

Test PR: rapidsai/cudf#22313

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Apr 27, 2026

The work around to get tests to pass is unacceptable. If we want to properly support tarballs we also need to have inline pathces work with tarballs, you can't just bypass that entirely in your tests

Good catch. I'll correct this.

I broke out the expansion of tests into #1011. This PR now depends on #1011, and its scope is limited to updating versions.json to use tarballs.

@bdice bdice force-pushed the cpm-tarball-downloads branch from df1572b to b41923d Compare April 29, 2026 01:14
rapids-bot Bot pushed a commit that referenced this pull request May 5, 2026
Add a new patch application path that uses `patch -p1`, so CPM packages fetched as tarballs can have patches applied to them. Tarball-extracted directories have no `.git` repository and so cannot use `git apply` or `git am`. The existing git-mode path is retained unchanged for packages fetched as git clones. The fetch mode is detected from `versions.json` (`git_url`/`git_tag` vs `url`/`url_hash`) and the patch script dispatches accordingly.

Also includes a few adjacent fixes uncovered while adding the above:

- Fix `pinning_write_file.cmake` to respect mode switching when an override provides git fields for a package whose default is URL-mode. Previously the fallback to default JSON could incorrectly detect URL-mode even when the override explicitly switched to git-mode.
- Update `verify_generated_pins` to handle URL-mode packages in pin verification, checking `url`/`url_hash` fields instead of `git_tag`/`git_shallow`.
- Update `cpm_package_override-multiple` tests to provide `git_url` alongside `git_tag` in overrides so that partial git overrides remain valid regardless of whether the default entry is git- or url-mode.

### Test coverage

Splits the existing patch-command tests into explicit `-git` and `-tarball` variants for each scenario (regular, embedded, required, required-fails) and adds `-diff-git`/`-diff-tarball` pairs so both `.patch` and `.diff` files are exercised in both modes. Each test specifies its fetch mode explicitly in `override.json` (`git_url`/`git_tag` for `-git` tests, `url`/`url_hash` for `-tarball` tests) rather than relying on the default, so the tests remain correct regardless of how `versions.json` evolves. The override is applied before `rapids_cpm_package_details_internal` so the explicitly declared mode is used for the test's pre-populate step.

Coverage matrix:

|              | git mode | tarball mode |
|--------------|----------|--------------|
| .diff        | new      | new          |
| .patch       | existing | new          |
| inline patch | existing | new          |
| required     | existing | new          |
| required-fails | existing | new        |

This PR is a prerequisite for #1005 (converting `versions.json` entries to tarballs), since several of the packages being converted have patches applied via overrides in downstream RAPIDS projects.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)

URL: #1011
Convert 7 packages (benchmark, bs_thread_pool, cuco, rapids_logger,
GTest, nvbench, nvtx3) from git clone to GitHub tarball archive
downloads in versions.json, reducing download sizes significantly.
@bdice bdice force-pushed the cpm-tarball-downloads branch from b41923d to c0e0bd9 Compare May 5, 2026 16:18
@bdice bdice requested a review from robertmaynard May 5, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants