FFMPEG 7 + cuda 12.9 by johnnynunez · Pull Request #346 · dmlc/decord

johnnynunez · 2025-06-16T10:12:14Z

It generates x86 and aarch64 wheels and both for macos also.
Tests are passing.

Also, I added pyav ffmpeg 7.1.1 as backend and cuda wheels CI.

cuda wheels are generated for x86 and aarch64

…ame CUDA wheels

…t_batch functions from test files, nose is deprecated

…o_reader.cc Fix display matrix retrieval without deprecated API

…pmg29p

…o_reader.cc-pmg29p Fix display matrix retrieval without deprecated API

# Conflicts: # .github/workflows/pypi.yml

…video_reader

Added a step to free up disk space on Linux runners.

Add Windows x86_64 wheel generation to the PyPI workflow: - CPU wheels via the build_wheels matrix (win_amd64, cp310-cp314), bundling the vendored FFmpeg DLLs with delvewheel. - CUDA (+cu132) wheels via a new build_cuda_wheels_windows job that installs the CUDA Toolkit on the host using NVIDIA's local installer (download .exe, extract with 7-Zip, run setup.exe -s), then keeps the CUDA runtime external while bundling FFmpeg. Supporting fixes so the Windows build works: - FindFFmpeg.cmake: use the MSVC import-lib names shipped by pyav-ffmpeg (avcodec.lib, not libavcodec.lib). - setup.py: mark Windows as a binary distribution so wheels get the win_amd64 platform tag.

CI fixes for the failing wheel matrix: - Drop deprecated cp313t (3.13 free-threading was removed from manylinux_2_28; keep cp314t). - Exclude macOS-only videotoolbox_device_api.cc from the core source glob on non-Apple platforms (MSVC has no C11 aligned_alloc), fixing the Windows build. - Remove no-op 'pp*' cibuildwheel skip selectors (PyPy not enabled). Version upgrades: - FFmpeg vendor pyav-ffmpeg 8.0.1-5 -> 8.1.1-1 (ffmpeg-8.1.json), plus the FFmpeg source build scripts (8.0 -> 8.1.1). - CUDA 13.2 -> 13.3 across Linux + Windows CUDA jobs (+cu133), gpu.Dockerfile base image, and README. Minor: - GetVideoCodecString: use explicit casts for the bounds check to silence the signed/unsigned comparison warning.

- delvewheel: add --analyze-existing so it walks decord.dll's imports and vendors the FFmpeg DLLs. decord loads its native lib as a plain decord.dll via ctypes (no .pyd extension), so delvewheel otherwise reported "no external dependencies are needed" and shipped ~1 MB wheels missing avcodec/avformat/etc. - Windows CUDA: point CMake's Visual Studio generator at the toolkit via -T "cuda=%CUDA_PATH%" and export the versioned CUDA_PATH_V<MAJ>_<MIN> env var. The MSBuild CUDA integration resolves CudaToolkitDir from those, not from -DCUDAToolkit_ROOT, so enable_language(CUDA) was failing with "The CUDA Toolkit directory '' does not exist".

pyav-ffmpeg ships its Windows DLLs with a 1970 mtime. Now that delvewheel actually bundles them (via --analyze-existing), re-zipping the wheel fails with "ZIP does not support timestamps before 1980". Touch the vendored DLLs (os.utime) after fetch-vendor so their mtime is current, on both the Windows CPU and CUDA jobs.

The Windows CUDA wheel link failed with LNK1120 (27 unresolved externals): the CUDA Driver API (cuInit/cuCtx*/cuModule*/cuLaunchKernel) and NVDEC (cuvid*) symbols. Only cudart/nvrtc/cublas/nvml were linked, and the namespace guard (CUDAToolkit:: vs CUDA::) always fell through to a legacy path that omitted them too. Link CUDA::cuda_driver and nvcuvid.lib on Windows only. ELF shared objects on Linux leave these undefined and resolve them from the driver at load time, so Linux is left unchanged (linking the libcuda stub there would make auditwheel bundle it and break the runtime).

nvcuvid.lib is not shipped in the CUDA Toolkit on Windows (only in the license-gated Video Codec SDK), so linking decord's NVDEC code failed with unresolved cuvid* externals. Generate an import library from a module-definition file (scripts/nvcuvid.def) with lib.exe at build time and point CMake at it via -DCUDA_NVCUVID_LIBRARY; the matching nvcuvid.dll is provided by the user's driver at runtime. - Add scripts/nvcuvid.def listing the cuvid* exports decord references. - Add ilammy/msvc-dev-cmd step so lib.exe is on PATH for BEFORE_BUILD.

Make the CUDA wheels compact on every platform while keeping GPU acceleration working: - Linux CUDA: auditwheel --exclude the CUDA runtime + driver libs (libcudart/libnvrtc/libcublas(Lt)/cuDNN/libcuda/libnvcuvid/libnvidia-ml) so wheels drop from ~500 MB to ~30 MB, mirroring the Windows policy. FFmpeg is still bundled. - Add a decord2[cu13] extra pulling the NVIDIA pip packages that ship the CUDA 13 runtime for Linux x86_64/aarch64 + Windows (nvidia-cuda-runtime / nvidia-cuda-nvrtc / nvidia-cublas). - Preload those libs at import time (_ffi/base._preload_cuda_libs): dlopen RTLD_GLOBAL from site-packages/nvidia/*/lib on POSIX, and os.add_dll_directory on site-packages/nvidia/*/bin on Windows. It is best-effort and silent, so CPU builds and system-CUDA setups are unaffected. nvcuvid and the CUDA driver API come from the GPU driver.

…the-box Linux CUDA wheels crashed on `import decord` with "undefined symbol: cuModuleLoadData": libdecord.so had the CUDA Driver API (cu*) and NVDEC (cuvid*) symbols undefined with no DT_NEEDED provider, because the driver/NVDEC linking was scoped to Windows only. Build-time (CMakeLists.txt): - Link CUDA::cuda_driver on all platforms (toolkit stub) so DT_NEEDED records libcuda.so.1 / imports nvcuda.dll, without bundling the driver. - Link NVDEC: Windows keeps the synthesized nvcuvid.lib; Linux links by soname (-l:libnvcuvid.so.1) so DT_NEEDED records libnvcuvid.so.1. Runtime (decord/_ffi/base.py): - Before loading the native lib, best-effort dlopen libcuda.so.1 and libnvcuvid.so.1 (RTLD_GLOBAL) on Linux, plus the nvidia-* pip runtime. Guarantees resolution even if the soname isn't in DT_NEEDED. No-op on CPU-only / macOS / Windows. Packaging: - delvewheel now also excludes nvcuda.dll (auditwheel already excludes libcuda/libnvcuvid). The driver libs are never bundled; they come from the host GPU driver at runtime.

johnnynunez and others added 30 commits June 14, 2025 21:36

Add IDE configuration directories to .gitignore

b3c0620

Update version to 1.0.0 and refactor audio/video reader implementations

2510b2d

new CI

a5741f8

bump version 2.0.0

5f2d677

Rename CUDA wheels and upload artifacts in CI workflow

9502ea7

Update CI configuration to include additional Python versions and ren…

e4ee54e

…ame CUDA wheels

Update README.md

d61fd6a

Update README.md

45c7000

Update update_version.py

90a699b

Update libinfo.py

8a6e2fd

Update c_runtime_api.h

81968ad

Update version.py

cf514b5

Remove commented-out test_no_audio_stream and test_video_corrupted_ge…

bf4bf97

…t_batch functions from test files, nose is deprecated

upgrade ffmpeg 8.0

d737afe

upgrade cuda 13.0

ba6ab82

upgrade cibuildwheel

07a2621

fix rename

3ad096b

fix rename

b3eaa33

fix cuda 13

d228409

fix cuda 13

2725eb7

Adapt rotation extraction for FFmpeg 8

61aa42b

Merge branch 'master' into codex/fix-build-errors-in-audio_reader.cc

0e2c4c6

Merge pull request #2 from johnnynunez/codex/fix-build-errors-in-audi…

182a4b9

…o_reader.cc Fix display matrix retrieval without deprecated API

fix cuda 13

b558f94

Merge remote-tracking branch 'origin/master'

de2ee49

Use AVBufferSinkParams and scan stream side data

f77fcb4

Merge branch 'master' into codex/fix-build-errors-in-audio_reader.cc-…

4db7f7a

…pmg29p

Merge pull request #3 from johnnynunez/codex/fix-build-errors-in-audi…

3b67bea

…o_reader.cc-pmg29p Fix display matrix retrieval without deprecated API

ffmpeg 8

1a62cf0

ffmpeg 8

6a26fb1

johnnynunez added 30 commits October 26, 2025 12:28

remove macos-13. deprecated

1cab26f

Upgrade cibuildwheel to version 3.2.1

7c58c6d

bump cuda version, cibuildwheel and github actions

0a9fc1c

Merge remote-tracking branch 'origin/master'

fa3e283

# Conflicts: # .github/workflows/pypi.yml

Change pts type to int64_t and add retry logic for frame skipping in …

64da3b9

…video_reader

CUDA 13.1

00c7d50

Add disk space cleanup step for Linux runners

0982892

Added a step to free up disk space on Linux runners.

bump 3.0.0

587236b

CUDA 13.2

d040a99

AUTOTAG

bfab37e

fix

279bd5a

fix

d2c99d9

upgrade to 3.1.0

c1f75f5

generate autotag when pass all CI

0a1ae50

generate autotag when pass all CI

7e67477

generate autotag when pass all CI

cbd0980

bump to 3.2.0

408350c

fix

4728995

bump to 3.2.0

b6aaf88

refactor: improve audio decoding and resampling logic

dae7817

bump 3.3.0

050ee2a

bump to 3.4.0

1b4cfbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFMPEG 7 + cuda 12.9#346

FFMPEG 7 + cuda 12.9#346
johnnynunez wants to merge 88 commits into
dmlc:masterfrom
johnnynunez:master

johnnynunez commented Jun 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnnynunez commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johnnynunez commented Jun 16, 2025 •

edited

Loading