Skip to content

FFMPEG 7 + cuda 12.9#346

Open
johnnynunez wants to merge 88 commits into
dmlc:masterfrom
johnnynunez:master
Open

FFMPEG 7 + cuda 12.9#346
johnnynunez wants to merge 88 commits into
dmlc:masterfrom
johnnynunez:master

Conversation

@johnnynunez

@johnnynunez johnnynunez commented Jun 16, 2025

Copy link
Copy Markdown

It generates x86 and aarch64 wheels and both for macos also.
Tests are passing.

Also, I added pyav ffmpeg 7.1.1 as backend and cuda wheels CI.

cuda wheels are generated for x86 and aarch64

johnnynunez and others added 30 commits June 14, 2025 21:36
…t_batch functions from test files, nose is deprecated
…o_reader.cc

Fix display matrix retrieval without deprecated API
…o_reader.cc-pmg29p

Fix display matrix retrieval without deprecated API
# Conflicts:
#	.github/workflows/pypi.yml
Added a step to free up disk space on Linux runners.
Add Windows x86_64 wheel generation to the PyPI workflow:
- CPU wheels via the build_wheels matrix (win_amd64, cp310-cp314),
  bundling the vendored FFmpeg DLLs with delvewheel.
- CUDA (+cu132) wheels via a new build_cuda_wheels_windows job that
  installs the CUDA Toolkit on the host using NVIDIA's local installer
  (download .exe, extract with 7-Zip, run setup.exe -s), then keeps the
  CUDA runtime external while bundling FFmpeg.

Supporting fixes so the Windows build works:
- FindFFmpeg.cmake: use the MSVC import-lib names shipped by pyav-ffmpeg
  (avcodec.lib, not libavcodec.lib).
- setup.py: mark Windows as a binary distribution so wheels get the
  win_amd64 platform tag.
CI fixes for the failing wheel matrix:
- Drop deprecated cp313t (3.13 free-threading was removed from
  manylinux_2_28; keep cp314t).
- Exclude macOS-only videotoolbox_device_api.cc from the core source
  glob on non-Apple platforms (MSVC has no C11 aligned_alloc), fixing
  the Windows build.
- Remove no-op 'pp*' cibuildwheel skip selectors (PyPy not enabled).

Version upgrades:
- FFmpeg vendor pyav-ffmpeg 8.0.1-5 -> 8.1.1-1 (ffmpeg-8.1.json),
  plus the FFmpeg source build scripts (8.0 -> 8.1.1).
- CUDA 13.2 -> 13.3 across Linux + Windows CUDA jobs (+cu133),
  gpu.Dockerfile base image, and README.

Minor:
- GetVideoCodecString: use explicit casts for the bounds check to
  silence the signed/unsigned comparison warning.
- delvewheel: add --analyze-existing so it walks decord.dll's imports
  and vendors the FFmpeg DLLs. decord loads its native lib as a plain
  decord.dll via ctypes (no .pyd extension), so delvewheel otherwise
  reported "no external dependencies are needed" and shipped ~1 MB
  wheels missing avcodec/avformat/etc.
- Windows CUDA: point CMake's Visual Studio generator at the toolkit via
  -T "cuda=%CUDA_PATH%" and export the versioned CUDA_PATH_V<MAJ>_<MIN>
  env var. The MSBuild CUDA integration resolves CudaToolkitDir from
  those, not from -DCUDAToolkit_ROOT, so enable_language(CUDA) was
  failing with "The CUDA Toolkit directory '' does not exist".
pyav-ffmpeg ships its Windows DLLs with a 1970 mtime. Now that
delvewheel actually bundles them (via --analyze-existing), re-zipping
the wheel fails with "ZIP does not support timestamps before 1980".
Touch the vendored DLLs (os.utime) after fetch-vendor so their mtime is
current, on both the Windows CPU and CUDA jobs.
The Windows CUDA wheel link failed with LNK1120 (27 unresolved
externals): the CUDA Driver API (cuInit/cuCtx*/cuModule*/cuLaunchKernel)
and NVDEC (cuvid*) symbols. Only cudart/nvrtc/cublas/nvml were linked,
and the namespace guard (CUDAToolkit:: vs CUDA::) always fell through to
a legacy path that omitted them too.

Link CUDA::cuda_driver and nvcuvid.lib on Windows only. ELF shared
objects on Linux leave these undefined and resolve them from the driver
at load time, so Linux is left unchanged (linking the libcuda stub there
would make auditwheel bundle it and break the runtime).
nvcuvid.lib is not shipped in the CUDA Toolkit on Windows (only in the
license-gated Video Codec SDK), so linking decord's NVDEC code failed
with unresolved cuvid* externals. Generate an import library from a
module-definition file (scripts/nvcuvid.def) with lib.exe at build time
and point CMake at it via -DCUDA_NVCUVID_LIBRARY; the matching
nvcuvid.dll is provided by the user's driver at runtime.

- Add scripts/nvcuvid.def listing the cuvid* exports decord references.
- Add ilammy/msvc-dev-cmd step so lib.exe is on PATH for BEFORE_BUILD.
Make the CUDA wheels compact on every platform while keeping GPU
acceleration working:

- Linux CUDA: auditwheel --exclude the CUDA runtime + driver libs
  (libcudart/libnvrtc/libcublas(Lt)/cuDNN/libcuda/libnvcuvid/libnvidia-ml)
  so wheels drop from ~500 MB to ~30 MB, mirroring the Windows policy.
  FFmpeg is still bundled.
- Add a decord2[cu13] extra pulling the NVIDIA pip packages that ship
  the CUDA 13 runtime for Linux x86_64/aarch64 + Windows
  (nvidia-cuda-runtime / nvidia-cuda-nvrtc / nvidia-cublas).
- Preload those libs at import time (_ffi/base._preload_cuda_libs):
  dlopen RTLD_GLOBAL from site-packages/nvidia/*/lib on POSIX, and
  os.add_dll_directory on site-packages/nvidia/*/bin on Windows. It is
  best-effort and silent, so CPU builds and system-CUDA setups are
  unaffected. nvcuvid and the CUDA driver API come from the GPU driver.
…the-box

Linux CUDA wheels crashed on `import decord` with
"undefined symbol: cuModuleLoadData": libdecord.so had the CUDA Driver
API (cu*) and NVDEC (cuvid*) symbols undefined with no DT_NEEDED
provider, because the driver/NVDEC linking was scoped to Windows only.

Build-time (CMakeLists.txt):
- Link CUDA::cuda_driver on all platforms (toolkit stub) so DT_NEEDED
  records libcuda.so.1 / imports nvcuda.dll, without bundling the driver.
- Link NVDEC: Windows keeps the synthesized nvcuvid.lib; Linux links by
  soname (-l:libnvcuvid.so.1) so DT_NEEDED records libnvcuvid.so.1.

Runtime (decord/_ffi/base.py):
- Before loading the native lib, best-effort dlopen libcuda.so.1 and
  libnvcuvid.so.1 (RTLD_GLOBAL) on Linux, plus the nvidia-* pip runtime.
  Guarantees resolution even if the soname isn't in DT_NEEDED. No-op on
  CPU-only / macOS / Windows.

Packaging:
- delvewheel now also excludes nvcuda.dll (auditwheel already excludes
  libcuda/libnvcuvid). The driver libs are never bundled; they come from
  the host GPU driver at runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant