FFMPEG 7 + cuda 12.9#346
Open
johnnynunez wants to merge 88 commits into
Open
Conversation
…t_batch functions from test files, nose is deprecated
…o_reader.cc Fix display matrix retrieval without deprecated API
…o_reader.cc-pmg29p Fix display matrix retrieval without deprecated API
# Conflicts: # .github/workflows/pypi.yml
Added a step to free up disk space on Linux runners.
Add Windows x86_64 wheel generation to the PyPI workflow: - CPU wheels via the build_wheels matrix (win_amd64, cp310-cp314), bundling the vendored FFmpeg DLLs with delvewheel. - CUDA (+cu132) wheels via a new build_cuda_wheels_windows job that installs the CUDA Toolkit on the host using NVIDIA's local installer (download .exe, extract with 7-Zip, run setup.exe -s), then keeps the CUDA runtime external while bundling FFmpeg. Supporting fixes so the Windows build works: - FindFFmpeg.cmake: use the MSVC import-lib names shipped by pyav-ffmpeg (avcodec.lib, not libavcodec.lib). - setup.py: mark Windows as a binary distribution so wheels get the win_amd64 platform tag.
CI fixes for the failing wheel matrix: - Drop deprecated cp313t (3.13 free-threading was removed from manylinux_2_28; keep cp314t). - Exclude macOS-only videotoolbox_device_api.cc from the core source glob on non-Apple platforms (MSVC has no C11 aligned_alloc), fixing the Windows build. - Remove no-op 'pp*' cibuildwheel skip selectors (PyPy not enabled). Version upgrades: - FFmpeg vendor pyav-ffmpeg 8.0.1-5 -> 8.1.1-1 (ffmpeg-8.1.json), plus the FFmpeg source build scripts (8.0 -> 8.1.1). - CUDA 13.2 -> 13.3 across Linux + Windows CUDA jobs (+cu133), gpu.Dockerfile base image, and README. Minor: - GetVideoCodecString: use explicit casts for the bounds check to silence the signed/unsigned comparison warning.
- delvewheel: add --analyze-existing so it walks decord.dll's imports and vendors the FFmpeg DLLs. decord loads its native lib as a plain decord.dll via ctypes (no .pyd extension), so delvewheel otherwise reported "no external dependencies are needed" and shipped ~1 MB wheels missing avcodec/avformat/etc. - Windows CUDA: point CMake's Visual Studio generator at the toolkit via -T "cuda=%CUDA_PATH%" and export the versioned CUDA_PATH_V<MAJ>_<MIN> env var. The MSBuild CUDA integration resolves CudaToolkitDir from those, not from -DCUDAToolkit_ROOT, so enable_language(CUDA) was failing with "The CUDA Toolkit directory '' does not exist".
pyav-ffmpeg ships its Windows DLLs with a 1970 mtime. Now that delvewheel actually bundles them (via --analyze-existing), re-zipping the wheel fails with "ZIP does not support timestamps before 1980". Touch the vendored DLLs (os.utime) after fetch-vendor so their mtime is current, on both the Windows CPU and CUDA jobs.
The Windows CUDA wheel link failed with LNK1120 (27 unresolved externals): the CUDA Driver API (cuInit/cuCtx*/cuModule*/cuLaunchKernel) and NVDEC (cuvid*) symbols. Only cudart/nvrtc/cublas/nvml were linked, and the namespace guard (CUDAToolkit:: vs CUDA::) always fell through to a legacy path that omitted them too. Link CUDA::cuda_driver and nvcuvid.lib on Windows only. ELF shared objects on Linux leave these undefined and resolve them from the driver at load time, so Linux is left unchanged (linking the libcuda stub there would make auditwheel bundle it and break the runtime).
nvcuvid.lib is not shipped in the CUDA Toolkit on Windows (only in the license-gated Video Codec SDK), so linking decord's NVDEC code failed with unresolved cuvid* externals. Generate an import library from a module-definition file (scripts/nvcuvid.def) with lib.exe at build time and point CMake at it via -DCUDA_NVCUVID_LIBRARY; the matching nvcuvid.dll is provided by the user's driver at runtime. - Add scripts/nvcuvid.def listing the cuvid* exports decord references. - Add ilammy/msvc-dev-cmd step so lib.exe is on PATH for BEFORE_BUILD.
Make the CUDA wheels compact on every platform while keeping GPU acceleration working: - Linux CUDA: auditwheel --exclude the CUDA runtime + driver libs (libcudart/libnvrtc/libcublas(Lt)/cuDNN/libcuda/libnvcuvid/libnvidia-ml) so wheels drop from ~500 MB to ~30 MB, mirroring the Windows policy. FFmpeg is still bundled. - Add a decord2[cu13] extra pulling the NVIDIA pip packages that ship the CUDA 13 runtime for Linux x86_64/aarch64 + Windows (nvidia-cuda-runtime / nvidia-cuda-nvrtc / nvidia-cublas). - Preload those libs at import time (_ffi/base._preload_cuda_libs): dlopen RTLD_GLOBAL from site-packages/nvidia/*/lib on POSIX, and os.add_dll_directory on site-packages/nvidia/*/bin on Windows. It is best-effort and silent, so CPU builds and system-CUDA setups are unaffected. nvcuvid and the CUDA driver API come from the GPU driver.
…the-box Linux CUDA wheels crashed on `import decord` with "undefined symbol: cuModuleLoadData": libdecord.so had the CUDA Driver API (cu*) and NVDEC (cuvid*) symbols undefined with no DT_NEEDED provider, because the driver/NVDEC linking was scoped to Windows only. Build-time (CMakeLists.txt): - Link CUDA::cuda_driver on all platforms (toolkit stub) so DT_NEEDED records libcuda.so.1 / imports nvcuda.dll, without bundling the driver. - Link NVDEC: Windows keeps the synthesized nvcuvid.lib; Linux links by soname (-l:libnvcuvid.so.1) so DT_NEEDED records libnvcuvid.so.1. Runtime (decord/_ffi/base.py): - Before loading the native lib, best-effort dlopen libcuda.so.1 and libnvcuvid.so.1 (RTLD_GLOBAL) on Linux, plus the nvidia-* pip runtime. Guarantees resolution even if the soname isn't in DT_NEEDED. No-op on CPU-only / macOS / Windows. Packaging: - delvewheel now also excludes nvcuda.dll (auditwheel already excludes libcuda/libnvcuvid). The driver libs are never bundled; they come from the host GPU driver at runtime.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It generates x86 and aarch64 wheels and both for macos also.
Tests are passing.
Also, I added pyav ffmpeg 7.1.1 as backend and cuda wheels CI.
cuda wheels are generated for x86 and aarch64