Cudnn static linking (original) (raw)

November 6, 2025, 4:35am 1

I'm compiling OpenCV and CUDA statically to build my static-dynamic library, but when I run the code that tests my library, I get the error libVideo_ai.so: undefined symbol: _ZN5cudnn6fusion23RuntimeFusionEngineBase23load_kernels_from_cacheESt10shared_ptrINS_7backend11KernelCacheEERKS2_INS3_14KernelCacheKeyEERb. I compiled everything on Linux Ubuntu 22.0, CUDA 12.1, RTX 3050. Thanks in advance for your reply.

carolyuu November 6, 2025, 5:05am 2

*** Please note that this reply is generated by LLM automatically ***

cuDNN (CUDA Deep Neural Network library) is a GPU-accelerated library of primitives for deep neural networks. It provides highly optimized implementations of core deep learning primitives, enabling frameworks like PyTorch, TensorFlow, and JAX to deliver state-of-the-art performance.

Key Features of cuDNN:

Optimized Performance: cuDNN provides optimized performance across different GPU architectures, making it a fundamental component of deep learning workloads.
General Matrix Multiply (GEMM) APIs: cuDNN offers optimized GEMM APIs that harness the advanced capabilities of NVIDIA GPUs, such as block-scaling FP8 and FP4 operations.
Fusion Support: cuDNN provides highly optimized GEMM operations with advanced fusion capabilities, enabling greater performance gains for deep learning workloads.
FP8 Flash Attention Operations: cuDNN delivers substantial performance improvements for FP8 Flash Attention operations, achieving up to 50% speedups on forward propagation and 84% speedups on backward propagation with FP8 kernels.

cuDNN 9.7 Release:

The latest release of cuDNN, version 9.7, extends support to NVIDIA Blackwell architecture across both Data Center and GeForce product lines. This release provides significant performance improvements when migrating cuDNN operations to the latest Blackwell Tensor Cores.

Comparison with Other Libraries:

While cuDNN is a specialized library for deep learning primitives, other libraries like CUDA and TensorRT provide complementary functionality. CUDA is a general-purpose parallel computing platform and programming model, while TensorRT is a software development kit for optimizing and deploying deep learning models.

cuDNN vs. CUDA:

cuDNN is a specialized library for deep learning primitives, while CUDA is a general-purpose parallel computing platform.
cuDNN provides optimized performance for deep learning workloads, while CUDA provides a broader range of functionality for general computing.

cuDNN vs. TensorRT:

cuDNN is a library of primitives for deep neural networks, while TensorRT is a software development kit for optimizing and deploying deep learning models.
cuDNN provides optimized performance for deep learning workloads, while TensorRT provides tools for optimizing and deploying models on NVIDIA GPUs.

Conclusion:

cuDNN is a powerful library of primitives for deep neural networks, providing optimized performance and specialized functionality for deep learning workloads. Its latest release, version 9.7, extends support to NVIDIA Blackwell architecture and provides significant performance improvements. While other libraries like CUDA and TensorRT provide complementary functionality, cuDNN remains a fundamental component of deep learning workloads.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***