A High-Performance CUDA Library for Distributed Dense Linear Algebra — cuSOLVERMp (original) (raw)
NVIDIA cuSOLVERMp is a high-performance, distributed-memory, GPU-accelerated library that provides tools for the solution of dense linear systems and eigenvalue problems.
cuSOLVERMp is compatible with 2D block-cyclic data layout and provides ScaLAPACK-like C APIs.
A companion library, CAL, contains utilities to manage communicators and to synchronize processes in a safe way.
Download: cuSOLVERMp library is available through NVIDIA Developer Zone and NVIDIA HPC SDK
Key Features#
- Multi-process, multi-GPU.
- One process per GPU.
- ScaLAPACK-like C functionalities and interfaces to facilitate porting.
- Configurable communication backends (UCC, NCCL, UCX, etc.).
- Logging and tracing.
- Tensor-core accelerated.