Explaining cuSparse behavior on A100 (original) (raw)

August 27, 2024, 4:40pm 1

Hi Everyone,
I run Sparse MVM on A100 40GB for varying matrix sizes and sparsity levels. I am using the COO format. Below is the plot for the same:

I am dealing with a structured sparsity involving diagonals i.e non-zeros are present only on diagonals (main diagonal + non-main diagonals).

What I find strange is the performance improvement I get for the matrix of size 65536 at sparsity level >= 0.5. For sparsity < 0.5, I don’t see the 10x performance benefit. Unfortunately, I cannot run matrices > 65536 and sparsity of 0.5 or less as that does not fit on my single A100 GPU.

Is there an explanation for this behavior?

This is a snippet of my code:

clock_gettime(CLOCK_MONOTONIC, &startNew);
        cudaEventRecord(start);
        cusparseSpMV(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE,
                    &alpha, matDescr, vecX, &beta, vecY,
                    CUDA_R_32F, CUSPARSE_MV_ALG_DEFAULT, d_buffer);
        cudaEventRecord(stop);
        cudaEventSynchronize(stop);
        clock_gettime(CLOCK_MONOTONIC, &stopNew);

qanhpham August 27, 2024, 9:57pm 2

Hi @atyagi2. I assume “sparsity” here means the percentage of zeros in the matrix. Then, it’s expected that cuSparse performs well on matrix with high sparsity (very few non-zeros) and not well with matrix having many non-zeros (close to dense matrix).

Would you mind sharing with us your use case of SpMV? Why your matrix is diagonal matrix? We’ll consider supporting diagonal format in cuSparse.