Explaining cuSparse behavior on A100 (original) (raw)
August 27, 2024, 4:40pm 1
Hi Everyone,
I run Sparse MVM on A100 40GB for varying matrix sizes and sparsity levels. I am using the COO format. Below is the plot for the same:
I am dealing with a structured sparsity involving diagonals i.e non-zeros are present only on diagonals (main diagonal + non-main diagonals).
What I find strange is the performance improvement I get for the matrix of size 65536 at sparsity level >= 0.5. For sparsity < 0.5, I don’t see the 10x performance benefit. Unfortunately, I cannot run matrices > 65536 and sparsity of 0.5 or less as that does not fit on my single A100 GPU.
Is there an explanation for this behavior?
This is a snippet of my code:
clock_gettime(CLOCK_MONOTONIC, &startNew);
cudaEventRecord(start);
cusparseSpMV(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE,
&alpha, matDescr, vecX, &beta, vecY,
CUDA_R_32F, CUSPARSE_MV_ALG_DEFAULT, d_buffer);
cudaEventRecord(stop);
cudaEventSynchronize(stop);
clock_gettime(CLOCK_MONOTONIC, &stopNew);
qanhpham August 27, 2024, 9:57pm 2
Hi @atyagi2. I assume “sparsity” here means the percentage of zeros in the matrix. Then, it’s expected that cuSparse performs well on matrix with high sparsity (very few non-zeros) and not well with matrix having many non-zeros (close to dense matrix).
Would you mind sharing with us your use case of SpMV? Why your matrix is diagonal matrix? We’ll consider supporting diagonal format in cuSparse.