How to Create a Sparse Matrix with SciPy (original) (raw)

A sparse matrix is a matrix in which most elements are zeros. Sparse matrices are widely used in machine learning, natural language processing (NLP), and large-scale data processing, where storing all zero values is inefficient.

**Example of a sparse matrix:

0 0 3 0 4
0 0 5 7 0
0 0 0 0 0
0 2 6 0 0

Storing such a matrix as a normal 2D array wastes memory, as most elements are zeros. Instead, we store only non-zero elements along with their row and column indices (triplets format).

**Benefits of using sparse matrices:

Sparse Matrix Formats in SciPy

The scipy.sparse module provides several formats for storing sparse matrices, each optimized for different operations:

Format Best For Description
csr_matrix Fast row slicing, math operations Compressed Sparse Row good for arithmetic and row access.
csc_matrix Fast column slicing Compressed Sparse Column efficient for column-based ops.
coo_matrix Easy matrix building Coordinate format using (row, col, value) triples.
lil_matrix Incremental row-wise construction List of Lists, modify rows easily before converting.
dia_matrix Diagonal-dominant matrices Stores only diagonals, saves space.
dok_matrix Fast item assignment Dictionary-like, ideal for random updates.

Example 1:csr_matrix (Compressed Sparse Row)

CSR format stores non-zero values row-wise, enabling fast row slicing and efficient matrix operations.

Python `

import numpy as np from scipy.sparse import csr_matrix

d = np.array([3, 4, 5, 7, 2, 6]) # data r = np.array([0, 0, 1, 1, 3, 3]) # rows c = np.array([2, 4, 2, 3, 1, 2]) # cols

csr = csr_matrix((d, (r, c)), shape=(4, 5)) print(csr.toarray())

`

**Output

[[0 0 3 0 4]
[0 0 5 7 0]
[0 0 0 0 0]
[0 2 6 0 0]]

**Explanation: csr_matrix stores only non-zero values with their coordinates and reconstructs full matrix using toarray().

Example 2: csc_matrix (Compressed Sparse Column)

CSC format stores data column-wise, making column-based operations faster.

Python `

import numpy as np from scipy.sparse import csc_matrix

d = np.array([3, 4, 5, 7, 2, 6])
r = np.array([0, 0, 1, 1, 3, 3])
c = np.array([2, 4, 2, 3, 1, 2])

csc = csc_matrix((d, (r, c)), shape=(4, 5)) print(csc.toarray())

`

**Output

[[0 0 3 0 4]
[0 0 5 7 0]
[0 0 0 0 0]
[0 2 6 0 0]]

**Explanation: Stores non-zero values in column-compressed format, efficient for column operations.

Example 3: coo_matrix (Coordinate Format)

COO format represents the matrix using (row, col, value) triplets. Useful when constructing matrices dynamically before converting to CSR/CSC.

Python `

import numpy as np from scipy.sparse import coo_matrix

d = np.array([3, 4, 5, 7, 2, 6]) r = np.array([0, 0, 1, 1, 3, 3]) c = np.array([2, 4, 2, 3, 1, 2])

coo = coo_matrix((d, (r, c)), shape=(4, 5)) print(coo.toarray())

`

**Output

[[0 0 3 0 4]
[0 0 5 7 0]
[0 0 0 0 0]
[0 2 6 0 0]]

**Explanation: Stores elements as (row, col, value) tuples.

Example 4:lil_matrix (List of Lists)

LIL (List of Lists) format allows efficient row-wise construction. You can easily insert or modify values before converting the matrix to CSR or CSC for faster computation.

Python `

import numpy as np from scipy.sparse import lil_matrix

lil = lil_matrix((4, 5)) lil[0, 2] = 3 lil[0, 4] = 4 lil[1, 2] = 5 lil[1, 3] = 7 lil[3, 1] = 2 lil[3, 2] = 6

print(lil.toarray())

`

**Output

[[0. 0. 3. 0. 4.]
[0. 0. 5. 7. 0.]
[0. 0. 0. 0. 0.]
[0. 2. 6. 0. 0.]]

**Explanation: Creates a List of Lists (LIL) matrix and assigns values directly by row and column.

Example 5:dok_matrix (Dictionary of Keys)

DOK (Dictionary of Keys) format is ideal for random assignments. You can assign elements at any position efficiently, making it perfect for incremental matrix construction.

Python `

import numpy as np from scipy.sparse import dok_matrix

dok = dok_matrix((4, 5)) dok[0, 2] = 3 dok[0, 4] = 4 dok[1, 2] = 5 dok[1, 3] = 7 dok[3, 1] = 2 dok[3, 2] = 6

print(dok.toarray())

`

**Output

[[0. 0. 3. 0. 4.]
[0. 0. 5. 7. 0.]
[0. 0. 0. 0. 0.]
[0. 2. 6. 0. 0.]]

**Explanation: Internally stored as dictionary {(row, col): value}.

Example 6: dia_matrix (Diagonal Matrix)

DIA (Diagonal) format stores only the diagonals of the matrix. It is very memory-efficient for diagonal-dominant matrices, where most non-zero elements lie along certain diagonals.

Python `

import numpy as np from scipy.sparse import dia_matrix

data = np.array([[3, 5, 6, 7]])
offsets = np.array([0])

dia = dia_matrix((data, offsets), shape=(4, 5)) print(dia.toarray())

`