csr_array ā SciPy v1.16.2 Manual (original) (raw)
scipy.sparse.
class scipy.sparse.csr_array(arg1, shape=None, dtype=None, copy=False, *, maxprint=None)[source]#
Compressed Sparse Row array.
This can be instantiated in several ways:
csr_array(D)
where D is a 2-D ndarray
csr_array(S)
with another sparse array or matrix S (equivalent to S.tocsr())
csr_array((M, N), [dtype])
to construct an empty array with shape (M, N) dtype is optional, defaulting to dtype=ādā.
csr_array((data, (row_ind, col_ind)), [shape=(M, N)])
where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].
csr_array((data, indices, indptr), [shape=(M, N)])
is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the array dimensions are inferred from the index arrays.
Attributes:
dtypedtype
Data type of the array
shape2-tuple
Shape of the array
ndimint
Number of dimensions (this is always 2)
Number of stored values, including explicit zeros.
Number of stored values.
data
CSR format data array of the array
indices
CSR format index array of the array
indptr
CSR format index pointer array of the array
Whether the indices are sorted
Whether the array/matrix has sorted indices and no duplicates
Transpose.
Methods
Notes
Sparse arrays can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.
Advantages of the CSR format
- efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
- efficient row slicing
- fast matrix vector products
Disadvantages of the CSR format
- slow column slicing operations (consider CSC)
- changes to the sparsity structure are expensive (consider LIL or DOK)
Canonical Format
- Within each row, indices are sorted by column.
- There are no duplicate entries.
Examples
import numpy as np from scipy.sparse import csr_array csr_array((3, 4), dtype=np.int8).toarray() array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=int8)
row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]) csr_array((data, (row, col)), shape=(3, 3)).toarray() array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])
indptr = np.array([0, 2, 3, 6]) indices = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]) csr_array((data, indices, indptr), shape=(3, 3)).toarray() array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])
Duplicate entries are summed together:
row = np.array([0, 1, 2, 0]) col = np.array([0, 1, 1, 0]) data = np.array([1, 2, 4, 8]) csr_array((data, (row, col)), shape=(3, 3)).toarray() array([[9, 0, 0], [0, 2, 0], [0, 4, 0]])
As an example of how to construct a CSR array incrementally, the following snippet builds a term-document array from texts:
docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]] indptr = [0] indices = [] data = [] vocabulary = {} for d in docs: ... for term in d: ... index = vocabulary.setdefault(term, len(vocabulary)) ... indices.append(index) ... data.append(1) ... indptr.append(len(indices)) ... csr_array((data, indices, indptr), dtype=int).toarray() array([[2, 1, 0, 0], [0, 1, 1, 1]])