incr_mean_variance_axis (original) (raw)

sklearn.utils.sparsefuncs.incr_mean_variance_axis(X, *, axis, last_mean, last_var, last_n, weights=None)[source]#

Compute incremental mean and variance along an axis on a CSR or CSC matrix.

last_mean, last_var are the statistics computed at the last step by this function. Both must be initialized to 0-arrays of the proper size, i.e. the number of features in X. last_n is the number of samples encountered until now.

Parameters:

XCSR or CSC sparse matrix of shape (n_samples, n_features)

Input data.

axis{0, 1}

Axis along which the axis should be computed.

last_meanndarray of shape (n_features,) or (n_samples,), dtype=floating

Array of means to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1.

last_varndarray of shape (n_features,) or (n_samples,), dtype=floating

Array of variances to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1.

last_nfloat or ndarray of shape (n_features,) or (n_samples,), dtype=floating

Sum of the weights seen so far, excluding the current weights If not float, it should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1. If float it corresponds to having same weights for all samples (or features).

weightsndarray of shape (n_samples,) or (n_features,), default=None

If axis is set to 0 shape is (n_samples,) or if axis is set to 1 shape is (n_features,). If it is set to None, then samples are equally weighted.

Added in version 0.24.

Returns:

meansndarray of shape (n_features,) or (n_samples,), dtype=floating

Updated feature-wise means if axis = 0 or sample-wise means if axis = 1.

variancesndarray of shape (n_features,) or (n_samples,), dtype=floating

Updated feature-wise variances if axis = 0 or sample-wise variances if axis = 1.

nndarray of shape (n_features,) or (n_samples,), dtype=integral

Updated number of seen samples per feature if axis=0 or number of seen features per sample if axis=1.

If weights is not None, n is a sum of the weights of the seen samples or features instead of the actual number of seen samples or features.

Notes

NaNs are ignored in the algorithm.

Examples

from sklearn.utils import sparsefuncs from scipy import sparse import numpy as np indptr = np.array([0, 3, 4, 4, 4]) indices = np.array([0, 1, 2, 2]) data = np.array([8, 1, 2, 5]) scale = np.array([2, 3, 2]) csr = sparse.csr_matrix((data, indices, indptr)) csr.todense() matrix([[8, 1, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) sparsefuncs.incr_mean_variance_axis( ... csr, axis=0, last_mean=np.zeros(3), last_var=np.zeros(3), last_n=2 ... ) (array([1.33, 0.167, 1.17]), array([8.88, 0.139, 3.47]), array([6., 6., 6.]))