incr_mean_variance_axis (original) (raw)
sklearn.utils.sparsefuncs.incr_mean_variance_axis(X, *, axis, last_mean, last_var, last_n, weights=None)[source]#
Compute incremental mean and variance along an axis on a CSR or CSC matrix.
last_mean, last_var are the statistics computed at the last step by this function. Both must be initialized to 0-arrays of the proper size, i.e. the number of features in X. last_n is the number of samples encountered until now.
Parameters:
XCSR or CSC sparse matrix of shape (n_samples, n_features)
Input data.
axis{0, 1}
Axis along which the axis should be computed.
last_meanndarray of shape (n_features,) or (n_samples,), dtype=floating
Array of means to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1.
last_varndarray of shape (n_features,) or (n_samples,), dtype=floating
Array of variances to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1.
last_nfloat or ndarray of shape (n_features,) or (n_samples,), dtype=floating
Sum of the weights seen so far, excluding the current weights If not float, it should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1. If float it corresponds to having same weights for all samples (or features).
weightsndarray of shape (n_samples,) or (n_features,), default=None
If axis is set to 0 shape is (n_samples,) or if axis is set to 1 shape is (n_features,). If it is set to None, then samples are equally weighted.
Added in version 0.24.
Returns:
meansndarray of shape (n_features,) or (n_samples,), dtype=floating
Updated feature-wise means if axis = 0 or sample-wise means if axis = 1.
variancesndarray of shape (n_features,) or (n_samples,), dtype=floating
Updated feature-wise variances if axis = 0 or sample-wise variances if axis = 1.
nndarray of shape (n_features,) or (n_samples,), dtype=integral
Updated number of seen samples per feature if axis=0 or number of seen features per sample if axis=1.
If weights is not None, n is a sum of the weights of the seen samples or features instead of the actual number of seen samples or features.
Notes
NaNs are ignored in the algorithm.
Examples
from sklearn.utils import sparsefuncs from scipy import sparse import numpy as np indptr = np.array([0, 3, 4, 4, 4]) indices = np.array([0, 1, 2, 2]) data = np.array([8, 1, 2, 5]) scale = np.array([2, 3, 2]) csr = sparse.csr_matrix((data, indices, indptr)) csr.todense() matrix([[8, 1, 2], [0, 0, 5], [0, 0, 0], [0, 0, 0]]) sparsefuncs.incr_mean_variance_axis( ... csr, axis=0, last_mean=np.zeros(3), last_var=np.zeros(3), last_n=2 ... ) (array([1.33, 0.167, 1.17]), array([8.88, 0.139, 3.47]), array([6., 6., 6.]))