dict_learning (original) (raw)
sklearn.decomposition.dict_learning(X, n_components, *, alpha, max_iter=100, tol=1e-08, method='lars', n_jobs=None, dict_init=None, code_init=None, callback=None, verbose=False, random_state=None, return_n_iter=False, positive_dict=False, positive_code=False, method_max_iter=1000)[source]#
Solve a dictionary learning matrix factorization problem.
Finds the best dictionary and the corresponding sparse code for approximating the data matrix X by solving:
(U^*, V^*) = argmin 0.5 || X - U V ||_Fro^2 + alpha * || U ||_1,1 (U,V) with || V_k ||_2 = 1 for all 0 <= k < n_components
where V is the dictionary and U is the sparse code. ||.||_Fro stands for the Frobenius norm and ||.||_1,1 stands for the entry-wise matrix norm which is the sum of the absolute values of all the entries in the matrix.
Read more in the User Guide.
Parameters:
Xarray-like of shape (n_samples, n_features)
Data matrix.
n_componentsint
Number of dictionary atoms to extract.
alphaint or float
Sparsity controlling parameter.
max_iterint, default=100
Maximum number of iterations to perform.
tolfloat, default=1e-8
Tolerance for the stopping condition.
method{‘lars’, ‘cd’}, default=’lars’
The method used:
'lars'
: uses the least angle regression method to solve the lasso
problem (linear_model.lars_path
);'cd'
: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso
). Lars will be faster if the estimated components are sparse.
n_jobsint, default=None
Number of parallel jobs to run.None
means 1 unless in a joblib.parallel_backend context.-1
means using all processors. See Glossaryfor more details.
dict_initndarray of shape (n_components, n_features), default=None
Initial value for the dictionary for warm restart scenarios. Only used if code_init
and dict_init
are not None.
code_initndarray of shape (n_samples, n_components), default=None
Initial value for the sparse code for warm restart scenarios. Only used if code_init
and dict_init
are not None.
callbackcallable, default=None
Callable that gets invoked every five iterations.
verbosebool, default=False
To control the verbosity of the procedure.
random_stateint, RandomState instance or None, default=None
Used for randomly initializing the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.
return_n_iterbool, default=False
Whether or not to return the number of iterations.
positive_dictbool, default=False
Whether to enforce positivity when finding the dictionary.
Added in version 0.20.
positive_codebool, default=False
Whether to enforce positivity when finding the code.
Added in version 0.20.
method_max_iterint, default=1000
Maximum number of iterations to perform.
Added in version 0.22.
Returns:
codendarray of shape (n_samples, n_components)
The sparse code factor in the matrix factorization.
dictionaryndarray of shape (n_components, n_features),
The dictionary factor in the matrix factorization.
errorsarray
Vector of errors at each iteration.
n_iterint
Number of iterations run. Returned only if return_n_iter
is set to True.
Examples
import numpy as np from sklearn.datasets import make_sparse_coded_signal from sklearn.decomposition import dict_learning X, _, _ = make_sparse_coded_signal( ... n_samples=30, n_components=15, n_features=20, n_nonzero_coefs=10, ... random_state=42, ... ) U, V, errors = dict_learning(X, n_components=15, alpha=0.1, random_state=42)
We can check the level of sparsity of U
:
np.mean(U == 0) np.float64(0.6...)
We can compare the average squared euclidean norm of the reconstruction error of the sparse coded signal relative to the squared euclidean norm of the original signal:
X_hat = U @ V np.mean(np.sum((X_hat - X) ** 2, axis=1) / np.sum(X ** 2, axis=1)) np.float64(0.01...)