SparseCoder (original) (raw)

class sklearn.decomposition.SparseCoder(dictionary, *, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, split_sign=False, n_jobs=None, positive_code=False, transform_max_iter=1000)[source]#

Sparse coding.

Finds a sparse representation of data against a fixed, precomputed dictionary.

Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that:

Read more in the User Guide.

Parameters:

dictionaryndarray of shape (n_components, n_features)

The dictionary atoms used for sparse coding. Lines are assumed to be normalized to unit norm.

transform_algorithm{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}, default=’omp’

Algorithm used to transform the data:

transform_n_nonzero_coefsint, default=None

Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm='lars' and algorithm='omp'and is overridden by alpha in the omp case. If None, thentransform_n_nonzero_coefs=int(n_features / 10).

transform_alphafloat, default=None

If algorithm='lasso_lars' or algorithm='lasso_cd', alpha is the penalty applied to the L1 norm. If algorithm='threshold', alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm='omp', alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overridesn_nonzero_coefs. If None, default to 1.

split_signbool, default=False

Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.

n_jobsint, default=None

Number of parallel jobs to run.None means 1 unless in a joblib.parallel_backend context.-1 means using all processors. See Glossaryfor more details.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

Added in version 0.20.

transform_max_iterint, default=1000

Maximum number of iterations to perform if algorithm='lasso_cd' orlasso_lars.

Added in version 0.22.

Attributes:

n_components_int

Number of atoms.

n_features_in_int

Number of features seen during fit.

**feature_names_in_**ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when Xhas feature names that are all strings.

Added in version 1.0.

Examples

import numpy as np from sklearn.decomposition import SparseCoder X = np.array([[-1, -1, -1], [0, 0, 3]]) dictionary = np.array( ... [[0, 1, 0], ... [-1, -1, 2], ... [1, 1, 1], ... [0, 1, 1], ... [0, 2, 1]], ... dtype=np.float64 ... ) coder = SparseCoder( ... dictionary=dictionary, transform_algorithm='lasso_lars', ... transform_alpha=1e-10, ... ) coder.transform(X) array([[ 0., 0., -1., 0., 0.], [ 0., 1., 1., 0., 0.]])

fit(X, y=None)[source]#

Do nothing and return the estimator unchanged.

This method is just there to implement the usual API and hence work in pipelines.

Parameters:

XIgnored

Not used, present for API consistency by convention.

yIgnored

Not used, present for API consistency by convention.

Returns:

selfobject

Returns the instance itself.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_paramsand returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

The feature names out will prefixed by the lowercased class name. For example, if the transformer outputs 3 features, then the feature names out are: ["class_name0", "class_name1", "class_name2"].

Parameters:

input_featuresarray-like of str or None, default=None

Only used to validate feature names with the names seen in fit.

Returns:

feature_names_outndarray of str objects

Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict

Parameter names mapped to their values.

property n_components_#

Number of atoms.

property n_features_in_#

Number of features seen during fit.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output APIfor an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

Added in version 1.4: "polars" option was added.

Returns:

selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict

Estimator parameters.

Returns:

selfestimator instance

Estimator instance.

transform(X, y=None)[source]#

Encode the data as a sparse combination of the dictionary atoms.

Coding method is determined by the object parametertransform_algorithm.

Parameters:

Xndarray of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yIgnored

Not used, present for API consistency by convention.

Returns:

X_newndarray of shape (n_samples, n_components)

Transformed data.