API Reference — scikit-learn 0.20.4 documentation (original) (raw)

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses. For reference on concepts repeated across the API, see Glossary of Common Terms and API Elements.

sklearn.base: Base classes and utility functions

Base classes for all estimators.

Functions

base.clone(estimator[, safe]) Constructs a new estimator with the same parameters.
base.is_classifier(estimator) Returns True if the given estimator is (probably) a classifier.
base.is_regressor(estimator) Returns True if the given estimator is (probably) a regressor.
config_context(**new_config) Context manager for global scikit-learn configuration
get_config() Retrieve current values for configuration set by set_config
set_config([assume_finite, working_memory]) Set global scikit-learn configuration
show_versions() Print useful debugging information

sklearn.cluster: Clustering

The sklearn.cluster module gathers popular unsupervised clustering algorithms.

User guide: See the Clustering section for further details.

Classes

cluster.AffinityPropagation([damping, …]) Perform Affinity Propagation Clustering of data.
cluster.AgglomerativeClustering([…]) Agglomerative Clustering
cluster.Birch([threshold, branching_factor, …]) Implements the Birch clustering algorithm.
cluster.DBSCAN([eps, min_samples, metric, …]) Perform DBSCAN clustering from vector array or distance matrix.
cluster.FeatureAgglomeration([n_clusters, …]) Agglomerate features.
cluster.KMeans([n_clusters, init, n_init, …]) K-Means clustering
cluster.MiniBatchKMeans([n_clusters, init, …]) Mini-Batch K-Means clustering
cluster.MeanShift([bandwidth, seeds, …]) Mean shift clustering using a flat kernel.
cluster.SpectralClustering([n_clusters, …]) Apply clustering to a projection to the normalized laplacian.

Functions

cluster.affinity_propagation(S[, …]) Perform Affinity Propagation Clustering of data
cluster.dbscan(X[, eps, min_samples, …]) Perform DBSCAN clustering from vector array or distance matrix.
cluster.estimate_bandwidth(X[, quantile, …]) Estimate the bandwidth to use with the mean-shift algorithm.
cluster.k_means(X, n_clusters[, …]) K-means clustering algorithm.
cluster.mean_shift(X[, bandwidth, seeds, …]) Perform mean shift clustering of data using a flat kernel.
cluster.spectral_clustering(affinity[, …]) Apply clustering to a projection to the normalized laplacian.
cluster.ward_tree(X[, connectivity, …]) Ward clustering based on a Feature matrix.

sklearn.cluster.bicluster: Biclustering

Spectral biclustering algorithms.

Authors : Kemal Eren License: BSD 3 clause

User guide: See the Biclustering section for further details.

sklearn.covariance: Covariance Estimators

The sklearn.covariance module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.

User guide: See the Covariance estimation section for further details.

covariance.EmpiricalCovariance([…]) Maximum likelihood covariance estimator
covariance.EllipticEnvelope([…]) An object for detecting outliers in a Gaussian distributed dataset.
covariance.GraphicalLasso([alpha, mode, …]) Sparse inverse covariance estimation with an l1-penalized estimator.
covariance.GraphicalLassoCV([alphas, …]) Sparse inverse covariance w/ cross-validated choice of the l1 penalty.
covariance.LedoitWolf([store_precision, …]) LedoitWolf Estimator
covariance.MinCovDet([store_precision, …]) Minimum Covariance Determinant (MCD): robust estimator of covariance.
covariance.OAS([store_precision, …]) Oracle Approximating Shrinkage Estimator
covariance.ShrunkCovariance([…]) Covariance estimator with shrinkage
covariance.empirical_covariance(X[, …]) Computes the Maximum likelihood covariance estimator
covariance.graphical_lasso(emp_cov, alpha[, …]) l1-penalized covariance estimator
covariance.ledoit_wolf(X[, assume_centered, …]) Estimates the shrunk Ledoit-Wolf covariance matrix.
covariance.oas(X[, assume_centered]) Estimate covariance with the Oracle Approximating Shrinkage algorithm.
covariance.shrunk_covariance(emp_cov[, …]) Calculates a covariance matrix shrunk on the diagonal

sklearn.datasets: Datasets

The sklearn.datasets module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

User guide: See the Dataset loading utilities section for further details.

Loaders

datasets.clear_data_home([data_home]) Delete all the content of the data home cache.
datasets.dump_svmlight_file(X, y, f[, …]) Dump the dataset in svmlight / libsvm file format.
datasets.fetch_20newsgroups([data_home, …]) Load the filenames and data from the 20 newsgroups dataset (classification).
datasets.fetch_20newsgroups_vectorized([…]) Load the 20 newsgroups dataset and vectorize it into token counts (classification).
datasets.fetch_california_housing([…]) Load the California housing dataset (regression).
datasets.fetch_covtype([data_home, …]) Load the covertype dataset (classification).
datasets.fetch_kddcup99([subset, data_home, …]) Load the kddcup99 dataset (classification).
datasets.fetch_lfw_pairs([subset, …]) Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).
datasets.fetch_lfw_people([data_home, …]) Load the Labeled Faces in the Wild (LFW) people dataset (classification).
datasets.fetch_olivetti_faces([data_home, …]) Load the Olivetti faces data-set from AT&T (classification).
datasets.fetch_openml([name, version, …]) Fetch dataset from openml by name or dataset id.
datasets.fetch_rcv1([data_home, subset, …]) Load the RCV1 multilabel dataset (classification).
datasets.fetch_species_distributions([…]) Loader for species distribution dataset from Phillips et.
datasets.get_data_home([data_home]) Return the path of the scikit-learn data dir.
datasets.load_boston([return_X_y]) Load and return the boston house-prices dataset (regression).
datasets.load_breast_cancer([return_X_y]) Load and return the breast cancer wisconsin dataset (classification).
datasets.load_diabetes([return_X_y]) Load and return the diabetes dataset (regression).
datasets.load_digits([n_class, return_X_y]) Load and return the digits dataset (classification).
datasets.load_files(container_path[, …]) Load text files with categories as subfolder names.
datasets.load_iris([return_X_y]) Load and return the iris dataset (classification).
datasets.load_linnerud([return_X_y]) Load and return the linnerud dataset (multivariate regression).
datasets.load_sample_image(image_name) Load the numpy array of a single sample image
datasets.load_sample_images() Load sample images for image manipulation.
datasets.load_svmlight_file(f[, n_features, …]) Load datasets in the svmlight / libsvm format into sparse CSR matrix
datasets.load_svmlight_files(files[, …]) Load dataset from multiple files in SVMlight format
datasets.load_wine([return_X_y]) Load and return the wine dataset (classification).

Samples generator

datasets.make_biclusters(shape, n_clusters) Generate an array with constant block diagonal structure for biclustering.
datasets.make_blobs([n_samples, n_features, …]) Generate isotropic Gaussian blobs for clustering.
datasets.make_checkerboard(shape, n_clusters) Generate an array with block checkerboard structure for biclustering.
datasets.make_circles([n_samples, shuffle, …]) Make a large circle containing a smaller circle in 2d.
datasets.make_classification([n_samples, …]) Generate a random n-class classification problem.
datasets.make_friedman1([n_samples, …]) Generate the “Friedman #1” regression problem
datasets.make_friedman2([n_samples, noise, …]) Generate the “Friedman #2” regression problem
datasets.make_friedman3([n_samples, noise, …]) Generate the “Friedman #3” regression problem
datasets.make_gaussian_quantiles([mean, …]) Generate isotropic Gaussian and label samples by quantile
datasets.make_hastie_10_2([n_samples, …]) Generates data for binary classification used in Hastie et al.
datasets.make_low_rank_matrix([n_samples, …]) Generate a mostly low rank matrix with bell-shaped singular values
datasets.make_moons([n_samples, shuffle, …]) Make two interleaving half circles
datasets.make_multilabel_classification([…]) Generate a random multilabel classification problem.
datasets.make_regression([n_samples, …]) Generate a random regression problem.
datasets.make_s_curve([n_samples, noise, …]) Generate an S curve dataset.
datasets.make_sparse_coded_signal(n_samples, …) Generate a signal as a sparse combination of dictionary elements.
datasets.make_sparse_spd_matrix([dim, …]) Generate a sparse symmetric definite positive matrix.
datasets.make_sparse_uncorrelated([…]) Generate a random regression problem with sparse uncorrelated design
datasets.make_spd_matrix(n_dim[, random_state]) Generate a random symmetric, positive-definite matrix.
datasets.make_swiss_roll([n_samples, noise, …]) Generate a swiss roll dataset.

sklearn.decomposition: Matrix Decomposition

The sklearn.decomposition module includes matrix decomposition algorithms, including among others PCA, NMF or ICA. Most of the algorithms of this module can be regarded as dimensionality reduction techniques.

User guide: See the Decomposing signals in components (matrix factorization problems) section for further details.

decomposition.DictionaryLearning([…]) Dictionary learning
decomposition.FactorAnalysis([n_components, …]) Factor Analysis (FA)
decomposition.FastICA([n_components, …]) FastICA: a fast algorithm for Independent Component Analysis.
decomposition.IncrementalPCA([n_components, …]) Incremental principal components analysis (IPCA).
decomposition.KernelPCA([n_components, …]) Kernel Principal component analysis (KPCA)
decomposition.LatentDirichletAllocation([…]) Latent Dirichlet Allocation with online variational Bayes algorithm
decomposition.MiniBatchDictionaryLearning([…]) Mini-batch dictionary learning
decomposition.MiniBatchSparsePCA([…]) Mini-batch Sparse Principal Components Analysis
decomposition.NMF([n_components, init, …]) Non-Negative Matrix Factorization (NMF)
decomposition.PCA([n_components, copy, …]) Principal component analysis (PCA)
decomposition.SparsePCA([n_components, …]) Sparse Principal Components Analysis (SparsePCA)
decomposition.SparseCoder(dictionary[, …]) Sparse coding
decomposition.TruncatedSVD([n_components, …]) Dimensionality reduction using truncated SVD (aka LSA).
decomposition.dict_learning(X, n_components, …) Solves a dictionary learning matrix factorization problem.
decomposition.dict_learning_online(X[, …]) Solves a dictionary learning matrix factorization problem online.
decomposition.fastica(X[, n_components, …]) Perform Fast Independent Component Analysis.
decomposition.sparse_encode(X, dictionary[, …]) Sparse coding

sklearn.feature_selection: Feature Selection

The sklearn.feature_selection module implements feature selection algorithms. It currently includes univariate filter selection methods and the recursive feature elimination algorithm.

User guide: See the Feature selection section for further details.

feature_selection.GenericUnivariateSelect([…]) Univariate feature selector with configurable strategy.
feature_selection.SelectPercentile([…]) Select features according to a percentile of the highest scores.
feature_selection.SelectKBest([score_func, k]) Select features according to the k highest scores.
feature_selection.SelectFpr([score_func, alpha]) Filter: Select the pvalues below alpha based on a FPR test.
feature_selection.SelectFdr([score_func, alpha]) Filter: Select the p-values for an estimated false discovery rate
feature_selection.SelectFromModel(estimator) Meta-transformer for selecting features based on importance weights.
feature_selection.SelectFwe([score_func, alpha]) Filter: Select the p-values corresponding to Family-wise error rate
feature_selection.RFE(estimator[, …]) Feature ranking with recursive feature elimination.
feature_selection.RFECV(estimator[, step, …]) Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
feature_selection.VarianceThreshold([threshold]) Feature selector that removes all low-variance features.
feature_selection.chi2(X, y) Compute chi-squared stats between each non-negative feature and class.
feature_selection.f_classif(X, y) Compute the ANOVA F-value for the provided sample.
feature_selection.f_regression(X, y[, center]) Univariate linear regression tests.
feature_selection.mutual_info_classif(X, y) Estimate mutual information for a discrete target variable.
feature_selection.mutual_info_regression(X, y) Estimate mutual information for a continuous target variable.

sklearn.linear_model: Generalized Linear Models

The sklearn.linear_model module implements generalized linear models. It includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators computed with Least Angle Regression and coordinate descent. It also implements Stochastic Gradient Descent related algorithms.

User guide: See the Generalized Linear Models section for further details.

linear_model.ARDRegression([n_iter, tol, …]) Bayesian ARD regression.
linear_model.BayesianRidge([n_iter, tol, …]) Bayesian ridge regression
linear_model.ElasticNet([alpha, l1_ratio, …]) Linear regression with combined L1 and L2 priors as regularizer.
linear_model.ElasticNetCV([l1_ratio, eps, …]) Elastic Net model with iterative fitting along a regularization path.
linear_model.HuberRegressor([epsilon, …]) Linear regression model that is robust to outliers.
linear_model.Lars([fit_intercept, verbose, …]) Least Angle Regression model a.k.a.
linear_model.LarsCV([fit_intercept, …]) Cross-validated Least Angle Regression model.
linear_model.Lasso([alpha, fit_intercept, …]) Linear Model trained with L1 prior as regularizer (aka the Lasso)
linear_model.LassoCV([eps, n_alphas, …]) Lasso linear model with iterative fitting along a regularization path.
linear_model.LassoLars([alpha, …]) Lasso model fit with Least Angle Regression a.k.a.
linear_model.LassoLarsCV([fit_intercept, …]) Cross-validated Lasso, using the LARS algorithm.
linear_model.LassoLarsIC([criterion, …]) Lasso model fit with Lars using BIC or AIC for model selection
linear_model.LinearRegression([…]) Ordinary least squares Linear Regression.
linear_model.LogisticRegression([penalty, …]) Logistic Regression (aka logit, MaxEnt) classifier.
linear_model.LogisticRegressionCV([Cs, …]) Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.MultiTaskLasso([alpha, …]) Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.
linear_model.MultiTaskElasticNet([alpha, …]) Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer
linear_model.MultiTaskLassoCV([eps, …]) Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.
linear_model.MultiTaskElasticNetCV([…]) Multi-task L1/L2 ElasticNet with built-in cross-validation.
linear_model.OrthogonalMatchingPursuit([…]) Orthogonal Matching Pursuit model (OMP)
linear_model.OrthogonalMatchingPursuitCV([…]) Cross-validated Orthogonal Matching Pursuit model (OMP).
linear_model.PassiveAggressiveClassifier([…]) Passive Aggressive Classifier
linear_model.PassiveAggressiveRegressor([C, …]) Passive Aggressive Regressor
linear_model.Perceptron([penalty, alpha, …]) Read more in the User Guide.
linear_model.RANSACRegressor([…]) RANSAC (RANdom SAmple Consensus) algorithm.
linear_model.Ridge([alpha, fit_intercept, …]) Linear least squares with l2 regularization.
linear_model.RidgeClassifier([alpha, …]) Classifier using Ridge regression.
linear_model.RidgeClassifierCV([alphas, …]) Ridge classifier with built-in cross-validation.
linear_model.RidgeCV([alphas, …]) Ridge regression with built-in cross-validation.
linear_model.SGDClassifier([loss, penalty, …]) Linear classifiers (SVM, logistic regression, a.o.) with SGD training.
linear_model.SGDRegressor([loss, penalty, …]) Linear model fitted by minimizing a regularized empirical loss with SGD
linear_model.TheilSenRegressor([…]) Theil-Sen Estimator: robust multivariate regression model.
linear_model.enet_path(X, y[, l1_ratio, …]) Compute elastic net path with coordinate descent
linear_model.lars_path(X, y[, Xy, Gram, …]) Compute Least Angle Regression or Lasso path using LARS algorithm [1]
linear_model.lasso_path(X, y[, eps, …]) Compute Lasso path with coordinate descent
linear_model.logistic_regression_path(X, y) Compute a Logistic Regression model for a list of regularization parameters.
linear_model.orthogonal_mp(X, y[, …]) Orthogonal Matching Pursuit (OMP)
linear_model.orthogonal_mp_gram(Gram, Xy[, …]) Gram Orthogonal Matching Pursuit (OMP)
linear_model.ridge_regression(X, y, alpha[, …]) Solve the ridge equation by the method of normal equations.

sklearn.manifold: Manifold Learning

The sklearn.manifold module implements data embedding techniques.

User guide: See the Manifold learning section for further details.

manifold.Isomap([n_neighbors, n_components, …]) Isomap Embedding
manifold.LocallyLinearEmbedding([…]) Locally Linear Embedding
manifold.MDS([n_components, metric, n_init, …]) Multidimensional scaling
manifold.SpectralEmbedding([n_components, …]) Spectral embedding for non-linear dimensionality reduction.
manifold.TSNE([n_components, perplexity, …]) t-distributed Stochastic Neighbor Embedding.
manifold.locally_linear_embedding(X, …[, …]) Perform a Locally Linear Embedding analysis on the data.
manifold.smacof(dissimilarities[, metric, …]) Computes multidimensional scaling using the SMACOF algorithm.
manifold.spectral_embedding(adjacency[, …]) Project the sample on the first eigenvectors of the graph Laplacian.

sklearn.metrics: Metrics

See the Model evaluation: quantifying the quality of predictions section and the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

The sklearn.metrics module includes score functions, performance metrics and pairwise metrics and distance computations.

Classification metrics

See the Classification metrics section of the user guide for further details.

metrics.accuracy_score(y_true, y_pred[, …]) Accuracy classification score.
metrics.auc(x, y[, reorder]) Compute Area Under the Curve (AUC) using the trapezoidal rule
metrics.average_precision_score(y_true, y_score) Compute average precision (AP) from prediction scores
metrics.balanced_accuracy_score(y_true, y_pred) Compute the balanced accuracy
metrics.brier_score_loss(y_true, y_prob[, …]) Compute the Brier score.
metrics.classification_report(y_true, y_pred) Build a text report showing the main classification metrics
metrics.cohen_kappa_score(y1, y2[, labels, …]) Cohen’s kappa: a statistic that measures inter-annotator agreement.
metrics.confusion_matrix(y_true, y_pred[, …]) Compute confusion matrix to evaluate the accuracy of a classification
metrics.f1_score(y_true, y_pred[, labels, …]) Compute the F1 score, also known as balanced F-score or F-measure
metrics.fbeta_score(y_true, y_pred, beta[, …]) Compute the F-beta score
metrics.hamming_loss(y_true, y_pred[, …]) Compute the average Hamming loss.
metrics.hinge_loss(y_true, pred_decision[, …]) Average hinge loss (non-regularized)
metrics.jaccard_similarity_score(y_true, y_pred) Jaccard similarity coefficient score
metrics.log_loss(y_true, y_pred[, eps, …]) Log loss, aka logistic loss or cross-entropy loss.
metrics.matthews_corrcoef(y_true, y_pred[, …]) Compute the Matthews correlation coefficient (MCC)
metrics.precision_recall_curve(y_true, …) Compute precision-recall pairs for different probability thresholds
metrics.precision_recall_fscore_support(…) Compute precision, recall, F-measure and support for each class
metrics.precision_score(y_true, y_pred[, …]) Compute the precision
metrics.recall_score(y_true, y_pred[, …]) Compute the recall
metrics.roc_auc_score(y_true, y_score[, …]) Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
metrics.roc_curve(y_true, y_score[, …]) Compute Receiver operating characteristic (ROC)
metrics.zero_one_loss(y_true, y_pred[, …]) Zero-one classification loss.

Clustering metrics

See the Clustering performance evaluation section of the user guide for further details.

The sklearn.metrics.cluster submodule contains evaluation metrics for cluster analysis results. There are two forms of evaluation:

metrics.adjusted_mutual_info_score(…[, …]) Adjusted Mutual Information between two clusterings.
metrics.adjusted_rand_score(labels_true, …) Rand index adjusted for chance.
metrics.calinski_harabaz_score(X, labels) Compute the Calinski and Harabaz score.
metrics.davies_bouldin_score(X, labels) Computes the Davies-Bouldin score.
metrics.completeness_score(labels_true, …) Completeness metric of a cluster labeling given a ground truth.
metrics.cluster.contingency_matrix(…[, …]) Build a contingency matrix describing the relationship between labels.
metrics.fowlkes_mallows_score(labels_true, …) Measure the similarity of two clusterings of a set of points.
metrics.homogeneity_completeness_v_measure(…) Compute the homogeneity and completeness and V-Measure scores at once.
metrics.homogeneity_score(labels_true, …) Homogeneity metric of a cluster labeling given a ground truth.
metrics.mutual_info_score(labels_true, …) Mutual Information between two clusterings.
metrics.normalized_mutual_info_score(…[, …]) Normalized Mutual Information between two clusterings.
metrics.silhouette_score(X, labels[, …]) Compute the mean Silhouette Coefficient of all samples.
metrics.silhouette_samples(X, labels[, metric]) Compute the Silhouette Coefficient for each sample.
metrics.v_measure_score(labels_true, labels_pred) V-measure cluster labeling given a ground truth.

Pairwise metrics

See the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

metrics.pairwise.additive_chi2_kernel(X[, Y]) Computes the additive chi-squared kernel between observations in X and Y
metrics.pairwise.chi2_kernel(X[, Y, gamma]) Computes the exponential chi-squared kernel X and Y.
metrics.pairwise.cosine_similarity(X[, Y, …]) Compute cosine similarity between samples in X and Y.
metrics.pairwise.cosine_distances(X[, Y]) Compute cosine distance between samples in X and Y.
metrics.pairwise.distance_metrics() Valid metrics for pairwise_distances.
metrics.pairwise.euclidean_distances(X[, Y, …]) Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors.
metrics.pairwise.kernel_metrics() Valid metrics for pairwise_kernels
metrics.pairwise.laplacian_kernel(X[, Y, gamma]) Compute the laplacian kernel between X and Y.
metrics.pairwise.linear_kernel(X[, Y, …]) Compute the linear kernel between X and Y.
metrics.pairwise.manhattan_distances(X[, Y, …]) Compute the L1 distances between the vectors in X and Y.
metrics.pairwise.pairwise_kernels(X[, Y, …]) Compute the kernel between arrays X and optional array Y.
metrics.pairwise.polynomial_kernel(X[, Y, …]) Compute the polynomial kernel between X and Y:
metrics.pairwise.rbf_kernel(X[, Y, gamma]) Compute the rbf (gaussian) kernel between X and Y:
metrics.pairwise.sigmoid_kernel(X[, Y, …]) Compute the sigmoid kernel between X and Y:
metrics.pairwise.paired_euclidean_distances(X, Y) Computes the paired euclidean distances between X and Y
metrics.pairwise.paired_manhattan_distances(X, Y) Compute the L1 distances between the vectors in X and Y.
metrics.pairwise.paired_cosine_distances(X, Y) Computes the paired cosine distances between X and Y
metrics.pairwise.paired_distances(X, Y[, metric]) Computes the paired distances between X and Y.
metrics.pairwise_distances(X[, Y, metric, …]) Compute the distance matrix from a vector array X and optional Y.
metrics.pairwise_distances_argmin(X, Y[, …]) Compute minimum distances between one point and a set of points.
metrics.pairwise_distances_argmin_min(X, Y) Compute minimum distances between one point and a set of points.
metrics.pairwise_distances_chunked(X[, Y, …]) Generate a distance matrix chunk by chunk with optional reduction

sklearn.multiclass: Multiclass and multilabel classification

Multiclass and multilabel classification strategies

This module implements multiclass learning algorithms:

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier. It is also possible to use these estimators with multiclass estimators in the hope that their accuracy or runtime performance improves.

All classifiers in scikit-learn implement multiclass classification; you only need to use this module if you want to experiment with custom multiclass strategies.

The one-vs-the-rest meta-classifier also implements a predict_proba method, so long as such a method is implemented by the base classifier. This method returns probabilities of class membership in both the single label and multilabel case. Note that in the multilabel case, probabilities are the marginal probability that a given sample falls in the given class. As such, in the multilabel case the sum of these probabilities over all possible labels for a given sample will not sum to unity, as they do in the single label case.

User guide: See the Multiclass and multilabel algorithms section for further details.

multiclass.OneVsRestClassifier(estimator[, …]) One-vs-the-rest (OvR) multiclass/multilabel strategy
multiclass.OneVsOneClassifier(estimator[, …]) One-vs-one multiclass strategy
multiclass.OutputCodeClassifier(estimator[, …]) (Error-Correcting) Output-Code multiclass strategy

sklearn.naive_bayes: Naive Bayes

The sklearn.naive_bayes module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes’ theorem with strong (naive) feature independence assumptions.

User guide: See the Naive Bayes section for further details.

naive_bayes.BernoulliNB([alpha, binarize, …]) Naive Bayes classifier for multivariate Bernoulli models.
naive_bayes.GaussianNB([priors, var_smoothing]) Gaussian Naive Bayes (GaussianNB)
naive_bayes.MultinomialNB([alpha, …]) Naive Bayes classifier for multinomial models
naive_bayes.ComplementNB([alpha, fit_prior, …]) The Complement Naive Bayes classifier described in Rennie et al.

sklearn.pipeline: Pipeline

The sklearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms and estimators.

pipeline.FeatureUnion(transformer_list[, …]) Concatenates results of multiple transformer objects.
pipeline.Pipeline(steps[, memory]) Pipeline of transforms with a final estimator.
pipeline.make_pipeline(*steps, **kwargs) Construct a Pipeline from the given estimators.
pipeline.make_union(*transformers, **kwargs) Construct a FeatureUnion from the given transformers.

sklearn.preprocessing: Preprocessing and Normalization

The sklearn.preprocessing module includes scaling, centering, normalization, binarization and imputation methods.

User guide: See the Preprocessing data section for further details.

preprocessing.Binarizer([threshold, copy]) Binarize data (set feature values to 0 or 1) according to a threshold
preprocessing.FunctionTransformer([func, …]) Constructs a transformer from an arbitrary callable.
preprocessing.KBinsDiscretizer([n_bins, …]) Bin continuous data into intervals.
preprocessing.KernelCenterer() Center a kernel matrix
preprocessing.LabelBinarizer([neg_label, …]) Binarize labels in a one-vs-all fashion
preprocessing.LabelEncoder Encode labels with value between 0 and n_classes-1.
preprocessing.MultiLabelBinarizer([classes, …]) Transform between iterable of iterables and a multilabel format
preprocessing.MaxAbsScaler([copy]) Scale each feature by its maximum absolute value.
preprocessing.MinMaxScaler([feature_range, copy]) Transforms features by scaling each feature to a given range.
preprocessing.Normalizer([norm, copy]) Normalize samples individually to unit norm.
preprocessing.OneHotEncoder([n_values, …]) Encode categorical integer features as a one-hot numeric array.
preprocessing.OrdinalEncoder([categories, dtype]) Encode categorical features as an integer array.
preprocessing.PolynomialFeatures([degree, …]) Generate polynomial and interaction features.
preprocessing.PowerTransformer([method, …]) Apply a power transform featurewise to make data more Gaussian-like.
preprocessing.QuantileTransformer([…]) Transform features using quantiles information.
preprocessing.RobustScaler([with_centering, …]) Scale features using statistics that are robust to outliers.
preprocessing.StandardScaler([copy, …]) Standardize features by removing the mean and scaling to unit variance
preprocessing.add_dummy_feature(X[, value]) Augment dataset with an additional dummy feature.
preprocessing.binarize(X[, threshold, copy]) Boolean thresholding of array-like or scipy.sparse matrix
preprocessing.label_binarize(y, classes[, …]) Binarize labels in a one-vs-all fashion
preprocessing.maxabs_scale(X[, axis, copy]) Scale each feature to the [-1, 1] range without breaking the sparsity.
preprocessing.minmax_scale(X[, …]) Transforms features by scaling each feature to a given range.
preprocessing.normalize(X[, norm, axis, …]) Scale input vectors individually to unit norm (vector length).
preprocessing.quantile_transform(X[, axis, …]) Transform features using quantiles information.
preprocessing.robust_scale(X[, axis, …]) Standardize a dataset along any axis
preprocessing.scale(X[, axis, with_mean, …]) Standardize a dataset along any axis
preprocessing.power_transform(X[, method, …]) Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like.

sklearn.random_projection: Random projection

Random Projection transformers

Random Projections are a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes.

The dimensions and distribution of Random Projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset.

The main theoretical result behind the efficiency of random projection is theJohnson-Lindenstrauss lemma (quoting Wikipedia):

In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.

User guide: See the Random Projection section for further details.

random_projection.GaussianRandomProjection([…]) Reduce dimensionality through Gaussian random projection
random_projection.SparseRandomProjection([…]) Reduce dimensionality through sparse random projection
random_projection.johnson_lindenstrauss_min_dim(…) Find a ‘safe’ number of components to randomly project to

sklearn.semi_supervised Semi-Supervised Learning

The sklearn.semi_supervised module implements semi-supervised learning algorithms. These algorithms utilized small amounts of labeled data and large amounts of unlabeled data for classification tasks. This module includes Label Propagation.

User guide: See the Semi-Supervised section for further details.

semi_supervised.LabelPropagation([kernel, …]) Label Propagation classifier
semi_supervised.LabelSpreading([kernel, …]) LabelSpreading model for semi-supervised learning

sklearn.svm: Support Vector Machines

The sklearn.svm module includes Support Vector Machine algorithms.

User guide: See the Support Vector Machines section for further details.

Estimators

svm.LinearSVC([penalty, loss, dual, tol, C, …]) Linear Support Vector Classification.
svm.LinearSVR([epsilon, tol, C, loss, …]) Linear Support Vector Regression.
svm.NuSVC([nu, kernel, degree, gamma, …]) Nu-Support Vector Classification.
svm.NuSVR([nu, C, kernel, degree, gamma, …]) Nu Support Vector Regression.
svm.OneClassSVM([kernel, degree, gamma, …]) Unsupervised Outlier Detection.
svm.SVC([C, kernel, degree, gamma, coef0, …]) C-Support Vector Classification.
svm.SVR([kernel, degree, gamma, coef0, tol, …]) Epsilon-Support Vector Regression.
svm.l1_min_c(X, y[, loss, fit_intercept, …]) Return the lowest bound for C such that for C in (l1_min_C, infinity) the model is guaranteed not to be empty.

sklearn.utils: Utilities

The sklearn.utils module includes various utilities.

Developer guide: See the Utilities for Developers page for further details.

utils.testing.mock_mldata_urlopen(*args, …) Object that mocks the urlopen function to fake requests to mldata.
utils.arrayfuncs.cholesky_delete
utils.arrayfuncs.min_pos Find the minimum value of an array over positive values
utils.as_float_array(X[, copy, force_all_finite]) Converts an array-like to an array of floats.
utils.assert_all_finite(X[, allow_nan]) Throw a ValueError if X contains NaN or infinity.
utils.bench.total_seconds(delta) helper function to emulate function total_seconds,
utils.check_X_y(X, y[, accept_sparse, …]) Input validation for standard estimators.
utils.check_array(array[, accept_sparse, …]) Input validation on an array, list, sparse matrix or similar.
utils.check_consistent_length(*arrays) Check that all arrays have consistent first dimensions.
utils.check_random_state(seed) Turn seed into a np.random.RandomState instance
utils.class_weight.compute_class_weight(…) Estimate class weights for unbalanced datasets.
utils.class_weight.compute_sample_weight(…) Estimate sample weights by class for unbalanced datasets.
utils.deprecated([extra]) Decorator to mark a function or class as deprecated.
utils.estimator_checks.check_estimator(Estimator) Check if estimator adheres to scikit-learn conventions.
utils.extmath.safe_sparse_dot(a, b[, …]) Dot product that handle the sparse matrix case correctly
utils.extmath.randomized_range_finder(A, …) Computes an orthonormal matrix whose range approximates the range of A.
utils.extmath.randomized_svd(M, n_components) Computes a truncated randomized SVD
utils.extmath.fast_logdet(A) Compute log(det(A)) for A symmetric
utils.extmath.density(w, **kwargs) Compute density of a sparse vector
utils.extmath.weighted_mode(a, w[, axis]) Returns an array of the weighted modal (most common) value in a
utils.gen_even_slices(n, n_packs[, n_samples]) Generator to create n_packs slices going up to n.
utils.graph.single_source_shortest_path_length(…) Return the shortest path length from source to all reachable nodes.
utils.graph_shortest_path.graph_shortest_path Perform a shortest-path graph search on a positive directed or undirected graph.
utils.indexable(*iterables) Make arrays indexable for cross-validation.
utils.multiclass.type_of_target(y) Determine the type of data indicated by the target.
utils.multiclass.is_multilabel(y) Check if y is in a multilabel format.
utils.multiclass.unique_labels(*ys) Extract an ordered array of unique labels
utils.murmurhash3_32 Compute the 32bit murmurhash3 of key at seed.
utils.resample(*arrays, **options) Resample arrays or sparse matrices in a consistent way
utils.safe_indexing(X, indices) Return items or rows from X using indices.
utils.safe_mask(X, mask) Return a mask which is safe to use on X.
utils.safe_sqr(X[, copy]) Element wise squaring of array-likes and sparse matrices.
utils.shuffle(*arrays, **options) Shuffle arrays or sparse matrices in a consistent way
utils.sparsefuncs.incr_mean_variance_axis(X, …) Compute incremental mean and variance along an axix on a CSR or CSC matrix.
utils.sparsefuncs.inplace_column_scale(X, scale) Inplace column scaling of a CSC/CSR matrix.
utils.sparsefuncs.inplace_row_scale(X, scale) Inplace row scaling of a CSR or CSC matrix.
utils.sparsefuncs.inplace_swap_row(X, m, n) Swaps two rows of a CSC/CSR matrix in-place.
utils.sparsefuncs.inplace_swap_column(X, m, n) Swaps two columns of a CSC/CSR matrix in-place.
utils.sparsefuncs.mean_variance_axis(X, axis) Compute mean and variance along an axix on a CSR or CSC matrix
utils.sparsefuncs.inplace_csr_column_scale(X, …) Inplace column scaling of a CSR matrix.
utils.sparsefuncs_fast.inplace_csr_row_normalize_l1 Inplace row normalize using the l1 norm
utils.sparsefuncs_fast.inplace_csr_row_normalize_l2 Inplace row normalize using the l2 norm
utils.random.sample_without_replacement Sample integers without replacement.
utils.validation.check_is_fitted(estimator, …) Perform is_fitted validation for estimator.
utils.validation.check_memory(memory) Check that memory is joblib.Memory-like.
utils.validation.check_symmetric(array[, …]) Make sure that array is 2D, square and symmetric.
utils.validation.column_or_1d(y[, warn]) Ravel column or 1d numpy array, else raises an error
utils.validation.has_fit_parameter(…) Checks whether the estimator’s fit method supports the given parameter.
utils.testing.assert_in Just like self.assertTrue(a in b), but with a nicer default message.
utils.testing.assert_not_in Just like self.assertTrue(a not in b), but with a nicer default message.
utils.testing.assert_raise_message(…) Helper function to test the message raised in an exception.
utils.testing.all_estimators([…]) Get a list of all estimators from sklearn.

Utilities from joblib:

utils.parallel_backend(backend[, n_jobs]) Change the default backend used by Parallel inside a with block.
utils.register_parallel_backend(name, factory) Register a new Parallel backend factory.

Recently deprecated

To be removed in 0.23

utils.Memory(*args, **kwargs) Attributes:
utils.Parallel(*args, **kwargs) Methods
utils.cpu_count() DEPRECATED: deprecated in version 0.20.1 to be removed in version 0.23.
utils.delayed(function[, check_pickle]) DEPRECATED: deprecated in version 0.20.1 to be removed in version 0.23.

To be removed in 0.22

covariance.GraphLasso(*args, **kwargs) Sparse inverse covariance estimation with an l1-penalized estimator.
covariance.GraphLassoCV(*args, **kwargs) Sparse inverse covariance w/ cross-validated choice of the l1 penalty.
preprocessing.Imputer(*args, **kwargs) Imputation transformer for completing missing values.
covariance.graph_lasso(emp_cov, alpha[, …]) DEPRECATED: The ‘graph_lasso’ was renamed to ‘graphical_lasso’ in version 0.20 and will be removed in 0.22.
datasets.fetch_mldata(dataname[, …]) DEPRECATED: fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22