SelfTrainingClassifier (original) (raw)
class sklearn.semi_supervised.SelfTrainingClassifier(estimator=None, base_estimator='deprecated', threshold=0.75, criterion='threshold', k_best=10, max_iter=10, verbose=False)[source]#
Self-training classifier.
This metaestimator allows a given supervised classifier to function as a semi-supervised classifier, allowing it to learn from unlabeled data. It does this by iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.
The classifier will continue iterating until either max_iter is reached, or no pseudo-labels were added to the training set in the previous iteration.
Read more in the User Guide.
Parameters:
estimatorestimator object
An estimator object implementing fit
and predict_proba
. Invoking the fit
method will fit a clone of the passed estimator, which will be stored in the estimator_
attribute.
Added in version 1.6: estimator
was added to replace base_estimator
.
base_estimatorestimator object
An estimator object implementing fit
and predict_proba
. Invoking the fit
method will fit a clone of the passed estimator, which will be stored in the estimator_
attribute.
Deprecated since version 1.6: base_estimator
was deprecated in 1.6 and will be removed in 1.8. Use estimator
instead.
thresholdfloat, default=0.75
The decision threshold for use with criterion='threshold'
. Should be in [0, 1). When using the 'threshold'
criterion, awell calibrated classifier should be used.
criterion{‘threshold’, ‘k_best’}, default=’threshold’
The selection criterion used to select which labels to add to the training set. If 'threshold'
, pseudo-labels with prediction probabilities above threshold
are added to the dataset. If 'k_best'
, the k_best
pseudo-labels with highest prediction probabilities are added to the dataset. When using the ‘threshold’ criterion, awell calibrated classifier should be used.
k_bestint, default=10
The amount of samples to add in each iteration. Only used whencriterion='k_best'
.
max_iterint or None, default=10
Maximum number of iterations allowed. Should be greater than or equal to 0. If it is None
, the classifier will continue to predict labels until no new pseudo-labels are added, or all unlabeled samples have been labeled.
verbosebool, default=False
Enable verbose output.
Attributes:
**estimator_**estimator object
The fitted estimator.
**classes_**ndarray or list of ndarray of shape (n_classes,)
Class labels for each output. (Taken from the trainedestimator_
).
**transduction_**ndarray of shape (n_samples,)
The labels used for the final fit of the classifier, including pseudo-labels added during fit.
**labeled_iter_**ndarray of shape (n_samples,)
The iteration in which each sample was labeled. When a sample has iteration 0, the sample was already labeled in the original dataset. When a sample has iteration -1, the sample was not labeled in any iteration.
**n_features_in_**int
Number of features seen during fit.
Added in version 0.24.
**feature_names_in_**ndarray of shape (n_features_in_
,)
Names of features seen during fit. Defined only when X
has feature names that are all strings.
Added in version 1.0.
**n_iter_**int
The number of rounds of self-training, that is the number of times the base estimator is fitted on relabeled variants of the training set.
termination_condition_{‘max_iter’, ‘no_change’, ‘all_labeled’}
The reason that fitting was stopped.
'max_iter'
:n_iter_
reachedmax_iter
.'no_change'
: no new labels were predicted.'all_labeled'
: all unlabeled samples were labeled beforemax_iter
was reached.
References
Examples
import numpy as np from sklearn import datasets from sklearn.semi_supervised import SelfTrainingClassifier from sklearn.svm import SVC rng = np.random.RandomState(42) iris = datasets.load_iris() random_unlabeled_points = rng.rand(iris.target.shape[0]) < 0.3 iris.target[random_unlabeled_points] = -1 svc = SVC(probability=True, gamma="auto") self_training_model = SelfTrainingClassifier(svc) self_training_model.fit(iris.data, iris.target) SelfTrainingClassifier(...)
decision_function(X, **params)[source]#
Call decision function of the estimator
.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Array representing the data.
**paramsdict of str -> object
Parameters to pass to the underlying estimator’sdecision_function
method.
Added in version 1.6: Only available if enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
Returns:
yndarray of shape (n_samples, n_features)
Result of the decision function of the estimator
.
Fit self-training classifier using X
, y
as training data.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Array representing the data.
y{array-like, sparse matrix} of shape (n_samples,)
Array representing the labels. Unlabeled samples should have the label -1.
**paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.6: Only available if enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
Returns:
selfobject
Fitted estimator.
get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Added in version 1.6.
Returns:
routingMetadataRouter
A MetadataRouter encapsulating routing information.
get_params(deep=True)[source]#
Get parameters for this estimator.
Parameters:
deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
paramsdict
Parameter names mapped to their values.
Predict the classes of X
.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Array representing the data.
**paramsdict of str -> object
Parameters to pass to the underlying estimator’s predict
method.
Added in version 1.6: Only available if enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
Returns:
yndarray of shape (n_samples,)
Array with predicted labels.
predict_log_proba(X, **params)[source]#
Predict log probability for each possible outcome.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Array representing the data.
**paramsdict of str -> object
Parameters to pass to the underlying estimator’spredict_log_proba
method.
Added in version 1.6: Only available if enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
Returns:
yndarray of shape (n_samples, n_features)
Array with log prediction probabilities.
predict_proba(X, **params)[source]#
Predict probability for each possible outcome.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Array representing the data.
**paramsdict of str -> object
Parameters to pass to the underlying estimator’spredict_proba
method.
Added in version 1.6: Only available if enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
Returns:
yndarray of shape (n_samples, n_features)
Array with prediction probabilities.
score(X, y, **params)[source]#
Call score on the estimator
.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Array representing the data.
yarray-like of shape (n_samples,)
Array representing the labels.
**paramsdict of str -> object
Parameters to pass to the underlying estimator’s score
method.
Added in version 1.6: Only available if enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
Returns:
scorefloat
Result of calling score on the estimator
.
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Parameters:
**paramsdict
Estimator parameters.
Returns:
selfestimator instance
Estimator instance.