sklearn.linear_model.RANSACRegressor — scikit-learn 0.20.4 documentation (original) (raw)
Parameters:
base_estimator : object, optional
Base estimator object which implements the following methods:
- fit(X, y): Fit model to given training data and target values.
- score(X, y): Returns the mean accuracy on the given test data, which is used for the stop criterion defined by stop_score. Additionally, the score is used to decide which of two equally large consensus sets is chosen as the better one.
- predict(X): Returns predicted values using the linear model, which is used to compute residual error using loss function.
If base_estimator is None, thenbase_estimator=sklearn.linear_model.LinearRegression()
is used for target values of dtype float.
Note that the current implementation only supports regression estimators.
min_samples : int (>= 1) or float ([0, 1]), optional
Minimum number of samples chosen randomly from original data. Treated as an absolute number of samples for min_samples >= 1, treated as a relative number ceil(min_samples * X.shape[0]) formin_samples < 1. This is typically chosen as the minimal number of samples necessary to estimate the given base_estimator. By default asklearn.linear_model.LinearRegression()
estimator is assumed andmin_samples is chosen as X.shape[1] + 1
.
residual_threshold : float, optional
Maximum residual for a data sample to be classified as an inlier. By default the threshold is chosen as the MAD (median absolute deviation) of the target values y.
is_data_valid : callable, optional
This function is called with the randomly selected data before the model is fitted to it: is_data_valid(X, y). If its return value is False the current randomly chosen sub-sample is skipped.
is_model_valid : callable, optional
This function is called with the estimated model and the randomly selected data: is_model_valid(model, X, y). If its return value is False the current randomly chosen sub-sample is skipped. Rejecting samples with this function is computationally costlier than with is_data_valid. is_model_valid should therefore only be used if the estimated model is needed for making the rejection decision.
max_trials : int, optional
Maximum number of iterations for random sample selection.
max_skips : int, optional
Maximum number of iterations that can be skipped due to finding zero inliers or invalid data defined by is_data_valid
or invalid models defined by is_model_valid
.
New in version 0.19.
stop_n_inliers : int, optional
Stop iteration if at least this number of inliers are found.
stop_score : float, optional
Stop iteration if score is greater equal than this threshold.
stop_probability : float in range [0, 1], optional
RANSAC iteration stops if at least one outlier-free set of the training data is sampled in RANSAC. This requires to generate at least N samples (iterations):
N >= log(1 - probability) / log(1 - e**m)
where the probability (confidence) is typically set to high value such as 0.99 (the default) and e is the current fraction of inliers w.r.t. the total number of samples.
loss : string, callable, optional, default “absolute_loss”
String inputs, “absolute_loss” and “squared_loss” are supported which find the absolute loss and squared loss per sample respectively.
If loss
is a callable, then it should be a function that takes two arrays as inputs, the true and predicted value and returns a 1-D array with the i-th value of the array corresponding to the loss on X[i]
.
If the loss on a sample is greater than the residual_threshold
, then this sample is classified as an outlier.
random_state : int, RandomState instance or None, optional, default None
The generator used to initialize the centers. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.