Estimator (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.ml.PipelineStage
- org.apache.spark.ml.Estimator
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging, Params, Identifiable
Direct Known Subclasses:
ALS, BisectingKMeans, BucketedRandomProjectionLSH, ChiSqSelector, CountVectorizer, CrossValidator, FPGrowth, GaussianMixture, IDF, Imputer, IsotonicRegression, KMeans, LDA, MaxAbsScaler, MinHashLSH, MinMaxScaler, OneHotEncoder, OneVsRest, PCA, Pipeline, Predictor, QuantileDiscretizer, RFormula, RobustScaler, StandardScaler, StringIndexer, TrainValidationSplit, UnivariateFeatureSelector, VarianceThresholdSelector, VectorIndexer, Word2Vec
public abstract class Estimator<M extends Model>
extends PipelineStage
Abstract class for estimators that fit models to data.
See Also:
Serialized Form
Nested Class Summary
* ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging `org.apache.spark.internal.Logging.SparkShellLoggingFilter`
Constructor Summary
Constructors
Constructor and Description Estimator() Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type Method and Description abstract Estimator<M> copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params. abstract M fit(Dataset<?> dataset) Fits a model to the input data. M fit(Dataset<?> dataset,ParamMap paramMap) Fits a single model to the input data with provided parameter map. M fit(Dataset dataset,[ParamPair](../../../../org/apache/spark/ml/param/ParamPair.html "class in org.apache.spark.ml.param") firstParamPair,ParamPair<?>... otherParamPairs) Fits a single model to the input data with optional parameters. M fit(Dataset dataset,[ParamPair](../../../../org/apache/spark/ml/param/ParamPair.html "class in org.apache.spark.ml.param") firstParamPair, scala.collection.Seq<ParamPair<?>> otherParamPairs) Fits a single model to the input data with optional parameters. scala.collection.Seq<M> fit(Dataset<?> dataset, scala.collection.Seq<ParamMap> paramMaps) Fits multiple models to the input data with multiple sets of parameters. * ### Methods inherited from class org.apache.spark.ml.[PipelineStage](../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml") `[params](../../../../org/apache/spark/ml/PipelineStage.html#params--), [transformSchema](../../../../org/apache/spark/ml/PipelineStage.html#transformSchema-org.apache.spark.sql.types.StructType-)` * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param") `[clear](../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [explainParam](../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [set](../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)` * ### Methods inherited from interface org.apache.spark.ml.util.[Identifiable](../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util") `[toString](../../../../org/apache/spark/ml/util/Identifiable.html#toString--), [uid](../../../../org/apache/spark/ml/util/Identifiable.html#uid--)` * ### Methods inherited from interface org.apache.spark.internal.Logging `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize`
Constructor Detail
* #### Estimator public Estimator()
Method Detail
* #### copy public abstract [Estimator](../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml")<[M](../../../../org/apache/spark/ml/Estimator.html "type parameter in Estimator")> copy([ParamMap](../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra) Description copied from interface: `[Params](../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`. Specified by: `[copy](../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Specified by: `[copy](../../../../org/apache/spark/ml/PipelineStage.html#copy-org.apache.spark.ml.param.ParamMap-)` in class `[PipelineStage](../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml")` Parameters: `extra` \- (undocumented) Returns: (undocumented) * #### fit public [M](../../../../org/apache/spark/ml/Estimator.html "type parameter in Estimator") fit([Dataset](../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset, [ParamPair](../../../../org/apache/spark/ml/param/ParamPair.html "class in org.apache.spark.ml.param")<?> firstParamPair, [ParamPair](../../../../org/apache/spark/ml/param/ParamPair.html "class in org.apache.spark.ml.param")<?>... otherParamPairs) Fits a single model to the input data with optional parameters. Parameters: `dataset` \- input dataset `firstParamPair` \- the first param pair, overrides embedded params `otherParamPairs` \- other param pairs. These values override any specified in this Estimator's embedded ParamMap. Returns: fitted model * #### fit public [M](../../../../org/apache/spark/ml/Estimator.html "type parameter in Estimator") fit([Dataset](../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset, [ParamPair](../../../../org/apache/spark/ml/param/ParamPair.html "class in org.apache.spark.ml.param")<?> firstParamPair, scala.collection.Seq<[ParamPair](../../../../org/apache/spark/ml/param/ParamPair.html "class in org.apache.spark.ml.param")<?>> otherParamPairs) Fits a single model to the input data with optional parameters. Parameters: `dataset` \- input dataset `firstParamPair` \- the first param pair, overrides embedded params `otherParamPairs` \- other param pairs. These values override any specified in this Estimator's embedded ParamMap. Returns: fitted model * #### fit public [M](../../../../org/apache/spark/ml/Estimator.html "type parameter in Estimator") fit([Dataset](../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset, [ParamMap](../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") paramMap) Fits a single model to the input data with provided parameter map. Parameters: `dataset` \- input dataset `paramMap` \- Parameter map. These values override any specified in this Estimator's embedded ParamMap. Returns: fitted model * #### fit public abstract [M](../../../../org/apache/spark/ml/Estimator.html "type parameter in Estimator") fit([Dataset](../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset) Fits a model to the input data. Parameters: `dataset` \- (undocumented) Returns: (undocumented) * #### fit public scala.collection.Seq<[M](../../../../org/apache/spark/ml/Estimator.html "type parameter in Estimator")> fit([Dataset](../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset, scala.collection.Seq<[ParamMap](../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param")> paramMaps) Fits multiple models to the input data with multiple sets of parameters. The default implementation uses a for loop on each parameter map. Subclasses could override this to optimize multi-model training. Parameters: `dataset` \- input dataset `paramMaps` \- An array of parameter maps. These values override any specified in this Estimator's embedded ParamMap. Returns: fitted models, matching the input parameter maps