GaussianMixtureModel (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.ml.PipelineStage
- org.apache.spark.ml.Transformer
- org.apache.spark.ml.Model<GaussianMixtureModel>
* * org.apache.spark.ml.clustering.GaussianMixtureModel
- org.apache.spark.ml.Model<GaussianMixtureModel>
- org.apache.spark.ml.Transformer
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging, GaussianMixtureParams, Params, HasAggregationDepth, HasFeaturesCol, HasMaxIter, HasPredictionCol, HasProbabilityCol, HasSeed, HasTol, HasWeightCol, HasTrainingSummary<GaussianMixtureSummary>, Identifiable, MLWritable
public class GaussianMixtureModel
extends Model<GaussianMixtureModel>
implements GaussianMixtureParams, MLWritable, HasTrainingSummary<GaussianMixtureSummary>
Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i with probability weights(i).
param: weights Weight for each Gaussian distribution in the mixture. This is a multinomial probability distribution over the k Gaussians, where weights(i) is the weight for Gaussian i, and weights sum to 1. param: gaussians Array of MultivariateGaussian
where gaussians(i) represents the Multivariate Gaussian (Normal) Distribution for Gaussian i
See Also:
Serialized Form
Nested Class Summary
* ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging `org.apache.spark.internal.Logging.SparkShellLoggingFilter`
Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type Method and Description IntParam aggregationDepth() Param for suggested depth for treeAggregate (>= 2). GaussianMixtureModel copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params. Param featuresCol() Param for features column name. MultivariateGaussian[] gaussians() Dataset<Row> gaussiansDF() Retrieve Gaussian distributions as a DataFrame. IntParam k() Number of independent Gaussians in the mixture model. static GaussianMixtureModel load(String path) IntParam maxIter() Param for maximum number of iterations (>= 0). int numFeatures() int predict(Vector features) Param predictionCol() Param for prediction column name. Vector predictProbability(Vector features) Param probabilityCol() Param for Column name for predicted class conditional probabilities. static MLReader<GaussianMixtureModel> read() LongParam seed() Param for random seed. GaussianMixtureModel setFeaturesCol(String value) GaussianMixtureModel setPredictionCol(String value) GaussianMixtureModel setProbabilityCol(String value) GaussianMixtureSummary summary() Gets summary of model on training set. DoubleParam tol() Param for the convergence tolerance for iterative algorithms (>= 0). String toString() Dataset<Row> transform(Dataset<?> dataset) Transforms the input dataset. StructType transformSchema(StructType schema) Check transform validity and derive the output schema from the input schema. String uid() An immutable unique ID for the object and its derivatives. Param weightCol() Param for weight column name. double[] weights() MLWriter write() Returns a MLWriter instance for this ML instance. * ### Methods inherited from class org.apache.spark.ml.[Model](../../../../../org/apache/spark/ml/Model.html "class in org.apache.spark.ml") `[hasParent](../../../../../org/apache/spark/ml/Model.html#hasParent--), [parent](../../../../../org/apache/spark/ml/Model.html#parent--), [setParent](../../../../../org/apache/spark/ml/Model.html#setParent-org.apache.spark.ml.Estimator-)` * ### Methods inherited from class org.apache.spark.ml.[Transformer](../../../../../org/apache/spark/ml/Transformer.html "class in org.apache.spark.ml") `[transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamMap-), [transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-org.apache.spark.ml.param.ParamPair...-), [transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-scala.collection.Seq-)` * ### Methods inherited from class org.apache.spark.ml.[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml") `[params](../../../../../org/apache/spark/ml/PipelineStage.html#params--)` * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.ml.clustering.[GaussianMixtureParams](../../../../../org/apache/spark/ml/clustering/GaussianMixtureParams.html "interface in org.apache.spark.ml.clustering") `[getK](../../../../../org/apache/spark/ml/clustering/GaussianMixtureParams.html#getK--), [validateAndTransformSchema](../../../../../org/apache/spark/ml/clustering/GaussianMixtureParams.html#validateAndTransformSchema-org.apache.spark.sql.types.StructType-)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared") `[getMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#getMaxIter--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared") `[getFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#getFeaturesCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared") `[getSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#getSeed--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html "interface in org.apache.spark.ml.param.shared") `[getPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html#getPredictionCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared") `[getWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#getWeightCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasProbabilityCol](../../../../../org/apache/spark/ml/param/shared/HasProbabilityCol.html "interface in org.apache.spark.ml.param.shared") `[getProbabilityCol](../../../../../org/apache/spark/ml/param/shared/HasProbabilityCol.html#getProbabilityCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html "interface in org.apache.spark.ml.param.shared") `[getTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#getTol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasAggregationDepth](../../../../../org/apache/spark/ml/param/shared/HasAggregationDepth.html "interface in org.apache.spark.ml.param.shared") `[getAggregationDepth](../../../../../org/apache/spark/ml/param/shared/HasAggregationDepth.html#getAggregationDepth--)` * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param") `[clear](../../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [defaultParamMap](../../../../../org/apache/spark/ml/param/Params.html#defaultParamMap--), [explainParam](../../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [paramMap](../../../../../org/apache/spark/ml/param/Params.html#paramMap--), [params](../../../../../org/apache/spark/ml/param/Params.html#params--), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)` * ### Methods inherited from interface org.apache.spark.ml.util.[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html "interface in org.apache.spark.ml.util") `[save](../../../../../org/apache/spark/ml/util/MLWritable.html#save-java.lang.String-)` * ### Methods inherited from interface org.apache.spark.ml.util.[HasTrainingSummary](../../../../../org/apache/spark/ml/util/HasTrainingSummary.html "interface in org.apache.spark.ml.util") `[hasSummary](../../../../../org/apache/spark/ml/util/HasTrainingSummary.html#hasSummary--), [setSummary](../../../../../org/apache/spark/ml/util/HasTrainingSummary.html#setSummary-scala.Option-)` * ### Methods inherited from interface org.apache.spark.internal.Logging `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize`
Method Detail
* #### read public static [MLReader](../../../../../org/apache/spark/ml/util/MLReader.html "class in org.apache.spark.ml.util")<[GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering")> read() * #### load public static [GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering") load(String path) * #### k public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") k() Number of independent Gaussians in the mixture model. Must be greater than 1\. Default: 2. Specified by: `[k](../../../../../org/apache/spark/ml/clustering/GaussianMixtureParams.html#k--)` in interface `[GaussianMixtureParams](../../../../../org/apache/spark/ml/clustering/GaussianMixtureParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### aggregationDepth public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") aggregationDepth() Param for suggested depth for treeAggregate (>= 2). Specified by: `[aggregationDepth](../../../../../org/apache/spark/ml/param/shared/HasAggregationDepth.html#aggregationDepth--)` in interface `[HasAggregationDepth](../../../../../org/apache/spark/ml/param/shared/HasAggregationDepth.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### tol public final [DoubleParam](../../../../../org/apache/spark/ml/param/DoubleParam.html "class in org.apache.spark.ml.param") tol() Description copied from interface: `[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#tol--)` Param for the convergence tolerance for iterative algorithms (>= 0). Specified by: `[tol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#tol--)` in interface `[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### probabilityCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> probabilityCol() Param for Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities. Specified by: `[probabilityCol](../../../../../org/apache/spark/ml/param/shared/HasProbabilityCol.html#probabilityCol--)` in interface `[HasProbabilityCol](../../../../../org/apache/spark/ml/param/shared/HasProbabilityCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### weightCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> weightCol() Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0. Specified by: `[weightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#weightCol--)` in interface `[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### predictionCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> predictionCol() Param for prediction column name. Specified by: `[predictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html#predictionCol--)` in interface `[HasPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### seed public final [LongParam](../../../../../org/apache/spark/ml/param/LongParam.html "class in org.apache.spark.ml.param") seed() Description copied from interface: `[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#seed--)` Param for random seed. Specified by: `[seed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#seed--)` in interface `[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### featuresCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> featuresCol() Param for features column name. Specified by: `[featuresCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#featuresCol--)` in interface `[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### maxIter public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") maxIter() Description copied from interface: `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` Param for maximum number of iterations (>= 0). Specified by: `[maxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` in interface `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### uid public String uid() An immutable unique ID for the object and its derivatives. Specified by: `[uid](../../../../../org/apache/spark/ml/util/Identifiable.html#uid--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")` Returns: (undocumented) * #### weights public double[] weights() * #### gaussians public [MultivariateGaussian](../../../../../org/apache/spark/ml/stat/distribution/MultivariateGaussian.html "class in org.apache.spark.ml.stat.distribution")[] gaussians() * #### numFeatures public int numFeatures() * #### setFeaturesCol public [GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering") setFeaturesCol(String value) * #### setPredictionCol public [GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering") setPredictionCol(String value) * #### setProbabilityCol public [GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering") setProbabilityCol(String value) * #### copy public [GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering") copy([ParamMap](../../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra) Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`. Specified by: `[copy](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Specified by: `[copy](../../../../../org/apache/spark/ml/Model.html#copy-org.apache.spark.ml.param.ParamMap-)` in class `[Model](../../../../../org/apache/spark/ml/Model.html "class in org.apache.spark.ml")<[GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering")>` Parameters: `extra` \- (undocumented) Returns: (undocumented) * #### transform public [Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<[Row](../../../../../org/apache/spark/sql/Row.html "interface in org.apache.spark.sql")> transform([Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset) Transforms the input dataset. Specified by: `[transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-)` in class `[Transformer](../../../../../org/apache/spark/ml/Transformer.html "class in org.apache.spark.ml")` Parameters: `dataset` \- (undocumented) Returns: (undocumented) * #### transformSchema public [StructType](../../../../../org/apache/spark/sql/types/StructType.html "class in org.apache.spark.sql.types") transformSchema([StructType](../../../../../org/apache/spark/sql/types/StructType.html "class in org.apache.spark.sql.types") schema) Check transform validity and derive the output schema from the input schema. We check validity for interactions between parameters during `transformSchema` and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by `Param.validate()`. Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks. Specified by: `[transformSchema](../../../../../org/apache/spark/ml/PipelineStage.html#transformSchema-org.apache.spark.sql.types.StructType-)` in class `[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml")` Parameters: `schema` \- (undocumented) Returns: (undocumented) * #### predict public int predict([Vector](../../../../../org/apache/spark/ml/linalg/Vector.html "interface in org.apache.spark.ml.linalg") features) * #### predictProbability public [Vector](../../../../../org/apache/spark/ml/linalg/Vector.html "interface in org.apache.spark.ml.linalg") predictProbability([Vector](../../../../../org/apache/spark/ml/linalg/Vector.html "interface in org.apache.spark.ml.linalg") features) * #### gaussiansDF public [Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<[Row](../../../../../org/apache/spark/sql/Row.html "interface in org.apache.spark.sql")> gaussiansDF() Retrieve Gaussian distributions as a DataFrame. Each row represents a Gaussian Distribution. Two columns are defined: mean and cov. Schema: ``` root |-- mean: vector (nullable = true) |-- cov: matrix (nullable = true) ``` Returns: (undocumented) * #### write public [MLWriter](../../../../../org/apache/spark/ml/util/MLWriter.html "class in org.apache.spark.ml.util") write() Returns a [MLWriter](../../../../../org/apache/spark/ml/util/MLWriter.html "class in org.apache.spark.ml.util") instance for this ML instance. For [GaussianMixtureModel](../../../../../org/apache/spark/ml/clustering/GaussianMixtureModel.html "class in org.apache.spark.ml.clustering"), this does NOT currently save the training `summary`. An option to save `summary` may be added in the future. Specified by: `[write](../../../../../org/apache/spark/ml/util/MLWritable.html#write--)` in interface `[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html "interface in org.apache.spark.ml.util")` Returns: (undocumented) * #### toString public String toString() Specified by: `[toString](../../../../../org/apache/spark/ml/util/Identifiable.html#toString--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")` Overrides: `toString` in class `Object` * #### summary public [GaussianMixtureSummary](../../../../../org/apache/spark/ml/clustering/GaussianMixtureSummary.html "class in org.apache.spark.ml.clustering") summary() Gets summary of model on training set. An exception is thrown if `hasSummary` is false. Specified by: `[summary](../../../../../org/apache/spark/ml/util/HasTrainingSummary.html#summary--)` in interface `[HasTrainingSummary](../../../../../org/apache/spark/ml/util/HasTrainingSummary.html "interface in org.apache.spark.ml.util")<[GaussianMixtureSummary](../../../../../org/apache/spark/ml/clustering/GaussianMixtureSummary.html "class in org.apache.spark.ml.clustering")>` Returns: (undocumented)