Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture (original) (raw)

Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's mvnormalmixEM(). Users can call summary to print a summary of the fitted model,predict to make predictions on new data, and write.ml/read.mlto save/load fitted models.

Usage

spark.gaussianMixture(data, formula, ...)

# S4 method for class 'SparkDataFrame,formula'
spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01)

# S4 method for class 'GaussianMixtureModel'
summary(object)

# S4 method for class 'GaussianMixtureModel'
predict(object, newData)

# S4 method for class 'GaussianMixtureModel,character'
write.ml(object, path, overwrite = FALSE)

Arguments

data

a SparkDataFrame for training.

formula

a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.

...

additional arguments passed to the method.

number of independent Gaussians in the mixture model.

maxIter

maximum iteration number.

tol

the convergence tolerance.

object

a fitted gaussian mixture model.

newData

a SparkDataFrame for testing.

path

the directory where the model is saved.

overwrite

overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.gaussianMixture returns a fitted multivariate gaussian mixture model.

summary returns summary of the fitted model, which is a list. The list includes the model's lambda (lambda), mu (mu),sigma (sigma), loglik (loglik), and posterior (posterior).

predict returns a SparkDataFrame containing predicted labels in a column named "prediction".

Note

spark.gaussianMixture since 2.1.0

summary(GaussianMixtureModel) since 2.1.0

predict(GaussianMixtureModel) since 2.1.0

write.ml(GaussianMixtureModel, character) since 2.1.0

Examples