Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture (original) (raw)
Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's mvnormalmixEM(). Users can call summary
to print a summary of the fitted model,predict
to make predictions on new data, and write.ml
/read.ml
to save/load fitted models.
Usage
spark.gaussianMixture(data, formula, ...)
# S4 method for class 'SparkDataFrame,formula'
spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01)
# S4 method for class 'GaussianMixtureModel'
summary(object)
# S4 method for class 'GaussianMixtureModel'
predict(object, newData)
# S4 method for class 'GaussianMixtureModel,character'
write.ml(object, path, overwrite = FALSE)
Arguments
a SparkDataFrame for training.
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.
additional arguments passed to the method.
number of independent Gaussians in the mixture model.
maximum iteration number.
the convergence tolerance.
a fitted gaussian mixture model.
a SparkDataFrame for testing.
the directory where the model is saved.
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.
Value
spark.gaussianMixture
returns a fitted multivariate gaussian mixture model.
summary
returns summary of the fitted model, which is a list. The list includes the model's lambda
(lambda), mu
(mu),sigma
(sigma), loglik
(loglik), and posterior
(posterior).
predict
returns a SparkDataFrame containing predicted labels in a column named "prediction".
Note
spark.gaussianMixture since 2.1.0
summary(GaussianMixtureModel) since 2.1.0
predict(GaussianMixtureModel) since 2.1.0
write.ml(GaussianMixtureModel, character) since 2.1.0
See also
Examples
if (FALSE) { # \dontrun{
sparkR.session()
library(mvtnorm)
set.seed(100)
a <- rmvnorm(4, c(0, 0))
b <- rmvnorm(6, c(3, 4))
data <- rbind(a, b)
df <- createDataFrame(as.data.frame(data))
model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
summary(model)
# fitted values on training data
fitted <- predict(model, df)
head(select(fitted, "V1", "prediction"))
# save fitted model to input path
path <- "path/to/model"
write.ml(model, path)
# can also read back the saved model and print
savedModel <- read.ml(path)
summary(savedModel)
} # }