DistributedLDAModel (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.ml.PipelineStage
- org.apache.spark.ml.Transformer
- org.apache.spark.ml.Model<LDAModel>
* * org.apache.spark.ml.clustering.LDAModel
* * org.apache.spark.ml.clustering.DistributedLDAModel
- org.apache.spark.ml.Model<LDAModel>
- org.apache.spark.ml.Transformer
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging, LDAParams, Params, HasCheckpointInterval, HasFeaturesCol, HasMaxIter, HasSeed, Identifiable, MLWritable
public class DistributedLDAModel
extends LDAModel
Distributed model fitted by LDA. This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel
as a lazy val, but keepingcopy()
cheap.
See Also:
Serialized Form
Nested Class Summary
* ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging `org.apache.spark.internal.Logging.SparkShellLoggingFilter`
Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type Method and Description DistributedLDAModel copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params. void deleteCheckpointFiles() Remove any remaining checkpoint files from training. String[] getCheckpointFiles() If using checkpointing and LDA.keepLastCheckpoint is set to true, then there may be saved checkpoint files. boolean isDistributed() Indicates whether this instance is of type DistributedLDAModel static DistributedLDAModel load(String path) double logPrior() static MLReader<DistributedLDAModel> read() LocalLDAModel toLocal() Convert this distributed model to a local representation. String toString() double trainingLogLikelihood() MLWriter write() Returns an MLWriter instance for this ML instance. * ### Methods inherited from class org.apache.spark.ml.clustering.[LDAModel](../../../../../org/apache/spark/ml/clustering/LDAModel.html "class in org.apache.spark.ml.clustering") `[checkpointInterval](../../../../../org/apache/spark/ml/clustering/LDAModel.html#checkpointInterval--), [describeTopics](../../../../../org/apache/spark/ml/clustering/LDAModel.html#describeTopics--), [describeTopics](../../../../../org/apache/spark/ml/clustering/LDAModel.html#describeTopics-int-), [docConcentration](../../../../../org/apache/spark/ml/clustering/LDAModel.html#docConcentration--), [estimatedDocConcentration](../../../../../org/apache/spark/ml/clustering/LDAModel.html#estimatedDocConcentration--), [featuresCol](../../../../../org/apache/spark/ml/clustering/LDAModel.html#featuresCol--), [k](../../../../../org/apache/spark/ml/clustering/LDAModel.html#k--), [keepLastCheckpoint](../../../../../org/apache/spark/ml/clustering/LDAModel.html#keepLastCheckpoint--), [learningDecay](../../../../../org/apache/spark/ml/clustering/LDAModel.html#learningDecay--), [learningOffset](../../../../../org/apache/spark/ml/clustering/LDAModel.html#learningOffset--), [logLikelihood](../../../../../org/apache/spark/ml/clustering/LDAModel.html#logLikelihood-org.apache.spark.sql.Dataset-), [logPerplexity](../../../../../org/apache/spark/ml/clustering/LDAModel.html#logPerplexity-org.apache.spark.sql.Dataset-), [maxIter](../../../../../org/apache/spark/ml/clustering/LDAModel.html#maxIter--), [optimizeDocConcentration](../../../../../org/apache/spark/ml/clustering/LDAModel.html#optimizeDocConcentration--), [optimizer](../../../../../org/apache/spark/ml/clustering/LDAModel.html#optimizer--), [seed](../../../../../org/apache/spark/ml/clustering/LDAModel.html#seed--), [setFeaturesCol](../../../../../org/apache/spark/ml/clustering/LDAModel.html#setFeaturesCol-java.lang.String-), [setSeed](../../../../../org/apache/spark/ml/clustering/LDAModel.html#setSeed-long-), [setTopicDistributionCol](../../../../../org/apache/spark/ml/clustering/LDAModel.html#setTopicDistributionCol-java.lang.String-), [subsamplingRate](../../../../../org/apache/spark/ml/clustering/LDAModel.html#subsamplingRate--), [supportedOptimizers](../../../../../org/apache/spark/ml/clustering/LDAModel.html#supportedOptimizers--), [topicConcentration](../../../../../org/apache/spark/ml/clustering/LDAModel.html#topicConcentration--), [topicDistributionCol](../../../../../org/apache/spark/ml/clustering/LDAModel.html#topicDistributionCol--), [topicsMatrix](../../../../../org/apache/spark/ml/clustering/LDAModel.html#topicsMatrix--), [transform](../../../../../org/apache/spark/ml/clustering/LDAModel.html#transform-org.apache.spark.sql.Dataset-), [transformSchema](../../../../../org/apache/spark/ml/clustering/LDAModel.html#transformSchema-org.apache.spark.sql.types.StructType-), [uid](../../../../../org/apache/spark/ml/clustering/LDAModel.html#uid--), [vocabSize](../../../../../org/apache/spark/ml/clustering/LDAModel.html#vocabSize--)` * ### Methods inherited from class org.apache.spark.ml.[Model](../../../../../org/apache/spark/ml/Model.html "class in org.apache.spark.ml") `[hasParent](../../../../../org/apache/spark/ml/Model.html#hasParent--), [parent](../../../../../org/apache/spark/ml/Model.html#parent--), [setParent](../../../../../org/apache/spark/ml/Model.html#setParent-org.apache.spark.ml.Estimator-)` * ### Methods inherited from class org.apache.spark.ml.[Transformer](../../../../../org/apache/spark/ml/Transformer.html "class in org.apache.spark.ml") `[transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamMap-), [transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-org.apache.spark.ml.param.ParamPair...-), [transform](../../../../../org/apache/spark/ml/Transformer.html#transform-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-scala.collection.Seq-)` * ### Methods inherited from class org.apache.spark.ml.[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml") `[params](../../../../../org/apache/spark/ml/PipelineStage.html#params--)` * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.ml.clustering.[LDAParams](../../../../../org/apache/spark/ml/clustering/LDAParams.html "interface in org.apache.spark.ml.clustering") `[getDocConcentration](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getDocConcentration--), [getK](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getK--), [getKeepLastCheckpoint](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getKeepLastCheckpoint--), [getLearningDecay](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getLearningDecay--), [getLearningOffset](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getLearningOffset--), [getOldDocConcentration](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getOldDocConcentration--), [getOldOptimizer](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getOldOptimizer--), [getOldTopicConcentration](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getOldTopicConcentration--), [getOptimizeDocConcentration](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getOptimizeDocConcentration--), [getOptimizer](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getOptimizer--), [getSubsamplingRate](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getSubsamplingRate--), [getTopicConcentration](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getTopicConcentration--), [getTopicDistributionCol](../../../../../org/apache/spark/ml/clustering/LDAParams.html#getTopicDistributionCol--), [validateAndTransformSchema](../../../../../org/apache/spark/ml/clustering/LDAParams.html#validateAndTransformSchema-org.apache.spark.sql.types.StructType-)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared") `[getFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#getFeaturesCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared") `[getMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#getMaxIter--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared") `[getSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#getSeed--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasCheckpointInterval](../../../../../org/apache/spark/ml/param/shared/HasCheckpointInterval.html "interface in org.apache.spark.ml.param.shared") `[getCheckpointInterval](../../../../../org/apache/spark/ml/param/shared/HasCheckpointInterval.html#getCheckpointInterval--)` * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param") `[clear](../../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [defaultParamMap](../../../../../org/apache/spark/ml/param/Params.html#defaultParamMap--), [explainParam](../../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [paramMap](../../../../../org/apache/spark/ml/param/Params.html#paramMap--), [params](../../../../../org/apache/spark/ml/param/Params.html#params--), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)` * ### Methods inherited from interface org.apache.spark.internal.Logging `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize` * ### Methods inherited from interface org.apache.spark.ml.util.[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html "interface in org.apache.spark.ml.util") `[save](../../../../../org/apache/spark/ml/util/MLWritable.html#save-java.lang.String-)`
Method Detail
* #### read public static [MLReader](../../../../../org/apache/spark/ml/util/MLReader.html "class in org.apache.spark.ml.util")<[DistributedLDAModel](../../../../../org/apache/spark/ml/clustering/DistributedLDAModel.html "class in org.apache.spark.ml.clustering")> read() * #### load public static [DistributedLDAModel](../../../../../org/apache/spark/ml/clustering/DistributedLDAModel.html "class in org.apache.spark.ml.clustering") load(String path) * #### toLocal public [LocalLDAModel](../../../../../org/apache/spark/ml/clustering/LocalLDAModel.html "class in org.apache.spark.ml.clustering") toLocal() Convert this distributed model to a local representation. This discards info about the training dataset. WARNING: This involves collecting a large `topicsMatrix` to the driver. Returns: (undocumented) * #### copy public [DistributedLDAModel](../../../../../org/apache/spark/ml/clustering/DistributedLDAModel.html "class in org.apache.spark.ml.clustering") copy([ParamMap](../../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra) Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`. Specified by: `[copy](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Specified by: `[copy](../../../../../org/apache/spark/ml/Model.html#copy-org.apache.spark.ml.param.ParamMap-)` in class `[Model](../../../../../org/apache/spark/ml/Model.html "class in org.apache.spark.ml")<[LDAModel](../../../../../org/apache/spark/ml/clustering/LDAModel.html "class in org.apache.spark.ml.clustering")>` Parameters: `extra` \- (undocumented) Returns: (undocumented) * #### isDistributed public boolean isDistributed() Description copied from class: `[LDAModel](../../../../../org/apache/spark/ml/clustering/LDAModel.html#isDistributed--)` Specified by: `[isDistributed](../../../../../org/apache/spark/ml/clustering/LDAModel.html#isDistributed--)` in class `[LDAModel](../../../../../org/apache/spark/ml/clustering/LDAModel.html "class in org.apache.spark.ml.clustering")` * #### trainingLogLikelihood public double trainingLogLikelihood() * #### logPrior public double logPrior() * #### getCheckpointFiles public String[] getCheckpointFiles() If using checkpointing and `LDA.keepLastCheckpoint` is set to true, then there may be saved checkpoint files. This method is provided so that users can manage those files. Note that removing the checkpoints can cause failures if a partition is lost and is needed by certain [DistributedLDAModel](../../../../../org/apache/spark/ml/clustering/DistributedLDAModel.html "class in org.apache.spark.ml.clustering") methods. Reference counting will clean up the checkpoints when this model and derivative data go out of scope. Returns: Checkpoint files from training * #### deleteCheckpointFiles public void deleteCheckpointFiles() Remove any remaining checkpoint files from training. See Also: `getCheckpointFiles` * #### write public [MLWriter](../../../../../org/apache/spark/ml/util/MLWriter.html "class in org.apache.spark.ml.util") write() Description copied from interface: `[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html#write--)` Returns an `MLWriter` instance for this ML instance. Returns: (undocumented) * #### toString public String toString() Specified by: `[toString](../../../../../org/apache/spark/ml/util/Identifiable.html#toString--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")` Overrides: `toString` in class `Object`