KMeans (Spark 3.5.5 JavaDoc) (original) (raw)

Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Estimator<KMeansModel>
    - - org.apache.spark.ml.clustering.KMeans
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging, KMeansParams, Params, HasDistanceMeasure, HasFeaturesCol, HasMaxBlockSizeInMB, HasMaxIter, HasPredictionCol, HasSeed, HasSolver, HasTol, HasWeightCol, DefaultParamsWritable, Identifiable, MLWritable

public class KMeans
extends Estimator<KMeansModel>
implements KMeansParams, DefaultParamsWritable
K-means clustering with support for k-means|| initialization proposed by Bahmani et al.
See Also:
Bahmani et al., Scalable k-means++., Serialized Form

Nested Class Summary

 * ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging  
 `org.apache.spark.internal.Logging.SparkShellLoggingFilter`

Constructor Summary

Constructors

Constructor and Description
KMeans()
KMeans(String uid)

Method Summary

All Methods Static Methods Instance Methods Concrete Methods

Modifier and Type	Method and Description
KMeans	copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params.
Param	distanceMeasure() Param for The distance measure.
Param	featuresCol() Param for features column name.
KMeansModel	fit(Dataset<?> dataset) Fits a model to the input data.
Param	initMode() Param for the initialization algorithm.
IntParam	initSteps() Param for the number of steps for the k-means\|
IntParam	k() The number of clusters to create (k).
static KMeans	load(String path)
DoubleParam	maxBlockSizeInMB() Param for Maximum memory in MB for stacking input data into blocks.
IntParam	maxIter() Param for maximum number of iterations (>= 0).
Param	predictionCol() Param for prediction column name.
static MLReader	read()
LongParam	seed() Param for random seed.
KMeans	setDistanceMeasure(String value)
KMeans	setFeaturesCol(String value)
KMeans	setInitMode(String value)
KMeans	setInitSteps(int value)
KMeans	setK(int value)
KMeans	setMaxBlockSizeInMB(double value) Sets the value of param maxBlockSizeInMB.
KMeans	setMaxIter(int value)
KMeans	setPredictionCol(String value)
KMeans	setSeed(long value)
KMeans	setSolver(String value) Sets the value of param solver.
KMeans	setTol(double value)
KMeans	setWeightCol(String value) Sets the value of param weightCol.
Param	solver() Param for the name of optimization method used in KMeans.
DoubleParam	tol() Param for the convergence tolerance for iterative algorithms (>= 0).
StructType	transformSchema(StructType schema) Check transform validity and derive the output schema from the input schema.
String	uid() An immutable unique ID for the object and its derivatives.
Param	weightCol() Param for weight column name.

   * ### Methods inherited from class org.apache.spark.ml.[Estimator](../../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml")  
   `[fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamMap-), [fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-org.apache.spark.ml.param.ParamPair...-), [fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-scala.collection.Seq-), [fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-scala.collection.Seq-)`  
   * ### Methods inherited from class org.apache.spark.ml.[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml")  
   `[params](../../../../../org/apache/spark/ml/PipelineStage.html#params--)`  
   * ### Methods inherited from class Object  
   `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`  
   * ### Methods inherited from interface org.apache.spark.ml.clustering.[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")  
   `[getInitMode](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#getInitMode--), [getInitSteps](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#getInitSteps--), [getK](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#getK--), [validateAndTransformSchema](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#validateAndTransformSchema-org.apache.spark.sql.types.StructType-)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared")  
   `[getMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#getMaxIter--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared")  
   `[getFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#getFeaturesCol--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared")  
   `[getSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#getSeed--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html "interface in org.apache.spark.ml.param.shared")  
   `[getPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html#getPredictionCol--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html "interface in org.apache.spark.ml.param.shared")  
   `[getTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#getTol--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasDistanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html "interface in org.apache.spark.ml.param.shared")  
   `[getDistanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html#getDistanceMeasure--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared")  
   `[getWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#getWeightCol--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasSolver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html "interface in org.apache.spark.ml.param.shared")  
   `[getSolver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html#getSolver--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html "interface in org.apache.spark.ml.param.shared")  
   `[getMaxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html#getMaxBlockSizeInMB--)`  
   * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")  
   `[clear](../../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [defaultParamMap](../../../../../org/apache/spark/ml/param/Params.html#defaultParamMap--), [explainParam](../../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [paramMap](../../../../../org/apache/spark/ml/param/Params.html#paramMap--), [params](../../../../../org/apache/spark/ml/param/Params.html#params--), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)`  
   * ### Methods inherited from interface org.apache.spark.ml.util.[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")  
   `[toString](../../../../../org/apache/spark/ml/util/Identifiable.html#toString--)`  
   * ### Methods inherited from interface org.apache.spark.ml.util.[DefaultParamsWritable](../../../../../org/apache/spark/ml/util/DefaultParamsWritable.html "interface in org.apache.spark.ml.util")  
   `[write](../../../../../org/apache/spark/ml/util/DefaultParamsWritable.html#write--)`  
   * ### Methods inherited from interface org.apache.spark.ml.util.[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html "interface in org.apache.spark.ml.util")  
   `[save](../../../../../org/apache/spark/ml/util/MLWritable.html#save-java.lang.String-)`  
   * ### Methods inherited from interface org.apache.spark.internal.Logging  
   `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize`

Constructor Detail

 * #### KMeans  
 public KMeans(String uid)  
 * #### KMeans  
 public KMeans()

Method Detail

* #### load  
public static [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") load(String path)  
* #### read  
public static [MLReader](../../../../../org/apache/spark/ml/util/MLReader.html "class in org.apache.spark.ml.util")<T> read()  
* #### k  
public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") k()  
The number of clusters to create (k). Must be &gt; 1\. Note that it is possible for fewer than k clusters to be returned, for example, if there are fewer than k distinct points to cluster. Default: 2.  
Specified by:  
`[k](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#k--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")`  
Returns:  
(undocumented)  
* #### initMode  
public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> initMode()  
Param for the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.  
Specified by:  
`[initMode](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#initMode--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")`  
Returns:  
(undocumented)  
* #### initSteps  
public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") initSteps()  
Param for the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 2 is almost always enough. Must be &gt; 0\. Default: 2.  
Specified by:  
`[initSteps](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#initSteps--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")`  
Returns:  
(undocumented)  
* #### solver  
public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> solver()  
Param for the name of optimization method used in KMeans. Supported options: - "auto": Automatically select the solver based on the input schema and sparsity: If input instances are arrays or input vectors are dense, set to "block". Else, set to "row". - "row": input instances are processed row by row, and triangle-inequality is applied to accelerate the training. - "block": input instances are stacked to blocks, and GEMM is applied to compute the distances. Default is "auto".  
Specified by:  
`[solver](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#solver--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")`  
Specified by:  
`[solver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html#solver--)` in interface `[HasSolver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### maxBlockSizeInMB  
public final [DoubleParam](../../../../../org/apache/spark/ml/param/DoubleParam.html "class in org.apache.spark.ml.param") maxBlockSizeInMB()  
Param for Maximum memory in MB for stacking input data into blocks. Data is stacked within partitions. If more than remaining data size in a partition then it is adjusted to the data size. Default 0.0 represents choosing optimal value, depends on specific algorithm. Must be &gt;= 0..  
Specified by:  
`[maxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html#maxBlockSizeInMB--)` in interface `[HasMaxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### weightCol  
public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> weightCol()  
Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.  
Specified by:  
`[weightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#weightCol--)` in interface `[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### distanceMeasure  
public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> distanceMeasure()  
Param for The distance measure. Supported options: 'euclidean' and 'cosine'.  
Specified by:  
`[distanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html#distanceMeasure--)` in interface `[HasDistanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### tol  
public final [DoubleParam](../../../../../org/apache/spark/ml/param/DoubleParam.html "class in org.apache.spark.ml.param") tol()  
Description copied from interface: `[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#tol--)`  
Param for the convergence tolerance for iterative algorithms (&gt;= 0).  
Specified by:  
`[tol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#tol--)` in interface `[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### predictionCol  
public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> predictionCol()  
Param for prediction column name.  
Specified by:  
`[predictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html#predictionCol--)` in interface `[HasPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### seed  
public final [LongParam](../../../../../org/apache/spark/ml/param/LongParam.html "class in org.apache.spark.ml.param") seed()  
Description copied from interface: `[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#seed--)`  
Param for random seed.  
Specified by:  
`[seed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#seed--)` in interface `[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### featuresCol  
public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> featuresCol()  
Param for features column name.  
Specified by:  
`[featuresCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#featuresCol--)` in interface `[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### maxIter  
public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") maxIter()  
Description copied from interface: `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)`  
Param for maximum number of iterations (&gt;= 0).  
Specified by:  
`[maxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` in interface `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared")`  
Returns:  
(undocumented)  
* #### uid  
public String uid()  
An immutable unique ID for the object and its derivatives.  
Specified by:  
`[uid](../../../../../org/apache/spark/ml/util/Identifiable.html#uid--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")`  
Returns:  
(undocumented)  
* #### copy  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") copy([ParamMap](../../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra)  
Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)`  
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`.  
Specified by:  
`[copy](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")`  
Specified by:  
`[copy](../../../../../org/apache/spark/ml/Estimator.html#copy-org.apache.spark.ml.param.ParamMap-)` in class `[Estimator](../../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml")<[KMeansModel](../../../../../org/apache/spark/ml/clustering/KMeansModel.html "class in org.apache.spark.ml.clustering")>`  
Parameters:  
`extra` \- (undocumented)  
Returns:  
(undocumented)  
* #### setFeaturesCol  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setFeaturesCol(String value)  
* #### setPredictionCol  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setPredictionCol(String value)  
* #### setK  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setK(int value)  
* #### setInitMode  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setInitMode(String value)  
* #### setDistanceMeasure  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setDistanceMeasure(String value)  
* #### setInitSteps  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setInitSteps(int value)  
* #### setMaxIter  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setMaxIter(int value)  
* #### setTol  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setTol(double value)  
* #### setSeed  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setSeed(long value)  
* #### setWeightCol  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setWeightCol(String value)  
Sets the value of param `weightCol`. If this is not set or empty, we treat all instance weights as 1.0\. Default is not set, so all instances have weight one.  
Parameters:  
`value` \- (undocumented)  
Returns:  
(undocumented)  
* #### setSolver  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setSolver(String value)  
Sets the value of param `solver`. Default is "auto".  
Parameters:  
`value` \- (undocumented)  
Returns:  
(undocumented)  
* #### setMaxBlockSizeInMB  
public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setMaxBlockSizeInMB(double value)  
Sets the value of param `maxBlockSizeInMB`. Default is 0.0, then 1.0 MB will be chosen.  
Parameters:  
`value` \- (undocumented)  
Returns:  
(undocumented)  
* #### fit  
public [KMeansModel](../../../../../org/apache/spark/ml/clustering/KMeansModel.html "class in org.apache.spark.ml.clustering") fit([Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset)  
Description copied from class: `[Estimator](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-)`  
Fits a model to the input data.  
Specified by:  
`[fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-)` in class `[Estimator](../../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml")<[KMeansModel](../../../../../org/apache/spark/ml/clustering/KMeansModel.html "class in org.apache.spark.ml.clustering")>`  
Parameters:  
`dataset` \- (undocumented)  
Returns:  
(undocumented)  
* #### transformSchema  
public [StructType](../../../../../org/apache/spark/sql/types/StructType.html "class in org.apache.spark.sql.types") transformSchema([StructType](../../../../../org/apache/spark/sql/types/StructType.html "class in org.apache.spark.sql.types") schema)  
Check transform validity and derive the output schema from the input schema.  
 We check validity for interactions between parameters during `transformSchema` and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by `Param.validate()`.  
 Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.  
Specified by:  
`[transformSchema](../../../../../org/apache/spark/ml/PipelineStage.html#transformSchema-org.apache.spark.sql.types.StructType-)` in class `[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml")`  
Parameters:  
`schema` \- (undocumented)  
Returns:  
(undocumented)

KMeans (Spark 3.5.5 JavaDoc) (original) (raw)

Nested Class Summary

Constructor Summary

Method Summary

Constructor Detail

Method Detail