KMeans (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.ml.PipelineStage
- org.apache.spark.ml.Estimator<KMeansModel>
- org.apache.spark.ml.clustering.KMeans
- org.apache.spark.ml.Estimator<KMeansModel>
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging, KMeansParams, Params, HasDistanceMeasure, HasFeaturesCol, HasMaxBlockSizeInMB, HasMaxIter, HasPredictionCol, HasSeed, HasSolver, HasTol, HasWeightCol, DefaultParamsWritable, Identifiable, MLWritable
public class KMeans
extends Estimator<KMeansModel>
implements KMeansParams, DefaultParamsWritable
K-means clustering with support for k-means|| initialization proposed by Bahmani et al.
See Also:
Bahmani et al., Scalable k-means++., Serialized Form
Nested Class Summary
* ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging `org.apache.spark.internal.Logging.SparkShellLoggingFilter`
Constructor Summary
Constructors
Constructor and Description KMeans() KMeans(String uid) Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type Method and Description KMeans copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params. Param distanceMeasure() Param for The distance measure. Param featuresCol() Param for features column name. KMeansModel fit(Dataset<?> dataset) Fits a model to the input data. Param initMode() Param for the initialization algorithm. IntParam initSteps() Param for the number of steps for the k-means| IntParam k() The number of clusters to create (k). static KMeans load(String path) DoubleParam maxBlockSizeInMB() Param for Maximum memory in MB for stacking input data into blocks. IntParam maxIter() Param for maximum number of iterations (>= 0). Param predictionCol() Param for prediction column name. static MLReader read() LongParam seed() Param for random seed. KMeans setDistanceMeasure(String value) KMeans setFeaturesCol(String value) KMeans setInitMode(String value) KMeans setInitSteps(int value) KMeans setK(int value) KMeans setMaxBlockSizeInMB(double value) Sets the value of param maxBlockSizeInMB. KMeans setMaxIter(int value) KMeans setPredictionCol(String value) KMeans setSeed(long value) KMeans setSolver(String value) Sets the value of param solver. KMeans setTol(double value) KMeans setWeightCol(String value) Sets the value of param weightCol. Param solver() Param for the name of optimization method used in KMeans. DoubleParam tol() Param for the convergence tolerance for iterative algorithms (>= 0). StructType transformSchema(StructType schema) Check transform validity and derive the output schema from the input schema. String uid() An immutable unique ID for the object and its derivatives. Param weightCol() Param for weight column name. * ### Methods inherited from class org.apache.spark.ml.[Estimator](../../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml") `[fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamMap-), [fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-org.apache.spark.ml.param.ParamPair...-), [fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-org.apache.spark.ml.param.ParamPair-scala.collection.Seq-), [fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-scala.collection.Seq-)` * ### Methods inherited from class org.apache.spark.ml.[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml") `[params](../../../../../org/apache/spark/ml/PipelineStage.html#params--)` * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.ml.clustering.[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering") `[getInitMode](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#getInitMode--), [getInitSteps](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#getInitSteps--), [getK](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#getK--), [validateAndTransformSchema](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#validateAndTransformSchema-org.apache.spark.sql.types.StructType-)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared") `[getMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#getMaxIter--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared") `[getFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#getFeaturesCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared") `[getSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#getSeed--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html "interface in org.apache.spark.ml.param.shared") `[getPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html#getPredictionCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html "interface in org.apache.spark.ml.param.shared") `[getTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#getTol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasDistanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html "interface in org.apache.spark.ml.param.shared") `[getDistanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html#getDistanceMeasure--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared") `[getWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#getWeightCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasSolver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html "interface in org.apache.spark.ml.param.shared") `[getSolver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html#getSolver--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html "interface in org.apache.spark.ml.param.shared") `[getMaxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html#getMaxBlockSizeInMB--)` * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param") `[clear](../../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [defaultParamMap](../../../../../org/apache/spark/ml/param/Params.html#defaultParamMap--), [explainParam](../../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [paramMap](../../../../../org/apache/spark/ml/param/Params.html#paramMap--), [params](../../../../../org/apache/spark/ml/param/Params.html#params--), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)` * ### Methods inherited from interface org.apache.spark.ml.util.[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util") `[toString](../../../../../org/apache/spark/ml/util/Identifiable.html#toString--)` * ### Methods inherited from interface org.apache.spark.ml.util.[DefaultParamsWritable](../../../../../org/apache/spark/ml/util/DefaultParamsWritable.html "interface in org.apache.spark.ml.util") `[write](../../../../../org/apache/spark/ml/util/DefaultParamsWritable.html#write--)` * ### Methods inherited from interface org.apache.spark.ml.util.[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html "interface in org.apache.spark.ml.util") `[save](../../../../../org/apache/spark/ml/util/MLWritable.html#save-java.lang.String-)` * ### Methods inherited from interface org.apache.spark.internal.Logging `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize`
Constructor Detail
* #### KMeans public KMeans(String uid) * #### KMeans public KMeans()
Method Detail
* #### load public static [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") load(String path) * #### read public static [MLReader](../../../../../org/apache/spark/ml/util/MLReader.html "class in org.apache.spark.ml.util")<T> read() * #### k public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") k() The number of clusters to create (k). Must be > 1\. Note that it is possible for fewer than k clusters to be returned, for example, if there are fewer than k distinct points to cluster. Default: 2. Specified by: `[k](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#k--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### initMode public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> initMode() Param for the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||. Specified by: `[initMode](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#initMode--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### initSteps public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") initSteps() Param for the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 2 is almost always enough. Must be > 0\. Default: 2. Specified by: `[initSteps](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#initSteps--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### solver public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> solver() Param for the name of optimization method used in KMeans. Supported options: - "auto": Automatically select the solver based on the input schema and sparsity: If input instances are arrays or input vectors are dense, set to "block". Else, set to "row". - "row": input instances are processed row by row, and triangle-inequality is applied to accelerate the training. - "block": input instances are stacked to blocks, and GEMM is applied to compute the distances. Default is "auto". Specified by: `[solver](../../../../../org/apache/spark/ml/clustering/KMeansParams.html#solver--)` in interface `[KMeansParams](../../../../../org/apache/spark/ml/clustering/KMeansParams.html "interface in org.apache.spark.ml.clustering")` Specified by: `[solver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html#solver--)` in interface `[HasSolver](../../../../../org/apache/spark/ml/param/shared/HasSolver.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### maxBlockSizeInMB public final [DoubleParam](../../../../../org/apache/spark/ml/param/DoubleParam.html "class in org.apache.spark.ml.param") maxBlockSizeInMB() Param for Maximum memory in MB for stacking input data into blocks. Data is stacked within partitions. If more than remaining data size in a partition then it is adjusted to the data size. Default 0.0 represents choosing optimal value, depends on specific algorithm. Must be >= 0.. Specified by: `[maxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html#maxBlockSizeInMB--)` in interface `[HasMaxBlockSizeInMB](../../../../../org/apache/spark/ml/param/shared/HasMaxBlockSizeInMB.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### weightCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> weightCol() Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0. Specified by: `[weightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#weightCol--)` in interface `[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### distanceMeasure public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> distanceMeasure() Param for The distance measure. Supported options: 'euclidean' and 'cosine'. Specified by: `[distanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html#distanceMeasure--)` in interface `[HasDistanceMeasure](../../../../../org/apache/spark/ml/param/shared/HasDistanceMeasure.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### tol public final [DoubleParam](../../../../../org/apache/spark/ml/param/DoubleParam.html "class in org.apache.spark.ml.param") tol() Description copied from interface: `[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#tol--)` Param for the convergence tolerance for iterative algorithms (>= 0). Specified by: `[tol](../../../../../org/apache/spark/ml/param/shared/HasTol.html#tol--)` in interface `[HasTol](../../../../../org/apache/spark/ml/param/shared/HasTol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### predictionCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> predictionCol() Param for prediction column name. Specified by: `[predictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html#predictionCol--)` in interface `[HasPredictionCol](../../../../../org/apache/spark/ml/param/shared/HasPredictionCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### seed public final [LongParam](../../../../../org/apache/spark/ml/param/LongParam.html "class in org.apache.spark.ml.param") seed() Description copied from interface: `[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#seed--)` Param for random seed. Specified by: `[seed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html#seed--)` in interface `[HasSeed](../../../../../org/apache/spark/ml/param/shared/HasSeed.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### featuresCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> featuresCol() Param for features column name. Specified by: `[featuresCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html#featuresCol--)` in interface `[HasFeaturesCol](../../../../../org/apache/spark/ml/param/shared/HasFeaturesCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### maxIter public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") maxIter() Description copied from interface: `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` Param for maximum number of iterations (>= 0). Specified by: `[maxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` in interface `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### uid public String uid() An immutable unique ID for the object and its derivatives. Specified by: `[uid](../../../../../org/apache/spark/ml/util/Identifiable.html#uid--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")` Returns: (undocumented) * #### copy public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") copy([ParamMap](../../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra) Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`. Specified by: `[copy](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Specified by: `[copy](../../../../../org/apache/spark/ml/Estimator.html#copy-org.apache.spark.ml.param.ParamMap-)` in class `[Estimator](../../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml")<[KMeansModel](../../../../../org/apache/spark/ml/clustering/KMeansModel.html "class in org.apache.spark.ml.clustering")>` Parameters: `extra` \- (undocumented) Returns: (undocumented) * #### setFeaturesCol public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setFeaturesCol(String value) * #### setPredictionCol public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setPredictionCol(String value) * #### setK public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setK(int value) * #### setInitMode public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setInitMode(String value) * #### setDistanceMeasure public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setDistanceMeasure(String value) * #### setInitSteps public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setInitSteps(int value) * #### setMaxIter public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setMaxIter(int value) * #### setTol public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setTol(double value) * #### setSeed public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setSeed(long value) * #### setWeightCol public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setWeightCol(String value) Sets the value of param `weightCol`. If this is not set or empty, we treat all instance weights as 1.0\. Default is not set, so all instances have weight one. Parameters: `value` \- (undocumented) Returns: (undocumented) * #### setSolver public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setSolver(String value) Sets the value of param `solver`. Default is "auto". Parameters: `value` \- (undocumented) Returns: (undocumented) * #### setMaxBlockSizeInMB public [KMeans](../../../../../org/apache/spark/ml/clustering/KMeans.html "class in org.apache.spark.ml.clustering") setMaxBlockSizeInMB(double value) Sets the value of param `maxBlockSizeInMB`. Default is 0.0, then 1.0 MB will be chosen. Parameters: `value` \- (undocumented) Returns: (undocumented) * #### fit public [KMeansModel](../../../../../org/apache/spark/ml/clustering/KMeansModel.html "class in org.apache.spark.ml.clustering") fit([Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset) Description copied from class: `[Estimator](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-)` Fits a model to the input data. Specified by: `[fit](../../../../../org/apache/spark/ml/Estimator.html#fit-org.apache.spark.sql.Dataset-)` in class `[Estimator](../../../../../org/apache/spark/ml/Estimator.html "class in org.apache.spark.ml")<[KMeansModel](../../../../../org/apache/spark/ml/clustering/KMeansModel.html "class in org.apache.spark.ml.clustering")>` Parameters: `dataset` \- (undocumented) Returns: (undocumented) * #### transformSchema public [StructType](../../../../../org/apache/spark/sql/types/StructType.html "class in org.apache.spark.sql.types") transformSchema([StructType](../../../../../org/apache/spark/sql/types/StructType.html "class in org.apache.spark.sql.types") schema) Check transform validity and derive the output schema from the input schema. We check validity for interactions between parameters during `transformSchema` and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by `Param.validate()`. Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks. Specified by: `[transformSchema](../../../../../org/apache/spark/ml/PipelineStage.html#transformSchema-org.apache.spark.sql.types.StructType-)` in class `[PipelineStage](../../../../../org/apache/spark/ml/PipelineStage.html "class in org.apache.spark.ml")` Parameters: `schema` \- (undocumented) Returns: (undocumented)