PowerIterationClustering (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.ml.clustering.PowerIterationClustering
All Implemented Interfaces:
java.io.Serializable, PowerIterationClusteringParams, Params, HasMaxIter, HasWeightCol, DefaultParamsWritable, Identifiable, MLWritable
public class PowerIterationClustering
extends Object
implements PowerIterationClusteringParams, DefaultParamsWritable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed byLin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.
This class is not yet an Estimator/Transformer, use assignClusters
method to run the PowerIterationClustering algorithm.
See Also:
Spectral clustering (Wikipedia), Serialized Form
Constructor Summary
Constructors
Constructor and Description PowerIterationClustering() Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type Method and Description Dataset<Row> assignClusters(Dataset<?> dataset) Run the PIC algorithm and returns a cluster assignment for each input vertex. PowerIterationClustering copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params. Param dstCol() Name of the input column for destination vertex IDs. Param initMode() Param for the initialization algorithm. IntParam k() The number of clusters to create (k). static PowerIterationClustering load(String path) IntParam maxIter() Param for maximum number of iterations (>= 0). Param<?>[] params() Returns all params sorted by their names. static MLReader read() PowerIterationClustering setDstCol(String value) PowerIterationClustering setInitMode(String value) PowerIterationClustering setK(int value) PowerIterationClustering setMaxIter(int value) PowerIterationClustering setSrcCol(String value) PowerIterationClustering setWeightCol(String value) Param srcCol() Param for the name of the input column for source vertex IDs. String uid() An immutable unique ID for the object and its derivatives. Param weightCol() Param for weight column name. * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.ml.clustering.[PowerIterationClusteringParams](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html "interface in org.apache.spark.ml.clustering") `[getDstCol](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#getDstCol--), [getInitMode](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#getInitMode--), [getK](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#getK--), [getSrcCol](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#getSrcCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared") `[getMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#getMaxIter--)` * ### Methods inherited from interface org.apache.spark.ml.param.shared.[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared") `[getWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#getWeightCol--)` * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param") `[clear](../../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [explainParam](../../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)` * ### Methods inherited from interface org.apache.spark.ml.util.[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util") `[toString](../../../../../org/apache/spark/ml/util/Identifiable.html#toString--)` * ### Methods inherited from interface org.apache.spark.ml.util.[DefaultParamsWritable](../../../../../org/apache/spark/ml/util/DefaultParamsWritable.html "interface in org.apache.spark.ml.util") `[write](../../../../../org/apache/spark/ml/util/DefaultParamsWritable.html#write--)` * ### Methods inherited from interface org.apache.spark.ml.util.[MLWritable](../../../../../org/apache/spark/ml/util/MLWritable.html "interface in org.apache.spark.ml.util") `[save](../../../../../org/apache/spark/ml/util/MLWritable.html#save-java.lang.String-)`
Constructor Detail
* #### PowerIterationClustering public PowerIterationClustering()
Method Detail
* #### load public static [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") load(String path) * #### read public static [MLReader](../../../../../org/apache/spark/ml/util/MLReader.html "class in org.apache.spark.ml.util")<T> read() * #### k public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") k() The number of clusters to create (k). Must be > 1\. Default: 2. Specified by: `[k](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#k--)` in interface `[PowerIterationClusteringParams](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### initMode public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> initMode() Param for the initialization algorithm. This can be either "random" to use a random vector as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. Default: random. Specified by: `[initMode](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#initMode--)` in interface `[PowerIterationClusteringParams](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### srcCol public [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> srcCol() Param for the name of the input column for source vertex IDs. Default: "src" Specified by: `[srcCol](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#srcCol--)` in interface `[PowerIterationClusteringParams](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### dstCol public [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> dstCol() Name of the input column for destination vertex IDs. Default: "dst" Specified by: `[dstCol](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html#dstCol--)` in interface `[PowerIterationClusteringParams](../../../../../org/apache/spark/ml/clustering/PowerIterationClusteringParams.html "interface in org.apache.spark.ml.clustering")` Returns: (undocumented) * #### weightCol public final [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> weightCol() Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0. Specified by: `[weightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html#weightCol--)` in interface `[HasWeightCol](../../../../../org/apache/spark/ml/param/shared/HasWeightCol.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### maxIter public final [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") maxIter() Description copied from interface: `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` Param for maximum number of iterations (>= 0). Specified by: `[maxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html#maxIter--)` in interface `[HasMaxIter](../../../../../org/apache/spark/ml/param/shared/HasMaxIter.html "interface in org.apache.spark.ml.param.shared")` Returns: (undocumented) * #### params public [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<?>[] params() Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#params--)` Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and return [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param"). Specified by: `[params](../../../../../org/apache/spark/ml/param/Params.html#params--)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Returns: (undocumented) * #### uid public String uid() An immutable unique ID for the object and its derivatives. Specified by: `[uid](../../../../../org/apache/spark/ml/util/Identifiable.html#uid--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")` Returns: (undocumented) * #### setK public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") setK(int value) * #### setInitMode public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") setInitMode(String value) * #### setMaxIter public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") setMaxIter(int value) * #### setSrcCol public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") setSrcCol(String value) * #### setDstCol public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") setDstCol(String value) * #### setWeightCol public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") setWeightCol(String value) * #### assignClusters public [Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<[Row](../../../../../org/apache/spark/sql/Row.html "interface in org.apache.spark.sql")> assignClusters([Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset) Run the PIC algorithm and returns a cluster assignment for each input vertex. Parameters: `dataset` \- A dataset with columns src, dst, weight representing the affinity matrix, which is the matrix A in the PIC paper. Suppose the src column value is i, the dst column value is j, the weight column value is similarity s,,ij,, which must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Rows with i = j are ignored, because we assume s,,ij,, = 0.0. Returns: A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: - id: Long - cluster: Int * #### copy public [PowerIterationClustering](../../../../../org/apache/spark/ml/clustering/PowerIterationClustering.html "class in org.apache.spark.ml.clustering") copy([ParamMap](../../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra) Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`. Specified by: `[copy](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Parameters: `extra` \- (undocumented) Returns: (undocumented)