PrefixSpan (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.ml.fpm.PrefixSpan
All Implemented Interfaces:
java.io.Serializable, Params, Identifiable
public final class PrefixSpan
extends Object
implements Params
A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth (see here). This class is not yet an Estimator/Transformer, use findFrequentSequentialPatterns
method to run the PrefixSpan algorithm.
See Also:
Sequential Pattern Mining (Wikipedia), Serialized Form
Constructor Summary
Constructors
Constructor and Description PrefixSpan() PrefixSpan(String uid) Method Summary
All Methods Instance Methods Concrete Methods
Modifier and Type Method and Description PrefixSpan copy(ParamMap extra) Creates a copy of this instance with the same UID and some extra params. Dataset<Row> findFrequentSequentialPatterns(Dataset<?> dataset) Finds the complete set of frequent sequential patterns in the input sequences of itemsets. long getMaxLocalProjDBSize() int getMaxPatternLength() double getMinSupport() String getSequenceCol() LongParam maxLocalProjDBSize() Param for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: 32000000). IntParam maxPatternLength() Param for the maximal pattern length (default: 10). DoubleParam minSupport() Param for the minimal support level (default: 0.1). Param<?>[] params() Returns all params sorted by their names. Param sequenceCol() Param for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored. PrefixSpan setMaxLocalProjDBSize(long value) PrefixSpan setMaxPatternLength(int value) PrefixSpan setMinSupport(double value) PrefixSpan setSequenceCol(String value) String uid() An immutable unique ID for the object and its derivatives. * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.ml.param.[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param") `[clear](../../../../../org/apache/spark/ml/param/Params.html#clear-org.apache.spark.ml.param.Param-), [copyValues](../../../../../org/apache/spark/ml/param/Params.html#copyValues-T-org.apache.spark.ml.param.ParamMap-), [defaultCopy](../../../../../org/apache/spark/ml/param/Params.html#defaultCopy-org.apache.spark.ml.param.ParamMap-), [explainParam](../../../../../org/apache/spark/ml/param/Params.html#explainParam-org.apache.spark.ml.param.Param-), [explainParams](../../../../../org/apache/spark/ml/param/Params.html#explainParams--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap--), [extractParamMap](../../../../../org/apache/spark/ml/param/Params.html#extractParamMap-org.apache.spark.ml.param.ParamMap-), [get](../../../../../org/apache/spark/ml/param/Params.html#get-org.apache.spark.ml.param.Param-), [getDefault](../../../../../org/apache/spark/ml/param/Params.html#getDefault-org.apache.spark.ml.param.Param-), [getOrDefault](../../../../../org/apache/spark/ml/param/Params.html#getOrDefault-org.apache.spark.ml.param.Param-), [getParam](../../../../../org/apache/spark/ml/param/Params.html#getParam-java.lang.String-), [hasDefault](../../../../../org/apache/spark/ml/param/Params.html#hasDefault-org.apache.spark.ml.param.Param-), [hasParam](../../../../../org/apache/spark/ml/param/Params.html#hasParam-java.lang.String-), [isDefined](../../../../../org/apache/spark/ml/param/Params.html#isDefined-org.apache.spark.ml.param.Param-), [isSet](../../../../../org/apache/spark/ml/param/Params.html#isSet-org.apache.spark.ml.param.Param-), [onParamChange](../../../../../org/apache/spark/ml/param/Params.html#onParamChange-org.apache.spark.ml.param.Param-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.Param-T-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-org.apache.spark.ml.param.ParamPair-), [set](../../../../../org/apache/spark/ml/param/Params.html#set-java.lang.String-java.lang.Object-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-org.apache.spark.ml.param.Param-T-), [setDefault](../../../../../org/apache/spark/ml/param/Params.html#setDefault-scala.collection.Seq-), [shouldOwn](../../../../../org/apache/spark/ml/param/Params.html#shouldOwn-org.apache.spark.ml.param.Param-)` * ### Methods inherited from interface org.apache.spark.ml.util.[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util") `[toString](../../../../../org/apache/spark/ml/util/Identifiable.html#toString--)`
Constructor Detail
* #### PrefixSpan public PrefixSpan(String uid) * #### PrefixSpan public PrefixSpan()
Method Detail
* #### copy public [PrefixSpan](../../../../../org/apache/spark/ml/fpm/PrefixSpan.html "class in org.apache.spark.ml.fpm") copy([ParamMap](../../../../../org/apache/spark/ml/param/ParamMap.html "class in org.apache.spark.ml.param") extra) Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See `defaultCopy()`. Specified by: `[copy](../../../../../org/apache/spark/ml/param/Params.html#copy-org.apache.spark.ml.param.ParamMap-)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Parameters: `extra` \- (undocumented) Returns: (undocumented) * #### findFrequentSequentialPatterns public [Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<[Row](../../../../../org/apache/spark/sql/Row.html "interface in org.apache.spark.sql")> findFrequentSequentialPatterns([Dataset](../../../../../org/apache/spark/sql/Dataset.html "class in org.apache.spark.sql")<?> dataset) Finds the complete set of frequent sequential patterns in the input sequences of itemsets. Parameters: `dataset` \- A dataset or a dataframe containing a sequence column which is ``` ArrayType(ArrayType(T)) ``` type, T is the item type for the input dataset. @return A \`DataFrame\` that contains columns of sequence and corresponding frequency. The schema of it will be: - \`sequence: ArrayType(ArrayType(T))\` (T is the item type) - \`freq: Long\` Returns: (undocumented) * #### getMaxLocalProjDBSize public long getMaxLocalProjDBSize() * #### getMaxPatternLength public int getMaxPatternLength() * #### getMinSupport public double getMinSupport() * #### getSequenceCol public String getSequenceCol() * #### maxLocalProjDBSize public [LongParam](../../../../../org/apache/spark/ml/param/LongParam.html "class in org.apache.spark.ml.param") maxLocalProjDBSize() Param for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: `32000000`). If a projected database exceeds this size, another iteration of distributed prefix growth is run. Returns: (undocumented) * #### maxPatternLength public [IntParam](../../../../../org/apache/spark/ml/param/IntParam.html "class in org.apache.spark.ml.param") maxPatternLength() Param for the maximal pattern length (default: `10`). Returns: (undocumented) * #### minSupport public [DoubleParam](../../../../../org/apache/spark/ml/param/DoubleParam.html "class in org.apache.spark.ml.param") minSupport() Param for the minimal support level (default: `0.1`). Sequential patterns that appear more than (minSupport \* size-of-the-dataset) times are identified as frequent sequential patterns. Returns: (undocumented) * #### params public [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<?>[] params() Description copied from interface: `[Params](../../../../../org/apache/spark/ml/param/Params.html#params--)` Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and return [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param"). Specified by: `[params](../../../../../org/apache/spark/ml/param/Params.html#params--)` in interface `[Params](../../../../../org/apache/spark/ml/param/Params.html "interface in org.apache.spark.ml.param")` Returns: (undocumented) * #### sequenceCol public [Param](../../../../../org/apache/spark/ml/param/Param.html "class in org.apache.spark.ml.param")<String> sequenceCol() Param for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored. Returns: (undocumented) * #### setMaxLocalProjDBSize public [PrefixSpan](../../../../../org/apache/spark/ml/fpm/PrefixSpan.html "class in org.apache.spark.ml.fpm") setMaxLocalProjDBSize(long value) * #### setMaxPatternLength public [PrefixSpan](../../../../../org/apache/spark/ml/fpm/PrefixSpan.html "class in org.apache.spark.ml.fpm") setMaxPatternLength(int value) * #### setMinSupport public [PrefixSpan](../../../../../org/apache/spark/ml/fpm/PrefixSpan.html "class in org.apache.spark.ml.fpm") setMinSupport(double value) * #### setSequenceCol public [PrefixSpan](../../../../../org/apache/spark/ml/fpm/PrefixSpan.html "class in org.apache.spark.ml.fpm") setSequenceCol(String value) * #### uid public String uid() An immutable unique ID for the object and its derivatives. Specified by: `[uid](../../../../../org/apache/spark/ml/util/Identifiable.html#uid--)` in interface `[Identifiable](../../../../../org/apache/spark/ml/util/Identifiable.html "interface in org.apache.spark.ml.util")` Returns: (undocumented)