PrefixSpan (Spark 3.5.5 JavaDoc) (original) (raw)
Object
- org.apache.spark.mllib.fpm.PrefixSpan
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging
public class PrefixSpan
extends Object
implements org.apache.spark.internal.Logging, scala.Serializable
A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth (see here).
param: minSupport the minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: maxPatternLength the maximal length of the sequential pattern param: maxLocalProjDBSize The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run.
See Also:
Sequential Pattern Mining (Wikipedia), Serialized Form
Nested Class Summary
Nested Classes
Modifier and Type Class and Description static class PrefixSpan.FreqSequence<Item> Represents a frequent sequence. static class PrefixSpan.Postfix$ static class PrefixSpan.Prefix$ * ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging `org.apache.spark.internal.Logging.SparkShellLoggingFilter`
Constructor Summary
Constructors
Constructor and Description PrefixSpan() Constructs a default instance with default parameters {minSupport: 0.1, maxPatternLength: 10, maxLocalProjDBSize: 32000000L}. Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type Method and Description long getMaxLocalProjDBSize() Gets the maximum number of items allowed in a projected database before local processing. int getMaxPatternLength() Gets the maximal pattern length (i.e. double getMinSupport() Get the minimal support (i.e. static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() <Item,Itemset extends Iterable,Sequence extends Iterable>PrefixSpanModel run(JavaRDD data) A Java-friendly version of run() that reads sequences from a JavaRDD and returns frequent sequences in a PrefixSpanModel. PrefixSpanModel run(RDD<Object[]> data, scala.reflect.ClassTag evidence$1) Finds the complete set of frequent sequential patterns in the input sequences of itemsets. PrefixSpan setMaxLocalProjDBSize(long maxLocalProjDBSize) Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: 32000000L). PrefixSpan setMaxPatternLength(int maxPatternLength) Sets maximal pattern length (default: 10). PrefixSpan setMinSupport(double minSupport) Sets the minimal support level (default: 0.1). * ### Methods inherited from class Object `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait` * ### Methods inherited from interface org.apache.spark.internal.Logging `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize`
Constructor Detail
* #### PrefixSpan public PrefixSpan() Constructs a default instance with default parameters {minSupport: `0.1`, maxPatternLength: `10`, maxLocalProjDBSize: `32000000L`}.
Method Detail
* #### org$apache$spark$internal$Logging$$log\_ public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() * #### org$apache$spark$internal$Logging$$log\_\_$eq public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) * #### getMinSupport public double getMinSupport() Get the minimal support (i.e. the frequency of occurrence before a pattern is considered frequent). Returns: (undocumented) * #### setMinSupport public [PrefixSpan](../../../../../org/apache/spark/mllib/fpm/PrefixSpan.html "class in org.apache.spark.mllib.fpm") setMinSupport(double minSupport) Sets the minimal support level (default: `0.1`). Parameters: `minSupport` \- (undocumented) Returns: (undocumented) * #### getMaxPatternLength public int getMaxPatternLength() Gets the maximal pattern length (i.e. the length of the longest sequential pattern to consider. Returns: (undocumented) * #### setMaxPatternLength public [PrefixSpan](../../../../../org/apache/spark/mllib/fpm/PrefixSpan.html "class in org.apache.spark.mllib.fpm") setMaxPatternLength(int maxPatternLength) Sets maximal pattern length (default: `10`). Parameters: `maxPatternLength` \- (undocumented) Returns: (undocumented) * #### getMaxLocalProjDBSize public long getMaxLocalProjDBSize() Gets the maximum number of items allowed in a projected database before local processing. Returns: (undocumented) * #### setMaxLocalProjDBSize public [PrefixSpan](../../../../../org/apache/spark/mllib/fpm/PrefixSpan.html "class in org.apache.spark.mllib.fpm") setMaxLocalProjDBSize(long maxLocalProjDBSize) Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: `32000000L`). Parameters: `maxLocalProjDBSize` \- (undocumented) Returns: (undocumented) * #### run public <Item> [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm")<Item> run([RDD](../../../../../org/apache/spark/rdd/RDD.html "class in org.apache.spark.rdd")<Object[]> data, scala.reflect.ClassTag<Item> evidence$1) Finds the complete set of frequent sequential patterns in the input sequences of itemsets. Parameters: `data` \- sequences of itemsets. `evidence$1` \- (undocumented) Returns: a [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm") that contains the frequent patterns * #### run public <Item,Itemset extends Iterable<Item>,Sequence extends Iterable<Itemset>> [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm")<Item> run([JavaRDD](../../../../../org/apache/spark/api/java/JavaRDD.html "class in org.apache.spark.api.java")<Sequence> data) A Java-friendly version of `run()` that reads sequences from a `JavaRDD` and returns frequent sequences in a [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm"). Parameters: `data` \- ordered sequences of itemsets stored as Java Iterable of Iterables Returns: a [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm") that contains the frequent sequential patterns