PrefixSpan (Spark 3.5.5 JavaDoc) (original) (raw)

Object
- org.apache.spark.mllib.fpm.PrefixSpan
All Implemented Interfaces:
java.io.Serializable, org.apache.spark.internal.Logging

public class PrefixSpan
extends Object
implements org.apache.spark.internal.Logging, scala.Serializable
A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth (see here).
param: minSupport the minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: maxPatternLength the maximal length of the sequential pattern param: maxLocalProjDBSize The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run.
See Also:
Sequential Pattern Mining (Wikipedia), Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type	Class and Description
static class	PrefixSpan.FreqSequence<Item> Represents a frequent sequence.
static class	PrefixSpan.Postfix$
static class	PrefixSpan.Prefix$

   * ### Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging  
   `org.apache.spark.internal.Logging.SparkShellLoggingFilter`

Constructor Summary

Constructors

Constructor and Description
PrefixSpan() Constructs a default instance with default parameters {minSupport: 0.1, maxPatternLength: 10, maxLocalProjDBSize: 32000000L}.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods

Modifier and Type	Method and Description
long	getMaxLocalProjDBSize() Gets the maximum number of items allowed in a projected database before local processing.
int	getMaxPatternLength() Gets the maximal pattern length (i.e.
double	getMinSupport() Get the minimal support (i.e.
static void	org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
static org.slf4j.Logger	org$apache$spark$internal$Logging$$log_()
<Item,Itemset extends Iterable,Sequence extends Iterable>PrefixSpanModel	run(JavaRDD data) A Java-friendly version of run() that reads sequences from a JavaRDD and returns frequent sequences in a PrefixSpanModel.
PrefixSpanModel	run(RDD<Object[]> data, scala.reflect.ClassTag evidence$1) Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
PrefixSpan	setMaxLocalProjDBSize(long maxLocalProjDBSize) Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: 32000000L).
PrefixSpan	setMaxPatternLength(int maxPatternLength) Sets maximal pattern length (default: 10).
PrefixSpan	setMinSupport(double minSupport) Sets the minimal support level (default: 0.1).

   * ### Methods inherited from class Object  
   `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`  
   * ### Methods inherited from interface org.apache.spark.internal.Logging  
   `$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize`

Constructor Detail

 * #### PrefixSpan  
 public PrefixSpan()  
 Constructs a default instance with default parameters {minSupport: `0.1`, maxPatternLength: `10`, maxLocalProjDBSize: `32000000L`}.

Method Detail

* #### org$apache$spark$internal$Logging$$log\_  
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()  
* #### org$apache$spark$internal$Logging$$log\_\_$eq  
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)  
* #### getMinSupport  
public double getMinSupport()  
Get the minimal support (i.e. the frequency of occurrence before a pattern is considered frequent).  
Returns:  
(undocumented)  
* #### setMinSupport  
public [PrefixSpan](../../../../../org/apache/spark/mllib/fpm/PrefixSpan.html "class in org.apache.spark.mllib.fpm") setMinSupport(double minSupport)  
Sets the minimal support level (default: `0.1`).  
Parameters:  
`minSupport` \- (undocumented)  
Returns:  
(undocumented)  
* #### getMaxPatternLength  
public int getMaxPatternLength()  
Gets the maximal pattern length (i.e. the length of the longest sequential pattern to consider.  
Returns:  
(undocumented)  
* #### setMaxPatternLength  
public [PrefixSpan](../../../../../org/apache/spark/mllib/fpm/PrefixSpan.html "class in org.apache.spark.mllib.fpm") setMaxPatternLength(int maxPatternLength)  
Sets maximal pattern length (default: `10`).  
Parameters:  
`maxPatternLength` \- (undocumented)  
Returns:  
(undocumented)  
* #### getMaxLocalProjDBSize  
public long getMaxLocalProjDBSize()  
Gets the maximum number of items allowed in a projected database before local processing.  
Returns:  
(undocumented)  
* #### setMaxLocalProjDBSize  
public [PrefixSpan](../../../../../org/apache/spark/mllib/fpm/PrefixSpan.html "class in org.apache.spark.mllib.fpm") setMaxLocalProjDBSize(long maxLocalProjDBSize)  
Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: `32000000L`).  
Parameters:  
`maxLocalProjDBSize` \- (undocumented)  
Returns:  
(undocumented)  
* #### run  
public <Item> [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm")<Item> run([RDD](../../../../../org/apache/spark/rdd/RDD.html "class in org.apache.spark.rdd")<Object[]> data,  
                                        scala.reflect.ClassTag<Item> evidence$1)  
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.  
Parameters:  
`data` \- sequences of itemsets.  
`evidence$1` \- (undocumented)  
Returns:  
a [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm") that contains the frequent patterns  
* #### run  
public <Item,Itemset extends Iterable<Item>,Sequence extends Iterable<Itemset>> [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm")<Item> run([JavaRDD](../../../../../org/apache/spark/api/java/JavaRDD.html "class in org.apache.spark.api.java")<Sequence> data)  
A Java-friendly version of `run()` that reads sequences from a `JavaRDD` and returns frequent sequences in a [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm").  
Parameters:  
`data` \- ordered sequences of itemsets stored as Java Iterable of Iterables  
Returns:  
a [PrefixSpanModel](../../../../../org/apache/spark/mllib/fpm/PrefixSpanModel.html "class in org.apache.spark.mllib.fpm") that contains the frequent sequential patterns

PrefixSpan (Spark 3.5.5 JavaDoc) (original) (raw)

Nested Class Summary

Constructor Summary

Method Summary

Constructor Detail

Method Detail