PrefixSpan (Spark 3.5.5 JavaDoc) (original) (raw)


public class PrefixSpan
extends Object
implements org.apache.spark.internal.Logging, scala.Serializable
A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth (see here).
param: minSupport the minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: maxPatternLength the maximal length of the sequential pattern param: maxLocalProjDBSize The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run.
See Also:
Sequential Pattern Mining (Wikipedia), Serialized Form