A Survey on Frequent Pattern Mining Techniques in Sequence Data Sets (original) (raw)

Finding Sequential Patterns from Large Sequence Data

2010

Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern models and efficient algorithms for mining these patterns. In this paper, we theoretically provided a brief overview three types of sequential patterns model.

Frequent Pattern Mining and Analysis in DNA Subsequence

The Frequent Pattern Mining (FPM) algorithms help to mine sub sequences. The subsequence can be non-contiguous and contiguous patterns. The mining of such patterns is an important problem. The mining of such patterns require efficient mining algorithm. It has various applications like discovery of motifs in DNA sequences, financial industry, the analysis of web log, customer shopping sequences and the investigation of scientific or medical processes etc. A more appropriate algorithm for pattern matching is needed. In this work, the approximate pattern matching algorithm is used for finding approximate subsequence. The proposed method is experimented for matching the frequent approximate sub sequences. This method is used to check whether two or more sequences match or to find the occurrence of such pattern. The method is effective since it will always find the pattern if the pattern exists.

IJERT-A Time and Space Efficient Algorithm for Mining Sequential Pattern

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/a-time-and-space-efficient-algorithm-for-mining-sequential-pattern https://www.ijert.org/research/a-time-and-space-efficient-algorithm-for-mining-sequential-pattern-IJERTV3IS091040.pdf Sequential Pattern Mining is very important technique of Data Mining which extract frequent patterns from given sequence. It is used in various fields such as medical treatments, customer shopping sequence, DNA sequence and gene structures. Sequential Pattern Mining Approaches are classified into two categories: Apriori or generate and test approach, pattern growth or divide and conquer approach. In this paper, we are introducing a more time and space efficient algorithm for sequential pattern mining. The time & space consumption of proposed algorithm will be lesser in comparison to previous algorithms & we compare two algorithms of pattern growth algorithms of Sequential Pattern Mining, one is P-prefixspan which discovers frequent sequential pattern with probability of inter arrival time and other one is new proposed algorithm named as Precursive algorithm. Our experiment shows that new proposed algorithm is more efficient and scalable then the P-prefixspan algorithm.

AN ENHANCED ALGORITHM FOR FREQUENT PATTERN MINING FROM BIOLOGICAL SEQUENCES

Bio-data analysis deals with the most vital discovering problem of similarity search and finding relationship among bio sequences and structures. In this paper, we are trading the problem of discovering the most recurrently occurring patterns in a given DNA or protein sequence. Several on hand tools need the user to spell out gap constraints in advance in turn to find specific patterns. Practically it is not possible for the user to provide the gap constraints. So the need arises of budding an algorithm to obtain the patterns easily on its own without the need of user intervention in the form of mentioning of gap constraints. We have got two analytical methods to find out the recurrent subsequences and guesstimate the maximum support for data with complexity O(|T|.Sup) where |T| stands for text sequence length and Sup represents the number of occurrences of the pattern. We are proposing an altered version of the previously proposed algorithm with complexity O(|T|).

Comparative Study of Various Sequential Pattern Mining Algorithms

International Journal of Computer Applications, 2014

In Sequential pattern mining represents an important class of data mining problems with wide range of applications. It is one of the very challenging problems because it deals with the careful scanning of a combinatorially large number of possible subsequence patterns. Broadly sequential pattern ming algorithms can be classified into three types namely Apriori based approaches, Pattern growth algorithms and Early pruning algorithms. These algorithms have further classification and extensions. Detailed explanation of each algorithm along with its important features, pseudo code, advantages and disadvantages is given in the subsequent sections of the paper. At the end a comparative analysis of all the algorithms with their supporting features is given in the form of a table. This paper tries to enrich the knowledge and understanding of various approaches of sequential pattern mining.

Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences

Transaction sequences in market-basket analysis have large set of alphabets with small length, whereas bio-sequences have small set of alphabets of long length with gap. There is the difference in pattern finding algorithms of these two sequences. The chances of repeatedly occurring small patterns are high in bio-sequences than in the transaction sequences. These repeatedly occurring small patterns are called as Frequent Contiguous Patterns (FCP). The challenging task in pattern finding of bio-sequences is to find FCP. FCP gives clues for genetic discovery, functional analysis and also helps to assemble a whole genome of species. Most of the existing FCP algorithms are all based on Apriori method. They require repeated scanning of the database and large number of intermediate tables to produce the results. So, these algorithms require large space and high computational time. In this paper, we are analyzing few of the currently available FCP algorithms with their advantages and disadvantages.

PERFORMANCE COMPARISON BETWEEN PATTERN GROWTH ALGORITHMS FOR MINING SEQUENTIAL PATTERN Prachi Batwara1, Basant Verma, Ph.D2

Sequential Pattern Mining is very important concept in Data Mining, finds frequent patterns from given sequence. It is used in various domains such as medical treatments, customer shopping sequence, DNA sequence and gene structures. Sequential Pattern Mining Approaches are classified into two categories: Apriori or generate and test approach, pattern growth or divide and conquer approach. In this paper, we are introducing a more efficient algorithm for sequential pattern mining. The time & space consumption of proposed algorithm will be lesser in comparison to previous algorithms & we compare two algorithms of pattern growth algorithms of Sequential Pattern Mining, one is P-prefix span which discovers frequent sequential pattern with probability of inter arrival time and other one is new proposed algorithm named as Percussive algorithm. Our experiment shows that new proposed algorithm is more efficient and scalable then the P-prefix span algorithm. Keywords: Data Mining, Sequential Pattern Mining, Frequent Item set, Support count, Sequence database.

Pattern directed mining of sequence data

1998

Sequence data arise naturally in many applications, and can be viewed as an ordering of events, where each event has an associated time of occurrence. An important characteristic of event sequences is the occurrence of episodes, i.e. a collection of events occurring in a certain pattern. Of special interest axe ~r~uent episodes, i.e. episodes occurring with a frequency above a certain threshold. In this paper, we study the problem of mining for f~equent episodes in sequence data. We present a framework for efficient mining of frequent episodes which goes beyond previous work in a number of ways. First, we present a language for specifying episodes of interest. Second, we describe a novel data structure, called the sequential pattern tree (SP Tree), which captures the relationships specified in the pattern language in a very compact manner. Third, we show how this data structure can be used by a standard bottomup mining algorithm to generate frequent episodes in an efficient manner. Finally, we show how the SP Tree can be optimized by sharing common conditions, and evaluating each such expression only once. We present the results of an evaluation of the proposed techniques.

A Review of Frequent sequential Pattern Mining Methods

2017

With the mining capabilities of the several data mining methodologies, there are several interesting extensions on frequent pattern mining. The discovery of sequential patterns is one of them. It has a vast array of real world applications. It is worthy of study on extending the memory indexing approach for efficient mining of generalized sequential patterns. This paper proposes a critical review of the sequential pattern mining methods.

Mining Sequences - Approaches and Analysis

Sequential Pattern Mining is to discover sequential patterns, with user-specified minimum support of pattern where support is number of sequences that contains pattern, from a database of sequences. Each sequence of database consists of list of transactions ordered by transaction time and each transaction is a set of items. Closed Sequential Pattern Mining has same capability as Sequential pattern mining, but in Closed Sequential Pattern Mining redundant patterns to be generated and stored are reduced which is much economical. This paper presents approaches and key-feature of algorithms ClaSP, CM-ClaSP, CloSpan, BIDE which are used for mining closed sequential patterns as well as approaches and key features of algorithms GSP, SPADE, PrefixSpan, SPAM, LAPIN which are used for mining sequential pattern. It shows that number of sequences generated in Closed Sequential Pattern Mining is much less than those generated by Sequential Pattern Mining which makes Closed Sequential Pattern Mining Economical. The algorithms are compared by attributes total time required to find frequent sequences, number of frequent sequences generated and maximum memory required.