A Graph Mining Approach for Ranking and Discovering the Interesting Frequent Subgraph Patterns (original) (raw)

A-RAFF: A Ranked Frequent Pattern-growth Subgraph Pattern Discovery Approach

Journal of Internet Technology, 2019

Graph mining is one of the arms of Data Mining in which voluminous complex data are represented in the form of graphs and mining is done to infer useful knowledge from them. Frequent subgraph mining (FSM) is an active research field and is considered as the essence of graph mining. FSM is defined as finding all the subgraph patterns that occur frequently over the entire set of graphs. FSM is extensively used in graph clustering, classification and building indices in the databases. In literature, different FSM algorithms have been proposed such as AGM, FSG, SPIN, SUBDUE, gSpan, FFSM, CloseGraph, FSG, GREW. Most of these FSM techniques perform very well for small to medium size graph datasets, but the computational cost of FSM becomes very critical when the graph size is increased. In accession to this, the number of frequent subgraphs patterns grows exponentially with the increasing size of graph datasets. Consequently, in this research work, a novel FSM approach A RAnked Frequent p...

A Survey of Frequent Subgraph Mining Algorithms

2012

Abstract Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at:(i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective.

IJERT-Review on Frequent Subgraph Pattern Mining Algorithms

International Journal of Engineering Research and Technology (IJERT), 2013

https://www.ijert.org/review-on-frequent-subgraph-pattern-mining-algorithms https://www.ijert.org/research/review-on-frequent-subgraph-pattern-mining-algorithms-IJERTV2IS100903.pdf Frequent subgraph pattern mining is a one of the most popular research topics in data mining. Aim of graph mining is finding interesting patterns within data that represent novel knowledge. Now a day frequent subgraph mining used in various domains like in chemical compounds, social networks, biological networks etc. Mining patterns from graph database is difficult because of subgraph testing and their different operations. This paper gives the idea about different subgraph algorithms based on their approaches.

A Survey of Frequent Subgraphs and Subtree Mining Methods

A graph is a basic data structure which, can be used to model complex structures and the relationships between them, such as XML documents, social networks, communication networks, chemical informatics, biology networks, and structure of web pages. Frequent subgraph pattern mining is one of the most important fields in graph mining. In light of many applications for it, there are extensive researches in this area, such as analysis and processing of XML documents, documents clustering and classification, images and video indexing, graph indexing for graph querying, routing in computer networks, web links analysis, drugs design, and carcinogenesis. Several frequent pattern mining algorithms have been proposed in recent years and every day a new one is introduced. The fact that these algorithms use various methods on different datasets, patterns mining types, graph and tree representations, it is not easy to study them in terms of features and performance. This paper presents a brief report of an intensive investigation of actual frequent subgraphs and subtrees mining algorithms. The algorithms were also categorized based on different features.

Comparative Study of Frequent Subgraph Algorithms

Among all the graph mining processes we can find the one called frequent subgraph mining. Several algorithms have been developed in order to recognize common patterns in a graph database including FSG, FFSM, gSpan and GASTON. This research is intended to analyse the behaviour of these four state-of-art algorithms thru a set of experiments designed to identify whether one of them is the most efficient in every case of use. If there is not an absolute winner, we expect to define which FSM algorithm is the better choice for each scenario.

Performance Evaluation of Frequent Subgraph Discovery Techniques

Mathematical Problems in Engineering, 2014

Due to rapid development of the Internet technology and new scientific advances, the number of applications that model the data as graphs increases, because graphs have highly expressive power to model a complicated structure. Graph mining is a wellexplored area of research which is gaining popularity in the data mining community. A graph is a general model to represent data and has been used in many domains such as cheminformatics, web information management system, computer network, and bioinformatics, to name a few. In graph mining the frequent subgraph discovery is a challenging task. Frequent subgraph mining is concerned with discovery of those subgraphs from graph dataset which have frequent or multiple instances within the given graph dataset. In the literature a large number of frequent subgraph mining algorithms have been proposed; these included FSG, AGM, gSpan, CloseGraph, SPIN, Gaston, and Mofa. The objective of this research work is to perform quantitative comparison of the above listed techniques. The performances of these techniques have been evaluated through a number of experiments based on three different state-of-the-art graph datasets. This novel work will provide base for anyone who is working to design a new frequent subgraph discovery technique.

Survey of Finding Frequent Patterns in Graph Mining: Algorithms and Techniques

Graphs become increasingly important in modeling complicated structures, such as circuits, images, chemical compounds, protein structures, biological networks, social networks, the web, workflows, and XML documents. Many graph search algorithms have been developed in chemical informatics, computer vision, video indexing and text retrieval with the increasing demand on the analysis of large amounts of structured data; graph mining has become an active and important theme in data mining.

Efficient mining of frequent subgraphs in the presence of isomorphism

2003

Frequent subgraph mining is an active research topic in the data mining community. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgraph testing, generally have higher time complexity than the corresponding operations on itemsets, sequences, and trees, which have been studied extensively. In this paper, we propose a novel frequent subgraph mining algorithm: FFSM, which employs a vertical search scheme within an algebraic graph framework we have developed to reduce the number of redundant candidates proposed. Our empirical study on synthetic and real datasets demonstrates that FFSM achieves a substantial performance gain over the current start-of-the-art subgraph mining algorithm gSpan.

A general framework for mining frequent subgraphs from labeled graphs

2004

The derivation of frequent subgraphs from a dataset of labeled graphs has high computational complexity because the hard problems of isomorphism and subgraph isomorphism have to be solved as part of this derivation. To deal with this computational complexity, all previous approaches have focused on one particular kind of graph. In this paper, we propose an approach to conduct a complete search for various classes of frequent subgraphs in a massive dataset of labeled graphs within a practical time. The power of our approach comes from the algebraic representation of graphs, its associated operations and well-organized bias constraints to limit the search space efficiently. The performance has been evaluated using real world datasets, and the high scalability and flexibility of our approach have been confirmed with respect to the amount of data and the computation time.

FS3: A sampling based method for top-k frequent subgraph mining

2014 IEEE International Conference on Big Data (Big Data), 2014

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS 3 , which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS 3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS 3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS 3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.