Graph-based Mining of Complex Data (original) (raw)

Graph-based data mining

IEEE Intelligent Systems, 2000

at Arlington THE LARGE AMOUNT OF DATA collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns in it. In response to this problem, researchers have developed techniques and systems for discovering concepts in databases. 1-3 Much of the collected data, however, has an explicit or implicit structural component (spatial or temporal), which few discovery systems are designed to handle. 4 So, in addition to the need to accelerate data mining of large databases, there is an urgent need to develop scalable tools for discovering concepts in structural databases. One method for discovering knowledge in structural data is the identification of common substructures within the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. The discovered substructure concepts allow abstraction from the detailed data structure and provide relevant attributes for interpreting the data. The substructure discovery method is the basis of Subdue, which performs data mining on databases represented as graphs. The system performs two key data-mining techniques: unsupervised pattern discovery and supervised concept learning from examples. Our test applications have demonstrated the scalability and effectiveness of these techniques on a variety of structural databases.

Efficient Mining of Graph-Based Data

International Conference on Artificial Intelligence, 2000

With the increasing amount of structural data being collected, there arises a need to efficiently mine infor- mation from this type of data. The goal of this re- search is to provide a system that performs data min- ing on structural data represented as a labeled graph. We demonstrate how the graph-based discovery system Subdue can be used to perform

Iterative Structure Discovery in Graph-Based Data

International Journal on Artificial Intelligence Tools, 2005

Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the g...

Learning from Supervised Graphs

Studies in Computational Intelligence, 2007

We describe an approach to learning patterns in relational data represented as a graph. The approach, implemented in the Subdue system, searches for patterns that maximally compress the input graph. Subdue can be used for supervised learning, as well as unsupervised pattern discovery and clustering. Mining graph-based data raises challenges not found in linear attribute-value data. However, additional requirements can further complicate the problem. In particular, we describe how concepts can be learned from training examples which are embedded into a single connected graph, or supervised graph. We demonstrate the technique using data from a a NASA SST domain as well as a homeland security domain.

Unsupervised and Supervised Pattern Learning in Graph Data

Cook/Mining Graph Data, 2006

The success of machine learning and data mining for business and scientific purposes has fueled the expansion of its scope to new representations and techniques. Much collected data is structural in nature, containing entities as well as relationships between these entities. Compelling data in bioinformatics [32], network intrusion detection [15], web analysis [2, 8], and social network analysis [7, 27] has become available that requires effective handling of structural data. The ability to learn 1 This work is partially supported by the National Science Foundation grants IIS-0505819 and IIS-0097517.

Mining Patterns from Structured Data by Beam-Wise Graph-Based Induction

Lecture Notes in Computer Science, 2002

Graph-Based Induction (GBI) extracts typical patterns from graph data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search strategy but at the same time it suffers from the incompleteness of search. Improvement is made on its search capability without imposing much computational complexity by 1) incorporating a beam search, 2) using a different evaluation function to extract patterns that are more discriminatory than those simply occurring frequently, and 3) adopting canonical labeling to enumerate identical patterns accurately. This new algorithm, now called Beam-wise GBI, B-GBI for short, was tested against the promoter dataset from UCI repository and shown successful in extracting discriminatory substructures. Effect of beam width on the number of discovered attributes and predictive accuracy was evaluated. The best result obtained by this approach was better than the previously best known result. B-GBI was then applied to a real-world data, Hepatitis dataset provided by Chiba University. Our very preliminary results indicate that B-GBI can actually handle graphs with a few thousands nodes and extract discriminatory patterns.

Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning - ML, 2003

Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and Web browsing history. There are a few approaches that can discover characteristic patterns from graph-structured data in the field of machine learning. However, almost all of them are not suitable for such applications that require a complete search for all frequent subgraph patterns in the data. In this paper, we propose a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graph-structured data. Our algorithm can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops (including self-loops) with labeled or unlabeled nodes and links. Its performance is evaluated through the applications t...

Graph-Based Data Mining in Dynamic Networks: Empirical Comparison of Compression-Based and Frequency-Based Subgraph Mining

2008 IEEE International Conference on Data Mining Workshops, 2008

We propose a dynamic graph-based relational mining approach using graph-rewriting rules to learns patterns in networks that structurally change over time. A dynamic graph containing a sequence of graphs over time represents dynamic properties as well as structural properties of the network. Our approach discovers graph-rewriting rules, which describe the structural transformations between two sequential graphs over time, and also learns description rules that generalize over the discovered graph-rewriting rules. The discovered graph-rewriting rules show how networks change over time, and the description rules in the graph-rewriting rules show temporal patterns in the structural changes. We apply our approach to biological networks to understand how the biosystems change over time. Our compression-based discovery of the description rules is compared with the frequent subgraph mining approach using several evaluation metrics.

Graph Mining : An Overview

2009

In the early years of data mining and knowledge discovery in databases, method development focused on rigidly and plainly structured data. Most often efforts were even confined to data that can be represented as a simple table, which describes a set of sample cases by attribute-value pairs. Recent years, however, have seen a constantly growing interest in the analysis of more complex data, with a less rigid and/or more sophisticated structure.

Mining edge-disjoint patterns in graph-relational data

Proc. Workshop on Data Mining for …, 2007

Diverse types of data are associated with proteins, including network and categorical data. While graph mining techniques have long focused on data with no more than one label per node, generalizations have recently been developed. We show that existing generalizations are not well suited to typical biological networks and are likely to return few or no results on protein regulatory networks. They are, furthermore, ill-suited to graphs that are dense or show the small world property, which are typical features of biological networks. A graph-relational edge disjoint instance mining algorithm (GR-EDI) is presented that resolves these problems. Our algorithm treats bipartite edges separately and only constrains unipartite edges to be disjoint. We introduce a new pattern constraint that recovers the downward closure property. The algorithm uses a search lattice traversal strategy that allows more effective mining of graphs that cannot be considered as sparse due to hubs. Effectiveness is demonstrated for a real biological example. While existing techniques return few or no patterns, GR-EDI is able to extract many patterns.