Empirical comparison of graph classification algorithms (original) (raw)

GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010

Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community.

Graph Classification using Machine Learning Algorithms

Graph Classification using Machine Learning Algorithms by Monica Golahalli Seenappa In the Graph classification problem, given is a family of graphs and a group of different categories, and we aim to classify all the graphs (of the family) into the given categories. Earlier approaches, such as graph kernels and graph embedding techniques have focused on extracting certain features by processing the entire graph. However, real world graphs are complex and noisy and these traditional approaches are computationally intensive. With the introduction of the deep learning framework, there have been numerous attempts to create more efficient classification approaches. For this project, we will be focusing on modifying an existing kernel graph convolutional neural network approach. Moreover, subgraphs (patches) are extracted from the graph using a community detection algorithm. These patches are provided as input to a graph kernel and max pooling is applied. We will be experimenting with different community detection algorithms and graph kernels and compare their efficiency and performance. For the experiments, we use eight publicly available real world datasets, ranging from biological to social networks. Additionally, for these datasets we provide results using a baseline algorithm and a spectral decomposition of Laplacian graph for comparison purposes.

An application of boosting to graph classification

Advances in neural information …, 2005

This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency.

Edge distance graph kernel and its application to small molecule classification

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Graph classification is an important problem in graph mining with various applications in different fields. Kernel methods have been successfully applied to this problem, recently producing promising results. A graph kernel that mostly specifies classification performance has to be defined in order to apply kernel methods to a graph classification problem. Although there are various previously proposed graph kernels, the problem is still worth investigating, as the available kernels are far from perfect. In this paper, we propose a new graph kernel based on a recently proposed concept called edge distance-k graphs. These new graphs are derived from the original graph and have the potential to be used as novel graph descriptors. We propose a method to convert these graphs into a multiset of strings that is further used to compute a kernel for graphs. The proposed graph kernel is then evaluated on various data sets in comparison to a recently proposed group of graph kernels. The results are promising, both in terms of performance and computational requirements.

Two New Graph Kernels and Applications to Chemoinformatics

Lecture Notes in Computer Science, 2011

Chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by the chemoinformatics field are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning techniques with graph theory. Such kernels prove their efficiency on several chemoinformatics problems. This paper presents two new graph kernels applied to regression and classification problems within the chemoinformatics field. The first kernel is based on the notion of edit distance while the second is based on sub trees enumeration. Several experiments show the complementary of both approaches.

GPM: A graph pattern matching kernel with diffusion for chemical compound classification

2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008

Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge.

Graph Classification via Neural Networks

2016

For a long time, the preferred machine learning algorithms for doing graph classification have been kernel based. The reasoning has been that kernels represent an elegant way to handle structured data that cannot be easily represented using numerical vectors or matrices. An important reason for the success of kernel methods, is the ’kernel trick’, which essentially replaces computing the feature representation, with a call to a kernel function, thus saving computation and memory cost. For some of the most successful kernels in the graph domain however, such as graphlets, this is not feasible, and one must compute the entire feature distribution in order to obtain the kernel. We present experimental evidence that using graphlet features presented to different neural networks gives comparable accuracy results to kernelized SVMs. As neural networks are parametric models that scale well with data size and can yield faster predictions than SVMs, our results suggest that they are attracti...

CPM : A Graph Pattern Matching Kernel with Diffusion for Accurate Graph Classification

2008

Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called G ̄ raph P ̄ attern M ̄ atching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g. support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call “pattern diffusion” to label nodes in the graphs. Finally we designed a novel graph matching algorithm to compute a graph kernel. We have perf...

Two new graphs kernels in chemoinformatics

2012

Chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by chemoinformatics are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning and graph theory techniques. Such kernels prove their efficiency on several chemoinformatics problems and this paper presents two new graph kernels applied to regression and classification problems. The first kernel is based on the notion of edit distance while the second is based on subtrees enumeration. The design of this last kernel is based on a variable selection step in order to obtain kernels defined on parsimonious sets of patterns. Performances of both kernels are investigated through experiments.