Graph kernels for chemical informatics (original) (raw)

Graph kernels based on relevant patterns and cycle information for chemoinformatics

2012

Chemoinformatics consists to predict molecule's properties through informational techniques. Computer science's research fields mainly concerned by chemoinformatics are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining these two fields. In this paper, we present a graph kernel based on an optimal combination of kernels based on sub tree enumeration. We additionally propose a new kernel between the cyclic systems of two graphs. These two extensions have been validated on several chemoinformatics datasets.

GPM: A graph pattern matching kernel with diffusion for chemical compound classification

2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008

Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge.

Symbolic Learning vs. Graph Kernels: An Experimental Comparison in a Chemical Application

2010

In this paper we present a quantitative comparison between two approaches, Graph Kernels and Symbolic Learning, within a classification scheme. The experimental case-study is the predictive toxicology evaluation, that is the inference of the toxic characteristics of chemical compounds from their structure. The results demonstrate that both approaches are comparable in terms of accuracy, but present pros and cons that are discussed in the last part of the paper.

Two New Graph Kernels and Applications to Chemoinformatics

Lecture Notes in Computer Science, 2011

Chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by the chemoinformatics field are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning techniques with graph theory. Such kernels prove their efficiency on several chemoinformatics problems. This paper presents two new graph kernels applied to regression and classification problems within the chemoinformatics field. The first kernel is based on the notion of edit distance while the second is based on sub trees enumeration. Several experiments show the complementary of both approaches.

Two new graphs kernels in chemoinformatics

2012

Chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by chemoinformatics are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning and graph theory techniques. Such kernels prove their efficiency on several chemoinformatics problems and this paper presents two new graph kernels applied to regression and classification problems. The first kernel is based on the notion of edit distance while the second is based on subtrees enumeration. The design of this last kernel is based on a variable selection step in order to obtain kernels defined on parsimonious sets of patterns. Performances of both kernels are investigated through experiments.

Edge distance graph kernel and its application to small molecule classification

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Graph classification is an important problem in graph mining with various applications in different fields. Kernel methods have been successfully applied to this problem, recently producing promising results. A graph kernel that mostly specifies classification performance has to be defined in order to apply kernel methods to a graph classification problem. Although there are various previously proposed graph kernels, the problem is still worth investigating, as the available kernels are far from perfect. In this paper, we propose a new graph kernel based on a recently proposed concept called edge distance-k graphs. These new graphs are derived from the original graph and have the potential to be used as novel graph descriptors. We propose a method to convert these graphs into a multiset of strings that is further used to compute a kernel for graphs. The proposed graph kernel is then evaluated on various data sets in comparison to a recently proposed group of graph kernels. The results are promising, both in terms of performance and computational requirements.

GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010

Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community.

Two new graph kernels and applications to Chemoinformatic

Chemoinformatics is a well established research field concerned with the discovery of molecule's properties through informational techniques. Computer science's research fields mainly concerned by the chemoinformatics field are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning techniques with graph theory. Such kernels prove their efficiency on several chemoinformatics problems. This paper presents two new graph kernels applied to regression and classification problems within the chemoinformatics field. The first kernel is based on the notion of edit distance while the second is based on sub trees enumeration. Several experiments show the complementary of both approaches.

Kernel Functions for Attributed Molecular Graphs – A New Similarity-Based Approach to ADME Prediction in Classification and Regression

Qsar & Combinatorial Science, 2006

Kernel methods, like the well-known Support Vector Machine (SVM), have gained a growing interest during the last years for designing QSAR/QSPR models having a high predictive strength. One of the key concepts of SVMs is the usage of a so-called kernel function, which can be thought of as a special similarity measure. In this paper we consider kernels for molecular structures, which are based on a graph representation of chemical compounds. The similarity score is calculated by computing an optimal assignment of the atoms from one molecule to those of another one, including information on specific chemical properties, membership to a substructure (e.g. aromatic ring, carbonyl group, etc.) and neighborhood for each atom. We show that by using this kernel we can achieve a generalization performance comparable to a classical model with a few descriptors, which are a-priori known to be relevant for the problem, and significantly better results than with and without performing an automatic descriptor selection. For this purpose we investigate ADME classification and regression datasets for predicting bioavailability (Yoshida), human intestinal absorption (HIA), blood-brain-barrier (BBB) penetration and a dataset consisting of 4 different inhibitor classes (SOL). We further explore the effect of combining our kernel with a problem dependent descriptor set. We also demonstrate the usefulness of an extension of our method to a reduced graph representation of molecules, in which certain structural features, like e.g. rings, donors or acceptors, are represented as a single node in the molecular graph.

Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

…, 2005

Motivation: Small molecules play a fundamental role in organic chemistry and biology. They can be used to probe biological systems and to discover new drugs and other useful compounds. As increasing numbers of large datasets of small molecules become available, it is necessary to develop computational methods that can deal with molecules of variable size and structure and predict their physical, chemical and biological properties. Results: Here we develop several new classes of kernels for small molecules using their 1D, 2D and 3D representations. In 1D, we consider string kernels based on SMILES strings. In 2D, we introduce several similarity kernels based on conventional or generalized fingerprints. Generalized fingerprints are derived by counting in different ways subpaths contained in the graph of bonds, using depth-first searches. In 3D, we consider similarity measures between histograms of pairwise distances between atom classes. These kernels can be computed efficiently and are applied to problems of classification and prediction of mutagenicity, toxicity and anti-cancer activity on three publicly available datasets. The results derived using cross-validation methods are state-of-the-art. Tradeoffs between various kernels are briefly discussed. Availability: Datasets available from