Nils Kriege - Academia.edu (original) (raw)
Papers by Nils Kriege
We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections betwee... more We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections between subgraphs. While recently proposed kernels based on common subgraphs (Wale et al., 2008; Shervashidze et al., 2009) in general can not be applied to attributed graphs, our approach allows to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and edge attributes by kernels. We show that subgraph matching kernels generalize several known kernels. To compute the kernel we propose a graph-theoretical algorithm inspired by a classical relation between common subgraphs of two graphs and cliques in their product graph observed by Levi (1973). Encouraging experimental results on a classification task of real-world graphs are presented.
Journal of Graph Algorithms and Applications, 2014
Communications in Computer and Information Science, 2014
Lecture Notes in Computer Science, 2015
Lecture Notes in Computer Science, 2014
Lecture Notes in Computer Science, 2014
ABSTRACT Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belon... more ABSTRACT Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods that are applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of O(nlogn) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. In extensive experimental evaluations on real-world and synthetic data sets, we compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and running time. The evaluations show a subquadratic running time in practice and a very low memory footprint.
2011 IEEE 27th International Conference on Data Engineering, 2011
ABSTRACT Efficient subgraph queries in large databases are a time-critical task in many applicati... more ABSTRACT Efficient subgraph queries in large databases are a time-critical task in many application areas as e.g. biology or chemistry, where biological networks or chemical compounds are modeled as graphs. The NP-completeness of the underlying subgraph isomorphism problem renders an exact subgraph test for each database graph infeasible. Therefore efficient methods have to be found that avoid most of these tests but still allow to identify all graphs containing the query pattern. We propose a new approach based on the filter-verification paradigm, using a new hash-key fingerprint technique with a combination of tree and cycle features for filtering and a new subgraph isomorphism test for verification. Our approach is able to cope with edge and vertex labels and also allows to use wild card patterns for the search. We present an experimental comparison of our approach with state-of-the-art methods using a benchmark set of both real world and generated graph instances that shows its practicability. Our approach is implemented as part of the Scaffold Hunter software, a tool for the visual analysis of chemical compound databases.
2014 IEEE International Conference on Data Mining, 2014
ABSTRACT As many real-world data can elegantly be represented as graphs, various graph kernels an... more ABSTRACT As many real-world data can elegantly be represented as graphs, various graph kernels and methods for computing them have been proposed. Surprisingly, many of the recent graph kernels do not employ the kernel trick anymore but rather compute an explicit feature map and report higher efficiency. So, is there really no benefit of the kernel trick when it comes to graphs? Triggered by this question, we investigate under which conditions it is possible to compute a graph kernel explicitly and for which graph properties this computation is actually more efficient. We give a sufficient condition for R-convolution kernels that enables kernel computation by explicit mapping. We theoretically and experimentally analyze efficiency and flexibility of implicit kernel functions and dot products of explicitly computed feature maps for widely used graph kernels such as random walk kernels, sub graph matching kernels, and shortest-path kernels. For walk kernels we observe a phase transition when comparing runtime with respect to label diversity and walk lengths leading to the conclusion that explicit computations are only favourable for smaller label sets and walk lengths whereas implicit computation is superior for longer walk lengths and data sets with larger label diversity.
Communications in Computer and Information Science, 2013
Lecture Notes in Computer Science, 2014
We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections betwee... more We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections between subgraphs. While recently proposed kernels based on common subgraphs in general can not be applied to attributed graphs, our approach allows to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and edge attributes by kernels. We show that subgraph matching kernels generalize several known kernels. To compute the kernel we propose a graphtheoretical algorithm inspired by a classical relation between common subgraphs of two graphs and cliques in their product graph observed by . Encouraging experimental results on a classification task of realworld graphs are presented.
Lecture Notes in Computer Science, 2010
Scaffold Hunter is a Java-based software tool for the analysis of structure-related biochemical d... more Scaffold Hunter is a Java-based software tool for the analysis of structure-related biochemical data. It facilitates the interactive exploration of chemical space by enabling generation of and navigation in a scaffold tree hierarchy annotated with various data. The graphical visualization of structural relationships allows to analyze large data sets, e.g., to correlate chemical structure and biochemical activity.
Journal of Cheminformatics, 2014
Molecular Informatics, 2013
ABSTRACT The growing interest in chemogenomics approaches over the last years has led to an incre... more ABSTRACT The growing interest in chemogenomics approaches over the last years has led to an increasing amount of data regarding chemical and the corresponding biological activity space. The resulting data, collected in either in-house or public databases, need to be analyzed efficiently to speed-up the increasingly difficult task of drug discovery. Unfortunately, the discovery of new chemical entities or new targets for known drugs (‘drug repurposing’) is not suitable to a fully automated analysis or a simple drill down process. Visual interactive interfaces that allow to explore chemical space in a systematic manner and facilitate analytical reasoning can help to overcome these problems. Scaffold Hunter is a tool for the visual analysis of chemical compound databases that provides integrated visualization and analysis of biological activity data and fosters the interactive exploration of data imported from a variety of sources. We describe the features and illustrate the use by means of an exemplary analysis workflow.
ABSTRACT DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybr... more ABSTRACT DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a conve-nient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation induced by reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, so finding a longest q-unique sequence is equivalent to finding an Euler tour and solved in linear time with respect to the output string length. For even q, self-complementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open.
The success of kernel methods has initiated the design of novel positive semidef-inite functions,... more The success of kernel methods has initiated the design of novel positive semidef-inite functions, in particular for structured data. A leading design paradigm for this is the convolution kernel, which decomposes structured objects into their parts and sums over all pairs of parts. Assignment kernels, in contrast, are obtained from an optimal bijection between parts, which can provide a more valid notion of similarity. In general however, optimal assignments yield indefinite functions, which complicates their use in kernel methods. We characterize a class of base kernels used to compare parts that guarantees positive semidefinite optimal assignment kernels. These base kernels give rise to hierarchies from which the optimal assignment kernels are computed in linear time by histogram intersection. We apply these results by developing the Weisfeiler-Lehman optimal assignment kernel for graphs. It provides high classification accuracy on widely-used benchmark data sets improving over the original Weisfeiler-Lehman kernel.
We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections betwee... more We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections between subgraphs. While recently proposed kernels based on common subgraphs (Wale et al., 2008; Shervashidze et al., 2009) in general can not be applied to attributed graphs, our approach allows to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and edge attributes by kernels. We show that subgraph matching kernels generalize several known kernels. To compute the kernel we propose a graph-theoretical algorithm inspired by a classical relation between common subgraphs of two graphs and cliques in their product graph observed by Levi (1973). Encouraging experimental results on a classification task of real-world graphs are presented.
Journal of Graph Algorithms and Applications, 2014
Communications in Computer and Information Science, 2014
Lecture Notes in Computer Science, 2015
Lecture Notes in Computer Science, 2014
Lecture Notes in Computer Science, 2014
ABSTRACT Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belon... more ABSTRACT Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods that are applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of O(nlogn) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. In extensive experimental evaluations on real-world and synthetic data sets, we compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and running time. The evaluations show a subquadratic running time in practice and a very low memory footprint.
2011 IEEE 27th International Conference on Data Engineering, 2011
ABSTRACT Efficient subgraph queries in large databases are a time-critical task in many applicati... more ABSTRACT Efficient subgraph queries in large databases are a time-critical task in many application areas as e.g. biology or chemistry, where biological networks or chemical compounds are modeled as graphs. The NP-completeness of the underlying subgraph isomorphism problem renders an exact subgraph test for each database graph infeasible. Therefore efficient methods have to be found that avoid most of these tests but still allow to identify all graphs containing the query pattern. We propose a new approach based on the filter-verification paradigm, using a new hash-key fingerprint technique with a combination of tree and cycle features for filtering and a new subgraph isomorphism test for verification. Our approach is able to cope with edge and vertex labels and also allows to use wild card patterns for the search. We present an experimental comparison of our approach with state-of-the-art methods using a benchmark set of both real world and generated graph instances that shows its practicability. Our approach is implemented as part of the Scaffold Hunter software, a tool for the visual analysis of chemical compound databases.
2014 IEEE International Conference on Data Mining, 2014
ABSTRACT As many real-world data can elegantly be represented as graphs, various graph kernels an... more ABSTRACT As many real-world data can elegantly be represented as graphs, various graph kernels and methods for computing them have been proposed. Surprisingly, many of the recent graph kernels do not employ the kernel trick anymore but rather compute an explicit feature map and report higher efficiency. So, is there really no benefit of the kernel trick when it comes to graphs? Triggered by this question, we investigate under which conditions it is possible to compute a graph kernel explicitly and for which graph properties this computation is actually more efficient. We give a sufficient condition for R-convolution kernels that enables kernel computation by explicit mapping. We theoretically and experimentally analyze efficiency and flexibility of implicit kernel functions and dot products of explicitly computed feature maps for widely used graph kernels such as random walk kernels, sub graph matching kernels, and shortest-path kernels. For walk kernels we observe a phase transition when comparing runtime with respect to label diversity and walk lengths leading to the conclusion that explicit computations are only favourable for smaller label sets and walk lengths whereas implicit computation is superior for longer walk lengths and data sets with larger label diversity.
Communications in Computer and Information Science, 2013
Lecture Notes in Computer Science, 2014
We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections betwee... more We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections between subgraphs. While recently proposed kernels based on common subgraphs in general can not be applied to attributed graphs, our approach allows to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and edge attributes by kernels. We show that subgraph matching kernels generalize several known kernels. To compute the kernel we propose a graphtheoretical algorithm inspired by a classical relation between common subgraphs of two graphs and cliques in their product graph observed by . Encouraging experimental results on a classification task of realworld graphs are presented.
Lecture Notes in Computer Science, 2010
Scaffold Hunter is a Java-based software tool for the analysis of structure-related biochemical d... more Scaffold Hunter is a Java-based software tool for the analysis of structure-related biochemical data. It facilitates the interactive exploration of chemical space by enabling generation of and navigation in a scaffold tree hierarchy annotated with various data. The graphical visualization of structural relationships allows to analyze large data sets, e.g., to correlate chemical structure and biochemical activity.
Journal of Cheminformatics, 2014
Molecular Informatics, 2013
ABSTRACT The growing interest in chemogenomics approaches over the last years has led to an incre... more ABSTRACT The growing interest in chemogenomics approaches over the last years has led to an increasing amount of data regarding chemical and the corresponding biological activity space. The resulting data, collected in either in-house or public databases, need to be analyzed efficiently to speed-up the increasingly difficult task of drug discovery. Unfortunately, the discovery of new chemical entities or new targets for known drugs (‘drug repurposing’) is not suitable to a fully automated analysis or a simple drill down process. Visual interactive interfaces that allow to explore chemical space in a systematic manner and facilitate analytical reasoning can help to overcome these problems. Scaffold Hunter is a tool for the visual analysis of chemical compound databases that provides integrated visualization and analysis of biological activity data and fosters the interactive exploration of data imported from a variety of sources. We describe the features and illustrate the use by means of an exemplary analysis workflow.
ABSTRACT DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybr... more ABSTRACT DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a conve-nient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation induced by reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, so finding a longest q-unique sequence is equivalent to finding an Euler tour and solved in linear time with respect to the output string length. For even q, self-complementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open.
The success of kernel methods has initiated the design of novel positive semidef-inite functions,... more The success of kernel methods has initiated the design of novel positive semidef-inite functions, in particular for structured data. A leading design paradigm for this is the convolution kernel, which decomposes structured objects into their parts and sums over all pairs of parts. Assignment kernels, in contrast, are obtained from an optimal bijection between parts, which can provide a more valid notion of similarity. In general however, optimal assignments yield indefinite functions, which complicates their use in kernel methods. We characterize a class of base kernels used to compare parts that guarantees positive semidefinite optimal assignment kernels. These base kernels give rise to hierarchies from which the optimal assignment kernels are computed in linear time by histogram intersection. We apply these results by developing the Weisfeiler-Lehman optimal assignment kernel for graphs. It provides high classification accuracy on widely-used benchmark data sets improving over the original Weisfeiler-Lehman kernel.