Diane Cook - Academia.edu (original) (raw)
Papers by Diane Cook
International Journal on Artificial Intelligence Tools, Mar 1, 2001
Knowledge Discovery and Data Mining, Jul 31, 1994
Journal of Artificial Intelligence Research, 1994
The ability to identify interesting and repetitive substructures is an essential component to dis... more The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUB...
Proceedings of the 2006 SIAM International Conference on Data Mining, 2006
2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009
Biomedical Data and Applications, 2009
International Journal of Pattern Recognition and Artificial Intelligence, 2003
The World Wide Web provides an immense source of information. Accessing information of interest p... more The World Wide Web provides an immense source of information. Accessing information of interest presents a challenge to scientists and analysts, particularly if the desired information is structural in nature. Our goal is to design a structural search engine that uses the hyperlink structure of the Web, in addition to textual information, to search for sites of interest. Our structural search engine, called WebSUBDUE, searches not only for particular words or topics but also for a desired hyperlink structure. Enhanced by WordNet text functions, our search engine retrieves sites corresponding to structures formed by graph-based user queries. We hypothesize that this system can form the heart of a structural query engine, and demonstrate the approach on a number of structural web queries.
Machine Learning in Cyber Trust, 2009
IGARSS 2008 - 2008 IEEE International Geoscience and Remote Sensing Symposium, 2008
2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009
Advanced Information and Knowledge Processing
Journal of Parallel and Distributed Computing, 2001
International Journal on Artificial Intelligence Tools, 2001
Hierarchical conceptual clustering has proven to be a useful, although greatly under-explored dat... more Hierarchical conceptual clustering has proven to be a useful, although greatly under-explored data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides the advantages of both approaches. This work presents SUBDUE and the development of its clustering functionalities. Several examples are used to illustrate the validity of the approach both in structured and unstructured domains, as well as compare SUBDUE to earlier clustering algorithms. Results show that SUBDUE successfully discovers hierarchical clusterings in both structured and unstructured data.
International Journal on Artificial Intelligence Tools, 2008
We describe an algorithm and experiments for inference of edge replacement graph grammars. This m... more We describe an algorithm and experiments for inference of edge replacement graph grammars. This method generates candidate recursive graph grammar productions based on isomorphic subgraphs which overlap by two nodes. If there is no edge between the two overlapping nodes, the method generates a recursive graph grammar production with a virtual edge. We guide the search for the graph grammar based on the size of the grammar and the portion of the graph described by the grammar. We show experiments where we generate graphs from known graph grammars, use our method to infer the grammar from the generated graphs, and then measure the error between the original and inferred grammars. Experiments show that the method performs well on several types of grammars, and specifically that error decreases with increased numbers of unique labels in the graph.
International Journal on Artificial Intelligence Tools, 2004
We present an algorithm for the inference of context-free graph grammars from examples. The algor... more We present an algorithm for the inference of context-free graph grammars from examples. The algorithm builds on an earlier system for frequent substructure discovery, and is biased toward grammars that minimize description length. Grammar features include recursion, variables and relationships. We present an illustrative example, demonstrate the algorithm's ability to learn in the presence of noise, and show real-world examples.
International Journal on Artificial Intelligence Tools, 2005
Much of current data mining research is focused on discovering sets of attributes that discrimina... more Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the g...
IEEE Transactions on Knowledge and Data Engineering, 2007
IEEE Engineering in Medicine and Biology Magazine, 2001
International Journal on Artificial Intelligence Tools, Mar 1, 2001
Knowledge Discovery and Data Mining, Jul 31, 1994
Journal of Artificial Intelligence Research, 1994
The ability to identify interesting and repetitive substructures is an essential component to dis... more The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUB...
Proceedings of the 2006 SIAM International Conference on Data Mining, 2006
2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009
Biomedical Data and Applications, 2009
International Journal of Pattern Recognition and Artificial Intelligence, 2003
The World Wide Web provides an immense source of information. Accessing information of interest p... more The World Wide Web provides an immense source of information. Accessing information of interest presents a challenge to scientists and analysts, particularly if the desired information is structural in nature. Our goal is to design a structural search engine that uses the hyperlink structure of the Web, in addition to textual information, to search for sites of interest. Our structural search engine, called WebSUBDUE, searches not only for particular words or topics but also for a desired hyperlink structure. Enhanced by WordNet text functions, our search engine retrieves sites corresponding to structures formed by graph-based user queries. We hypothesize that this system can form the heart of a structural query engine, and demonstrate the approach on a number of structural web queries.
Machine Learning in Cyber Trust, 2009
IGARSS 2008 - 2008 IEEE International Geoscience and Remote Sensing Symposium, 2008
2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009
Advanced Information and Knowledge Processing
Journal of Parallel and Distributed Computing, 2001
International Journal on Artificial Intelligence Tools, 2001
Hierarchical conceptual clustering has proven to be a useful, although greatly under-explored dat... more Hierarchical conceptual clustering has proven to be a useful, although greatly under-explored data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides the advantages of both approaches. This work presents SUBDUE and the development of its clustering functionalities. Several examples are used to illustrate the validity of the approach both in structured and unstructured domains, as well as compare SUBDUE to earlier clustering algorithms. Results show that SUBDUE successfully discovers hierarchical clusterings in both structured and unstructured data.
International Journal on Artificial Intelligence Tools, 2008
We describe an algorithm and experiments for inference of edge replacement graph grammars. This m... more We describe an algorithm and experiments for inference of edge replacement graph grammars. This method generates candidate recursive graph grammar productions based on isomorphic subgraphs which overlap by two nodes. If there is no edge between the two overlapping nodes, the method generates a recursive graph grammar production with a virtual edge. We guide the search for the graph grammar based on the size of the grammar and the portion of the graph described by the grammar. We show experiments where we generate graphs from known graph grammars, use our method to infer the grammar from the generated graphs, and then measure the error between the original and inferred grammars. Experiments show that the method performs well on several types of grammars, and specifically that error decreases with increased numbers of unique labels in the graph.
International Journal on Artificial Intelligence Tools, 2004
We present an algorithm for the inference of context-free graph grammars from examples. The algor... more We present an algorithm for the inference of context-free graph grammars from examples. The algorithm builds on an earlier system for frequent substructure discovery, and is biased toward grammars that minimize description length. Grammar features include recursion, variables and relationships. We present an illustrative example, demonstrate the algorithm's ability to learn in the presence of noise, and show real-world examples.
International Journal on Artificial Intelligence Tools, 2005
Much of current data mining research is focused on discovering sets of attributes that discrimina... more Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the g...
IEEE Transactions on Knowledge and Data Engineering, 2007
IEEE Engineering in Medicine and Biology Magazine, 2001