Graph-based hierarchical conceptual clustering (original) (raw)
Related papers
Conceptual Clustering in Structured Databases: A Practical Approach
1995
Many machine-learning (either supervised or unsupervised) techniques assume that data present themselves in an attribute-value form. But this formalism is largely insufficient to account for many applications. Therefore, much of the ongoing research now focuses on first-order learning systems. But complex formalisms lead to high computational complexities. On the other hand, most of the currently installed databases have been designed according to a formalism known as entity-relationship, and usually implemented on a relational database management system. This formalism is far less complex than first-order logic, but much more expressive than attribute-value lists. In that context, the database schema defines an abstraction space, and learning must occur at each level of abstraction. This paper describes a clustering system able to discover useful groupings in structured databases. It is based in the COBWEB algorithm, to which it adds the ability to cluster structured objects.
Automatic structuring of knowledge bases by conceptual clustering
IEEE Transactions on Knowledge and Data Engineering, 1995
An important structuring mechanism for knowledge bases is building an inheritance hierarchy of classes based on the content of their knowledge objects. This hierarchy facilitates group-related processing tasks such as answering set queries, discriminating between objects, finding similarities among objects, etc. Building this hierarchy is a difficult task for the knowledge engineer. Conceptual clustering may be used to automate or assist the engineer in the creation of such a classification structure. This article introduces a new conceptual clustering method which addresses the problem of clustering large amounts of structured objects. The conditions under which the method is applicable are discussed.
IEEE Intelligent Systems, 2000
at Arlington THE LARGE AMOUNT OF DATA collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns in it. In response to this problem, researchers have developed techniques and systems for discovering concepts in databases. 1-3 Much of the collected data, however, has an explicit or implicit structural component (spatial or temporal), which few discovery systems are designed to handle. 4 So, in addition to the need to accelerate data mining of large databases, there is an urgent need to develop scalable tools for discovering concepts in structural databases. One method for discovering knowledge in structural data is the identification of common substructures within the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. The discovered substructure concepts allow abstraction from the detailed data structure and provide relevant attributes for interpreting the data. The substructure discovery method is the basis of Subdue, which performs data mining on databases represented as graphs. The system performs two key data-mining techniques: unsupervised pattern discovery and supervised concept learning from examples. Our test applications have demonstrated the scalability and effectiveness of these techniques on a variety of structural databases.
DynamicWEB : a conceptual clustering algorithm for a changing world
2011
This research was motivated by problems in network security, where an attacker often deliberately changes their identifying information and behaviour in order to camouflage their malicious behaviour. Addressing this problem has resulted in a new adaption to the unsupervised machine learning technique COBWEB. In machine learning and data mining the aim is to extract patterns from data in order to discover a meaning underlying the processes that are taking place. In most cases, each object is observed once, and then the patterns that have been extracted can be used to classify newly-observed objects. Conceptual clustering aims to do this in such a way that the patterns that are learned are human readable. Concept drift algorithms allow concepts to change over time, although most undertake this in a supervised manner, which presents a challenge when looking for novel classes. This research focuses on the classification of objects that change over time across multiple observations. The ...
Knowledge discovery from structural data
Journal of Intelligent Information Systems, 1995
Discovering repetitive substructure in a structural database improves the ability to interpret and compress the data. This paper describes the Subdue system that uses domain-independent and domain-dependent heuristics to nd interesting and repetitive structures in structural data. This substructure discovery technique can be used to discover fuzzy concepts, compress the data description, and formulate hierarchical substructure de nitions. Examples from the domains of scene analysis, chemical compound analysis, computer-aided design, and program analysis demonstrate the bene ts of the discovery technique.
Substructure discovery in the subdue system
1994
Because many databases contain or can be embellished with structural information, a method for identifying interesting and repetitive substructures is an essential component to discovering knowledge in such databases. This paper describes the SUBDUE system, which uses the minimum description length (MDL) principle to discover substructures that compress the database and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. Inclusion of background knowledgeguides SUBDUE toward appropriate substructures for a particular domain or discovery goal, and the use of an inexact graph match allows a controlled amount of deviations in the instance of a substructure concept. We describe the application of SUBDUE to a variety of domains. We also discuss approaches to combining SUBDUE with non-structural discovery systems.
Ontology discovery for the semantic web using hierarchical clustering
2002
According to a proposal by Tim Berners-Lee, the World Wide Web should be extended to make a Semantic Web where human understandable content is structured in such a way as to make it machine processable. Central to this conception is the establishment of shared ontologies, which specify the fundamental objects and relations important to particular online communities. Normally, such ontologies are hand crafted by domain experts.