Automatic Construction and Refinement of a Class Hierarchy over Semi-Structured Data (original) (raw)
In many applications, it becomes crucial to help users to access to a huge amount of data by clustering them in a small number of classes described at an appropriate level of abstraction. In this paper, we present an approach based on the use of two languages of description of classes for the automatic clustering of semistructured data. The rst language of classes has a high power of abstraction and guides the construction of a lattice of classes covering the whole set of the data. The second language of classes, more expressive and more precise, is the basis for the re nement of a part of the lattice that the user wants to focus on. Our approach has been implemented and experimented on real data in the setting of the GAEL project 1 which aims at building exible electronic catalogs organized as a hierarchy of classes of products. Our experiments have been conducted on real data coming from the C/Net (http://www.cnet.com) electronic catalog of computer products.
Sign up to get access to over 50M papers
Sign up for access to the world's latest research
Related papers
Building Classes in Object-Based Languages by Automatic Clustering
Lecture Notes in Computer Science, 1999
The paper deals with clustering of objects described both by properties and relations. Relational attributes may make object descriptions recursively depend on themselves so that attribute values cannot be compared before objects themselves are. An approach to clustering is presented whose core element is an object dissimilarity measure. All sorts of object attributes are compared in a uniform manner with possible exploration of the existing taxonomic knowledge. Dissimilarity values for mutually dependent object couples are computed as solutions of a system of linear equations. An example of building classes on objects with self-references demonstrates the advantages of the suggested approach.
Instances of instances modeled via higher-order classes
Foundational Aspects of Ontologies, 2005
In many languages used for expressing ontologies a strict division between classes and instances of classes is enforced. Other languages permit instances of instances without end. In both cases, ontological meaning is often mistakenly assumed from these purely syntactic features. A rigorous set of definitions for different levels of classes is presented to enable concepts to be unambiguously defined.
A Unified Framework for Class-Based Representation Formalisms
Proceedings of the Fourth …, 1994
The notion of class is ubiquitous in Computer Science and is central in many knowledge representation languages. In this paper we propose a representation formalism in the style of concept languages, with the aim of providing a uni ed framework for classbased formalisms. The language we consider is quite expressive and features a novel combination of constructs including number restrictions, inverse roles and inclusion assertions with no restrictions on cycles. We are able to show that such language is powerful enough to model frame systems, objectoriented database languages and semantic data models. As a consequence of the established correspondences, several signi cant extensions of each of the above formalisms become available. The high expressivity of the language and the need for capturing the reasoning in di erent contexts forces us to distinguish between unrestricted and nite model reasoning. A notable feature of our proposal is that reasoning in both cases is decidable. For the unrestricted case we exploit a correspondence with propositional dynamic logic and extend it to the treatment of number restrictions. For the nite model case we develop a new method based on the use of linear programming techniques. We argue that, by virtue of the high expressive power and of the associated reasoning techniques on both unrestricted and nite models, our language provides a uni ed framework for class-based representation formalisms.
Towards a theory of formal classification
2005
Classifications have been used for centuries with the goal of cataloguing and searching large sets of objects. In the early days it was mainly books; lately it has become Web pages, pictures and any kind of electronic information items. Classifications describe their contents using natural language labels, an approach which has proved very effective in manual classification. However natural language labels show their limitations when one tries to automate the process, as they make it almost impossible to reason about classifications and their contents. In this paper we introduce the novel notion of Formal Classification, as a graph structure where labels are written in a logical concept language. The main property of Formal Classifications is that each node can be associated a normal form formula which univocally describes its contents. This in turn allows us to reduce document classification and query answering to fully automatic propositional reasoning.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.