Constraint-Based Mining and Inductive Databases (original) (raw)

Towards an Algebraic Framework for Querying Inductive Databases

Lecture Notes in Computer Science, 2010

In this paper, we present a theoretical foundation for querying inductive databases, which can accommodate disparate mining tasks. We present a data mining algebra including some essential operations for manipulating data and patterns and illustrate the use of a fix-point operator in a logic-based mining language. We show that the mining algebra has equivalent expressive power as the logic-based paradigm with a fixpoint operator.

Querying inductive databases: A case study on the MINE RULE operator

Lecture Notes in Computer Science, 1998

Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern, zooming on a subset of the data, and postprocessing some patterns. We describe a comprehensive framework in which all these steps can be carried out by means of queries over an inductive database. An inductive database is a database that in addition to data also contains intensionally defned generalizations about the data. We formalize this concept: an inductive database consists of a normal database together with a subset of patterns from a class of patterns, and an evaluation function that tells how the patterns occur in the data. Then, looking for potential query languages built on top of SQL, we consider the research on the MINE RULE operator by Meo, Psaila and Ceri. It is a serious step towards an implementation framework for inductive databases, though it addresses only the association rule mining problem. Perspectives are then discussed.

A Logic-Based Approach to Mining Inductive Databases

Lecture Notes in Computer Science, 2007

In this paper, we discuss the main problems of inductive query languages and optimisation issues. We present a logic-based inductive query language and illustrate the use of aggregates and exploit a new join operator to model specific data mining tasks. We show how a fixpoint operator works for association rule mining and a clustering method. A preliminary experimental result shows that fixpoint operator outperforms SQL and Apriori methods. The results of our framework could be useful for inductive query language design in the development of inductive database systems.

Inductive Databases in the Relational Model: The Data as the Bridge

Lecture Notes in Computer Science, 2006

We present a new and comprehensive approach to inductive databases in the relational model. The main contribution is a new inductive query language extending SQL, with the goal of supporting the whole knowledge discovery process, from pre-processing via data mining to post-processing. A prototype system supporting the query language was developed in the SINDBAD (structured inductive database development) project. Setting aside models and focusing on distance-based and instance-based methods, closure can easily be achieved. An example scenario from the area of gene expression data analysis demonstrates the power and simplicity of the concept. We hope that this preliminary work will help to bring the fundamental issues, such as the integration of various pattern domains and data mining techniques, to the attention of the inductive database community.

Data Mining Query Languages

Data Mining and Knowledge Discovery, 2005

Many data mining algorithms enable to extract different types of patterns from data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and data. The inductive database approach has emerged as an unifying framework for such systems. Following this database perspective, knowledge discovery processes become querying processes for which query languages have to be designed. In the prolific field of association rule mining, different proposals of query languages have been made to support the more or less declarative specification of both data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

An inductive database and query language in the relational model

Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008

In the demonstration, we will present the concepts and an implementation of an inductive database -as proposed by Imielinski and Mannila -in the relational model. The goal is to support all steps of the knowledge discovery process, from pre-processing via data mining to post-processing, on the basis of queries to a database system. The query language SIQL (structured inductive query language), an SQL extension, offers query primitives for feature selection, discretization, pattern mining, clustering, instance-based learning and rule induction. A prototype system processing such queries was implemented as part of the SINDBAD (structured inductive database development) project. Key concepts of this system, among others, are the closure of operators and distances between objects. To support the analysis of multi-relational data, we incorporated multi-relational distance measures based on set distances and recursive descent. The inclusion of rule-based classification models made it necessary to extend the data model and the software architecture significantly. The prototype is applied to three different applications: gene expression analysis, gene regulation prediction and structure-activity relationships (SARs) of small molecules.

A comparison between query languages for the extraction of association rules

2002

Recently inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. With an IDB the user/analyst performs a set of very different operations on data using a special-purpose language, powerful enough to perform all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing.

A constraint-based querying system for exploratory pattern discovery

Information Systems, 2009

In this article we present ConQueSt, a constraint-based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint-based query language, which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. ConQueSt is a comprehensive mining system that can access real-world relational databases from which to extract data. Through the interaction with a friendly graphical user interface (GUI), the user can define complex mining queries by means of few clicks. After a pre-processing step, mining queries are answered by an efficient and robust pattern mining engine which entails the state-of-the-art of data and search space reduction techniques. Resulting patterns are then presented to the user in a pattern browsing window, and possibly stored back in the underlying database as relations.