XQuake as a Constraint-Based Mining Language (original) (raw)

OntoDM: An ontology of data mining

Data Mining Workshops, …, 2008

Motivated by the need for unification of the field of data mining and the growing demand for formalized representation of outcomes of research, we address the task of constructing an ontology of data mining. The proposed ontology, named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc. It also allows for the definition of more complex entities, e.g., constraints in constraint-based data mining, sets of such constraints (inductive queries) and data mining scenarios (sequences of inductive queries). Unlike most existing approaches to constructing ontologies of data mining, OntoDM is a deep/heavy-weight ontology and follows best practices in ontology engineering, such as not allowing multiple inheritance of classes, using a predefined set of relations and usinga top level ontology.

Discovering Knowledge using a Constraint-based Language

2011

Discovering pattern sets or global patterns is an attractive issue from the pattern mining community in order to provide useful information. By combining local patterns satisfying a joint meaning, this approach produces patterns of higher level and thus more useful for the data analyst than the usual local patterns, while reducing the number of patterns. In parallel, recent works investigating relationships between data mining and constraint programming (CP) show that the CP paradigm is a nice framework to model and mine such patterns in a declarative and generic way. We present a constraint-based language which enables us to define queries addressing patterns sets and global patterns. The usefulness of such a declarative approach is highlighted by several examples coming from the clustering based on associations. This language has been implemented in the CP framework.

Inductive Databases and Constraint-Based Data Mining

2010

, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

An XML-Based definition of a database for knowledge discovery

2003

The data mining results of the last years are mainly constituted by e cient techniques for extracting knowledge from large data sets and represented by patterns like association rules, classi ed data, clusters, etc... Each pattern takes a peculiar format. Inductive databases have been proposed in 9] as general purpose databases that solve the problems of integration between data and patterns. Unfortunately, the heterogeneity of the patterns and of the di erent conceptual tools used to extract them make di cult the integration in a unique framework. In this paper, we explore the feasibility of using XML as the unifying framework for inductive databases, and propose a new model, called XDM (XML for Data Mining). We will show the basic features of the model, such as the storage in the same database of both data and patterns. Determinant for the interpretation of patterns is the storage of pattern derivation process. This latter is described by the statements, based on data mining operators. Some of the statements are automatically generated by the system while maintaining consistence between source and derived data. Furthermore, we show how the use of XML namespaces allows the e ective coexistence of di erent data mining operators and provides extensibility to new operators. Finally, we show that with the use of XML-Schema it is possible to de ne the schema, the state and the integrity constraints of an inductive database.

Querying inductive databases: A case study on the MINE RULE operator

Lecture Notes in Computer Science, 1998

Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern, zooming on a subset of the data, and postprocessing some patterns. We describe a comprehensive framework in which all these steps can be carried out by means of queries over an inductive database. An inductive database is a database that in addition to data also contains intensionally defned generalizations about the data. We formalize this concept: an inductive database consists of a normal database together with a subset of patterns from a class of patterns, and an evaluation function that tells how the patterns occur in the data. Then, looking for potential query languages built on top of SQL, we consider the research on the MINE RULE operator by Meo, Psaila and Ceri. It is a serious step towards an implementation framework for inductive databases, though it addresses only the association rule mining problem. Perspectives are then discussed.

A Logic-Based Approach to Mining Inductive Databases

Lecture Notes in Computer Science, 2007

In this paper, we discuss the main problems of inductive query languages and optimisation issues. We present a logic-based inductive query language and illustrate the use of aggregates and exploit a new join operator to model specific data mining tasks. We show how a fixpoint operator works for association rule mining and a clustering method. A preliminary experimental result shows that fixpoint operator outperforms SQL and Apriori methods. The results of our framework could be useful for inductive query language design in the development of inductive database systems.

Database Mining through Inductive Logic Programming

2007

Rapid growth in the automation of business transactions has lead to an explosion in the size of databases. It has been realised for a long time that the data in these databases contains hidden information which needs to be extracted. Data mining is a step in this ...

Logical Languages for Data Mining

Logics for Emerging Applications of Databases, 2004

Data mining focuses on the development of methods and algorithms for such tasks as classification, clustering, rule induction, and discovery of associations. In the database field, the view of data mining as advanced querying has recently stimulated much research into the development of data mining query languages. In the field of machine learning, inductive logic programming has broadened its scope toward extending standard data mining tasks from the usual attribute-value setting to a multirelational setting. After a concise description of data mining, the contribution of logic to both fields is discussed. At the end, we indicate the potential use of logic for unifying different existing data mining formalisms.

DATA MINING IN RELATIONAL SYSTEMS

The subject of the research is methods of relational database mining. The purpose of the research is to develop scientifically grounded models for supporting intelligent technologies for integrating and managing information resources of distributed computing systems. Explore the features of the operational specification of the relational data model. To develop a method for evaluating a relational data model and a procedure for constructing functional associative rules when solving problems of mining relational databases. In accordance with the set research goal, the presented article considers the following tasks: analysis of existing methods and technologies for data mining. Research of methods for representing intelligent models by means of relational systems. Development of technology for evaluating the relational data model for building functional association rules in the tasks of mining relational databases. Development of design tools and maintenance of applied data mining tasks; development of applied problems of data mining. Results: The analysis of existing methods and technologies for data mining is carried out. The features of the structural specification of a relational database, the formation of association rules for building a decision support system are investigated. Information technology has been developed, a methodology for the design of information and analytical systems, based on the relational data model, for solving practical problems of mining, practical recommendations have been developed for the use of a relational data model for building functional association rules in problems of mining relational databases, conclusion: the main source of knowledge for database operation can be a relational database. In this regard, the study of data properties is an urgent task in the construction of systems of association rules. On the one hand, associative rules are close to logical models, which makes it possible to organize efficient inference procedures on them, and on the other hand, they more clearly reflect knowledge than classical models. They do not have the strict limitations typical of logical calculus, which makes it possible to change the interpretation of product elements. The search for association rules is far from a trivial task, as it might seem at first glance. One of the problems is the algorithmic complexity of finding frequently occurring itemsets, since as the number of items grows, the number of potential itemsets grows exponentially.