Expressive power of an algebra for data mining (original) (raw)

Towards data mining operators in database systems: Algebra and implementation

2004

The KDD process is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. This process comprises several steps which are invoked and parametrized in an interactive and iterative manner. A uniform framework for different kinds of patterns and operators is needed to support KDD efficiently and in an integrated way. Furthermore, because of large data sets it is necessary to scale up mining algorithms in order to achieve fast user support. One task of scaling data mining algorithms is the integration of KDD operators in database management systems. Two aspects of supporting KDD are addressed in this paper. First, a uniform framework is proposed that is based on constraint database concepts as well as interestingness values of patterns. Different operators are defined uniformly in this model. Second, DBMS-coupled implementations of selected operators for decision tree mining are discussed.

DATA MINING IN RELATIONAL SYSTEMS

The subject of the research is methods of relational database mining. The purpose of the research is to develop scientifically grounded models for supporting intelligent technologies for integrating and managing information resources of distributed computing systems. Explore the features of the operational specification of the relational data model. To develop a method for evaluating a relational data model and a procedure for constructing functional associative rules when solving problems of mining relational databases. In accordance with the set research goal, the presented article considers the following tasks: analysis of existing methods and technologies for data mining. Research of methods for representing intelligent models by means of relational systems. Development of technology for evaluating the relational data model for building functional association rules in the tasks of mining relational databases. Development of design tools and maintenance of applied data mining tasks; development of applied problems of data mining. Results: The analysis of existing methods and technologies for data mining is carried out. The features of the structural specification of a relational database, the formation of association rules for building a decision support system are investigated. Information technology has been developed, a methodology for the design of information and analytical systems, based on the relational data model, for solving practical problems of mining, practical recommendations have been developed for the use of a relational data model for building functional association rules in problems of mining relational databases, conclusion: the main source of knowledge for database operation can be a relational database. In this regard, the study of data properties is an urgent task in the construction of systems of association rules. On the one hand, associative rules are close to logical models, which makes it possible to organize efficient inference procedures on them, and on the other hand, they more clearly reflect knowledge than classical models. They do not have the strict limitations typical of logical calculus, which makes it possible to change the interpretation of product elements. The search for association rules is far from a trivial task, as it might seem at first glance. One of the problems is the algorithmic complexity of finding frequently occurring itemsets, since as the number of items grows, the number of potential itemsets grows exponentially.

Data mining - a semantic model

2002

Data mining techniques applied to decision support in real-life problems require a multi-step process. Inputs and outputs of these steps require some standard format to be followed in order to achieve a useful platform for the execution of data mining algorithms. There is a need to develop a uniform model where every operation can be expressed in a standard way, allowing algorithms to cooperate and to reuse results. We present, first, a common structure for the representation of inter-step results, and second, a model of the operator, i.e. the entity that handles and transforms this common structure according to a basic data mining algorithm

Towards a Unified Theoretical Model of Database Mining

Since early 1990s, Data Base Mining has grown tremendously as a field of research. It has broadened its horizon from merely mining of market basket data to nearly all perceivable domains where decision making is dependent or has the potential of dependence on the patterns formed within the large databases. Interestingly, data accumulation in the databases is governed by the architectures of the various DBMS and tools developed and implemented under these models. The situation has become even more diverse with the rapid proliferation of the Internet and its distributed repositories along with emergence of various other data models like Object Relational, Active, Deductive, and Temporal etc. As a result, Data Base Mining has also become a hugely diverse sphere of activity which in turn has made the various Data Mining tasks either too specific to a particular model of DBMS or too abstract to find a direct relationship with a particular DBMS or its instance. While data will continue to grow under any DBMS model but the task of Data Mining shall be remaining similar irrespective of the model based on which the repositories are made. That is various Data Mining tasks are general in nature and applicable on any data repositories under any DBMS models. But unfortunately various algorithms for the above Data Mining tasks are not generic enough to be applicable on all kinds of data bases. In this paper the issues related to these problems are closely examined and analysed whether any foundation of a unified theoretical model is possible for the Data Mining tasks so that it is uniformly applicable to any underlying data model.

Logical Languages for Data Mining

Logics for Emerging Applications of Databases, 2004

Data mining focuses on the development of methods and algorithms for such tasks as classification, clustering, rule induction, and discovery of associations. In the database field, the view of data mining as advanced querying has recently stimulated much research into the development of data mining query languages. In the field of machine learning, inductive logic programming has broadened its scope toward extending standard data mining tasks from the usual attribute-value setting to a multirelational setting. After a concise description of data mining, the contribution of logic to both fields is discussed. At the end, we indicate the potential use of logic for unifying different existing data mining formalisms.

9 Logical Languages for Data Mining

Logics for emerging …, 2004

Querying inductive databases: A case study on the MINE RULE operator

Lecture Notes in Computer Science, 1998

Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern, zooming on a subset of the data, and postprocessing some patterns. We describe a comprehensive framework in which all these steps can be carried out by means of queries over an inductive database. An inductive database is a database that in addition to data also contains intensionally defned generalizations about the data. We formalize this concept: an inductive database consists of a normal database together with a subset of patterns from a class of patterns, and an evaluation function that tells how the patterns occur in the data. Then, looking for potential query languages built on top of SQL, we consider the research on the MINE RULE operator by Meo, Psaila and Ceri. It is a serious step towards an implementation framework for inductive databases, though it addresses only the association rule mining problem. Perspectives are then discussed.

Data Mining Query Languages

Data Mining and Knowledge Discovery, 2005

Many data mining algorithms enable to extract different types of patterns from data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and data. The inductive database approach has emerged as an unifying framework for such systems. Following this database perspective, knowledge discovery processes become querying processes for which query languages have to be designed. In the prolific field of association rule mining, different proposals of query languages have been made to support the more or less declarative specification of both data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

The GUHA Method and Foundations of (Relational) Data Mining

Lecture Notes in Computer Science, 2003

In this paper we present two systems for dealing with relations, the RelView and the Rath system. After a short introduction to both systems we exhibit their usual domain of application by presenting some typical examples. Cooperation for this paper was supported by European COST Action 274 "Theory and Applications of Relational Structures as Knowledge Instruments" (TARSKI).

Towards an Algebraic Framework for Querying Inductive Databases

Lecture Notes in Computer Science, 2010

In this paper, we present a theoretical foundation for querying inductive databases, which can accommodate disparate mining tasks. We present a data mining algebra including some essential operations for manipulating data and patterns and illustrate the use of a fix-point operator in a logic-based mining language. We show that the mining algebra has equivalent expressive power as the logic-based paradigm with a fixpoint operator.

Expressive power of an algebra for data mining (original) (raw)

Related papers