Survey on using constraints in data mining (original) (raw)

Use of Constraints in Pattern Mining: A Survey

—Constraint based pattern mining and association rules are used in many applications like genetic sequence analysis, in finance for bankrupting prediction, in securities for fraud detection, in agriculture for discovering classification of plants etc. to get the user interesting knowledge. Constraints are useful to eliminate unwanted rules and also solves rule explosion problem. Many algorithms are proposed for constraint based pattern mining and association rule generation. These constraints are in the form of attribute, item length, time or duration, regular expression etc. Pushing constraints in a mining process gives user interesting discovery. Literature survey shows that performance of an algorithm improves with application of constraint during the mining process. The paper elaborates about the literature survey on use of constraints in generation of association rules with different categories of constraints with its properties.

Discovering Knowledge using a Constraint-based Language

2011

Discovering pattern sets or global patterns is an attractive issue from the pattern mining community in order to provide useful information. By combining local patterns satisfying a joint meaning, this approach produces patterns of higher level and thus more useful for the data analyst than the usual local patterns, while reducing the number of patterns. In parallel, recent works investigating relationships between data mining and constraint programming (CP) show that the CP paradigm is a nice framework to model and mine such patterns in a declarative and generic way. We present a constraint-based language which enables us to define queries addressing patterns sets and global patterns. The usefulness of such a declarative approach is highlighted by several examples coming from the clustering based on associations. This language has been implemented in the CP framework.

Constrained pattern mining in the new era

Knowledge and Information Systems, 2015

Twenty years of research on frequent itemset mining, or pattern mining, has led to the existence of a set of efficient algorithms for identifying different types of patterns, from transactional to sequential. Despite the great advances in this field, big data brought a completely new context to operate, with new challenges arising from the growth in data size, dynamics and complexity. These challenges include the shift not only from static to dynamic data, but also from tabular to complex data sources, such as social networks (expressed as graphs) and data warehouses (expressed as multi-relational models). In this new context, and more than ever, users need effective ways to control the large number of discovered patterns, and to be able to choose what patterns to consider at each time. The most accepted and common approach to minimize these drawbacks has been to capture and represent the semantics of the domain through constraints, and use them not only to reduce the number of results, but also to focus the algorithms in areas where it is more likely to gain information and return more interesting results. The use of constraints in pattern mining has been widely studied, and there are a lot of proposed types of constraints and pushing strategies. In this paper, we present a new global view of the work done on the incorporation of constraints in the pattern mining process. In particular, we propose a new framework for constrained pattern mining, that allows us to organize and analyze existing algorithms and strategies, based on the different types and properties of constraints, and on the data sources they are able to handle. Keywords Data mining • Pattern mining • Domain knowledge • Constraints This work was partially supported by FCT-Fundação para a Ciência e a Tecnologia, under Project D2PM

Extending the Soft Constraint Based Mining Paradigm

Lecture Notes in Computer Science, 2007

The paradigm of pattern discovery based on constraints has been recognized as a core technique in inductive querying: constraints provide to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. So far the research on this paradigm has mainly focussed on the latter aspect: the development of efficient algorithms for the evaluation of constraint-based mining queries. Due to the lack of research on methodological issues, the constraint-based pattern mining framework still suffers from many problems which limit its practical relevance. In our previous work [5], we analyzed such limitations and showed how they flow out from the same source: the fact that in the classical constraint-based mining, a constraint is a rigid boolean function which returns either true or false. To overcome such limitations we introduced the new paradigm of pattern discovery based on Soft Constraints, and instantiated our idea to the fuzzy soft constraints. In this paper we extend the framework to deal with probabilistic and weighted soft constraints: we provide theoretical basis and detailed experimental analysis. We also discuss a straightforward solution to deal with top-k queries. Finally we show how the ideas presented in this paper have been implemented in a real Inductive Database system.

Soft constraint based pattern mining

Data & Knowledge Engineering, 2007

The paradigm of pattern discovery based on constraints was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. So far the research on this paradigm has mainly focused on the latter aspect: the development of efficient algorithms for the evaluation of constraint-based mining queries. Due to the lack of research on methodological issues, the constraintbased pattern mining framework still suffers from many problems which limit its practical relevance. In this paper we analyze such limitations and we show how they flow out from the same source: the fact that in the classical constraint-based mining, a constraint is a rigid boolean function which returns either true or false. Indeed, interestingness is not a dichotomy. Following this consideration, we introduce the new paradigm of pattern discovery based on Soft Constraints, where constraints are no longer rigid boolean functions.

Constraint based Data Mining Focusing Farming Case Study

International journal of computer applications, 2012

Data mining process may uncover thousands of patterns from a given data set; most of them may be unrelated to the users' interest. Also these rules occupy more memory space, take more time also require more efforts of the decision maker in analysis. To confine the search space users have the good sense of which direction of mining may lead to related or interested patterns they would like to find. Therefore, a good heuristic is to have the users specify such intuition or expectation as constraints to limit the search space. In this paper efforts are made to discover valuable patterns using the user input constraints.

Mining patterns using relaxations of user defined constraints

2004

The main drawbacks of sequential pattern mining have been its lack of focus on user expectations and the high number of discovered patterns. However, the solution commonly accepted-the use of constraints-approximates the mining process to a hypothesis-testing task. In this paper, we propose a new methodology to mine sequential patterns, keeping the focus on user expectations, without compromising the discovery of unknown patterns. Our methodology is based on the use of constraint relaxations, and it consists on using them to filter accepted patterns during the mining process. We propose a hierarchy of relaxations, applied to constraints expressed as context-free languages. An extended pushdown automaton (ePDA) is a pushdown automaton E=(Q, Σ, Γ, δ, q0, Z0, F), where δ is a mapping function from Q×P (Σ)∪{ε}×Γ* to finite subsets of Q×Λ*, with Λ equal to Γ* and P (Σ) representing the powerset of Σ

Mining with Constraints by Pruning and Avoiding Ineffectual Processing

Lecture Notes in Computer Science, 2005

It is known that algorithms for discovering association rules generate an overwhelming number of those rules. While many new very efficient algorithms were recently proposed to allow the mining of extremely large datasets, the problem due to the sheer number of rules discovered still remains. In this paper we propose a new way of pushing the constraints in dual-mode based from the set of maximal patterns that is an order of magnitude smaller than the set of all frequent patterns.

A Survey of Constrained Clustering

2016

Traditional data mining methods for clustering only use unlabeled data objects as input. The aim of such methods is to find a partition of these unlabeled data objects in order to discover the underlying structure of the data. In some cases, there may be some prior knowledge about the data in the form of (a few number of) labels or constraints. Performing traditional clustering methods by ignoring the prior knowledge may result in extracting irrelevant information for the user. Constrained clustering, i.e., clustering with side information or semi-supervised clustering, addresses this problem by incorporating prior knowledge into the clustering process to discover relevant information from the data. In this chapter, a survey of advances in the area of constrained clustering will be presented. Different types of prior knowledge considered in the literature, and clustering approaches that make use of this prior knowledge will be reviewed.

Interestingness is not a dichotomy: Introducing softness in constrained pattern mining

2005

The paradigm of pattern discovery based on constraints was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. So far the research on this paradigm has mainly focussed on the latter aspect: the development of efficient algorithms for the evaluation of constraint-based mining queries. Due to the lack of research on methodological issues, the constraint-based pattern mining framework still suffers from many problems which limit its practical relevance. As a solution, in this paper we introduce the new paradigm of pattern discovery based on Soft Constraints. Albeit simple, the proposed paradigm overcomes all the major methodological drawbacks of the classical constraint-based paradigm, representing an important step further towards practical pattern discovery.