Reducing Attributes in Rough Set Theory with the Viewpoint of Mining Frequent Patterns (original) (raw)
Related papers
Frequent Itemset Mining using Rough-Sets
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.
An Innovative Approach for Attribute Reduction in Rough Set Theory
Intelligent Information Management, 2014
The Rough Sets Theory is used in data mining with emphasis on the treatment of uncertain or vague information. In the case of classification, this theory implicitly calculates reducts of the full set of attributes, eliminating those that are redundant or meaningless. Such reducts may even serve as input to other classifiers other than Rough Sets. The typical high dimensionality of current databases precludes the use of greedy methods to find optimal or suboptimal reducts in the search space and requires the use of stochastic methods. In this context, the calculation of reducts is typically performed by a genetic algorithm, but other metaheuristics have been proposed with better performance. This work proposes the innovative use of two known metaheuristics for this calculation, the Variable Neighborhood Search, the Variable Neighborhood Descent, besides a third heuristic called Decrescent Cardinality Search. The last one is a new heuristic specifically proposed for reduct calculation. Considering some databases commonly found in the literature of the area, the reducts that have been obtained present lower cardinality, i.e., a lower number of attributes.
A Novel Algorithm For Attribute Reduction in Rough Sets Based on Relation-Matrix
This paper has established correlation between information systems and relation matrix, in which a heuristic information was constructed from the viewpoint of relation matrix, objectively depicting the degree of importance of attributes. On this basis, an efficient information system of attribute reduction algorithm (ARFA) was proposed. Compared with the existing reduction algorithms, it has greater flexibility that can remove unimportant step by step, avoiding repeatedly calculation to its importance, and improve the search efficiency. The example analysis and experimental results showed that the reduction algorithm is both feasible and effective.
Application of Rough Set Theory in Data Mining
Rough set theory is a new method that deals with vagueness and uncertainty emphasized in decision making. Data mining is a discipline that has an important contribution to data analysis, discovery of new meaningful knowledge, and autonomous decision making. The rough set theory offers a viable approach for decision rule extraction from data.This paper, introduces the fundamental concepts of rough set theory and other aspects of data mining, a discussion of data representation with rough set theory including pairs of attribute-value blocks, information tables reducts, indiscernibility relation and decision tables. Additionally, the rough set approach to lower and upper approximations and certain possible rule sets concepts are introduced. Finally, some description about applications of the data mining system with rough set theory is included.
Optimistic Rough Sets Attribute Reduction using Dynamic Programming
ijcset.com
Nowadays, and with the current progress in technologies and business sales, databases with large amount of data exist especially in Retail Companies. The main objective of this study is to reduce the complexity of the classification problems while maintaining the prediction classification quality. We propose to apply the promising technique Rough set theory which is a new mathematical approach to data analysis based on classification of objects of interest into similarity classes, which are indiscernible with respect to some features. Since some features are of high interest, this leads to the fundamental concept of "Attribute Reduction". The goal of Rough set is to enumerate good attribute subsets that have high dependence, discriminating index and significance. The naïve way of is to generate all possible subsets of attribute but in high dimension cases, this approach is very inefficient while it will require 1 2 d iterations. Therefore, we propose the Dynamic programming technique in order to enumerate dynamically the optimal subsets of the reduced attributes of high interest by reducing the degree of complexity. Implementation has been developed, applied, and tested over a 3 years historical business data in Retail Business (RB). Simulations and visual analysis are shown and discussed in order to validate the accuracy of the proposed tool.
On applications of rough sets theory to knowledge discovery
2008
Knowledge Discovery in Databases (KDD) is the nontrivial extraction of implicit, previously unknown and potentially useful information from data. Data preprocessing is a step of the KDD process that reduces the complexity of the data and offers better conditions to subsequent analysis. Rough sets theory, where sets are approximated using elementary sets, is another approach for developing methods for the KDD process. In this doctoral Thesis, we propose new algorithms based on Rough sets theory for three data preprocessing steps: Discretization, feature selection, and instance selection. In Discretization, continuous features are transformed into new categorical features. This is required for some KDD algorithms working strictly with categorical features. In Feature selection, the new subset of features leads to a new dataset of lower dimension, where it is easier to perform a KDD task. When a dataset is very large, an instance selection process is required to decrease the computatio...
Data Mining on Information System Using Fuzzy Rough Set Theory
2020
Today, thanks to the strong development of applications of information technology and Internet in many fields, a huge of database has been created. The number of records and the size of each record collected very quickly make it difficult to store and process information. Exploiting information sources from large databases effectively is an urgent issue and plays an important role in solving practical problems. In addition to traditional exploiting information methods, researchers have developed attribute reduction methods to reduce the size of the data space and eliminate irrelevant attributes. Our attribute reduction is based on the dependence between attributes in traditional rough set theory and in fuzzy rough set. The author built the tool which is inclusion degree and tolerance-based contingency table to solve the problem of finding the approximation set on set-valued information systems.
Tabu search for attribute reduction in rough set theory
Soft Computing-A Fusion of …, 2008
Attribute reduction of an information system is a key problem in rough set theory and its applications. Using computational intelligence (CI) tools to solve such problems has recently fascinated many researchers. CI tools are practical and robust for many real-world problems, and they are rapidly developed nowadays. However, some classes of CI tools, like memory-based heuristics, have not been involved in solving information systems and data mining applications like other well-known CI tools of evolutionary computing and neural networks. In this paper, we consider a memory-based heuristic of tabu search to solve the attribute reduction problem in rough set theory. The proposed method, called tabu search attribute reduction (TSAR), shows promising and competitive performance compared with some other CI tools in terms of solution qualities. Moreover, TSAR shows a superior performance in saving the computational costs.
RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY
Attribute reduction is the process of selecting a minimal attribute subset from a problem domain while retaining a suitably high accuracy in representing the original attributes. In this work, we propose a new attribute reduction algorithm called record-to-record travel (RRT) algorithm and employ a rough set theory as a mathematical tool to evaluate the quality of the obtained solutions. RRT is an optimization algorithm that is inspired from simulated annealing, which depends on a single parameter called DEVIATION. Experimental results on 13 well known UCI datasets show that the proposed method, coded as RRTAR, is comparable with other rough set-based attribute reduction methods available in the literature.
Rough Sets and Association Rule Generation
Fundamenta Informaticae, 1999
ASSOCIATION RULE (see 1]) extraction methods have been developed as the main methods for mining of real life data, in particular in Basket Data Analysis. In this paper we present a n o vel approach to generation of association rule, based on Rough Set and Boolean reasoning methods. We s h o w the relationship between the problems of association rule extraction for transaction data and relative reducts (or -reducts generation) for a decision table. Moreover, the present approach can be used to extract association rules in general form. The experimental results show that the presented methods are quite e cient. Large number of association rules with given support and con dence can be extracted in a short time.