Hiding association rules by using confidence and support (original) (raw)
Knowledge and …, 2004
Large repositories of data contain sensitive information that must be protected against unauthorized access. The protection of the confidentiality of this information has been a long-term goal for the database security research community and for the government statistical agencies. Recent advances in data mining and machine learning algorithms, have increased the disclosure risks that one may encounter when releasing data to outside parties.
Study of Hiding Sensitive Data in Data Mining Using Association Rules
This paper describes Apriori algorithm for association rules for hiding sensitive data in data mining if Large data contain sensitive information that data must be protected from the unauthorized users. Here, we are going to hide this sensitive information in data mining using association rules, when we are going to apply rules for data that time it will falsely hidden information and fake rules falsely generated. So here, we examine confidentiality issues of a broad category of rules, which are called association rules. If the disclosure risk of some of these rules are above a certain privacy threshold, those rules must be characterized as sensitive information in some cases sensitive rules should not be disclose to the public since, other things, they may be used for inference of sensitive data, or they may provided these sensitive data to business competitors with an advantages.
A survey on association rule hiding methods
Rapid growth of information technology has led to creation of huge volumes of data which will be useless if they are not efficiently analyzed. Therefore, various techniques have been provided for retrieving valuable information from huge amounts of data, one of the most common of which is mining association rules. As much as data mining can be important for extracting hidden knowledge from data, it can also reveal sensitive information, which has created some concerns for data owners. Thus, the issue of hiding sensitive knowledge and preserving privacy was raised in data mining. In this paper, different methods for preserving privacy was studied and by mentioning advantages and disadvantages of each method, a suitable platform was provided for researchers to be able to implement the best technique for sanitizing the considered database.
Hiding informative association rule sets
Expert Systems with Applications, 2007
Privacy-preserving data mining, is a novel research direction in data mining and statistical databases, where data mining algorithms are analyzed for the side effects they incur in data privacy [
Association Rules Hiding for Privacy Preserving Data Mining: A Survey
International Journal of Computer Applications, 2016
(PPDM) privacy preserving data mining is recent advanced research in (DM) data mining field; Many efficient and practical techniques have been proposed for hiding sensitive patterns or information from been discovered by (DM) data mining algorithms. (ARM) Association rule mining is the most important tool in (DM) data mining, that is considered a powerful and interested tool for discovering relationships between items, which are hidden in large database and may provide business competitors with an advantage, thus the hiding of association rules is the most important point in (PPDM) privacy preserving data mining for protecting sensitive and crucial data against unauthorized access; Many Practical techniques and approaches have been proposed for hiding association rules for (PPDM) privacy preserving data mining; In this paper the current existing techniques and algorithms for all approaches for (ARH) association rule hiding have been summarized.
Survey on Association Rule Hiding Techniques
International Journal of Scientific Research in Science, Engineering and Technology, 2019
Data mining process extracts useful information from a large amount of data. The most interesting part of data mining is discovering the unseen patterns without unpacking sensitive knowledge. Privacy Preserving Data Mining abbreviated as PPDM deals with the issue of sustaining the privacy of information. This methodology covers the sensitive information from disclosure. PPDM techniques are established for hiding the sensitive information even after performing the data mining. One of the practices to hide the sensitive association rules is termed as association rule hiding. The main objective of association rule hiding algorithm is to slightly adjust the original database so that no sensitive association rule is derived from it. The following article presents a detailed survey of various association rule hiding techniques for preserving privacy in data mining. At first, different techniques developed by previous researchers are studied in detail. Then, a comparative analysis is carried out to know the limitations of each technique and then providing a suggestion for future improvement in association rule hiding for privacy preservation.
Association Rule Hiding by Positions Swapping of Support and Confidence
Many strategies had been proposed in the literature to hide the information containing sensitive items. Some use distributed databases over several sites, some use data perturbation, some use clustering and some use data distortion technique. Present paper focuses on data distortion technique. Algorithms based on this technique either hide a specific rule using data alteration technique or hide the rules depending on the sensitivity of the items to be hidden. The proposed approach is based on data distortion technique where the position of the sensitive items is altered but its support is never changed. The proposed approach uses the idea of representative rules to prune the rules first and then hides the sensitive rules. Experimental results show that proposed approach hides the more number of rules in minimum number of database scans compared to existing algorithms based on the same approach i.e. data distortion technique. Index Terms-Support, Confidence, Representative Rule Sanitization, Distortion
A heuristic algorithm for quick hiding of association rules
Increasing use of data mining process and extracting of association rules caused the introduction of privacy preserving in data mining. A complete publication of the database is inconsistent with security policies and it would result in disclosure of some sensitive data after performing data mining. Individuals and organizations should secure the database before the publication, because if they neglect this issue they will be harmed. The owners of database consider factors such as database size, precision in immunization and velocity in choosing the right approach in order to hide the association rules. Besides the large volume of data and precision in immunization, we should optimize the time of operation and this is one of the issues that has received a little attention. In this paper, FHA algorithm is introduced for hiding sensitive patterns. In this algorithm, it is being tried to reduce the overload of ordering transactions by decreasing database scans. Also, we have reduced the side effects by selecting the appropriate item for performing the modifications. Conducted experiments indicate the execution of this algorithm in appropriate hiding of sensitive association rules.
Hiding Sensitive Association Rules with Limited Side Effects
IEEE Transactions on Knowledge and Data Engineering, 2007
Data mining techniques have been widely used in various applications. However, the misuse of these techniques may lead to the disclosure of sensitive information. Researchers have recently made efforts at hiding sensitive association rules. Nevertheless, undesired side effects, e.g., nonsensitive rules falsely hidden and spurious rules falsely generated, may be produced in the rule hiding process. In this paper, we present a novel approach that strategically modifies a few transactions in the transaction database to decrease the supports or confidences of sensitive rules without producing the side effects. Since the correlation among rules can make it impossible to achieve this goal, in this paper, we propose heuristic methods for increasing the number of hidden sensitive rules and reducing the number of modified entries. The experimental results show the effectiveness of our approach, i.e., undesired side effects are avoided in the rule hiding process. The results also report that in most cases, all the sensitive rules are hidden without spurious rules falsely generated. Moreover, the good scalability of our approach in terms of database size and the influence of the correlation among rules on rule hiding are observed.