An Adaptive Privacy Preserving Framework for Distributed Association Rule Mining in Healthcare Databases (original) (raw)

Privacy Preserving Data Mining Framework for Negative Association Rules: An Application to Healthcare Informatics

IEEE Access

Protecting the privacy of healthcare information is an important part of encouraging data custodians to give accurate records so that mining may proceed with confidence. The application of association rule mining in healthcare data has been widespread to this point in time. Most applications focus on positive association rules, ignoring the negative consequences of particular diagnostic techniques. When it comes to bridging divergent diseases and drugs, negative association rules may give more helpful information than positive ones. This is especially true when it comes to physicians and social organizations (e.g., a certain symptom will not arise when certain symptoms exist). Data mining in healthcare must be done in a way that protects the identity of patients, especially when dealing with sensitive information. However, revealing this information puts it at risk of attack. Healthcare data privacy protection has lately been addressed by technologies that disrupt data (data sanitization) and reconstruct aggregate distributions in the interest of doing research in data mining. In this study, metaheuristic-based data sanitization for healthcare data mining is investigated in order to keep patient privacy protected. It is hoped that by using the Tabu-genetic algorithm as an optimization tool, the suggested technique chooses item sets to be sanitized (modified) from transactions that satisfy sensitive negative criteria with the goal of minimizing changes to the original database. Experiments with benchmark healthcare datasets show that the suggested privacy preserving data mining (PPDM) method outperforms existing algorithms in terms of Hiding Failure (HF), Artificial Rule Generation (AR), and Lost Rules (LR).

Privacy-preserving association rule mining for horizontally partitioned healthcare data: a case study on the heart diseases

Sādhanā

In recent years, a trend of electronic health record (EHR) system can be seen increasingly in the hospitals, which has generated huge amount of electronically stored data of patients. Association rule mining technique is very helpful in the numerous applications of healthcare (e.g., correlation between disease and symptoms, disease and offering effective treatment and predicting risks of disease based on the historical data, etc.). The data collected by an EHR system are very important for the medical research. Currently, a patient health report is derived on the basis of a physician's own experience and on the association rule mining results of a local EHR system maintained by a particular hospital. Association rule mining results will be more accurate if the data of all local EHR systems are integrated and association rule mining is performed. Integration of local EHR systems requires the sharing of local EHR data. Sharing of patient records violates the privacy of patients. Hence, medical research is focused on the problem of mining association rules without sharing of local private EHR data. Privacy-preserving distributed association rule mining (PPDARM) solves this issue by mining the association rules while preserving the privacy of patients. In this paper, an approach for the PPDARM is proposed for collaboratively performing association rule mining by all local EHR systems while preserving the privacy. The proposed approach is also analysed with the heart disease dataset.

Privacy preserving in association rules using a genetic algorithm

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2014

Association rule mining is one of the data mining techniques used to extract hidden knowledge from large datasets. This hidden knowledge contains useful and confidential information that users want to keep private from the public. Similarly, privacy preserving data mining techniques are used to preserve such confidential information or restrictive patterns from unauthorized access. The pattern can be represented in the form of a frequent itemset or association rule. Furthermore, a rule or pattern is marked as sensitive if its disclosure risk is above a given threshold. Numerous techniques have been used to hide sensitive association rules by performing some modifications in the original dataset. Due to these modifications, some nonrestrictive patterns may be lost, called lost rules, and new patterns are also generated, known as ghost rules. In the current research work, a genetic algorithm is used to counter the side effects of lost rules and ghost rules. Moreover, the technique can be applied for small as well as for large datasets in the domain of medical, military, and business datasets.

IJERT-Survey on Privacy Preserving Data Mining Techniques

International Journal of Engineering Research and Technology (IJERT), 2020

https://www.ijert.org/survey-on-privacy-preserving-data-mining-techniques https://www.ijert.org/research/survey-on-privacy-preserving-data-mining-techniques-IJERTV9IS060568.pdf Privacy preservation in Data Mining has become more prominent and popular because of its property of maintaining privacy of sensitive data for analysis purposes. In this decade, enormous volume of data is created by many sectors especially healthcare, and it is vital to analyze and extract the right information out of it. For instance, the integration of patient's medical records and health test data helps to identify the relation between atypical test result and disease. Incorporating association rule mining on this data aids in creating new information which contributes in disease prevention. During association rule mining procedure, it is crucial to maintain the privacy and security of data, the business's vital information should not be leaked. In this paper, we provide an effective solution of privacy preservation along with association rule mining. Our paper is focused on healthcare datasets; however, it can be extended and implemented in various areas

Algorithms for balancing privacy and knowledge discovery in association rule mining

2003

The discovery of association rules from large databases has proven beneficial for companies since such rules can be very effective in revealing actionable knowledge that leads to strategic decisions. In tandem with this benefit, association rule mining can also pose a threat to privacy protection. The main problem is that from non-sensitive information or unclassified data, one is able to infer sensitive information, including personal information, facts, or even patterns that are not supposed to be disclosed. This scenario reveals a pressing need for techniques that ensure privacy protection, while facilitating proper information accuracy and mining. In this paper, we introduce new algorithms for balancing privacy and knowledge discovery in association rule mining. We show that our algorithms require only two scans, regardless of the database size and the number of restrictive association rules that must be protected. Our performance study compares the effectiveness and scalability of the proposed algorithms and analyzes the fraction of association rules which are preserved after sanitizing a database. We also report the main results of our performance evaluation and discuss some open research issues.

Privacy preserving association rule mining over distributed databases using genetic algorithm

Neural Computing and Applications, 2013

Privacy preservation in distributed database is an active area of research. With the advancement of technology, massive amounts of data are continuously being collected and stored in distributed database applications. Indeed, temporal associations and correlations among items in large transactional datasets of distributed database can help in many business decision-making processes. One among them is mining frequent itemset and computing their association rules, which is a nontrivial issue. In a typical situation, multiple parties may wish to collaborate for extracting interesting global information such as frequent association, without revealing their respective data to each other. This may be particularly useful in applications such as retail market basket analysis, medical research, academic, etc. In the proposed work, we aim to find frequent items and to develop a global association rules model based on the genetic algorithm (GA). The GA is used due to its inherent features like robustness with respect to local maxima/minima and domain-independent nature for large space search technique to find exact or approximate solutions for optimization and search problems. For privacy preservation of the data, the concept of trusted third party with two offsets has been used. The data are first anonymized at local party end, and then, the aggregation and global association is done by the trusted third party. The proposed algorithms address various types of partitions such as horizontal, vertical, and arbitrary.

Preserving Privacy in Association Rule Mining Using Metaheuristic-Based Algorithms: A Systematic Literature Review

IEEE Access, 2024

The current state of Association Rule Mining (ARM) technology is heading towards a critical yet profitable direction. The ARM process uncovers numerous association rules, determining correlations between itemsets, forming building blocks that have led to revolutionary scientific discoveries. However, a high level of privacy is vital for protecting sensitive rules, raising privacy concerns. Researchers have recently highlighted challenges in the Privacy-Preserving Association Rule Mining (PPARM) field. Many studies have proposed workarounds for the PPARM dilemma by using metaheuristics. This paper conducts a systematic literature review on metaheuristic-based algorithms addressing PPARM challenges. It explores existing studies, providing insights into diverse metaheuristic approaches tackling PPARM problems. A detailed taxonomy is presented, offering a structured classification of metaheuristic-based algorithms specific to PPARM. This classification facilitates a nuanced understanding of the field by categorizing these algorithms into metaphor-based and non-metaphor-based groups, with a discussion of the nature of the representation schemes for each category identified in the survey. The review extends its analysis to encompass the latest applied approaches, highlighting the diversification of existing metaheuristic algorithms in the PPARM context. Moreover, common datasets and evaluation metrics identified from selected studies are documented to provide a deeper understanding of the methodological choices made by researchers in this domain. Finally, a discussion of existing challenges and potential future directions is presented. This review serves as a helpful guide that outlines previous research and presents potential future opportunities for metaheuristic-based algorithms in the context of PPARM.

A survey on privacy preserving association rule mining

By developing information technology and production methods and collecting data, a great amount of data is daily being collected in commercial, medical databases. Some of this information is important with respect to competition concept in organizations and individual misuses. Nowadays in order to mine knowledge among a great amount of data, data mining tools are used. In order to protect information, fast processing and preventing from revealing private data to keep privacy is presented in data mining. In this article, some techniques in preserving privacy of association rule mining are introduced and some hiding algorithms of association rules are evaluated.

Association Rule Hiding in Privacy Preserving Data Mining

International Journal of Information Security and Privacy, 2018

This article describes how privacy preserving data mining has become one of the most important and interesting research directions in data mining. With the help of data mining techniques, people can extract hidden information and discover patterns and relationships between the data items. In most of the situations, the extracted knowledge contains sensitive information about individuals and organizations. Moreover, this sensitive information can be misused for various purposes which violate the individual's privacy. Association rules frequently predetermine significant target marketing information about a business. Significant association rules provide knowledge to the data miner as they effectively summarize the data, while uncovering any hidden relations among items that hold in the data. Association rule hiding techniques are used for protecting the knowledge extracted by the sensitive association rules during the process of association rule mining. Association rule hiding re...

Privacy Preserving Mining Based Framework for Analyzing the Patient Behavior

2014

The data driven mining technology was applied in the most of the existing data mining algorithms. The limitations of this method are that expert analysis is required before the derived information can be used. To analyze data for predicting the patient’s behavior here we assume the strategy of domain driven data mining and utilize association’s rules, clustering and decision trees. The proposed system is named as the Combined Mining based patient Behavior Prediction System (CM-PBPS). In this paper, we are implementing domain –driven data mining strategy and exploit clustering, decision trees, and association rules to explore data of patient’s behavior. This system specifies the patients who are infected by which diseases. For the implementation, initially for analyze the patient behavior the associations rules are used. After that for patient segmentation the clustering algorithm is used. For reducing the data imbalanced we delete the clusters of the patient who are not infected by ...