Identification of Outliers in Medical Diagnostic System Using Data Mining Techniques (original) (raw)
Related papers
Detection of Outliers based on clusters is an important task on the field of data mining research. In this proposed work, feature selection method used to reduce an irrelevant data points and eliminating redundancy of data instances before clustering .After clustering data elements, outliers are identified and discarded based on threshold value. Genetic algorithm (GA) is used to extract the large amount of data sets into relevant attribute and finding the optimal set of parameters for clustering process. Features selected from biomedical data can be more essential in disease diagnostics and there are number of features that can be tested. The research work proposed both Euclidean and mahalonobis distance for identifying outliers. And outlier rejection is required during clustering process is absolutely necessary for avoiding life losses and improving efficiency of diagnostic works. Pima Indian diabetes data sets are taken from UCI machine learning repository. Various Experiments are conducted and compared in proposed method for selecting the relevant subsets, clustering and Outliers removal with less computational time.
OUTLIER MINING IN MEDICAL DATABASES BY USING STATISTICAL METHODS
International Journal of Engineering Science, 2012
Outlier detection in the medical and public health domains typically works with patient records and is a very critical problem. This paper elaborates how the outliers can be detected by using statistical methods. A total of 78, 67, 82, 78 and 69 outliers in five medical datasets are detected for the statistics namely leverage, R-standard, R-student, DFFITS, Cook's D and covariance ratio. The results of the present investigation suggest that (i) the extraordinary behavior of outliers facilitates the exploration of the valuable knowledge hidden in their domain and help the decision makers to provide improved, reliable and efficient healthcare services (ii) medical doctors can use the present experimental results as a tool to make sensible predictions of the vast medical databases and finally (iii)a thorough understanding of the complex relationships that appear with regard to patient symptoms, diagnoses and behavior is the most promising area of outlier mining.
Improving Mining of Medical Data by Outliers Prediction
18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05), 2005
In the paper a new outlier prediction method is presented that should improve the classification performance when mining the medical data. The method introduces the class confusion score metric that is based on the classification results of a set of classifiers, induced by an evolutionary decision tree induction algorithm. The classification improvement should be achieved by removing the identified outliers from a training set. Our proposition is that a classifier trained by a filtered dataset captures a better, more general knowledge model and should therefore perform better also on unseen cases. The proposed method is applied on the two cardio-vascular datasets and the obtained results are discussed.
2016
Data Mining simply refers to the mining of very interesting patterns of the data from the massive data sets. Outlier detection is one of the important characteristics of data mining. It is a task that finds objects that are considerably dissimilar, incomparable or inconsistent with respect to the remaining data. Outlier detection has wide applications which include data analysis, network intrusion detection, financial fraud detection, and clinical diagnosis of diseases. This paper proposes three outlier detection models such as OFWDT (Outlier Finding with Decision Tree), OFWNB (Outlier Finding with Naïve Bayes) and OFWQR (Outlier Finding With Quartile Range) with three different applications. OFWDT model has three steps of a process. In the first step, groups the data in to number of clusters using Farthest First clustering algorithm. Due to minimize the size of dataset, the computation time reduced greatly.In the second step, outliers are detected from wisconsin breast cancer datas...
Machine learning based outlier detection for medical data
Indonesian Journal of Electrical Engineering and Computer Science
The concept of machine learning generate best results in health care data, it also reduce the work load of health care industry. This algorithm potentially overcome the issues and find out the novel knowledge for development of medical date in health care industry. In this paper propose a new algorithm for finding the outliers using different datasets. Considering that medical data are analytic of mutually health problems and an activity. The proposed algorithm is working based on supervised and unsupervised learning. This algorithm detects the outliers in medical data. The effectiveness of local and global data factor for outlier detection for medical data in real time. Whatever, the model used in this scenario from their training and testing of medical data. The cleaning process based on the complete attributes of dataset of similarity operations. Experiments are conducted in built in various medical datasets. The statistical outcome describe that the machine learning based outlie...
Application of Data Mining Techniques For Diabetic DataSet
Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for fast and better clinical decision making, and also to curb the occurrence of particular disease by physicians. However, the available raw medical data are widely distributed, heterogeneous in nature and voluminous. Data mining and Statistics both strive towards discovering hidden patterns and structures in data. Statistics deals with heterogeneous numbers only, whereas data mining deals with heterogeneous fields. We have identified one area of healthcare where data mining techniques can be applied for knowledge discovery. In this paper, the impact of two Data Mining techniques(FP-Growth and Apriori) on a known diabetic dataset has been examined. Also rules generated by the FP-Growth approach are being matched and co-related with those being generated by Apriori algorithm. KEYWORDS Data mining, Knowledge discovery in database(KDD), Assoc...
A Systematic Review of Outliers Detection Techniques in Medical Data-Preliminary Study
Background: Patient medical records contain many entries relating to patient conditions, treatments and lab results. Generally involve multiple types of data and produces a large amount of information. These databases can provide important information for clinical decision and to support the management of the hospital. Medical databases have some specificities not often found in others non-medical databases. In this context, outlier detection techniques can be used to detect abnormal patterns in health records (for instance, problems in data quality) and this contributing to better data and better knowledge in the process of decision making. Aim: This systematic review intention to provide a better comprehension about the techniques used to detect outliers in healthcare data, for creates automatisms for those methods in the order to facilitate the access to information with quality in healthcare.
A Heuristic Approach for Observing Outlying Points in Diabetes Data Set
— Data Mining is the process of analyzing large amount of data and useful for knowledge discovery. Detection of outliers is critically essential in the knowledge based society. Focusing on outlier detection in offline data stream has been increased in the past few years. The proposed a new CLOPD algorithm for identifying mislabelled data (anomaly) during clustering and increase the accuracy of cluster analysis in medical data set. It consists of two phases, Partition and Detection. Clustering aims to partitioning the data into groups based on distance metrics. The instance, which does not interfere with respect to the clusters are considered and indentified (Detection) as outliers. The main purpose of the research work is to extracts mislabelled instance (outliers) from data set and its merits are discussed for further exploration. Finally the results were compared and depicted.
Survey on Outlier Detection in Data Mining
International Journal of Computer Applications, 2013
Data Mining is used to extract useful information from a collection of databases or data warehouses. In recent years, Data Mining has become an important field. This paper has surveyed upon data mining and its various techniques that are used to extract useful information such as clustering, and has also surveyed the techniques that are used to detect the outliers. This paper also presents various techniques used by different researchers to detect outliers and present the efficient result to the user.