Analysis of Different Classifiers for Medical Dataset using Various Measures (original) (raw)
Related papers
Empirical Study on Performance of Decision Trees (CART) and Ensemble Methods in Medical Diagnosis
2013
This paper investigates the ability of decision trees and ensemble methods to predict the probability of occurrence of Hypertension and Diabetes in a mixed patient population. A detailed database comprising healthy, hypertensive and diabetic patients from a university hospital was used for constructing the decision trees using CART algorithm. Ensemble algorithms such as bagging and multiple versions of boosting were used to improve the performance of basic CART algorithm for building various classification models for prediction of medical diagnosis. The measure of percentage misclassification error was considered to determine the effectiveness of classifier model. Even though CART shows acceptable classification error for the given datasets, ensemble methods such as bagging still improves the performance by building multiple trees.
International Journal of Engineering Research and Technology (IJERT), 2013
https://www.ijert.org/empirical-study-on-performance-of-decision-trees-cart-and-ensemble-methods-in-medical-diagnosis https://www.ijert.org/research/empirical-study-on-performance-of-decision-trees-cart-and-ensemble-methods-in-medical-diagnosis-IJERTV2IS120394.pdf This paper investigates the ability of decision trees and ensemble methods to predict the probability of occurrence of Hypertension and Diabetes in a mixed patient population. A detailed database comprising healthy, hypertensive and diabetic patients from a university hospital was used for constructing the decision trees using CART algorithm. Ensemble algorithms such as bagging and multiple versions of boosting were used to improve the performance of basic CART algorithm for building various classification models for prediction of medical diagnosis. The measure of percentage misclassification error was considered to determine the effectiveness of classifier model. Even though CART shows acceptable classification error for the given datasets, ensemble methods such as bagging still improves the performance by building multiple trees.
An Efficient Ensemble Based Classification Techniques for Medical Diagnosis
Building accurate and efficient classifiers for Medical databases is one of the essential tasks of data mining and machine learning research. Building effective classification systems is one of the central tasks of data mining. One of the most active areas of research in supervised machine learning has been to study methods for constructing good ensembles of learners. This paper aims to establish an accurate ensemble classification model for Medical prediction, in order to make full use of the invaluable information in clinical data, especially which is usually ignored by most of the existing methods when they aim for high prediction accuracies. This paper presents a comparison among the different ensemble classifiers on the database of Wisconsin Breast Cancer (WBC) and Diabetes data sets. In this experiment, we compare ensemble classification techniques in Weka software and comparison results show that Random forest has higher prediction accuracy than those methods. Different methods for breast cancer detection are explored and their accuracies are compared with these results, we infer that the Random forest are more suitable in handling the ensemble classification problem of Medical prediction, and we recommend the use of these approaches in similar ensemble classification problems.
IJERT-Comparative Analysis of Data Mining Classifiers in Analyzing Clinical Data
International Journal of Engineering Research and Technology (IJERT), 2013
https://www.ijert.org/comparative-analysis-of-data-mining-classifiers-in-analyzing-clinical-data https://www.ijert.org/research/comparative-analysis-of-data-mining-classifiers-in-analyzing-clinical-data-IJERTV2IS120344.pdf Health-care providers know there's a wealth of valuable information trapped in the handwritten notes on patients' charts. But the challenge of collecting and interpreting the data on a large scale remains to be solved. Now, researchers have taken a step forward in mining patient-based information by using existing language-analysis methods to identify drug use side effects in advance of the NAFDAC (National Agency for Food and Drug Administration and Control) issuing official alerts. Bioinformatics is an interdisciplinary field that develops and improves on methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to generate useful biological knowledge. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. This research is streamlined to the biological and clinical data. It plays a role in the textual mining of biological literature and computer science literature to organize, query and mine biological data. Biological data obtained is subjected to cross industry standard process for data mining and WEKA will serve as the bioinformatics tools for biological knowledge extraction. In this research work K-Nearest Neighbour (K-NN) and Classification and Regression Tree (CART) algorithms will be apply to biological data via the WEKA tool with aim to predict the effect of drug used considering the most probable target which is the patient symptoms after using the anti-malaria drugs.
A Comparative Result Analysis of Human Cancer Diagnosis using Ensemble Classification Methods
International Journal of Computer Applications, 2013
Cancer research has been an interesting and challenging research area in the field of medical science. Classification techniques have been found useful in early diagnosis of cancer and better treatment. For diagnosis of cancer various classification methods are used but they suffer with one or more disadvantages. In this paper ensemble based classification methods which combine the prediction of individual classifiers to generate the final prediction are discussed. The methods discussed are Bagging, Boosting and Random Forest Algorithm. These ensemble methods have shown improvement in quality of result as compared to commonly used single classifier e.g. decision tree or neural network .The improvement in classification is however at the cost extra processing time and higher storage as decision tree or neural network are faster as compared to ensemble based techniques. The ideas for further improvement in this field are also discussed in this paper. Methods discussed in the paper are applied on human cancer data for appropriate cancer gene selection which leads to classification of cancer.
Prediction System for Heart Disease Based on Ensemble Classifiers
2020
The heart is an essential organ in the human body. On the off chance that this organ gets influenced, at that point, it equally influences the other fundamental pieces of the body. Heart diseases are the front runner in terms of death worldwide, making the need for an effective prediction system a source of high demand in treating affected patients. This study aims to analyze prediction systems, thereby designing an automated medical diagnosis system that takes advantage of the collected database. For this study, ensemble classifiers were implemented for classification of data of a medical database with discretization used during the preprocessing phase. The data employed in this research was obtained from the University of California (UCI) machine learning repository. The dataset utilized was the Statlog heart disease. Performance measures, such as accuracy, sensitivity, and specificity, were used to evaluate the proposed methods’ performance. The proposed method achieved an accura...
A Decision Tree Based Classifier for Classification & Prediction of Diseases
IJSRD, 2013
In this paper, we are proposing a modified algorithm for classification. This algorithm is based on the concept of the decision trees. The proposed algorithm is better then the previous algorithms. It provides more accurate results. We have tested the proposed method on the example of patient data set. Our proposed methodology uses greedy approach to select the best attribute. To do so the information gain is used. The attribute with highest information gain is selected. If information gain is not good then again divide attributes values into groups. These steps are done until we get good classification/misclassification ratio. The proposed algorithms classify the data sets more accurately and efficiently.
Heterogeneous ensemble classifier in computer systems for medical diagnostics, 2024
The work is dedicated to the solution of an important scientific and technical problem: building a diagnostic decision-support system in medicine. The foundation of this system is a model developed as a heterogeneous ensemble classifier, which implements two primary approaches to formulating a diagnostic conclusion through basic models. The first of these approaches is probabilistic. It is based on the analysis of a training sample of patients with a confirmed diagnosis, which enables estimation of the probability of the presence of a particular disease based on available data. The second approach is expert-based, relying on expert information about the structure of symptom complexes that characterize each individual disease. It is important to note that both of these approaches address the same problem from different perspectives, and their combined use holds great promise for developing effective diagnostic systems. The purpose of this study is to synthesize a heterogeneous ensemble classifier that integrates both expert and probabilistic components into the diagnostic process. An analysis of various diagnostic methods used by doctors in alignment with the current requirements of evidence-based medicine was carried out as part of the study. Methods of constructing diagnostic decision rules in medical decision-support systems were also considered. Based on these studies, a mathematical model of a heterogeneous ensemble classifier was developed, with the choice of its constituent parts being justified. Widely used classification methods were selected as the probabilistic component in this system, particularly the standard comparison method, the k-nearest neighbors method, and the potential functions method. Expert knowledge concerning the structure of symptom complexes is formalized by expressing the symptom complexes of each disease in the form of numerical intervals. In this framework, linguistic variables are used, which can indicate “below the norm”, “norm”, or “above the norm”. Various strategies for aggregating different types of basic models within the heterogeneous ensemble classifier are reviewed. This approach preserves the advantages of each method and enhances the overall classification accuracy. Requirements for the developed system's functionality were formulated, design tools and the main development platform (Java) were defined, and the database management system (MySQL). The decisionsupport system was designed, and a comprehensive evaluation of the developed system was conducted on real medical data. The results of these tests confirmed the effectiveness of the system
Decision tree classifiers for automated medical diagnosis
Neural Computing and Applications, 2012
Decision support systems help physicians and also play an important role in medical decision-making. They are based on different models, and the best of them are providing an explanation together with an accurate, reliable and quick response. This paper presents a decision support tool for the detection of breast cancer based on three types of decision tree classifiers. They are single decision tree (SDT), boosted decision tree (BDT) and decision tree forest (DTF). Decision tree classification provides a rapid and effective method of categorizing data sets. Decision-making is performed in two stages: training the classifiers with features from Wisconsin breast cancer data set, and then testing. The performance of the proposed structure is evaluated in terms of accuracy, sensitivity, specificity, confusion matrix and receiver operating characteristic (ROC) curves. The results showed that the overall accuracies of SDT and BDT in the training phase achieved 97.07 % with 429 correct classifications and 98.83 % with 437 correct classifications, respectively. BDT performed better than SDT for all performance indices than SDT. Value of ROC and Matthews correlation coefficient (MCC) for BDT in the training phase achieved 0.99971 and 0.9746, respectively, which was superior to SDT classifier. During validation phase, DTF achieved 97.51 %, which was superior to SDT (95.75 %) and BDT (97.07 %) classifiers. Value of ROC and MCC for DTF achieved 0.99382 and 0.9462, respectively. BDT showed the best performance in terms of sensitivity, and SDT was the best only considering speed. Keywords Computer-aided diagnosis (CAD) Á Decision support systems (DSS) Á Decision tree classification Á Single decision tree Á Boosted decision tree Á Decision tree forest Á k-fold cross-validation
Comparative Analysis of Data Mining Classifiers in Analyzing Clinical Data
Health-care providers know there's a wealth of valuable information trapped in the hand-written notes on patients' charts. But the challenge of collecting and interpreting the data on a large scale remains to be solved. Now, researchers have taken a step forward in mining patient-based information by using existing language-analysis methods to identify drug use side effects in advance of the NAFDAC (National Agency for Food and Drug Administration and Control) issuing official alerts. Bioinformatics is an interdisciplinary field that develops and improves on methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to generate useful biological knowledge. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. This research is streamlined to the biological and clinical data. It plays a role in the textual mining of biological literature and computer science literature to organize, query and mine biological data. Biological data obtained is subjected to cross industry standard process for data mining and WEKA will serve as the bioinformatics tools for biological knowledge extraction. In this research work K-Nearest Neighbour (K-NN) and Classification and Regression Tree (CART) algorithms will be apply to biological data via the WEKA tool with aim to predict the effect of drug used considering the most probable target which is the patient symptoms after using the anti-malaria drugs.