A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data (original) (raw)

SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED ON GENETIC ALGORITHM

Feature Selection (FS) has become the focus of much research on decision support systems areas for which datasets with tremendous number of variables are analyzed. In this paper we present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic Algorithm (GA) wrapped Bayes Naïve (BN) based FS. Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA generates in each iteration a subset of attributes that will be evaluated using the BN in the second step of the selection procedure. The final set of attribute contains the most relevant feature model that increases the accuracy. The algorithm in this case produces 85.50% classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then compared with the use of Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is correspondingly compared with other FS algorithms. The Obtained results have shown very promising outcomes for the diagnosis of CAD.

A data mining approach for diagnosis of coronary artery disease

Computer Methods and Programs in Biomedicine, 2013

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 1 ( 2 0 1 3 ) 52-61 Classification Data mining Coronary artery disease SMO Bagging Neural Networks a b s t r a c t Cardiovascular diseases are very common and are one of the main reasons of death. Being among the major types of these diseases, correct and in-time diagnosis of coronary artery disease (CAD) is very important. Angiography is the most accurate CAD diagnosis method;

The use of genetic algorithm and particle swarm optimization on tiered feature selection method in machine learning-based coronary heart disease diagnosis system

International Journal of Electrical and Computer Engineering (IJECE), 2024

Coronary heart disease (CHD) is a leading global cause of death. Early detection is the right step to reduce mortality rates and treatment costs. Early detection can be developed using machine learning by utilizing patient medical record datasets. Unfortunately, this dataset has excessive features which can reduce machine learning performance. For this reason, it is necessary to reduce the number of redundant features and irrelevant data to improve machine learning performance. Therefore, this research proposes a tiered of feature selection model with genetic algorithm (GA) and particle swarm optimization (PSO) to improve the performance of the diagnosis model. The feature selection model is evaluated using parameters derived from the confusion matrix and using the CatBoost machine learning algorithm. Model testing uses z-Alizadeh Sani, Cleveland, Statlog, and Hungarian datasets. The best results for this model were obtained on the z-Alizadeh Sani dataset with 6 selected features from 54 features and the resulting performance for accuracy parameters was 99.32%, specificity 98.57%, sensitivity 100.00%, area under the curve (AUC) 99.28%, and F1-Score 99.37%. The proposed feature selection model is able to provide machine learning performance in the very good category. The diagnostic model proposed is of excellent standard.

Predicting coronary artery disease: a comparison between two data mining algorithms

BMC Public Health, 2019

Background: Cardiovascular diseases (CADs) are the first leading cause of death across the world. World Health Organization has estimated that morality rate caused by heart diseases will mount to 23 million cases by 2030. Hence, the use of data mining algorithms could be useful in predicting coronary artery diseases. Therefore, the present study aimed to compare the positive predictive value (PPV) of CAD using artificial neural network (ANN) and SVM algorithms and their distinction in terms of predicting CAD in the selected hospitals. Methods: The present study was conducted by using data mining techniques. The research sample was the medical records of the patients with coronary artery disease who were hospitalized in three hospitals affiliated to AJA University of Medical Sciences between March 2016 and March 2017 (n = 1324). The dataset and the predicting variables used in this study was the same for both data mining techniques. Totally, 25 variables affecting CAD were selected and related data were extracted. After normalizing and cleaning the data, they were entered into SPSS (V23.0) and Excel 2013. Then, R 3.3.2 was used for statistical computing. Results: The SVM model had lower MAPE (112.03), higher Hosmer-Lemeshow test's result (16.71), and higher sensitivity (92.23). Moreover, variables affecting CAD (74.42) yielded better goodness of fit in SVM model and provided more accurate result than the ANN model. On the other hand, since the area under the receiver operating characteristic (ROC) curve in the SVM algorithm was more than this area in ANN model, it could be concluded that SVM model had higher accuracy than the ANN model. Conclusion: According to the results, the SVM algorithm presented higher accuracy and better performance than the ANN model and was characterized with higher power and sensitivity. Overall, it provided a better classification for the prediction of CAD. The use of other data mining algorithms are suggested to improve the positive predictive value of the disease prediction.

Feature selection and risk prediction for patients with coronary artery disease using data mining

Medical & Biological Engineering & Computing, 2020

Coronary artery disease (CAD) is an important cause of mortality across the globe. Early risk prediction of CAD would be able to reduce the death rate by allowing early and targeted treatments. In healthcare, some studies applied data mining techniques and machine learning algorithms on the risk prediction of CAD using patient data collected by hospitals and medical centers. However, most of these studies used all the attributes in the datasets which might reduce the performance of prediction models due to data redundancy. The objective of this research is to identify significant features to build models for predicting the risk level of patients with CAD. In this research, significant features were selected using three methods (i.e., Chi-squared test, recursive feature elimination, and Embedded Decision Tree). Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique was implemented to address the imbalanced dataset issue. The prediction models were built based on the identified significant features and eight machine learning algorithms, utilizing Acute Coronary Syndrome (ACS) datasets provided by National Cardiovascular Disease Database (NCVD) Malaysia. The prediction models were evaluated and compared using six performance evaluation metrics, and the top-performing models have achieved AUC more than 90%.

Data mining approach for Coronary Artery Disease screening

Coronary artery disease (CAD) is the major cause of mortality in the world. Although there is a significant level of advancement in medical science and technology, this disease still remains challenging to the common people. The aim of this study is to develop a computer assisted screening system that will help early detection of CAD and improved patient management with the limited resources in the developing countries. The present system is developed from an initial marked data set. Ten risk factors have been investigated for the risk stratification of CAD. Two decision tree models -ID3 and CART, have been applied for finding a preliminary set of rules from the annotated database. The extracted rules have been clinically validated by a group of cardiologists as per their medical experience and acumen in finding a final set of rule base. The dataset used for automatic generation of model consists of 500 subjects. The present screening system provides risk stratification for CAD based on easily available medical data and it produces rules that can be easily interpreted by the medical experts. The developed system is ready to clinically validate on a large dataset.

Diagnosis of Cardiovascular Diseases using Hybrid Feature Selection and Classification Algorithms

2017

Current diagnostic systems in order to identify cardiovascular diseases (CVDs) such as Echocardiography (ECG) require highly skilled physicians to evaluate complex combinations of clinical and pathological data. Inaccurate decision decision making is the challenge in the process and thus can’t be permitted in healthcare industry. Data mining methodologies can be applied to large medical datasets to extract insights that aid healthcare professionals in the diagnosis of cardiovascular diseases. In CVDs data mining, classification categorize a patient as having CVDs or free from it based on their similarities to previous examples of other patients. The classification accuracy rate is highly influenced by feature selection technique which eliminates features or attributes with practically no or little information from the dataset. Thus, feature selection and classification algorithms are considered as a concern of global "combinatorial optimization". The aim of this research i...

An Intelligent Machine Learning Approaches for Predicting Coronary Artery Disease

Coronary Artery Disease (CAD) destroys the internal layer of the artery. Consequently, this destruction leads the fatty sediments to escalate the injury. CAD is one of the common significant reasons of death all around the world, thus early detection of CAD will facilitate scale back these rates. The medical industries gather a large number of facts which include some unknown data to make the choice effective. They also use some excellent data processing methods. The CAD prediction indicates the probability of patients getting artery disease. In this research, we propose various Machine Learning (ML) methods to predict the CAD with the help of historical data. These ML methods enable the system to learn over several datasets to acknowledge valuable understanding. The programmable capability of ML in examining, interpreting, and processing data-set is beneficial to decision-makers in the medical field. This method uses 10 medical parameters to forecast artery disease which is obtained from KEEL (Knowledge Extraction based on Evolutionary Learning). An experiment is performed with algorithms like Naive Bayes, Decision Tree, Neural Network (MLP Classifier), Logistic Regression, and Random Forest with necessary performance metrics like accuracy, precision, recall.

Coronary artery disease detection using computational intelligence methods

Knowledge-Based Systems, 2016

Nowadays, cardiovascular diseases are very common and are one of the main causes of death worldwide. One major type of such diseases is the coronary artery disease (CAD). The best and most accurate method for the diagnosis of CAD is angiography, which has significant complications and costs. Researchers are, therefore, seeking novel modalities for CAD diagnosis via data mining methods. To that end, several algorithms and datasets have been developed. However, a few studies have considered the stenosis of each major coronary artery separately. We attempted to achieve a high rate of accuracy in the diagnosis of the stenosis of each major coronary artery. Analytical methods were used to investigate the importance of features on artery stenosis. Further, a proposed classification model was built to predict each artery status in new visitors. To further enhance the models, a proposed feature selection method was employed to select more discriminative feature subsets for each artery. According to the experiments, accuracy rates of 86.14%, 83.17%, and 83.50% were achieved for the diagnosis of the stenosis of the left anterior descending (LAD) artery, left circumflex (LCX) artery and right coronary artery (RCA), respectively. To the best of our knowledge, these are the highest accuracy rates that have been obtained in the literature so far. In addition, a number of rules with high confidence were introduced for deciding whether the arteries were stenotic or not. Also, we applied the proposed method on two challenging datasets and obtained the best accuracy in comparison with other methods.

Diagnoses of coronary heart disease (CHD) using data mining techniques based on classification

2018

The coronary heart disease CHD is the most attracts people's attention around the world because it leads to death. We have designed a system to help diagnose this disease with better reduce of costs and time that required for this process. We had used a programing language with data mining classification techniques. These algorithms gave high results and accuracy. We applied our study with different CHD dataset. We got the best accuracy as 99% through use of Random Forest (RF) algorithm with Hungarian two classes. And with Cleveland we got 94% used the same algorithm while the better accuracy with the same dataset in the previous study was 58% used SVM algorithm. Also with Hungarian five class dataset we had got 99% as the best accuracy used random forest classifier algorithm rather than the accuracy that achieved with this dataset in previous work which was close to 67% used SVM algorithm. In addition, we have got 88% as abetter accuracy used AdaBoost classifier with Hungarian dataset and 87% as better accuracy used logistic regression classifier with the heart.csv dataset. With Switzerland dataset, we got 95% as best accuracy used random forest and 91% as best accuracy with long-beach dataset used the same classifier. We used train test split and preprocessing for CHD dataset in our research and we processed the missing values that found with attributes. This process had made the big difference from the previous study is proposed accuracy for this purpose with same CHD dataset.