A comprehensive study of machine learning for predicting cardiovascular disease using Weka and Statistical Package for Social Sciences tools (original) (raw)

A comprehensive study of machine learning for predicting cardiovascular disease using Weka and SPSS tools

International Journal of Electrical and Computer Engineering (IJECE), 2023

Artificial intelligence (AI) is simulating human intelligence processes by machines and software simulators to help humans in making accurate, informed, and fast decisions based on data analysis. The medical field can make use of such AI simulators because medical data records are enormous with many overlapping parameters. Using in-depth classification techniques and data analysis can be the first step in identifying and reducing the risk factors. In this research, we are evaluating a dataset of cardiovascular abnormalities affecting a group of potential patients. We aim to employ the help of AI simulators such as Weka to understand the effect of each parameter on the risk of suffering from cardiovascular disease (CVD). We are utilizing seven classes, such as baseline accuracy, naïve Bayes, k-nearest neighbor, decision tree, support vector machine, linear regression, and artificial neural network multilayer perceptron. The classifiers are assisted by a correlation-based filter to select the most influential attributes that may have an impact on obtaining a higher classification accuracy. Analysis of the results based on sensitivity, specificity, accuracy, and precision results from Weka and Statistical Package for Social Sciences (SPSS) is illustrated. A decision tree method (J48) demonstrated its ability to classify CVD cases with high accuracy 95.76%.

A Comparative Study on Predicting Cardiovascular Disease Using Machine Learning Algorithms

International Journal of Innovative Research in Computer Science and Technology (IJIRCST), 2024

Heart disease is a global health concern because of eating patterns, office work cultures, and lifestyle changes. A machine learning-based heart attack prediction system is like having a vigilant watchdog in the medical field. To estimate the danger of a heart attack, it all boils down to analyzing data and complex algorithms. Four primary categories were established at the outset of this study: age, gender, BMI, and blood pressure. The data on heart illness was then classified using a variety of machine learning approaches, including XGBoost Model, Gradient Boosting Model, Random Forest, Logistic Regression, and Decision Trees. The results in terms of accuracy, false positive rate, precision, sensitivity, and specificity were then compared. Results in terms of accuracy, precision, recall, and f1_score were found to be greatest when using Logistic Regression (LR). It is therefore strongly recommended that data on cardiac disease can be classified using the logistic regression technique.

Cardiovascular Disease Prediction Model using Machine Learning Algorithms

international journal for research in applied science and engineering technology ijraset, 2020

A general term for conditions affecting the heart or blood vessels is called as Cardiovascular disease (CVD). It is commonly associated with an increased risk of blood clots and build-up of fatty deposits inside the arteries (atherosclerosis). Sometimes, it can also be associated with damage to arteries in organs such as the brain, kidneys, heart and eyes. CVD is the reason for the highest number of deaths globally and the major cause of death annually. Most cardiovascular diseases can often be prevented by leading a healthy lifestyle and addressing behavioural risk factors such as unhealthy diet and obesity, tobacco use, harmful use of alcohol and physical inactivity using population-wide strategies. Machine Learning can play an important role in predicting cardiovascular disease and such information, if predicted well in advance can provide significant insights to doctors who can then adapt their treatment and diagnosis for each patient accordingly. In the proposed research method, firstly the attributes are selected from the dataset, then data pre-processing takes place which uses techniques such as removal of noisy data, removal of missing data, filling default values if applicable, classification of attributes for prediction and decision making at different levels. Classification, accuracy, sensitivity and specificity analysis is done to obtain the performance of the diagnosis model. A prediction model which predicts whether a person has a heart disease or not and hence provide diagnosis or discussion on the results is proposed. This is accomplished by applying rules to the individual results of classification algorithms such as Gradient Boosting Classifier, Random Forest Classifier, Support Vector Machine, Extremely Randomized Trees Classifier (Extra Trees Classifier), Logistic Regression and Multi-Layer Perceptron (MLP) Classifier obtained on the dataset.

IJERT-A Method of Cardiovascular Disease Prediction using Machine Learning

International Journal of Engineering Research and Technology (IJERT), 2021

https://www.ijert.org/a-method-of-cardiovascular-disease-prediction-using-machine-learning https://www.ijert.org/research/a-method-of-cardiovascular-disease-prediction-using-machine-learning-IJERTCONV9IS05050.pdf In the recent times, the major reason for increasing death rate is cardiac diseases. It is illogical for a typical man to experience exorbitant tests like the ECG as often as possible. Hence, there is an immediate need to forestall the death rate by setting up a framework for identifying the cardiovascular diseases in the initial stage, as there is a wild increase in the rate of cardiac arrests at adolescent age. For this purpose there are various classification algorithms of machine learning which can forestall the weakness of heart from the given essential indications corresponding to age, sex, cholesterol, glucose levels, heartbeat rate and so on. In this paper a popular classification methodology K Nearest Neighbor (KNN) algorithm has been utilized for detecting the heart diseases at the early stage. The UCI dataset has been utilized for classification which contains the medical records of 303 patients. An accuracy of 87% has been obtained by the KNN algorithm.

Application of Machine Learning for Cardiovascular Disease Risk Prediction

Computational Intelligence and Neuroscience, 2023

Cardiovascular diseases (CVDs) are a common cause of heart failure globally. Te need to explore possible ways to tackle the disease necessitated this study. Te study designed a machine learning model for cardiovascular disease risk prediction in accordance with a dataset that contains 11 features which may be used to forecast the disease. Te dataset from Kaggle on cardiovascular disease includes approximately 70,000 patient records that were used to determine the outcome. Compared to the UCI dataset, the Kaggle dataset has many more training and validation records. Models created using neural networks, random forests, Bayesian networks, C5.0, and QUEST were compared for this dataset. On training and testing data sets, the results acquired a high accuracy (99.1 percent), which is signifcantly superior to previous methods. Ahead-of-time detection and diagnosis of cardiac disease, as well as better treatment outcomes, are strong possibilities for the suggested prediction model. Additionally, it may help patients better manage their illness or life forms in order to increase their chances of recovery/survival. Te result showed greater accuracy and promising signs that machine-learning algorithms can indeed assist in early identifcation of the disease and improvement of the treatment outcome.

IJERT-Heart Disease Prediction using Artificial Intelligence

International Journal of Engineering Research and Technology (IJERT), 2021

https://www.ijert.org/heart-disease-prediction-using-artificial-intelligence https://www.ijert.org/research/heart-disease-prediction-using-artificial-intelligence-IJERTCONV9IS04015.pdf Artificial Intelligence techniques have been widely used in clinical decision support systems for prediction and diagnosis of various diseases with good accuracy. These classifying techniques are very effective in designing clinical support systems due to their ability to get hidden patterns and relationships in medical data provided by medical professionals. One of the most important applications of such systems is in the diagnosis of heart diseases because it is one of the leading causes of deaths all over the world. Almost all systems that predict heart diseases using clinical dataset having parameters and inputs from complex tests conducted in labs. None of the systems predicts heart diseases supporting risk factors like age, case history, diabetes, hypertension, high cholesterol, tobacco smoking, alcohol intake, obesity or physical inactivity, etc. Heart disease patients have many of those visible risk factors in common which may be used very effectively for diagnosis. A system based on such risk factors would not only help medical professionals but it would give patients a warning about the probable presence of heart disease even before the patient visits a hospital or goes for costly medical checkups. Hence this paper presents a technique for prediction of heart disease using major risk factors with help of different Classifying Algorithms. This technique involves four major classification algorithms such as K Neighbors, Support Vector, Decision Tree, Random Forest algorithms.

Risk prediction of cardiovascular disease using machine learning classifiers

Open Medicine

Cardiovascular disease (CVD) makes our heart and blood vessels dysfunctional and often leads to death or physical paralysis. Therefore, early and automatic detection of CVD can save many human lives. Multiple investigations have been carried out to achieve this objective, but there is still room for improvement in performance and reliability. This study is yet another step in this direction. In this study, two reliable machine learning techniques, multi-layer perceptron (MLP), and K-nearest neighbour (K-NN) have been employed for CVD detection using publicly available University of California Irvine repository data. The performances of the models are optimally increased by removing outliers and attributes having null values. Experimental-based results demonstrate that a higher accuracy in detection of 82.47% and an area-under-the-curve value of 86.41% are obtained using the MLP model, unlike the K-NN model. Therefore, the proposed MLP model was recommended for automatic CVD detectio...

Performance Analysis of Some Se-Lected Machine Learning Algo-Rithms on Heart Disease Predic-Tion Using the Noble Uci Datasets

International Journal of Engineering Applied Sciences and Technology

Heart disease is one of the major causes of morbidity and mortality in the world. The diagnosis and treatment are very complex, especially in the low income countries, due to the rare availability of efficient diagnostic tools and shortage of physicians which affect proper prediction and treatment of patients. Lack of awareness, inadequate preventive measures, lack of experienced medical professionals are among the factors that contribute to high risk of heart disease occurrences. Although, large proportion of heart diseases could be prevented but they continue to rise mainly because preventive measures taken are inadequate. Nowadays, several clinical decision support systems on heart disease prediction have been developed using the most popular machine learning algorithms and tools. This paper analyses the performances of these algorithms on heart disease prediction using the noble UCI datasets. They include Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT-J48), Random Forest (RF), K-Nearest Neighbor (KNN) and Neural Network (NN). From our investigation, these algorithms were mostly used in which RF appeared the best in the prediction of heart diseases using the mentioned datasets. From the 34 researches investigated, RF was used 10 times and appeared the best 4 times, followed by SVM whose frequency of usage was 18 times with 6 best performances. From the most popular algorithms, KNN was employed 10 times but appeared the best only once. Others such as LR and MLP were used 7 and 5 times respectively but none recorded a single best performance in the prediction of heart diseases, while FCM and Vote were not popular and were rarely considered.

Prediction of Cardiovascular Disease on Different Parameters Using Machine Learning

International Journal of Scientific Research in Science, Engineering and Technology, 2021

The most common serious diseases affecting human health are cardiovascular diseases (CVDs). Early diagnosis can prevent or mitigate CVDs, which can reduce the rate of death. It's a promising approach to identify risk factors using machine learning models. We wish to propose a model with different methods to effectively predict heart disease. We have employed effective data collection, data pre-processing and data transformation methods for the precise information of our training model to make our proposed model a success. A combined dataset has been used (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). The appropriate function is selected using AASSO (Advanced Absolute Shrinkage and Selection Operator techniques) and AASSO techniques. Appropriate features are selected. New hybrids are developed with integration of the traditional bagging and boosting methods, such as Decision Tree Bagger Method (DTBM), the Random Forest Bagging Method (RFBM), the K-Nearest Neighbour Bagging method (KNNBM), the AdaBoost Boosting Method (ABBM), and the GBBM. Our machine learning algorithms, along with Negative Predictive Value (NGR, false positive rates), and false negative flow rates, also were implemented to calculate accuracy of our model, sensitivity (SEN), error rate, accuracy of the model (FRE) and the F1 score (F1) (FNR). The results are shown for comparisons separately. Based on the result analysis, our proposed model produced the highest precision, Accuracy using RFBM and relief selection methods (99.05 percent).

Using Machine Learning Classifiers, Analyze and Predict Cardiovascular Disease

International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2022

A myocardial infarction, indigestion, or even death can take place as a result of several illnesses known as heart disease, including restricted or blocked veins. Depending on the extent of the patient's side effects, the condition is anticipated by the supervised classification classifier. This research intends to investigate how Machine Learning Tree Classifiers depict Heart Disease Prediction. Pattern recognition tree classifiers are analyzed using Random Forest, Decision Tree, Logistic Regression, Support Vector Machine (SVM), and K-nearest Neighbors (KNN) based on their correctness and AUC Gryphon scores. With an execution time of 1.32 seconds, better precision of 85%, and a Coefficient Of determination (r score of 0.8739, the Random Forest machine learning classification surpassed its effectiveness in this investigation of coronary heart disease detection.