Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare (original) (raw)
Abstract
Heart disease is one of the complex diseases and globally many people suffered from this disease. On time and efficient identification of heart disease plays a key role in healthcare, particularly in the field of cardiology. In this article, we proposed an efficient and accurate system to diagnosis heart disease and the system is based on machine learning techniques. The system is developed based on classification algorithms includes Support vector machine, Logistic regression, Artificial neural network, K-nearest neighbor, Naïve bays, and Decision tree while standard features selection algorithms have been used such as Relief, Minimal redundancy maximal relevance, Least absolute shrinkage selection operator and Local learning for removing irrelevant and redundant features. We also proposed novel fast conditional mutual information feature selection algorithm to solve feature selection problem. The features selection algorithms are used for features selection to increase the classification accuracy and reduce the execution time of classification system. Furthermore, the leave one subject out cross-validation method has been used for learning the best practices of model assessment and for hyperparameter tuning. The performance measuring metrics are used for assessment of the performances of the classifiers. The performances of the classifiers have been checked on the selected features as selected by features selection algorithms. The experimental results show that the proposed feature selection algorithm (FCMIM) is feasible with classifier support vector machine for designing a high-level intelligent system to identify heart disease. The suggested diagnosis system (FCMIM-SVM) achieved good accuracy as compared to previously proposed methods. Additionally, the proposed system can easily be implemented in healthcare for the identification of heart disease. INDEX TERMS Heart disease classification, features selection, disease diagnosis, intelligent system, medical data analytics.
Figures (29)
TABLE 1. Summary of the previous methods. Algorithm 1 Pseudo-Code for Relief FS Algorithm
FIGURE 1. Proposed heart disease identification system.
FIGURE 2. Histgrams of heart disease dataset. TABLE 5. The results of statistical operation on the dataset.
FIGURE 4. The score of features and ranking selected by FS algorithms.
FIGURE 3. The heat map for correlation features of heart disease dataset.
FIGURE 5. Cross validation MSE of LASSO fit. EIA, OPK, PES, and THA. The classifiers classification performances on these selected features are very good. The AGE and FBS features are not selected by this algorithm. In Table 7, we report the features selected by FCMIM FS algorithm along with feature score and graphical describes in Figure 6.
TABLE 6. Selected features by Relief, MRMR, LASSO, and LLBFS. TABLE 7. Features selected by FCMIM FS algorithm.
other parameters values also passed during the training pro- cess. Table 8 represents the performance evaluation of clas- sifiers with LOSO CV. According to Table 8, the classifier logistic regression has good performance that obtained 84% accuracy, 93% specificity, and 75% sensitivity and MCC was 84%, and processing time was 0.003 seconds at C = 10 as compared with others values of parameter C. The K-NN, different experiments conducted with different values of k. However, at k = 7 the performance of K-NN was excel-
FIGURE 6. Features selected by FMIM FS algorithm. lent. ANN was trained with hidden neurons but at 10 hid- den neurons give better performance result with accuracy 60%, specificity 100%, and sensitivity 0%. SVM (RBF) with C= 100, g = 0.001 has 61% specificity, 70% sensitivity and 70% accuracy. The SVM linear kernel has 95% specificity 75% sensitivity, and 85% accuracy. The NB was third good classifiers which have 90% specificity, 78% sensitivity and 80%accuracy. DT has 72% specificity, 83% sensitivity, and 70% accuracy. Figure 7 shows that the SVM outperformed as compared to the other five classifiers. The accuracy of SVM (linear) is 85%, sensitivity 77%, and specificity 95%, and 85% accuracy. Logistic regression is second good classifier has 84% accuracy. The third important classifier is NB and its specificity is 90%, sensitivity is 78%, and classification accu- racy is 80%. The worst classifiers were K-NN at k = | with LOSO cross-validation. The MCC of SVM is 85% pretty good and SVM is good classifier for heart disease prediction. In Figure 11, we have been shown the execution time of each algorithm in which classifier Svm (linear) on C = 100 and g = 0.009 processing time is 30.145 seconds and logistic regression at C = 10 is 0.003 seconds very fast exaction time as compared to others classifiers with LOSO cross-validation method. Table 8 shows the LOSO cross validation classifiers performance with full features. Ts wha Callasceicccs sackiétlieces la. Aldcxiweetesees Sead hee eee A
‘ABLE 8. Performance of classifiers on full features set.
TABLE 10. Results of classifiers on features selected by MRMR. perceptron and in MLP were used a various number of hidden neurons. The ANN on 20 hidden neurons the MLP gives high results on selected features set with LOSO validation method and obtained classification accuracy 82% and on full features, the accuracy was 55%. It clears the difference of performance improvement with features selection. Also, the computation time of the ANN algorithm also reduced from 9.777 seconds to 5.931 seconds. The specificity of ANN was 94% at 20 hid- den neurons. Therefore, the ANN is good for detection of healthy people. The results of SVM (rbf) at C = 100 and
TABLE 9. Performance of classifiers on features selected by Relief.
FIGURE 11. Classifiers performances on 6 important features selected set by LSBFS.
TABLE 12. Results of classifiers on features selected by LLBFS. TABLE 11. Results of classifiers on s features selected by LASSO.
LOSO validation are also good for heart diagnosis. The speci- ficity of classifiers as reported in Table 13 that specificity of ANN classifier is best on Relief FS algorithm as compared to the specificity of MRMR, LASSO, LLBFS, and FCMIM feature selection algorithms. Therefore, Relief FS algorithm with classifier ANN the specificity is good and best diagno- sis system for correct classification of healthy people. The in term of accuracy is good and achieved 92.37%accuarcy on selected features selected by proposed FS algorithm (FCMIM) as compared to the state of the arts FS algorithms (Relief, MRMR, LASSO, LLBFS) with LOSO CV. Hence in term of accuracy FCMIM, FS algorithm best for features selection and SVM is suitable classifier for HD diagnosis. LASSO and MRMR performances in term of accuracy with
with classifier logistic regression are good low as compared to MRMR FS algorithms. Table 15 shows the accuracy of LR improved from 84% to 88% on reduces features with LLBFS algorithm. Similarly, SVM (linear) accuracy improved from 85% to 92.37% on reduces features set with FCMIM. Thus, the performance of classifiers improved with selected fea- tures. Finally, we concluded that the diagnosis system for heart disease using FCMIM FS algorithm with classifier SVM is good for effective diagnosis for heart disease. The proposed system (FCMIM + SVM) accuracy is high and achieved 92.37% accuracy as compared to other features selection algorithms and classifiers.
FIGURE 12. Classifiers accuracy on features selected by FCMIM FS algorithm. specificity of Logistic Regression with MRMR is also best for the correct prediction of healthy people. The sensitivity of the classifier logistic regression is 98% on features selected by FCMIM FS algorithm and correctly classify the people with heart disease. The sensitivity of classifier NB on selected features set by LASSO FS algorithm also give the best result as compared to the sensitivity values of Relief FS algorithm with classifier SVM (linear). In the case of MCC, FCMIM chooses appropriate features with classifier LR and achieved best MCC 91% as compared to the MCC values of MRMR, LASSO, LLBFS, and Relief FS algorithm. The computation time of Relief, LASSO, LLBFS, and FCMIM FS algorithms
TABLE 13. Classifiers results on features selected by FCMIM.
TABLE 16. Training parameters for BPDNNs. 12) PERFORMANCE COMPARISON OF PROPOSED METHOD WITH PREVIOUSLY PROPOSED METHODS
TABLE 15. Performance comparison of best classifiers before and after feature selection using standers features selection algorithm and proposed FCMIM FS algorithm.
TABLE 14. Best performances metrics results and best classifiers with Features selection algorithms.
FIGURE 14. Performance comparison of the proposed method with previously proposed methods.
TABLE 17. Proposed method performance comparison with existing methods.
ASIF KHAN received the B.Sc. (Hons.) and Mas- ter of Computer Science and Application (M.C.A.) degrees from Aligarh Muslim University, India, and the Ph.D. degree (Hons.) in computer sci- ence and technology from the University of Elec- tronic Science and Technology of China (UESTC), China, in 2016. He was an Adjunct Faculty with the University of Bridgeport, USA, for the China Program, in Summer 2016. He was a Visiting Scholar of big data mining and application with the Chonecine i Institute of Green and Intelligent Technology (CIGIT), Chinese Academy of Sciences, Chongqing, China. He is currently a Postdoctoral Scientific Research Fellow with UESTC. He is also an Assistant Professor with BSA Crescent University, India. He is a contributor to many interna- tional journals with robotics and vision analysis about the contemporary world in his articles. His research interests include machine learning, robotics vision, and new ideas regarding vision-based information critical theoretical research. He received the Academic Achievement Award and the Excellent Performance Award. ESTC. from 2015 to 2016. AMIN UL FAQ received the M.S. degree In com- puter science. He is currently pursuing the Ph.D. degree with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. He has a vast aca- demic, technical, and professional experience in Pakistan. He is also a Lecturer with Agricultural University, Peshawar, Pakistan. He is associated with the Wavelets Active Media Technology and the Big Data Laboratory, as an International Stu- dent. He has been published high-level research articles in good journals. His research interests include machine learning, medical big data, the IoT, e-health and telemedicine. and concerned technologies and algorithms.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (54)
- A. L. Bui, T. B. Horwich, and G. C. Fonarow, ''Epidemiology and risk profile of heart failure,'' Nature Rev. Cardiol., vol. 8, no. 1, p. 30, 2011.
- M. Durairaj and N. Ramasamy, ''A comparison of the perceptive approaches for preprocessing the data set for predicting fertility success rate,'' Int. J. Control Theory Appl., vol. 9, no. 27, pp. 255-260, 2016.
- L. A. Allen, L. W. Stevenson, K. L. Grady, N. E. Goldstein, D. D. Matlock, R. M. Arnold, N. R. Cook, G. M. Felker, G. S. Francis, P. J. Hauptman, E. P. Havranek, H. M. Krumholz, D. Mancini, B. Riegel, and J. A. Spertus, ''Decision making in advanced heart failure: A scientific statement from the American heart association,'' Circulation, vol. 125, no. 15, pp. 1928-1952, 2012.
- S. Ghwanmeh, A. Mohammad, and A. Al-Ibrahim, ''Innovative artificial neural networks-based decision support system for heart diseases diagno- sis,'' J. Intell. Learn. Syst. Appl., vol. 5, no. 3, 2013, Art. no. 35396.
- Q. K. Al-Shayea, ''Artificial neural networks in medical diagnosis,'' Int. J. Comput. Sci. Issues, vol. 8, no. 2, pp. 150-154, 2011.
- J. Lopez-Sendon, ''The heart failure epidemic,'' Medicographia, vol. 33, no. 4, pp. 363-369, 2011.
- P. A. Heidenreich, J. G. Trogdon, O. A. Khavjou, J. Butler, K. Dracup, M. D. Ezekowitz, E. A. Finkelstein, Y. Hong, S. C. Johnston, A. Khera, D. M. Lloyd-Jones, S. A. Nelson, G. Nichol, D. Orenstein, P. W. F. Wilson, and Y. J. Woo, ''Forecasting the future of cardiovascular disease in the united states: A policy statement from the American heart association,'' Circulation, vol. 123, no. 8, pp. 933-944, 2011.
- A. Tsanas, M. A. Little, P. E. McSharry, and L. O. Ramig, ''Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity,'' J. Roy. Soc. Interface, vol. 8, no. 59, pp. 842-855, 2011.
- S. I. Ansarullah and P. Kumar, ''A systematic literature review on cardiovascular disorder identification using knowledge mining and machine learning method,'' Int. J. Recent Technol. Eng., vol. 7, no. 6S, pp. 1009-1015, 2019.
- S. Nazir, S. Shahzad, S. Mahfooz, and M. Nazir, ''Fuzzy logic based decision support system for component security evaluation,'' Int. Arab J. Inf. Technol., vol. 15, no. 2, pp. 224-231, 2018.
- R. Detrano, A. Janosi, W. Steinbrunn, M. Pfisterer, J.-J. Schmid, S. Sandhu, K. H. Guppy, S. Lee, and V. Froelicher, ''International application of a new probability algorithm for the diagnosis of coronary artery disease,'' Amer. J. Cardiol., vol. 64, no. 5, pp. 304-310, Aug. 1989.
- J. H. Gennari, P. Langley, and D. Fisher, ''Models of incremental concept formation,'' Artif. Intell., vol. 40, nos. 1-3, pp. 11-61, Sep. 1989.
- Y. Li, T. Li, and H. Liu, ''Recent advances in feature selection and its applications,'' Knowl. Inf. Syst., vol. 53, no. 3, pp. 551-577, Dec. 2017.
- J. Li and H. Liu, ''Challenges of feature selection for big data analytics,'' IEEE Intell. Syst., vol. 32, no. 2, pp. 9-15, Mar. 2017.
- L. Zhu, J. Shen, L. Xie, and Z. Cheng, ''Unsupervised topic hypergraph hashing for efficient mobile image retrieval,'' IEEE Trans. Cybern., vol. 47, no. 11, pp. 3941-3954, Nov. 2017.
- S. Raschka, ''Model evaluation, model selection, and algorithm selec- tion in machine learning,'' 2018, arXiv:1811.12808. [Online]. Available: http://arxiv.org/abs/1811.12808
- S. Palaniappan and R. Awang, ''Intelligent heart disease prediction system using data mining techniques,'' in Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl., Mar. 2008, pp. 108-115.
- E. O. Olaniyi, O. K. Oyedotun, and K. Adnan, ''Heart diseases diagnosis using neural networks arbitration,'' Int. J. Intell. Syst. Appl., vol. 7, no. 12, p. 72, 2015.
- R. Das, I. Turkoglu, and A. Sengur, ''Effective diagnosis of heart disease through neural networks ensembles,'' Expert Syst. Appl., vol. 36, no. 4, pp. 7675-7680, May 2009.
- O. W. Samuel, G. M. Asogbon, A. K. Sangaiah, P. Fang, and G. Li, ''An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction,'' Expert Syst. Appl., vol. 68, pp. 163-172, Feb. 2017.
- A. V. S. Kumar, ''Diagnosis of heart disease using fuzzy resolution mech- anism,'' J. Artif. Intell., vol. 5, no. 1, pp. 47-55, Jan. 2012.
- M. Gudadhe, K. Wankhade, and S. Dongre, ''Decision support system for heart disease based on support vector machine and artificial neural net- work,'' in Proc. Int. Conf. Comput. Commun. Technol. (ICCCT), Sep. 2010, pp. 741-745.
- H. Kahramanli and N. Allahverdi, ''Design of a hybrid system for the dia- betes and heart diseases,'' Expert Syst. Appl., vol. 35, nos. 1-2, pp. 82-89, Jul. 2008.
- M. A. Jabbar, B. Deekshatulu, and P. Chandra, ''Classification of heart dis- ease using artificial neural network and feature subset selection,'' Global J. Comput. Sci. Technol. Neural Artif. Intell., vol. 13, no. 3, pp. 4-8, 2013.
- X. Liu, X. Wang, Q. Su, M. Zhang, Y. Zhu, Q. Wang, and Q. Wang, ''A hybrid classification system for heart disease diagnosis based on the RFRS method,'' Comput. Math. Methods Med., vol. 2017, pp. 1-11, Jan. 2017.
- A. U. Haq, J. Li, M. H. Memon, M. H. Memon, J. Khan, and S. M. Marium, ''Heart disease prediction system using model of machine learning and sequential backward selection algorithm for features selection,'' in Proc. IEEE 5th Int. Conf. Converg. Technol. (ICT), Mar. 2019, pp. 1-4.
- S. Mohan, C. Thirumalai, and G. Srivastava, ''Effective heart disease prediction using hybrid machine learning techniques,'' IEEE Access, vol. 7, pp. 81542-81554, 2019.
- G. G. N. Geweid and M. A. Abdallah, ''A new automatic identification method of heart failure using improved support vector machine based on duality optimization technique,'' IEEE Access, vol. 7, pp. 149595-149611, 2019.
- A. U. Haq, J. Li, M. H. Memon, J. Khan, and S. U. Din, ''A novel integrated diagnosis method for breast cancer detection,'' J. Intell. Fuzzy Syst., vol. 38, no. 2, pp. 2383-2398, 2020.
- V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, J. M. Benítez, and F. Herrera, ''A review of microarray datasets and applied feature selection methods,'' Inf. Sci., vol. 282, pp. 111-135, Oct. 2014.
- R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, ''Relief-based feature selection: Introduction and review,'' J. Biomed. Informat., vol. 85, pp. 189-203, Sep. 2018.
- H. Peng, F. Long, and C. Ding, ''Feature selection based on mutual infor- mation criteria of max-dependency, max-relevance, and min-redundancy,'' IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
- A. M. D. Silva, Feature Selection, vol. 13. Berlin, Germany: Springer, 2015, pp. 1-13.
- A. Unler, A. Murat, and R. B. Chinnam, ''Mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelli- gence for support vector machine classification,'' Inf. Sci., vol. 181, no. 20, pp. 4625-4641, Oct. 2011.
- R. Tibshirani, ''Regression shrinkage and selection via the lasso,'' J. Roy. Stat. Soc., B, Methodol., vol. 58, no. 1, pp. 267-288, Jan. 1996.
- F. E. Harrell, Jr., ''Ordinal logistic regression,'' in Regression Modeling Strategies. Cham, Switzerland: Springer, 2015, pp. 311-325.
- F. Fleuret, ''Fast binary feature selection with conditional mutual informa- tion,'' J. Mach. Learn. Res., vol. 5, pp. 1531-1555, Nov. 2004.
- R. Alzubi, N. Ramzan, H. Alzoubi, and A. Amira, ''A hybrid feature selection method for complex diseases SNPs,'' IEEE Access, vol. 6, pp. 1292-1301, 2018.
- K. Larsen, J. H. Petersen, E. Budtz-Jørgensen, and L. Endahl, ''Interpreting parameters in the logistic regression model with random effects,'' Biomet- rics, vol. 56, no. 3, pp. 909-914, 2000.
- V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, USA: Springer-Verlag, 2013.
- X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, ''Top 10 algorithms in data mining,'' Knowl. Inf. Syst., vol. 14, no. 1, pp. 1-37, 2008.
- A. V. D. Sánchez, ''Advanced support vector machines and kernel meth- ods,'' Neurocomputing, vol. 55, nos. 1-2, pp. 5-20, Sep. 2003.
- N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: Cambridge Univ. Press, 2000.
- C.-C. Chang and C.-J. Lin, ''LIBSVM: A library for support vector machines,'' ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1-27, Apr. 2011.
- H.-L. Chen, B. Yang, J. Liu, and D.-Y. Liu, ''A support vector machine classifier with rough set-based feature selection for breast cancer diagno- sis,'' Expert Syst. Appl., vol. 38, no. 7, pp. 9014-9022, Jul. 2011.
- J. Mourão-Miranda, A. L. W. Bokde, C. Born, H. Hampel, and M. Stetter, ''Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data,'' NeuroImage, vol. 28, no. 4, pp. 980-995, Dec. 2005.
- N. Friedman, D. Geiger, and M. Goldszmidt, ''Bayesian network classi- fiers,'' Mach. Learn., vol. 29, no. 2, pp. 131-163, Nov. 1997.
- R. Sivaranjani, V. S. Naresh, and N. V. Murthy, ''4 coronary heart disease prediction using genetic algorithm based decision tree,'' Intell. Decis. Support Syst., Appl. Signal Process., vol. 4, p. 71, Oct. 2019.
- A. U. Haq, J. Li, J. Khan, M. H. Memon, S. Parveen, M. F. Raji, W. Akbar, T. Ahmad, S. Ullah, L. Shoista, and H. N. Monday, ''Identifying the predictive capability of machine learning classifiers for designing heart disease detection system,'' in Proc. 16th Int. Comput. Conf. Wavelet Act. Media Technol. Inf. Process., Dec. 2019, pp. 130-138.
- A. Ul Haq, J. Li, Z. Ali, J. Khan, M. H. Memon, M. Abbas, and S. Nazir, ''Recognition of the Parkinson's disease using a hybrid feature selection approach,'' J. Intell. Fuzzy Syst., vol. 39, pp. 1-21, May 2020, doi: 10.3233/JIFS-200075.
- A. U. Haq, J. P. Li, J. Khan, M. H. Memon, S. Nazir, S. Ahmad, G. A. Khan, and A. Ali, ''Intelligent machine learning approach for effective recogni- tion of diabetes in E-healthcare using clinical data,'' Sensors, vol. 20, no. 9, p. 2649, May 2020.
- A. U. Haq, J. Li, M. H. Memon, J. Khan, S. U. Din, I. Ahad, R. Sun, and Z. Lai, ''Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for predic- tion of parkinson disease,'' in Proc. 15th Int. Comput. Conf. Wavelet Act. Media Technol. Inf. Process. (ICCWAMTIP), Dec. 2018, pp. 101-106.
- A. U. Haq, J. P. Li, M. H. Memon, J. Khan, A. Malik, T. Ahmad, A. Ali, S. Nazir, I. Ahad, and M. Shahid, ''Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson's disease using voice recordings,'' IEEE Access, vol. 7, pp. 37718-37734, 2019.
- A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun, ''A hybrid intel- ligent system framework for the prediction of heart disease using machine learning algorithms,'' Mobile Inf. Syst., vol. 2018, pp. 1-21, Dec. 2018.