tsehay admassu - Academia.edu (original) (raw)
Papers by tsehay admassu
International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering, Apr 1, 2024
IAES International Journal of Artificial Intelligence, Sep 1, 2022
In recent years, machine learning is attaining higher precision and accuracy in clinical heart di... more In recent years, machine learning is attaining higher precision and accuracy in clinical heart disease dataset classification. However, literature shows that the quality of heart disease feature used for the training model has a significant impact on the outcome of the predictive model. Thus, this study focuses on exploring the impact of the quality of heart disease features on the performance of the machine learning model on heart disease prediction by employing recursive feature elimination with cross-validation (RFECV). Furthermore, the study explores heart disease features with a significant effect on model output. The dataset for experimentation is obtained from the University of California Irvine (UCI) machine learning dataset. The experiment is implemented using a support vector machine (SVM), logistic regression (LR), decision tree (DT), and random forest (RF) are employed. The performance of the SVM, LR, DT, and RF models. The result appears to prove that the quality of the feature significantly affects the performance of the model. Overall, the experiment proves that RF outperforms as compared to other algorithms. In conclusion, the predictive accuracy of 99.7% is achieved with RF.
International Journal of Electrical and Computer Engineering (IJECE)
This study investigates the Shapley additive explanation (SHAP) of the extreme boosting (XGBoost)... more This study investigates the Shapley additive explanation (SHAP) of the extreme boosting (XGBoost) model for breast cancer diagnosis. The study employed Wisconsin’s breast cancer dataset, characterized by 30 features extracted from an image of a breast cell. SHAP module generated different explainer values representing the impact of a breast cancer feature on breast cancer diagnosis. The experiment computed SHAP values of 569 samples of the breast cancer dataset. The SHAP explanation indicates perimeter and concave points have the highest impact on breast cancer diagnosis. SHAP explains the XGB model diagnosis outcome showing the features affecting the XGBoost model. The developed XGB model achieves an accuracy of 98.42%.
Proceedings of engineering and technology innovation, 2023
This study aims to investigate the effectiveness of local interpretable model-agnostic explanatio... more This study aims to investigate the effectiveness of local interpretable model-agnostic explanation (LIME) and Shapley additive explanation (SHAP) approaches for chronic heart disease detection. The efficiency of LIME and SHAP are evaluated by analyzing the diagnostic results of the XGBoost model and the stability and quality of counterfactual explanations. Firstly, 1025 heart disease samples are collected from the University of California Irvine. Then, the performance of LIME and SHAP is compared by using the XGBoost model with various measures, such as consistency and proximity. Finally, Python 3.7 programming language with Jupyter Notebook integrated development environment is used for simulation. The simulation result shows that the XGBoost model achieves 99.79% accuracy, indicating that the counterfactual explanation of the XGBoost model describes the smallest changes in the feature values for changing the diagnosis outcome to the predefined output.
Mathematical Modelling of Engineering Problems
Chronic kidney disease is one of the leading causes of death around the world. Early detection of... more Chronic kidney disease is one of the leading causes of death around the world. Early detection of chronic kidney disease is crucial to the reduction of mortality caused as a result of the disease. Machine learning methods are recently becoming popular for the detection of chronic kidney disease. This study investigates the influence of resampling for chronic kidney disease detection using an imbalanced chronic kidney disease dataset. Choosing an optimal feature subset for medical datasets is important for improving the performance of data-driven predictive models. The influence of imbalanced class distribution on predictive models has become an increasingly important topic due to the recent advances in automatic decision-making processes and the continuous expansion in the volume of the data collected by medical institutions. To address the identified research gap, an experimental evaluation of synthetic minority oversampling and near miss undersampling technique was performed on a real-world chronic kidney disease dataset using several classification methods such as decision tree, random forest, K-nearest neighbor, adaptive boosting, and support vector machine. The results demonstrate that a number of variables, including performance metrics, classification algorithm, and dataset characteristics, influence the best class distribution. The study also offers useful information about resampling methods for an imbalanced classification problem which will help improve classification accuracy.
International Journal of Power Electronics and Drive Systems, Jun 1, 2023
Feature selection improves the classification performance of machine learning models. It also ide... more Feature selection improves the classification performance of machine learning models. It also identifies the important features and eliminates those with little significance. Furthermore, feature selection reduces the dimensionality of training and testing data points. This study proposes a feature selection method that uses a multivariate sample similarity measure. The method selects features with significant contributions using a machine-learning model. The multivariate sample similarity measure is evaluated using the University of California, Irvine heart disease dataset and compared with existing feature selection methods. The multivariate sample similarity measure is evaluated with metrics such as minimum subset selected, accuracy, F1-score, and area under the curve (AUC). The results show that the proposed method is able to diagnose chest pain, thallium scan, and major vessels scanned using X-rays with a high capability to distinguish between healthy and heart disease patients with a 99.6% accuracy.
Iraqi Journal of Science
Heart disease identification is one of the most challenging task that requires highly experienced... more Heart disease identification is one of the most challenging task that requires highly experienced cardiologists. However, in developing nations such as Ethiopia, there are a few cardiologists and heart disease detection is more challenging. As an alternative solution to cardiologist, this study proposed a more effective model for heart disease detection by employing random forest and sequential feature selection (SFS). SFS is an effective approach to improve the performance of random forest model on heart disease detection. SFS removes unrelated features in heart disease dataset that tends to mislead random forest model on heart disease detection. Thus, removing inappropriate and duplicate features from the training set with sequential feature selection approach plays significant role in improving the performance of the proposed model. The proposed feature selection approach is evaluated using real world clinical heart disease dataset collected from University of California Irvine (...
Bulletin of Electrical Engineering and Informatics
This article evaluates the performance of the support vector machine (SVM), decision tree (DT), a... more This article evaluates the performance of the support vector machine (SVM), decision tree (DT), and random forest (RF) on the dataset that contains the medical records of 299 patients with heart failure (HF) collected at the Faisalabad Institute of Cardiology and the Allied hospital in Pakistan. The dataset contains 13 descriptive features of physical, clinical, and lifestyle information. The study compared the performance of three classification algorithms employing pre-processing techniques such as min-max scaling, and principal component analysis (PCA). The simulation result shows that the performance of the DT, and RF decreased with dimensionality reduction while the SVM improved with dimensionality reduction. The SVM achieved 84.44%. Thus, feature scaling improves the performance of the SVM. The RF performs at 82.22%, the DT at 81.11%, and the SVM shows an improvement of 1.64% with scaled features, compared to the original dataset.
Proceedings of Engineering and Technology Innovation
This study aims to investigate the effectiveness of local interpretable model-agnostic explanatio... more This study aims to investigate the effectiveness of local interpretable model-agnostic explanation (LIME) and Shapley additive explanation (SHAP) approaches for chronic heart disease detection. The efficiency of LIME and SHAP are evaluated by analyzing the diagnostic results of the XGBoost model and the stability and quality of counterfactual explanations. Firstly, 1025 heart disease samples are collected from the University of California Irvine. Then, the performance of LIME and SHAP is compared by using the XGBoost model with various measures, such as consistency and proximity. Finally, Python 3.7 programming language with Jupyter Notebook integrated development environment is used for simulation. The simulation result shows that the XGBoost model achieves 99.79% accuracy, indicating that the counterfactual explanation of the XGBoost model describes the smallest changes in the feature values for changing the diagnosis outcome to the predefined output.
Indonesian Journal of Electrical Engineering and Computer Science
The existing heart failure risk prediction models are developed based on machine learning predict... more The existing heart failure risk prediction models are developed based on machine learning predictors. The objective of this study is to identify the key risk factors that affect the survival time of heart patients and to develop a heart failure survival prediction model using the identified risk factors. A cox proportional hazard regression method is applied to generate the proposed heart failure survival model. We used the dataset from the University of California Irvine (UCI) clinical heart failure data repository. To develop the model we have used multiple risk factors such as age, anemia, creatinine phosphokinase, diabetes history, ejection fraction, presence of high blood pressure, platelet count, serum creatinine, sex, and smoking history. Among the risk factors, high blood pressure is identified as one of the novel risk factors for heart failure. We have validated the performance of the model via statistical and empirical validation. The experimental result shows that the pro...
Indonesian Journal of Electrical Engineering and Computer Science, 2022
Breast cancer is the most common type of cancer occurring mostly in females. In recent years, man... more Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedde...
Bulletin of Electrical Engineering and Informatics
The objective of this study is to evaluate the effectiveness of different regression models in co... more The objective of this study is to evaluate the effectiveness of different regression models in concrete compressive strength estimation. A concrete compressive strength dataset is employed for the estimation of the regressor models. Regression models such as linear regressor, ridge regressor, k-neighbors regressor, decision tree regressor, random forest regressor, gradient boosting regressor, AdaBoost regressor, and support vector regressor are used for developing the model that predicts the concrete strength. Cross-validation techniques and grid search are used to tune the parameters for better model performance. Python 3.8 programming language is used to conduct the experiment. The Performance evaluation result reveals that the gradient boosting regressor has better performance as compared to other models using root mean square error (RMSE).
International Journal of Informatics and Communication Technology (IJ-ICT), 2021
In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model.... more In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%
Bulletin of Electrical Engineering and Informatics, 2022
Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes... more Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes diagnosis. Because, reasoning the predictive outcome of model helps to understand why the model predicted an instance into diabetes positive or negative class. In recent years, highest predictive accuracy and promising result is achieved with simple linear model to complex deep neural network. However, the use of complex model such as ensemble and deep learning have trade-off between accuracy and interpretability. In response to the problem of interpretability, different approaches have been proposed to explain the predictive outcome of complex model. However, the relationship between the proposed approaches and the preferred approach for diabetes prediction is not clear. To address this problem, the authors aimed to implement and compare existing model interpretation approaches, local interpretable model agnostic explanation (LIME), shapely additive explanation (SHAP) and permutation feature importance by employing extreme boosting (XGBoost). Experiment is conducted on diabetes dataset with the aim of investigating the most influencing feature on model output. Overall, experimental result evidently appears to reveal that blood glucose has the highest impact on model prediction outcome.
International Journal of Electrical and Computer Engineering (IJECE), 2022
Heart disease is one of the most widely spreading and deadliest diseases across the world. In thi... more Heart disease is one of the most widely spreading and deadliest diseases across the world. In this study, we have proposed hybrid model for heart disease prediction by employing random forest and support vector machine. With random forest, iterative feature elimination is carried out to select heart disease features that improves predictive outcome of support vector machine for heart disease prediction. Experiment is conducted on the proposed model using test set and the experimental result evidently appears to prove that the performance of the proposed hybrid model is better as compared to an individual random forest and support vector machine. Overall, we have developed more accurate and computationally efficient model for heart disease prediction with accuracy of 98.3%. Moreover, experiment is conducted to analyze the effect of regularization parameter (C) and gamma on the performance of support vector machine. The experimental result evidently reveals that support vector machine is very sensitive to C and gamma.
International Journal of Electrical and Computer Engineering (IJECE), 2022
Heart disease is one of the most widely spreading and deadliest diseases across the world. In thi... more Heart disease is one of the most widely spreading and deadliest diseases across the world. In this study, we have proposed hybrid model for heart disease prediction by employing random forest and support vector machine. With random forest, iterative feature elimination is carried out to select heart disease features that improves predictive outcome of support vector machine for heart disease prediction. Experiment is conducted on the proposed model using test set and the experimental result evidently appears to prove that the performance of the proposed hybrid model is better as compared to an individual random forest and support vector machine. Overall, we have developed more accurate and computationally efficient model for heart disease prediction with accuracy of 98.3%. Moreover, experiment is conducted to analyze the effect of regularization parameter (C) and gamma on the performance of support vector machine. The experimental result evidently reveals that support vector machine...
International journal of health sciences
Recent years have seen an upsurge in the acceptance of illness diagnosis and prediction utilizing... more Recent years have seen an upsurge in the acceptance of illness diagnosis and prediction utilizing ML algorithms. A ML model can be employed in the diagnosis of breast cancer illness. In this research, an effective breast cancer prediction model with grid search approach is provided. Using the random forest approach, grid search is used to find the best n-estimator, which may provide the highest possible accuracy for predicting breast cancer. The accuracy of the suggested model can then be utilised to contrast its effectiveness to that of a standard RFM. The experimental result analysis demonstrates that the optimized model has 97.07 percent accuracy whereas the regular random forest technique has an accuracy of 94.73 percent in breast cancer detection.
Bulletin of Electrical Engineering and Informatics
Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes... more Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes diagnosis. Because, reasoning the predictive outcome of model helps to understand why the model predicted an instance into diabetes positive or negative class. In recent years, highest predictive accuracy and promising result is achieved with simple linear model to complex deep neural network. However, the use of complex model such as ensemble and deep learning have trade-off between accuracy and interpretability. In response to the problem of interpretability, different approaches have been proposed to explain the predictive outcome of complex model. However, the relationship between the proposed approaches and the preferred approach for diabetes prediction is not clear. To address this problem, the authors aimed to implement and compare existing model interpretation approaches, local interpretable model agnostic explanation (LIME), shapely additive explanation (SHAP) and permutation f...
IAES International Journal of Artificial Intelligence (IJ-AI), 2021
In this study, breast cancer prediction model is proposed with decision tree and adaptive boostin... more In this study, breast cancer prediction model is proposed with decision tree and adaptive boosting (Adboost). Furthermore, an extensive experimental evaluation of the predictive performance of the proposed model is conducted. The study is conducted on breast cancer dataset collected form the kaggle data repository. The dataset consists of 569 observations of which the 212 or 37.25% are benign or breast cancer negative and 62.74% are malignant or breast cancer positive. The class distribution shows that, the dataset is highly imbalanced and a learning algorithm such as decision tree is biased to the benign observation and results in poor performance on predicting the malignant observation. To improve the performance of the decision tree on the malignant observation, boosting algorithm namely, the adaptive boosting is employed. Finally, the predictive performance of the decision tree and adaptive boosting is analyzed. The analysis on predictive performance of the model on the kaggle b...
SN Computer Science, 2021
International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering, Apr 1, 2024
IAES International Journal of Artificial Intelligence, Sep 1, 2022
In recent years, machine learning is attaining higher precision and accuracy in clinical heart di... more In recent years, machine learning is attaining higher precision and accuracy in clinical heart disease dataset classification. However, literature shows that the quality of heart disease feature used for the training model has a significant impact on the outcome of the predictive model. Thus, this study focuses on exploring the impact of the quality of heart disease features on the performance of the machine learning model on heart disease prediction by employing recursive feature elimination with cross-validation (RFECV). Furthermore, the study explores heart disease features with a significant effect on model output. The dataset for experimentation is obtained from the University of California Irvine (UCI) machine learning dataset. The experiment is implemented using a support vector machine (SVM), logistic regression (LR), decision tree (DT), and random forest (RF) are employed. The performance of the SVM, LR, DT, and RF models. The result appears to prove that the quality of the feature significantly affects the performance of the model. Overall, the experiment proves that RF outperforms as compared to other algorithms. In conclusion, the predictive accuracy of 99.7% is achieved with RF.
International Journal of Electrical and Computer Engineering (IJECE)
This study investigates the Shapley additive explanation (SHAP) of the extreme boosting (XGBoost)... more This study investigates the Shapley additive explanation (SHAP) of the extreme boosting (XGBoost) model for breast cancer diagnosis. The study employed Wisconsin’s breast cancer dataset, characterized by 30 features extracted from an image of a breast cell. SHAP module generated different explainer values representing the impact of a breast cancer feature on breast cancer diagnosis. The experiment computed SHAP values of 569 samples of the breast cancer dataset. The SHAP explanation indicates perimeter and concave points have the highest impact on breast cancer diagnosis. SHAP explains the XGB model diagnosis outcome showing the features affecting the XGBoost model. The developed XGB model achieves an accuracy of 98.42%.
Proceedings of engineering and technology innovation, 2023
This study aims to investigate the effectiveness of local interpretable model-agnostic explanatio... more This study aims to investigate the effectiveness of local interpretable model-agnostic explanation (LIME) and Shapley additive explanation (SHAP) approaches for chronic heart disease detection. The efficiency of LIME and SHAP are evaluated by analyzing the diagnostic results of the XGBoost model and the stability and quality of counterfactual explanations. Firstly, 1025 heart disease samples are collected from the University of California Irvine. Then, the performance of LIME and SHAP is compared by using the XGBoost model with various measures, such as consistency and proximity. Finally, Python 3.7 programming language with Jupyter Notebook integrated development environment is used for simulation. The simulation result shows that the XGBoost model achieves 99.79% accuracy, indicating that the counterfactual explanation of the XGBoost model describes the smallest changes in the feature values for changing the diagnosis outcome to the predefined output.
Mathematical Modelling of Engineering Problems
Chronic kidney disease is one of the leading causes of death around the world. Early detection of... more Chronic kidney disease is one of the leading causes of death around the world. Early detection of chronic kidney disease is crucial to the reduction of mortality caused as a result of the disease. Machine learning methods are recently becoming popular for the detection of chronic kidney disease. This study investigates the influence of resampling for chronic kidney disease detection using an imbalanced chronic kidney disease dataset. Choosing an optimal feature subset for medical datasets is important for improving the performance of data-driven predictive models. The influence of imbalanced class distribution on predictive models has become an increasingly important topic due to the recent advances in automatic decision-making processes and the continuous expansion in the volume of the data collected by medical institutions. To address the identified research gap, an experimental evaluation of synthetic minority oversampling and near miss undersampling technique was performed on a real-world chronic kidney disease dataset using several classification methods such as decision tree, random forest, K-nearest neighbor, adaptive boosting, and support vector machine. The results demonstrate that a number of variables, including performance metrics, classification algorithm, and dataset characteristics, influence the best class distribution. The study also offers useful information about resampling methods for an imbalanced classification problem which will help improve classification accuracy.
International Journal of Power Electronics and Drive Systems, Jun 1, 2023
Feature selection improves the classification performance of machine learning models. It also ide... more Feature selection improves the classification performance of machine learning models. It also identifies the important features and eliminates those with little significance. Furthermore, feature selection reduces the dimensionality of training and testing data points. This study proposes a feature selection method that uses a multivariate sample similarity measure. The method selects features with significant contributions using a machine-learning model. The multivariate sample similarity measure is evaluated using the University of California, Irvine heart disease dataset and compared with existing feature selection methods. The multivariate sample similarity measure is evaluated with metrics such as minimum subset selected, accuracy, F1-score, and area under the curve (AUC). The results show that the proposed method is able to diagnose chest pain, thallium scan, and major vessels scanned using X-rays with a high capability to distinguish between healthy and heart disease patients with a 99.6% accuracy.
Iraqi Journal of Science
Heart disease identification is one of the most challenging task that requires highly experienced... more Heart disease identification is one of the most challenging task that requires highly experienced cardiologists. However, in developing nations such as Ethiopia, there are a few cardiologists and heart disease detection is more challenging. As an alternative solution to cardiologist, this study proposed a more effective model for heart disease detection by employing random forest and sequential feature selection (SFS). SFS is an effective approach to improve the performance of random forest model on heart disease detection. SFS removes unrelated features in heart disease dataset that tends to mislead random forest model on heart disease detection. Thus, removing inappropriate and duplicate features from the training set with sequential feature selection approach plays significant role in improving the performance of the proposed model. The proposed feature selection approach is evaluated using real world clinical heart disease dataset collected from University of California Irvine (...
Bulletin of Electrical Engineering and Informatics
This article evaluates the performance of the support vector machine (SVM), decision tree (DT), a... more This article evaluates the performance of the support vector machine (SVM), decision tree (DT), and random forest (RF) on the dataset that contains the medical records of 299 patients with heart failure (HF) collected at the Faisalabad Institute of Cardiology and the Allied hospital in Pakistan. The dataset contains 13 descriptive features of physical, clinical, and lifestyle information. The study compared the performance of three classification algorithms employing pre-processing techniques such as min-max scaling, and principal component analysis (PCA). The simulation result shows that the performance of the DT, and RF decreased with dimensionality reduction while the SVM improved with dimensionality reduction. The SVM achieved 84.44%. Thus, feature scaling improves the performance of the SVM. The RF performs at 82.22%, the DT at 81.11%, and the SVM shows an improvement of 1.64% with scaled features, compared to the original dataset.
Proceedings of Engineering and Technology Innovation
This study aims to investigate the effectiveness of local interpretable model-agnostic explanatio... more This study aims to investigate the effectiveness of local interpretable model-agnostic explanation (LIME) and Shapley additive explanation (SHAP) approaches for chronic heart disease detection. The efficiency of LIME and SHAP are evaluated by analyzing the diagnostic results of the XGBoost model and the stability and quality of counterfactual explanations. Firstly, 1025 heart disease samples are collected from the University of California Irvine. Then, the performance of LIME and SHAP is compared by using the XGBoost model with various measures, such as consistency and proximity. Finally, Python 3.7 programming language with Jupyter Notebook integrated development environment is used for simulation. The simulation result shows that the XGBoost model achieves 99.79% accuracy, indicating that the counterfactual explanation of the XGBoost model describes the smallest changes in the feature values for changing the diagnosis outcome to the predefined output.
Indonesian Journal of Electrical Engineering and Computer Science
The existing heart failure risk prediction models are developed based on machine learning predict... more The existing heart failure risk prediction models are developed based on machine learning predictors. The objective of this study is to identify the key risk factors that affect the survival time of heart patients and to develop a heart failure survival prediction model using the identified risk factors. A cox proportional hazard regression method is applied to generate the proposed heart failure survival model. We used the dataset from the University of California Irvine (UCI) clinical heart failure data repository. To develop the model we have used multiple risk factors such as age, anemia, creatinine phosphokinase, diabetes history, ejection fraction, presence of high blood pressure, platelet count, serum creatinine, sex, and smoking history. Among the risk factors, high blood pressure is identified as one of the novel risk factors for heart failure. We have validated the performance of the model via statistical and empirical validation. The experimental result shows that the pro...
Indonesian Journal of Electrical Engineering and Computer Science, 2022
Breast cancer is the most common type of cancer occurring mostly in females. In recent years, man... more Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedde...
Bulletin of Electrical Engineering and Informatics
The objective of this study is to evaluate the effectiveness of different regression models in co... more The objective of this study is to evaluate the effectiveness of different regression models in concrete compressive strength estimation. A concrete compressive strength dataset is employed for the estimation of the regressor models. Regression models such as linear regressor, ridge regressor, k-neighbors regressor, decision tree regressor, random forest regressor, gradient boosting regressor, AdaBoost regressor, and support vector regressor are used for developing the model that predicts the concrete strength. Cross-validation techniques and grid search are used to tune the parameters for better model performance. Python 3.8 programming language is used to conduct the experiment. The Performance evaluation result reveals that the gradient boosting regressor has better performance as compared to other models using root mean square error (RMSE).
International Journal of Informatics and Communication Technology (IJ-ICT), 2021
In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model.... more In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%
Bulletin of Electrical Engineering and Informatics, 2022
Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes... more Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes diagnosis. Because, reasoning the predictive outcome of model helps to understand why the model predicted an instance into diabetes positive or negative class. In recent years, highest predictive accuracy and promising result is achieved with simple linear model to complex deep neural network. However, the use of complex model such as ensemble and deep learning have trade-off between accuracy and interpretability. In response to the problem of interpretability, different approaches have been proposed to explain the predictive outcome of complex model. However, the relationship between the proposed approaches and the preferred approach for diabetes prediction is not clear. To address this problem, the authors aimed to implement and compare existing model interpretation approaches, local interpretable model agnostic explanation (LIME), shapely additive explanation (SHAP) and permutation feature importance by employing extreme boosting (XGBoost). Experiment is conducted on diabetes dataset with the aim of investigating the most influencing feature on model output. Overall, experimental result evidently appears to reveal that blood glucose has the highest impact on model prediction outcome.
International Journal of Electrical and Computer Engineering (IJECE), 2022
Heart disease is one of the most widely spreading and deadliest diseases across the world. In thi... more Heart disease is one of the most widely spreading and deadliest diseases across the world. In this study, we have proposed hybrid model for heart disease prediction by employing random forest and support vector machine. With random forest, iterative feature elimination is carried out to select heart disease features that improves predictive outcome of support vector machine for heart disease prediction. Experiment is conducted on the proposed model using test set and the experimental result evidently appears to prove that the performance of the proposed hybrid model is better as compared to an individual random forest and support vector machine. Overall, we have developed more accurate and computationally efficient model for heart disease prediction with accuracy of 98.3%. Moreover, experiment is conducted to analyze the effect of regularization parameter (C) and gamma on the performance of support vector machine. The experimental result evidently reveals that support vector machine is very sensitive to C and gamma.
International Journal of Electrical and Computer Engineering (IJECE), 2022
Heart disease is one of the most widely spreading and deadliest diseases across the world. In thi... more Heart disease is one of the most widely spreading and deadliest diseases across the world. In this study, we have proposed hybrid model for heart disease prediction by employing random forest and support vector machine. With random forest, iterative feature elimination is carried out to select heart disease features that improves predictive outcome of support vector machine for heart disease prediction. Experiment is conducted on the proposed model using test set and the experimental result evidently appears to prove that the performance of the proposed hybrid model is better as compared to an individual random forest and support vector machine. Overall, we have developed more accurate and computationally efficient model for heart disease prediction with accuracy of 98.3%. Moreover, experiment is conducted to analyze the effect of regularization parameter (C) and gamma on the performance of support vector machine. The experimental result evidently reveals that support vector machine...
International journal of health sciences
Recent years have seen an upsurge in the acceptance of illness diagnosis and prediction utilizing... more Recent years have seen an upsurge in the acceptance of illness diagnosis and prediction utilizing ML algorithms. A ML model can be employed in the diagnosis of breast cancer illness. In this research, an effective breast cancer prediction model with grid search approach is provided. Using the random forest approach, grid search is used to find the best n-estimator, which may provide the highest possible accuracy for predicting breast cancer. The accuracy of the suggested model can then be utilised to contrast its effectiveness to that of a standard RFM. The experimental result analysis demonstrates that the optimized model has 97.07 percent accuracy whereas the regular random forest technique has an accuracy of 94.73 percent in breast cancer detection.
Bulletin of Electrical Engineering and Informatics
Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes... more Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes diagnosis. Because, reasoning the predictive outcome of model helps to understand why the model predicted an instance into diabetes positive or negative class. In recent years, highest predictive accuracy and promising result is achieved with simple linear model to complex deep neural network. However, the use of complex model such as ensemble and deep learning have trade-off between accuracy and interpretability. In response to the problem of interpretability, different approaches have been proposed to explain the predictive outcome of complex model. However, the relationship between the proposed approaches and the preferred approach for diabetes prediction is not clear. To address this problem, the authors aimed to implement and compare existing model interpretation approaches, local interpretable model agnostic explanation (LIME), shapely additive explanation (SHAP) and permutation f...
IAES International Journal of Artificial Intelligence (IJ-AI), 2021
In this study, breast cancer prediction model is proposed with decision tree and adaptive boostin... more In this study, breast cancer prediction model is proposed with decision tree and adaptive boosting (Adboost). Furthermore, an extensive experimental evaluation of the predictive performance of the proposed model is conducted. The study is conducted on breast cancer dataset collected form the kaggle data repository. The dataset consists of 569 observations of which the 212 or 37.25% are benign or breast cancer negative and 62.74% are malignant or breast cancer positive. The class distribution shows that, the dataset is highly imbalanced and a learning algorithm such as decision tree is biased to the benign observation and results in poor performance on predicting the malignant observation. To improve the performance of the decision tree on the malignant observation, boosting algorithm namely, the adaptive boosting is employed. Finally, the predictive performance of the decision tree and adaptive boosting is analyzed. The analysis on predictive performance of the model on the kaggle b...
SN Computer Science, 2021