An XGBoost Model for Age Prediction from COVID-19 Blood Test (original) (raw)

Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development

JMIR Medical Informatics, 2021

Background Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. Objective Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. Methods We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. Results Our work revealed that several clinical parameters that are measurable in blood...

Age-Stratified Analysis of COVID-19 Outcome Using Machine Learning Predictive Models

Healthcare

Since the emergence of COVID-19, most health systems around the world have experienced a series of spikes in the number of infected patients, leading to collapse of the health systems in many countries. The use of clinical laboratory tests can serve as a discriminatory method for disease severity, defining the profile of patients with a higher risk of mortality. In this paper, we study the results of applying predictive models to data regarding COVID-19 outcome, using three datasets after age stratification of patients. The extreme gradient boosting (XGBoost) algorithm was employed as the predictive method, yielding excellent results. The area under the receiving operator characteristic curve (AUROC) value was 0.97 for the subgroup of patients up to 65 years of age. In addition, SHAP (Shapley additive explanations) was used to analyze the feature importance in the resulting models.

A Machine Learning Model Reveals Older Age and Delayed Hospitalization as Predictors of Mortality in Patients with COVID-19

2020

ABSTRACTObjectiveThe recent pandemic of novel coronavirus disease 2019 (COVID-19) is increasingly causing severe acute respiratory syndrome (SARS) and significant mortality. We aim here to identify the risk factors associated with mortality of coronavirus infected persons using a supervised machine learning approach.Research Design and MethodsClinical data of 1085 cases of COVID-19 from 13th January to 28th February, 2020 was obtained from Kaggle, an online community of Data scientists. 430 cases were selected for the final analysis. Random Forest classification algorithm was implemented on the dataset to identify the important predictors and their effects on mortality.ResultsThe Area under the ROC curve obtained during model validation on the test dataset was 0.97. Age was the most important variable in predicting mortality followed by the time gap between symptom onset and hospitalization.ConclusionsPatients aged beyond 62 years are at higher risk of fatality whereas hospitalizati...

Early prediction of in-hospital death of COVID-19 patients: a machine-learning model based on age, blood analyses, and chest x-ray score

eLife, 2021

An early-warning model to predict in-hospital mortality on admission of COVID-19 patients at an emergency department (ED) was developed and validated using a machine-learning model. In total, 2782 patients were enrolled between March 2020 and December 2020, including 2106 patients (first wave) and 676 patients (second wave) in the COVID-19 outbreak in Italy. The first-wave patients were divided into two groups with 1474 patients used to train the model, and 632 to validate it. The 676 patients in the second wave were used to test the model. Age, 17 blood analytes, and Brescia chest X-ray score were the variables processed using a random forests classification algorithm to build and validate the model. Receiver operating characteristic (ROC) analysis was used to assess the model performances. A web-based death-risk calculator was implemented and integrated within the Laboratory Information System of the hospital. The final score was constructed by age (the most powerful predictor), b...

Machine Learning-Based COVID-19 Diagnosis by Demographic Characteristics and Clinical Data

Advances in Respiratory Medicine

Introduction: To facilitate rapid and effective diagnosis of COVID-19, effective screening can alleviate the challenges facing healthcare systems. We aimed to develop a machine learning-based prediction of COVID-19 diagnosis and design a graphical user interface (GUI) to diagnose COVID-19 cases by recording their symptoms and demographic features. Methods: We imple-mented different classification models including support vector machine (SVM), Decision tree (DT), Naïve Bayes (NB) and K-nearest neighbor (KNN) to predict the result of COVID-19 test for individ-uals. We trained these models by data of 16973 individuals (90% of all individuals included in data gathering) and tested by 1885 individuals (10% of all individuals). Maximum relevance minimum redundancy (MRMR) algorithms used to score features for prediction of result of COVID-19 test. A user-friendly GUI was designed to predict COVID-19 test results in individuals. Results: Study re-sults revealed that coughing had the highest...

COVID-19 diagnosis by routine blood tests using machine learning

Scientific Reports, 2021

Physicians taking care of patients with COVID-19 have described different changes in routine blood parameters. However, these changes hinder them from performing COVID-19 diagnoses. We constructed a machine learning model for COVID-19 diagnosis that was based and cross-validated on the routine blood tests of 5333 patients with various bacterial and viral infections, and 160 COVID-19-positive patients. We selected the operational ROC point at a sensitivity of 81.9% and a specificity of 97.9%. The cross-validated AUC was 0.97. The five most useful routine blood parameters for COVID-19 diagnosis according to the feature importance scoring of the XGBoost algorithm were: MCHC, eosinophil count, albumin, INR, and prothrombin activity percentage. t-SNE visualization showed that the blood parameters of the patients with a severe COVID-19 course are more like the parameters of a bacterial than a viral infection. The reported diagnostic accuracy is at least comparable and probably complementa...

COVID-19 Mortality Risk Prediction using Clinical and Laboratory Examination: Machine Learning Approach for Implementation

Background and Aim: We aimed to propose a mortality risk prediction tool to facilitate COVID-19 patient management and allocation for the frontline physician on admission day. Methods: We used a dataset of confirmed COVID-19 patients admitted to three general hospitals in Tehran. Clinical and laboratory values on admission were gathered. Different machine learning methods were used to assess the risk of in-hospital mortality, including logistic regression, k-nearest neighbor (KNN), gradient boosting classifier, random forest, support vector machine, and deep neural network (DNN). Least absolute shrinkage and selection operator (LASSO) regression and Boruta feature selection methods were used for feature selection. The proposed model was selected using the area under the receiver operator curve (AUC). Furthermore, a dataset from the fourth hospital was used for external validation. Results: 5320 hospitalized COVID-19 patients were enrolled in the study with a mean age of 61.6± 17.6 y...

Prediction of COVID-19 Diagnosis based on Symptoms

The COVID-19 pandemic has caused a significant global impact, creating a need for accurate prediction models. Such models can inform public health policies, guide resource allocation decisions, and assist individuals and society as a whole in decision-making. Several prediction models that combine multiple features to estimate infection risk have been developed using various data sources, such as case counts, testing rates, and demographic information. These models aim to assist healthcare professionals in triaging patients, especially in resource-limited settings. Our model accurately predicted COVID-19 test results using features such as sex, age 60 years, known contact with an infected person, and the presence of initial clinical symptoms. While no prediction model is perfect, they can provide valuable insights and contribute to the ongoing efforts to mitigate the impact of COVID-19.