Biomarker selection for medical diagnosis using the partial area under the ROC curve (original) (raw)

Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve

Computational Statistics & Data Analysis, 2015

In clinical practices, it is common that several biomakers are related to a specific disease and each single marker does not have enough diagnostic power. An effective way to improve the diagnostic accuracy is to combine multiple markers. It is known that the area under the receiver operating characteristic curve (AUC) is very popular for evaluation of a diagnostic tool. Su and Liu (1993) derived the best linear combination that maximizes AUC when the markers are multivariate normally distributed. However, there are many applications that do not operate in the entire range of the curve, but only in particular regions of it, for example, high specificity regions. In these cases, it is more practical to analyze the partial area under the curve (pAUC). In this paper, we propose two easy-implemented algorithms, to find the best linear combination of multiple biomarkers that optimizes the pAUC, for given range of specificity. Analysis of synthesized and real datasets shows that the proposed algorithms achieve larger predictive pAUC values on future observations than existing methods, such as Su and Liu's method, logistic regression and others.

Combining multiple biomarkers linearly to maximize the partial area under the ROC curve

Statistics in medicine, 2017

It is now common in clinical practice to make clinical decisions based on combinations of multiple biomarkers. In this paper, we propose new approaches for combining multiple biomarkers linearly to maximize the partial area under the receiver operating characteristic curve (pAUC). The parametric and nonparametric methods that have been developed for this purpose have limitations. When the biomarker values for populations with and without a given disease follow a multivariate normal distribution, it is easy to implement our proposed parametric approach, which adopts an alternative analytic expression of the pAUC. When normality assumptions are violated, a kernel-based approach is presented, which handles multiple biomarkers simultaneously. We evaluated the proposed as well as existing methods through simulations and discovered that when the covariance matrices for the disease and nondisease samples are disproportional, traditional methods (such as the logistic regression) are more li...

Marker selection via maximizing the partial area under the ROC curve of linear risk scores

Biostatistics, 2011

Rather than viewing receiver operating characteristic (ROC) curves directly to compare the performances of diagnostic methods, the whole and the partial areas under the ROC curve (area under the ROC curve [AUC] and partial area under the ROC curve [pAUC]) are 2 of the most popularly used summaries of the curve. Moreover, when high specificity is a prerequisite, as in some medical diagnostics, pAUC is preferable. In this paper, we propose a wrapper-type algorithm to select the best linear combination of markers that has high sensitivity within a confined specificity range. The markers selected by the proposed algorithm are different from those selected by AUC-based algorithms and therefore provide different information for further studies. Most notably, for example, within the given range of specificity, the markers selected by the proposed algorithm always have higher individual sensitivities than those selected by other AUC-based methods. This characteristic makes the proposed method a good addition to existing methods. Without assuming the underlying distributions of markers, we prove that the pAUC obtained with the proposed algorithm is a strongly consistent estimate of the true pAUC and then illustrate its performance with numerical studies using synthesized data and 2 real examples. The results are compared with those obtained by its AUC-based counterpart. We found that the classification performance of the final classifier based on the selected markers is very competitive.

ROC curve inference for best linear combination of two biomarkers subject to limits of detection

Biometrical Journal, 2011

The receiver operating characteristic (ROC) curve is a tool commonly used to evaluate biomarker utility in clinical diagnosis of disease. Often, multiple biomarkers are developed to evaluate the discrimination for the same outcome. Levels of multiple biomarkers can be combined via best linear combination (BLC) such that their overall discriminatory ability is greater than any of them individually. Biomarker measurements frequently have undetectable levels below a detection limit sometimes denoted as limit of detection (LOD). Ignoring observations below the LOD or substituting some replacement value as a method of correction has been shown to lead to negatively biased estimates of the area under the ROC curve for some distributions of single biomarkers. In this paper, we develop asymptotically unbiased estimators, via the maximum likelihood technique, of the area under the ROC curve of BLC of two bivariate normally distributed biomarkers affected by LODs. We also propose confidence intervals for this area under curve. Point and confidence interval estimates are scrutinized by simulation study, recording bias and root mean square error and coverage probability, respectively. An example using polychlorinated biphenyl (PCB) levels to classify women with and without endometriosis illustrates the potential benefits of our methods.

The linear combinations of biomarkers which maximize the partial area under the ROC curves

Computational Statistics, 2013

As biotechnology has made remarkable progress nowadays, there has also been a great improvement on data collection with lower cost and higher quality outcomes. More often than not investigators can obtain the measurements of many disease-related features simultaneously. When multiple potential biomarkers are available for constructing a diagnostic tool of a disease, an effective approach is to combine these biomarkers to build one single indicator. For continuous-scaled variables, the use of linear combinations is popular due to its easy interpretation. Su and Liu (J Ame Stat Assoc 88(424):1350-1355, 1993) derived the best linear combination under the criterion of the area under the receiver operating characteristic (ROC) curve, when the joint normality of biomarkers is assumed. However, in many investigations, the emphases are placed only on a limited extent of clinical relevancy, instead of the whole ROC curve. The goal of this study is to find the linear combination that maximizes the partial area under a ROC curve (pAUC) for a pre-specified range. In order to find an analytic solution, the first derivative of the pAUC under normal assumption is derived. The explicit form is so complicated, that a further validation on the Hessian matrix is difficult. On the other hand, we find that the pAUC maximizer may not be unique and local maximizers do exist in some cases. Consequently, the existing algorithms find an initial-point dependent solution and are inadequate to serve our needs. Hence, we propose a new algorithm by adopting several initial points at one time. Intensive numerical studies have been performed to show the adequacy of the proposed algorithm. Real examples are also provided for illustration.

Combining binary and continuous biomarkers by maximizing the area under the receiver operating characteristic curve

Communications in Statistics - Simulation and Computation, 2020

In any clinical case, a decision is made with the maximum possible accuracy. To achieve such accuracy, in the presence of multiple diagnostic tests or biomarkers, biomarker combinations aim to achieve maximum accuracy. As existing biomarker combination methods combine only continuous biomarkers, therefore in this study biomarker combination for binary biomarkers was created by suggesting an approach using Youden's J statistic for combining binary biomarkers. The proposed approach will facilitate binary and continuous biomarker combinations. A simulation study was conducted to compare the performance of our proposed combination approach according to different sample sizes. Both in the analysis of real data and the simulation studies for different samples, the proposed approach has been shown to yield favorable results and higher area under the curve.

Receiver Operator Characteristic Analysis of Biomarkers Evaluation in Diagnostic Research

JOURNAL OF CLINICAL AND DIAGNOSTIC RESEARCH, 2018

Receiver Operator Characteristic (ROC) analysis is the choice of method in evaluation of biomarkers in bioinformatics research. However, there is no single method and also no single accuracy index in evaluating diagnostic tools. This review provides an extensive illustration of different methods of ROC curve analysis that can be used in clinical practice of diagnostic studies. It includes their early use for rating data and the recent developments for quantitative data with a discussion of choice of model selection in parametric ROC analysis compared with non-parametric approach. The relevant methodological issues of these two alternative approaches have been discussed in terms of bias and sampling variability of Area under the curve (AUC) index that may influence on the performance of diagnostic tests. The methods were illustrated with two relevant clinical examples. The semi-parametric and parametric model of mixture of Gaussian is comparable with purely nonparametric approach. The choice between methods depends on practical conveniences unless the presence of severe departure from binormality. The recent new development and the gaps in knowledge concerning their behaviours in actual applications for medical researches and a guideline for future research have been discussed.

Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation

Caspian journal of internal medicine, 2013

This review provides the basic principle and rational for ROC analysis of rating and continuous diagnostic test results versus a gold standard. Derived indexes of accuracy, in particular area under the curve (AUC) has a meaningful interpretation for disease classification from healthy subjects. The methods of estimate of AUC and its testing in single diagnostic test and also comparative studies, the advantage of ROC curve to determine the optimal cut off values and the issues of bias and confounding have been discussed.

Generalized ROC curve inference for a biomarker subject to a limit of detection and measurement error

Statistics in Medicine, 2009

The receiver operating characteristic (ROC) curve is a tool commonly used to evaluate biomarker utility in clinical diagnosis of disease, especially during biomarker development research. Emerging biomarkers are often measured with random measurement error and subject to limits of detection that hinder their potential utility or mask an ability to discriminate by negatively biasing the estimates of ROC curves and subsequent area under the curve. Methods have been developed to correct the ROC curve for each of these types of sources of bias but here we develop a method by which the ROC curve is corrected for both simultaneously through replicate measures and maximum likelihood. Our method is evaluated via simulation study and applied to two potential discriminators of women with and without preeclampsia.

Evaluation of strategies to combine multiple biomarkers in diagnostic testing

2012

A challenge in clinical medicine is that of correct diagnosis of disease. Medical researchers invest considerable time and effort to enhance accurate disease diagnosis. Diagnostic tests are important components in modern medical practice. The receiver operating characteristic (ROC) is a commonly used statistical tool for describing the discriminatory accuracy and performance of a diagnostic test. A popular summary index of discriminatory accuracy is the area under ROC curve (AUC). First of all, I thank ALLAH for his Grace and Mercy showered upon me. I heartily express my profound gratitude to my supervisors, Professor Henry G. Mwambi and Dr Lori E. Dodd, for their invaluable learned guidance, advises, encouragement, understanding and continued support they have provided me throughout the duration of my studies which led to the compilation of this thesis. I will be always indebted to them for introducing me to this fascinating area of application in health research and creating my interest in Biostatistics. I lovingly thank my dear husband Ayoub, who supported me each step of the way and without his help and encouragement it simply never would have been possible to finish this work. I also would like to thank my lovely parents Hanan and Balla for their continuous support and best wishes. I am grateful for the facilities made available to me by the School of Mathematics, Statistics and Computer Science of the University of KwaZulu-Natal (UKZN), Pietermaritzburg. I am also grateful for the financial support that I have received from UKZN. My thanks extend to Professor