Combining multiple biomarkers linearly to maximize the partial area under the ROC curve (original) (raw)

Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve

Computational Statistics & Data Analysis, 2015

In clinical practices, it is common that several biomakers are related to a specific disease and each single marker does not have enough diagnostic power. An effective way to improve the diagnostic accuracy is to combine multiple markers. It is known that the area under the receiver operating characteristic curve (AUC) is very popular for evaluation of a diagnostic tool. Su and Liu (1993) derived the best linear combination that maximizes AUC when the markers are multivariate normally distributed. However, there are many applications that do not operate in the entire range of the curve, but only in particular regions of it, for example, high specificity regions. In these cases, it is more practical to analyze the partial area under the curve (pAUC). In this paper, we propose two easy-implemented algorithms, to find the best linear combination of multiple biomarkers that optimizes the pAUC, for given range of specificity. Analysis of synthesized and real datasets shows that the proposed algorithms achieve larger predictive pAUC values on future observations than existing methods, such as Su and Liu's method, logistic regression and others.

Biomarker selection for medical diagnosis using the partial area under the ROC curve

BMC Research Notes, 2014

Background: A biomarker is usually used as a diagnostic or assessment tool in medical research. Finding an ideal biomarker is not easy and combining multiple biomarkers provides a promising alternative. Moreover, some biomarkers based on the optimal linear combination do not have enough discriminatory power. As a result, the aim of this study was to find the significant biomarkers based on the optimal linear combination maximizing the pAUC for assessment of the biomarkers.

ROC curve inference for best linear combination of two biomarkers subject to limits of detection

Biometrical Journal, 2011

The receiver operating characteristic (ROC) curve is a tool commonly used to evaluate biomarker utility in clinical diagnosis of disease. Often, multiple biomarkers are developed to evaluate the discrimination for the same outcome. Levels of multiple biomarkers can be combined via best linear combination (BLC) such that their overall discriminatory ability is greater than any of them individually. Biomarker measurements frequently have undetectable levels below a detection limit sometimes denoted as limit of detection (LOD). Ignoring observations below the LOD or substituting some replacement value as a method of correction has been shown to lead to negatively biased estimates of the area under the ROC curve for some distributions of single biomarkers. In this paper, we develop asymptotically unbiased estimators, via the maximum likelihood technique, of the area under the ROC curve of BLC of two bivariate normally distributed biomarkers affected by LODs. We also propose confidence intervals for this area under curve. Point and confidence interval estimates are scrutinized by simulation study, recording bias and root mean square error and coverage probability, respectively. An example using polychlorinated biphenyl (PCB) levels to classify women with and without endometriosis illustrates the potential benefits of our methods.

A Comparison of Parametric and Nonparametric Approaches to ROC Analysis of Quantitative Diagnostic Tests

Medical Decision Making, 1997

Receiver operating characteristic (ROC) analysis, which yields indices of accuracy such as the area under the curve (AUC), is increasingly being used to evaluate the performances of diagnostic tests that produce results on continuous scales. Both parametric and nonparametric ROC approaches are available to assess the discriminant capacity of such tests, but there are no clear guidelines as to the merits of each, particularly with non-binormal data. Investigators may worry that when data are non-Gaussian, estimates of diagnostic accuracy based on a binormal model may be distorted. The authors conducted a Monte Carlo simulation study to compare the bias and sampling variability in the estimates of the AUCs derived from parametric and nonparametric procedures. Each approach was assessed in data sets generated from various configurations of pairs of overlapping distributions; these included the binormal model and non-binormal pairs of distributions where one or both pair members were mixtures of Gaussian (MG) distributions with different degrees of departures from binormality. The biases in the estimates of the AUCs were found to be very small for both parametric and nonparametrlc procedures. The two approaches yielded very close estimates of the AUCs and of the corresponding sampling variability even when data were generated from non-binormal models. Thus, for a wide range of distributions, concern about bias or imprecision of the estimates of the AUC should not be a major factor in choosing between the nonparametric and parametric approaches. Key words: ROC analysis; quantitative diagnostic test; comparison, parametric; binormal model; LABROC; nonparametric procedure; area under the curve (AUC). M e d Decis Making 1997;17:94-102) During the past ten years, receiver operator characteristic (ROC) analysis has become a popular method for evaluating the accuracy/performance of medical diagnostic tests. 1-3 The most attractive property of ROC analysis is that the accuracy indices derived from this technique are not distorted by fluctuations caused by the use of an arbitrarily chosen decision "criterion" or "cutoff." 4-8 One index available from an ROC analysis, the area under the curve"' (AUC), measures the ability of a diagnostic

Combining binary and continuous biomarkers by maximizing the area under the receiver operating characteristic curve

Communications in Statistics - Simulation and Computation, 2020

In any clinical case, a decision is made with the maximum possible accuracy. To achieve such accuracy, in the presence of multiple diagnostic tests or biomarkers, biomarker combinations aim to achieve maximum accuracy. As existing biomarker combination methods combine only continuous biomarkers, therefore in this study biomarker combination for binary biomarkers was created by suggesting an approach using Youden's J statistic for combining binary biomarkers. The proposed approach will facilitate binary and continuous biomarker combinations. A simulation study was conducted to compare the performance of our proposed combination approach according to different sample sizes. Both in the analysis of real data and the simulation studies for different samples, the proposed approach has been shown to yield favorable results and higher area under the curve.

Marker selection via maximizing the partial area under the ROC curve of linear risk scores

Biostatistics, 2011

Rather than viewing receiver operating characteristic (ROC) curves directly to compare the performances of diagnostic methods, the whole and the partial areas under the ROC curve (area under the ROC curve [AUC] and partial area under the ROC curve [pAUC]) are 2 of the most popularly used summaries of the curve. Moreover, when high specificity is a prerequisite, as in some medical diagnostics, pAUC is preferable. In this paper, we propose a wrapper-type algorithm to select the best linear combination of markers that has high sensitivity within a confined specificity range. The markers selected by the proposed algorithm are different from those selected by AUC-based algorithms and therefore provide different information for further studies. Most notably, for example, within the given range of specificity, the markers selected by the proposed algorithm always have higher individual sensitivities than those selected by other AUC-based methods. This characteristic makes the proposed method a good addition to existing methods. Without assuming the underlying distributions of markers, we prove that the pAUC obtained with the proposed algorithm is a strongly consistent estimate of the true pAUC and then illustrate its performance with numerical studies using synthesized data and 2 real examples. The results are compared with those obtained by its AUC-based counterpart. We found that the classification performance of the final classifier based on the selected markers is very competitive.

Generalized ROC curve inference for a biomarker subject to a limit of detection and measurement error

Statistics in Medicine, 2009

The receiver operating characteristic (ROC) curve is a tool commonly used to evaluate biomarker utility in clinical diagnosis of disease, especially during biomarker development research. Emerging biomarkers are often measured with random measurement error and subject to limits of detection that hinder their potential utility or mask an ability to discriminate by negatively biasing the estimates of ROC curves and subsequent area under the curve. Methods have been developed to correct the ROC curve for each of these types of sources of bias but here we develop a method by which the ROC curve is corrected for both simultaneously through replicate measures and maximum likelihood. Our method is evaluated via simulation study and applied to two potential discriminators of women with and without preeclampsia.

Compare diagnostic tests using transformation-invariant smoothed ROC curves

Journal of Statistical Planning and Inference, 2010

Receiver operating characteristic (ROC) curve, plotting true positive rates against false positive rates as threshold varies, is an important tool for evaluating biomarkers in diagnostic medicine studies. By definition, ROC curve is monotone increasing from 0 to 1 and is invariant to any monotone transformation of test results. And it is often a curve with certain level of smoothness when test results from the diseased and non-diseased subjects follow continuous distributions. Most existing ROC curve estimation methods do not guarantee all of these properties. One of the exceptions is Du and Tang (2009) which applies certain monotone spline regression procedure to empirical ROC estimates. However, their method does not consider the inherent correlations between empirical ROC estimates. This makes the derivation of the asymptotic properties very difficult. In this paper we propose a penalized weighted least square estimation method, which incorporates the covariance between empirical ROC estimates as a weight matrix. The resulting estimator satisfies all the aforementioned properties, and we show that it is also consistent. Then a resampling approach is used to extend our method for comparisons of two or more diagnostic tests. Our simulations show a significantly improved performance over the existing method, especially for steep ROC curves. We then apply the proposed method to a cancer diagnostic study that compares several newly developed diagnostic biomarkers to a traditional one.

The linear combinations of biomarkers which maximize the partial area under the ROC curves

Computational Statistics, 2013

As biotechnology has made remarkable progress nowadays, there has also been a great improvement on data collection with lower cost and higher quality outcomes. More often than not investigators can obtain the measurements of many disease-related features simultaneously. When multiple potential biomarkers are available for constructing a diagnostic tool of a disease, an effective approach is to combine these biomarkers to build one single indicator. For continuous-scaled variables, the use of linear combinations is popular due to its easy interpretation. Su and Liu (J Ame Stat Assoc 88(424):1350-1355, 1993) derived the best linear combination under the criterion of the area under the receiver operating characteristic (ROC) curve, when the joint normality of biomarkers is assumed. However, in many investigations, the emphases are placed only on a limited extent of clinical relevancy, instead of the whole ROC curve. The goal of this study is to find the linear combination that maximizes the partial area under a ROC curve (pAUC) for a pre-specified range. In order to find an analytic solution, the first derivative of the pAUC under normal assumption is derived. The explicit form is so complicated, that a further validation on the Hessian matrix is difficult. On the other hand, we find that the pAUC maximizer may not be unique and local maximizers do exist in some cases. Consequently, the existing algorithms find an initial-point dependent solution and are inadequate to serve our needs. Hence, we propose a new algorithm by adopting several initial points at one time. Intensive numerical studies have been performed to show the adequacy of the proposed algorithm. Real examples are also provided for illustration.

Receiver Operator Characteristic Analysis of Biomarkers Evaluation in Diagnostic Research

JOURNAL OF CLINICAL AND DIAGNOSTIC RESEARCH, 2018

Receiver Operator Characteristic (ROC) analysis is the choice of method in evaluation of biomarkers in bioinformatics research. However, there is no single method and also no single accuracy index in evaluating diagnostic tools. This review provides an extensive illustration of different methods of ROC curve analysis that can be used in clinical practice of diagnostic studies. It includes their early use for rating data and the recent developments for quantitative data with a discussion of choice of model selection in parametric ROC analysis compared with non-parametric approach. The relevant methodological issues of these two alternative approaches have been discussed in terms of bias and sampling variability of Area under the curve (AUC) index that may influence on the performance of diagnostic tests. The methods were illustrated with two relevant clinical examples. The semi-parametric and parametric model of mixture of Gaussian is comparable with purely nonparametric approach. The choice between methods depends on practical conveniences unless the presence of severe departure from binormality. The recent new development and the gaps in knowledge concerning their behaviours in actual applications for medical researches and a guideline for future research have been discussed.