A Unifying Framework for Evaluating the Predictive Power of Genetic Variants Based on the Level of Heritability Explained (original) (raw)

Using the Optimal Robust Receiver Operating Characteristic (ROC) Curve for Predictive Genetic Tests

Biometrics, 2010

Current ongoing genome-wide association studies represent a powerful approach to uncover common unknown genetic variants causing common complex diseases. The discovery of these genetic variants offers an important opportunity for early disease prediction, prevention and individualized treatment. We describe here a method of combining multiple genetic variants for early disease prediction, based on the optimality theory of the likelihood ratio. Such theory simply shows that the receiver operating characteristic (ROC) curve based on the likelihood ratio (LR) has maximum performance at each cutoff point and that the area under the ROC curve (AUC) so obtained is highest among that of all approaches. Through simulations and a real data application, we compared it with the commonly used logistic regression and classification tree approaches. The three approaches show similar performance if we know the underlying disease model. However, for most common diseases we have little prior knowledge of the disease model and in this situation the new method has an advantage over logistic regression and classification tree approaches. We applied the new method to the Type 1 diabetes genome-wide association data from the Wellcome Trust Case Control Consortium. Based on five single nucleotide polymorphisms (SNPs), the test reaches medium level classification accuracy. With more genetic findings to be discovered in the future, we believe a predictive genetic test for Type 1 diabetes can be successfully constructed and eventually implemented for clinical use.

Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

PLoS ONE, 2008

Background: The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy.

Predictive genetic testing for the identification of high-risk groups: A simulation study on the impact of predictive ability

Genome Medicine, 2011

Background: Genetic risk models could potentially be useful in identifying high-risk groups for the prevention of complex diseases. We investigated the performance of this risk stratification strategy by examining epidemiological parameters that impact the predictive ability of risk models. Methods: We assessed sensitivity, specificity, and positive and negative predictive value for all possible risk thresholds that can define high-risk groups and investigated how these measures depend on the frequency of disease in the population, the frequency of the high-risk group, and the discriminative accuracy of the risk model, as assessed by the area under the receiver-operating characteristic curve (AUC). In a simulation study, we modeled genetic risk scores of 50 genes with equal odds ratios and genotype frequencies, and varied the odds ratios and the disease frequency across scenarios. We also performed a simulation of age-related macular degeneration risk prediction based on published odds ratios and frequencies for six genetic risk variants. Results: We show that when the frequency of the high-risk group was lower than the disease frequency, positive predictive value increased with the AUC but sensitivity remained low. When the frequency of the high-risk group was higher than the disease frequency, sensitivity was high but positive predictive value remained low. When both frequencies were equal, both positive predictive value and sensitivity increased with increasing AUC, but higher AUC was needed to maximize both measures. Conclusions: The performance of risk stratification is strongly determined by the frequency of the high-risk group relative to the frequency of disease in the population. The identification of high-risk groups with appreciable combinations of sensitivity and positive predictive value requires higher AUC.

Incremental value of rare genetic variants for the prediction of multifactorial diseases

Genome Medicine, 2013

Background: It is often assumed that rare genetic variants will improve available risk prediction scores. We aimed to estimate the added predictive ability of rare variants for risk prediction of common diseases in hypothetical scenarios. Methods: In simulated data, we constructed risk models with an area under the ROC curve (AUC) ranging between 0.50 and 0.95, to which we added a single variant representing the cumulative frequency and effect (odds ratio, OR) of multiple rare variants. The frequency of the rare variant ranged between 0.0001 and 0.01 and the OR between 2 and 10. We assessed the resulting AUC, increment in AUC, integrated discrimination improvement (IDI), net reclassification improvement (NRI(>0.01)) and categorical NRI. The analyses were illustrated by a simulation of atrial fibrillation risk prediction based on a published clinical risk model. Results: We observed minimal improvement in AUC with the addition of rare variants. All measures increased with the frequency and OR of the variant, but maximum increment in AUC remained below 0.05. Increment in AUC and NRI(>0.01) decreased with higher AUC of the baseline model, whereas IDI remained constant. In the atrial fibrillation example, the maximum increment in AUC was 0.02 for a variant with frequency = 0.01 and OR = 10. IDI and NRI showed at most minimal increase for variants with frequency greater than or equal to 0.005 and OR greater than or equal to 5. Conclusions: Since rare variants are present in only a minority of affected individuals, their predictive ability is generally low at the population level. To improve the predictive ability of clinical risk models for complex diseases, genetic variants must be common and have substantial effect on disease risk.

Analytical and simulation methods for estimating the potential predictive ability of genetic profiling: a comparison of methods and results

European Journal of Human Genetics, 2012

Various modeling methods have been proposed to estimate the potential predictive ability of polygenic risk variants that predispose to various common diseases. However, it is unknown whether differences between them affect their conclusions on predictive ability. We reviewed input parameters, assumptions and output of the five most common methods and compared their estimates of the area under the receiver operating characteristic (ROC) curve (AUC) using hypothetical data representing effect sizes and frequencies of genetic variants, population disease risk and number of variants. To assess the accuracy of the estimated AUCs, we aimed to reproduce the AUCs of published empirical studies. All methods assumed that the combined effect of genetic variants on disease risk followed a multiplicative risk model of independent genetic effects, but they either assumed per allele, per genotype or dominant/recessive effects for the genetic variants. Modeling strategy and input parameters differed. Methods used simulation analysis or analytical formulas with effect sizes quantified by odds ratios (ORs) or relative risks. Estimated AUC values were similar for lower ORs (o1.2). When AUCs were larger (40.7) due to variants with strong effects, differences in estimated AUCs between methods increased. The simulation methods accurately reproduced the AUC values of empirical studies, but the analytical methods did not. We conclude that despite differences in input parameters, the modeling methods estimate similar AUC for realistic values of the ORs. When one or more variants have stronger effects and AUC values are higher, the simulation methods tend to be more accurate.

Genomic Prediction of Complex Disease Risk

We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistive) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~ 0.58-0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of PGS) with 3 - 8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE ...

Predictive testing for complex diseases using multiple genes: Fact or fiction?

Genetics in Medicine, 2006

There is ongoing debate about whether testing low-risk genes at multiple loci will be useful in clinical care and public health. We investigated the usefulness of multiple genetic testing using simulated data. Methods: Usefulness was evaluated by the area under the receiver-operating characteristic curve (AUC), which indicates the accuracy of genetic profiling in discriminating between future patients and nonpatients. The AUC was investigated in relation to the number of genes assumed to be involved, the risk allele frequency, the odds ratio of the risk genotypes, and to the proportion of variance explained by genetic factors as an approximation of the heritability of the disease. Results: We demonstrated that a high (AUC Ͼ 0.80) to excellent discriminative accuracy (AUC Ͼ 0.95) can be obtained by simultaneously testing multiple susceptibility genes. A higher discriminative accuracy is obtained when genetic factors play a larger role in the disease, as indicated by the proportion of explained variance. The maximum discriminative accuracy of future genetic profiling can be estimated at present from the heritability and prevalence of disease. Conclusions: Genetic profiling may have the potential to identify individuals at higher risk of disease depending on the prevalence and heritability of the disease.

Genetic-based prediction of disease traits: prediction is very difficult, especially about the future

Frontiers in genetics, 2014

Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued...