Poly-omic prediction of complex traits: OmicKriging - PubMed (original) (raw)

Poly-omic prediction of complex traits: OmicKriging

Heather E Wheeler et al. Genet Epidemiol. 2014 Jul.

Abstract

High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods. We provide an R package to implement OmicKriging (http://www.scandb.org/newinterface/tools/OmicKriging.html).

Keywords: Kriging; complex trait prediction; polygenic modeling; polygenic prediction; systems biology.

© 2014 WILEY PERIODICALS, INC.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Kriging and whole-genome prediction connection. This figure shows the analogous relationships between components of the Kriging method used in geostatistics and whole-genome prediction. The prediction at an unobserved location (?) is computed as a weighted average of the variable at observed locations. The weights are functions of the correlation between the rainfall at the new location and the rainfalls at the observed locations. The closer the distance between each observed location and the new location, the higher the weight. In complex trait prediction, locations correspond to individuals, physical proximity corresponds to genetic relatedness. The correlation between two locations or individuals is the key component of this method. In animal breeding approaches, the genetic relatedness matrix or kinship matrix is used. In OmicKriging, a genetic relatedness matrix, a gene expression similarity matrix, or any combination of available high-throughput data similarity measures can be tested for complex trait prediction performance.

Figure 2

Figure 2

OmicKriging data integration and weighting. (A) The individual weights, depicted as w1 and w2, in the Kriging method are given by the product of the composite similarity matrix Σ and the correlation of omic data between the individual of unknown phenotype (?) and the individuals of known phenotype. (B) The composite similarity matrix Σ integrates different omic correlation matrices such as a genetic relationship matrix (GRM) derived from SNPs and a gene expression correlation matrix (GXM) derived from gene expression levels in this example. Σ also includes an environmental component, i.e. noise term (*). (C) In OmicKriging, we optimize the matrix weights, _θ_1 and _θ_2, by testing the θi values of the grid space depicted in color. (D) The optimal matrix weights θi give the highest values of AUC for binary traits and _R_2 for quantitative traits.

Figure 3

Figure 3

iGrowth prediction using OmicKriging. Predicted versus true iGrowth (n = 99) using (A) the optimally weighted gene expression matrix (GXM) alone, (B) the optimally weighted microRNA expression matrix (MXM) alone, and (C) the optimally weighted combination of the two matrices from the grid search. The solid black lines represent the slopes of the regression between the predicted and true values. The red dashed lines are the identity lines representing perfect prediction (slope 1, intercept 0). (D) Results of the grid search that shows that the best iGrowth prediction correlation (_R_2 = 0.48 [0.45, 0.52]) was obtained with (MXM, GXM) matrix weights of (0.1, 0.8). The _R_2 values presented in the contour plot are the mean values from 500 random samplings of the data into 16 cross-validation folds.

Figure 4

Figure 4

OmicKriging prediction performance for WTCCC disease risk prediction. Mean area under the ROC curve (AUC) for two implementations of OmicKriging for each disease from the WTCCC: a single common SNP genetic relationship matrix (OK:SingleGRM) and two optimallyweightedGRMsofcommonSNPsandknownloci(OK:DoubleGRM) for the predictions. The known loci were obtained from studies that did not include the WTCCC data to avoid overfitting. For comparison, we also show mean AUC results of the polygenic score method using genome-wide significant loci with 10 principal components (Baseline) and the lambda-optimized elastic-net penalized model (ElasticNet). Error bars represent the 95 % confidence intervals from multiple cross-validation runs (see Methods). BD, bipolar disorder; CAD, coronary artery disease; CD, Crohn's disease; HT, hypertension; RA, rheumatoid arthritis; T1D, type 1 diabetes; T2D, type 2 diabetes.

Similar articles

Cited by

References

    1. Abraham G, Kowalczyk A, Zobel J, Inouye M. SparSNP: fast and memory-efficient analysis of all SNPs for phenotype prediction. BMC Bioinformatics. 2012;13:88. - PMC - PubMed
    1. Abraham G, Kowalczyk A, Zobel J, Inouye M. Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease. Genet Epidemiol. 2013;37(2):184–195. - PubMed
    1. Barber MJ, Mangravite LM, Hyde CL, Chasman DI, Smith JD, McCarty CA, Li X, Wilke RA, Rieder MJ, Williams PT. Genome-wide association of lipid-lowering response to statins in combined study populations. PLoS One. 2010;5(3):e9763. others. - PMC - PubMed
    1. Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678. others. - PMC - PubMed
    1. Consortium TGP. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources