Poly-omic prediction of complex traits: OmicKriging - PubMed (original) (raw)
Poly-omic prediction of complex traits: OmicKriging
Heather E Wheeler et al. Genet Epidemiol. 2014 Jul.
Abstract
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods. We provide an R package to implement OmicKriging (http://www.scandb.org/newinterface/tools/OmicKriging.html).
Keywords: Kriging; complex trait prediction; polygenic modeling; polygenic prediction; systems biology.
© 2014 WILEY PERIODICALS, INC.
Figures
Figure 1
Kriging and whole-genome prediction connection. This figure shows the analogous relationships between components of the Kriging method used in geostatistics and whole-genome prediction. The prediction at an unobserved location (?) is computed as a weighted average of the variable at observed locations. The weights are functions of the correlation between the rainfall at the new location and the rainfalls at the observed locations. The closer the distance between each observed location and the new location, the higher the weight. In complex trait prediction, locations correspond to individuals, physical proximity corresponds to genetic relatedness. The correlation between two locations or individuals is the key component of this method. In animal breeding approaches, the genetic relatedness matrix or kinship matrix is used. In OmicKriging, a genetic relatedness matrix, a gene expression similarity matrix, or any combination of available high-throughput data similarity measures can be tested for complex trait prediction performance.
Figure 2
OmicKriging data integration and weighting. (A) The individual weights, depicted as w1 and w2, in the Kriging method are given by the product of the composite similarity matrix Σ and the correlation of omic data between the individual of unknown phenotype (?) and the individuals of known phenotype. (B) The composite similarity matrix Σ integrates different omic correlation matrices such as a genetic relationship matrix (GRM) derived from SNPs and a gene expression correlation matrix (GXM) derived from gene expression levels in this example. Σ also includes an environmental component, i.e. noise term (*). (C) In OmicKriging, we optimize the matrix weights, _θ_1 and _θ_2, by testing the θi values of the grid space depicted in color. (D) The optimal matrix weights θi give the highest values of AUC for binary traits and _R_2 for quantitative traits.
Figure 3
iGrowth prediction using OmicKriging. Predicted versus true iGrowth (n = 99) using (A) the optimally weighted gene expression matrix (GXM) alone, (B) the optimally weighted microRNA expression matrix (MXM) alone, and (C) the optimally weighted combination of the two matrices from the grid search. The solid black lines represent the slopes of the regression between the predicted and true values. The red dashed lines are the identity lines representing perfect prediction (slope 1, intercept 0). (D) Results of the grid search that shows that the best iGrowth prediction correlation (_R_2 = 0.48 [0.45, 0.52]) was obtained with (MXM, GXM) matrix weights of (0.1, 0.8). The _R_2 values presented in the contour plot are the mean values from 500 random samplings of the data into 16 cross-validation folds.
Figure 4
OmicKriging prediction performance for WTCCC disease risk prediction. Mean area under the ROC curve (AUC) for two implementations of OmicKriging for each disease from the WTCCC: a single common SNP genetic relationship matrix (OK:SingleGRM) and two optimallyweightedGRMsofcommonSNPsandknownloci(OK:DoubleGRM) for the predictions. The known loci were obtained from studies that did not include the WTCCC data to avoid overfitting. For comparison, we also show mean AUC results of the polygenic score method using genome-wide significant loci with 10 principal components (Baseline) and the lambda-optimized elastic-net penalized model (ElasticNet). Error bars represent the 95 % confidence intervals from multiple cross-validation runs (see Methods). BD, bipolar disorder; CAD, coronary artery disease; CD, Crohn's disease; HT, hypertension; RA, rheumatoid arthritis; T1D, type 1 diabetes; T2D, type 2 diabetes.
Similar articles
- TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits.
Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M, Gibson G, De Jager PL, Bennett DA, Wingo AP, Wingo TS, Yang J. Nagpal S, et al. Am J Hum Genet. 2019 Aug 1;105(2):258-266. doi: 10.1016/j.ajhg.2019.05.018. Epub 2019 Jun 20. Am J Hum Genet. 2019. PMID: 31230719 Free PMC article. - Polygenic modeling with bayesian sparse linear mixed models.
Zhou X, Carbonetto P, Stephens M. Zhou X, et al. PLoS Genet. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. Epub 2013 Feb 7. PLoS Genet. 2013. PMID: 23408905 Free PMC article. - Fast and accurate Bayesian polygenic risk modeling with variational inference.
Zabad S, Gravel S, Li Y. Zabad S, et al. Am J Hum Genet. 2023 May 4;110(5):741-761. doi: 10.1016/j.ajhg.2023.03.009. Epub 2023 Apr 7. Am J Hum Genet. 2023. PMID: 37030289 Free PMC article. - Polygenic Epidemiology.
Dudbridge F. Dudbridge F. Genet Epidemiol. 2016 May;40(4):268-72. doi: 10.1002/gepi.21966. Epub 2016 Apr 7. Genet Epidemiol. 2016. PMID: 27061411 Free PMC article. Review. - Genetic prediction of complex traits with polygenic scores: a statistical review.
Ma Y, Zhou X. Ma Y, et al. Trends Genet. 2021 Nov;37(11):995-1011. doi: 10.1016/j.tig.2021.06.004. Epub 2021 Jul 6. Trends Genet. 2021. PMID: 34243982 Free PMC article. Review.
Cited by
- Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data.
Hai Y, Ma J, Yang K, Wen Y. Hai Y, et al. Bioinformatics. 2023 Nov 1;39(11):btad647. doi: 10.1093/bioinformatics/btad647. Bioinformatics. 2023. PMID: 37882747 Free PMC article. - Deep Clinical Phenotyping of Schizophrenia Spectrum Disorders Using Data-Driven Methods: Marching towards Precision Psychiatry.
Habtewold TD, Hao J, Liemburg EJ, Baştürk N, Bruggeman R, Alizadeh BZ. Habtewold TD, et al. J Pers Med. 2023 Jun 5;13(6):954. doi: 10.3390/jpm13060954. J Pers Med. 2023. PMID: 37373943 Free PMC article. - A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data.
Wang X, Wen Y. Wang X, et al. Brief Bioinform. 2022 Jul 18;23(4):bbac193. doi: 10.1093/bib/bbac193. Brief Bioinform. 2022. PMID: 35649346 Free PMC article. - Incorporation of Trait-Specific Genetic Information into Genomic Prediction Models.
Shi S, Zhang Z, Li B, Zhang S, Fang L. Shi S, et al. Methods Mol Biol. 2022;2467:329-340. doi: 10.1007/978-1-0716-2205-6_11. Methods Mol Biol. 2022. PMID: 35451781 - Ten Years of EWAS.
Wei S, Tao J, Xu J, Chen X, Wang Z, Zhang N, Zuo L, Jia Z, Chen H, Sun H, Yan Y, Zhang M, Lv H, Kong F, Duan L, Ma Y, Liao M, Xu L, Feng R, Liu G, Project TE, Jiang Y. Wei S, et al. Adv Sci (Weinh). 2021 Oct;8(20):e2100727. doi: 10.1002/advs.202100727. Epub 2021 Aug 11. Adv Sci (Weinh). 2021. PMID: 34382344 Free PMC article. Review.
References
- Abraham G, Kowalczyk A, Zobel J, Inouye M. Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease. Genet Epidemiol. 2013;37(2):184–195. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- F32 CA165823/CA/NCI NIH HHS/United States
- U01 GM061393/GM/NIGMS NIH HHS/United States
- K12 CA139160/CA/NCI NIH HHS/United States
- UO1GM61393/GM/NIGMS NIH HHS/United States
- P60 DK20595/DK/NIDDK NIH HHS/United States
- T32 CA009594/CA/NCI NIH HHS/United States
- F32CA165823/CA/NCI NIH HHS/United States
- Wellcome Trust/United Kingdom
- R01 MH090937/MH/NIMH NIH HHS/United States
- P60 DK020595/DK/NIDDK NIH HHS/United States
- UL1 TR000430/TR/NCATS NIH HHS/United States
- K12CA139160/CA/NCI NIH HHS/United States
- P30 CA014599/CA/NCI NIH HHS/United States
- U19 HL065962/HL/NHLBI NIH HHS/United States
- P50 MH094267/MH/NIMH NIH HHS/United States
- P50MH094267/MH/NIMH NIH HHS/United States
- R01MH101820/MH/NIMH NIH HHS/United States
- U19 HL069757/HL/NHLBI NIH HHS/United States
- R01 MH101820/MH/NIMH NIH HHS/United States
- U19 HL069757-11/HL/NHLBI NIH HHS/United States
- U01 HL069757/HL/NHLBI NIH HHS/United States
- P30 CA014599-36/CA/NCI NIH HHS/United States
- P30 DK020595/DK/NIDDK NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources