MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS - PubMed (original) (raw)
MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS
Paul F O'Reilly et al. PLoS One. 2012.
Abstract
The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.
Conflict of interest statement
Competing Interests: The authors have declared that no competing interests exist.
Figures
Figure 1. The power of MultiPhen in different scenarios of effect and correlation between phenotypes.
Power results based on simulations described in the text for MultiPhen (red lines) and the standard single-phenotype approach (black lines). Left panel: causal variant explains 0.5% of phenotypic variance of both phenotypes. Middle panel: causal variant explains 0.5% on the phenotypic variance of the first phenotype and 0.1% of the variance in the second phenotype. Right panel: causal variant explains 0.5% of phenotypic variance of the first phenotype and 0% of the second phenotype.
Figure 2. The correlation structure between pairs of lipids.
The left panel shows the correlation structure between total cholesterol (CHOL) and low-density lipoprotein (LDL) in 5655 individuals from the Northern Finland Birth Cohort 1966. Each circle depicts the value of CHOL (X-axis) and LDL (Y-axis) in mmol/L for each individual. The right panel shows the correlation structure between low-density lipoprotein (LDL) and high-density lipoprotein (HDL), in mmol/L, in the same individuals. The arrows in each plot show the direction of effect of a variant affecting only CHOL or only HDL, such that the genotypes of individuals underlying each plotted point are more likely to contain risk alleles for the labelled lipid moving through the points in the direction of the arrow. The diagonal arrows are based on the Friedewald Formula (Friedewald.72). The arrows indicate that effects of variants can be in very different directions in the 2-dimensional spaces shown; the aim of modelling and testing linear combinations of phenotypes is to capture effects in any direction.
Figure 3. Genome-wide significant results from standard GWAS approach and MultiPhen tested on combinations of the lipids using NFBC1966 data.
Each bar shows the number of SNPs reaching genome-wide significance for a given phenotype-combination analysis (specified by the first letters of each trait, such that CHL refers to an analysis on the CHOL, HDL and LDL), with the SNPs discovered by both the univariate approach and MultiPhen shown by the white segment of the bar, the SNPs discovered by the univariate approach only shown by the grey segment, and the SNPs discovered by MultiPhen only illustrated by the black segment. The bars labelled ALL2 and ALL3 combine results across analyses on all combinations of two and three lipid traits, respectively, while ALL combines the results across the analyses of all 2, 3 and 4 combinations of the traits. A complete breakdown of these results is presented in Tables S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15.
References
- Sladek R, Rocheleau G, Rung J, Dina C, Shen L, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885. - PubMed
- Sattar N, McConnachie A, Shaper AG, Blauw GJ, Buckley BM, et al. Can metabolic syndrome usefully predict cardiovascular disease and diabetes? Outcome data from two prospective studies. The Lancet. 2008;371:1927–1935. - PubMed
- Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat. Genet. 2009;41:18–24. - PubMed
Publication types
MeSH terms
Grants and funding
- R01 HL087679/HL/NHLBI NIH HHS/United States
- R01 MH063706/MH/NIMH NIH HHS/United States
- G0801056/MRC_/Medical Research Council/United Kingdom
- 5R01MH63706/MH/NIMH NIH HHS/United States
- 1RL1MH083268-01/MH/NIMH NIH HHS/United States
- 5R01HL087679-02/HL/NHLBI NIH HHS/United States
- RL1 MH083268/MH/NIMH NIH HHS/United States