A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes - PubMed (original) (raw)
A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes
Xiaoyu Liang et al. Genet Epidemiol. 2018 Jun.
Abstract
Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.
Keywords: association study; dimension reduction; hierarchical clustering; multiple phenotypes.
© 2018 WILEY PERIODICALS, INC.
Conflict of interest statement
COMPETING FINANCIAL INTERESTS
The authors declare no conflict of interest.
Figures
Figure 1
Power comparisons of the six tests (HCMANOVA, MANOVA, HCMultiPhen, MultiPhen, HCTATES, and TATES) for the power as a function of effect size β for 20 quantitative phenotypes. MAF is 0.3. The sample size is 5000. The number of replication is 1000. The within-factor correlation is 0.5 (c2=0.5) and the between-factor correlation is 0.1 (ρc2=0.1). The powers are evaluated at 5% significance level.
Figure 2
Power comparisons of the six tests (HCMANOVA, MANOVA, HCMultiPhen, MultiPhen, HCTATES, and TATES) for the power as a function of effect size β for 40 quantitative phenotypes. MAF is 0.3. The sample size is 5000. The number of replication is 1000. The within-factor correlation is 0.5 (c2=0.5) and the between-factor correlation is 0.1 (ρc2=0.1). The powers are evaluated at 5% significance level.
Figure 3
The dendrogram of the seven phenotypes in the COPDGene study.
References
- Bates DM, DebRoy S. Linear mixed models and penalized least squares. Journal of Multivariate Analysis. 2004;91:1–17.
- Bühlmann P, Rütimann P, van de Geer S, Zhang CH. Correlated variables in regression: clustering and sparse estimation. Journal of Statistical Planning and Inference. 2013;143:1835–1858.
- Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nature Methods. 2015;12:755–758. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources