A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC - PubMed (original) (raw)
A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC
Liwan Fu et al. Front Genet. 2022.
Abstract
Genome-wide association studies (GWASs) have successfully discovered numerous variants underlying various diseases. Generally, one-phenotype one-variant association study in GWASs is not efficient in identifying variants with weak effects, indicating that more signals have not been identified yet. Nowadays, jointly analyzing multiple phenotypes has been recognized as an important approach to elevate the statistical power for identifying weak genetic variants on complex diseases, shedding new light on potential biological mechanisms. Therefore, hierarchical clustering based on different methods for calculating correlation coefficients (HCDC) is developed to synchronously analyze multiple phenotypes in association studies. There are two steps involved in HCDC. First, a clustering approach based on the similarity matrix between two groups of phenotypes is applied to choose a representative phenotype in each cluster. Then, we use existing methods to estimate the genetic associations with the representative phenotypes rather than the individual phenotypes in every cluster. A variety of simulations are conducted to demonstrate the capacity of HCDC for boosting power. As a consequence, existing methods embedding HCDC are either more powerful or comparable with those of without embedding HCDC in most scenarios. Additionally, the application of obesity-related phenotypes from Atherosclerosis Risk in Communities via existing methods with HCDC uncovered several associated variants. Among these, _UQCC1_-rs1570004 is reported as a significant obesity signal for the first time, whose differential expression in subcutaneous fat, visceral fat, and muscle tissue is worthy of further functional studies.
Keywords: GWAS; bioinformatics; hierarchical clustering; multiple phenotypes; obesity.
Copyright © 2022 Fu, Wang, Li, Yang and Hu.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures
FIGURE 1
Power comparisons of the nine methods as a function of β in the four models. Sample size N = 2,000, the number of phenotypes M = 16 (A–D) and M = 32 (E–H), c2 = 0.5, ρc2 = 0.1, and MAF = 0.3. The power of all the methods is evaluated by 1,000 replicated samples at a significance level of 0.05.
FIGURE 2
The dendrogram of the nine phenotypes in the ARIC study via HCM (A) and HCDC (B). h represents the maximum value of correlation coefficient in each clustering process, which is taken as the “branch length” of the clustering tree. K reveals the final clustering times according to the stopping criteria. BMI is body mass index; Triceps is average skinfold thickness of triceps brachii; Scapular is mean subscapular skinfold thickness; WC is waist; HC is hip girth; WHR is waist-to-hip ratio; Calf is calf girth; and Wrist is wrist breadth.
FIGURE 3
The regional association plots of the significant SNPs identified in ARIC. The _p_-values of rs17017947 and rs41470552 are evaluated by MANOVA method. The _p_-values of 7927943 and rs1945647 are estimated by TATES method. The _p_-values of rs9939609, rs8050136, and rs201561 are assessed by HCDCMANOVA method. The _p_-values of rs1570004 is evaluated by HCDCTATES. LD is constructed using the hg19 version of the 1000 Genome (American). The plots where rs7927943 and rs1945647 are located show the 1,000-kb range around these most significant SNPs. The plots where the rest SNPs (rs7927943, rs1945647, rs9939609, rs8050136, rs201561, and rs1570004) are located present the 400-kb range around these identified significantly SNPs. SNP, single-nucleotide polymorphism.
FIGURE 4
The relationship between the genotypes of the significant SNPs discovered by HCDC method and eQTL in subcutaneous adipose tissue, visceral adipose tissue, and muscle tissue (data are from GTEx website).
References
- Author Anonymous (1989). The Atherosclerosis Risk in Communities (ARIC) Study: Design and Objectives. The ARIC Investigators. Am. J. Epidemiol. 129, 687–702. - PubMed
- Bates D. M., DebRoy S. (2004). Linear Mixed Models and Penalized Least Squares. J. Multivariate Anal. 91, 1–17. 10.1016/j.jmva.2004.04.013 - DOI
- Bien J., Wegkamp M. (2013). Discussion of “Correlated Variables in Regression: Clustering and Sparse Estimation”. J. Stat. Plan. Infer 143, 1859–1862. 10.1016/j.jspi.2013.05.020 - DOI
LinkOut - more resources
Full Text Sources
Research Materials