A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant - PubMed (original) (raw)
A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant
Liwan Fu et al. Front Genet. 2021.
Abstract
As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.
Keywords: GWAS; hierarchical cluster; multiple phenotypes; obesity; score test.
Copyright © 2021 Fu, Wang, Li and Hu.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures
FIGURE 1
Power comparisons of the seven methods as a function of β in the six models. Sample size is N = 1,000, the number of phenotypes is M = 16, _c_2 = 0.5, ρ_c_2 = 0.1, and MAF = 0.3. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.
FIGURE 2
Power comparisons of the seven methods as a function of β in the six models. Sample size is N = 1,000, the number of phenotypes is M = 32, _c_2 = 0.5, ρc2 = 0.1, and MAF = 0.3. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.
FIGURE 3
Power comparisons of the seven methods as a function of _c_2 in the six models. Sample size is N = 1,000, the number of phenotypes is M = 16, ρc2 = 0.1 and MAF = 0.3. β = 0.09 for model 1 and 2; β = 0.08 for model 3; β = 0.1 for model 4 and 5; β = 0.07 for model 6. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.
FIGURE 4
Power comparisons of the seven methods as a function of c_2 in the six models. Sample size is N = 1,000, the number of phenotypes is M = 32, ρ_c2 = 0.1, and MAF = 0.3. β = 0.1 for model 1 and 4–6; β = 0.09 for model 2; β = 0.08 for model 3. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.
FIGURE 5
GO enrichment analysis of significant SNPs probability regulating associated genes expression. (A) Red, blue, and green bars indicate biology progress, cellular components, and molecular function categories, respectively. The numbers above the bar charts indicate the number of genes in each of the biological categories; (B) Bar charts of GO enrichment analysis; (C) Volcano plot of GO enrichment analysis. For more knowledge about GO enrichment, please check the website
http://geneontology.org/docs/go-enrichment-analysis/
.
FIGURE 6
KEGG enrichment analysis and PPI network diagram of significant SNPs probability regulating associated genes expression. (A) Bar chats of KEGG enrichment analysis; (B) Volcano plot of KEGG enrichment analysis; (C) PPI interaction network diagram, data are from
.
References
LinkOut - more resources
Full Text Sources
Research Materials