A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant - PubMed (original) (raw)

A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant

Liwan Fu et al. Front Genet. 2021.

Abstract

As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.

Keywords: GWAS; hierarchical cluster; multiple phenotypes; obesity; score test.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1

Power comparisons of the seven methods as a function of β in the six models. Sample size is N = 1,000, the number of phenotypes is M = 16, _c_2 = 0.5, ρ_c_2 = 0.1, and MAF = 0.3. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.

FIGURE 2

Power comparisons of the seven methods as a function of β in the six models. Sample size is N = 1,000, the number of phenotypes is M = 32, _c_2 = 0.5, ρc2 = 0.1, and MAF = 0.3. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.

FIGURE 3

Power comparisons of the seven methods as a function of _c_2 in the six models. Sample size is N = 1,000, the number of phenotypes is M = 16, ρc2 = 0.1 and MAF = 0.3. β = 0.09 for model 1 and 2; β = 0.08 for model 3; β = 0.1 for model 4 and 5; β = 0.07 for model 6. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.

FIGURE 4

Power comparisons of the seven methods as a function of c_2 in the six models. Sample size is N = 1,000, the number of phenotypes is M = 32, ρ_c2 = 0.1, and MAF = 0.3. β = 0.1 for model 1 and 4–6; β = 0.09 for model 2; β = 0.08 for model 3. The power of all the seven methods is estimated by 1000 replicated samples at a significance level of 0.05.

FIGURE 5

GO enrichment analysis of significant SNPs probability regulating associated genes expression. (A) Red, blue, and green bars indicate biology progress, cellular components, and molecular function categories, respectively. The numbers above the bar charts indicate the number of genes in each of the biological categories; (B) Bar charts of GO enrichment analysis; (C) Volcano plot of GO enrichment analysis. For more knowledge about GO enrichment, please check the website

http://geneontology.org/docs/go-enrichment-analysis/

FIGURE 6

KEGG enrichment analysis and PPI network diagram of significant SNPs probability regulating associated genes expression. (A) Bar chats of KEGG enrichment analysis; (B) Volcano plot of KEGG enrichment analysis; (C) PPI interaction network diagram, data are from

https://www.string-db.org/

References

1. Ali A. M., Dawson S. J., Blows F. M., Provenzano E., Ellis I. O., Baglietto L., et al. (2011). Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer. Br. J. Cancer 104 693–699. 10.1038/sj.bjc.6606078 - DOI - PMC - PubMed
1. Aschard H., Vilhjálmsson B. J., Greliche N., Morange P.-E., Trégouët D.-A., Kraft P. (2014). Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am. J. Hum. Genet. 94 662–676. 10.1016/j.ajhg.2014.03.016 - DOI - PMC - PubMed
1. Bauchet M., McEvoy B., Pearson L. N., Quillen E. E., Sarkisian T., Hovhannesyan K., et al. (2007). Measuring European population stratification with microarray genotype data. Am. J. Hum. Genet. 80 948–956. 10.1086/513477 - DOI - PMC - PubMed
1. Berndt S. I., Gustafsson S., Mägi R., Ganna A., Wheeler E., Feitosa M. F., et al. (2013). Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 45 501–512. 10.1038/ng.2606 - DOI - PMC - PubMed
1. Bradfield J. P., Taal H. R., Timpson N. J., Scherag A., Lecoeur C., Warrington N. M., et al. (2012). A genome-wide association meta-analysis identifies new childhood obesity loci. Nat. Genet. 44 526–531. 10.1038/ng.2247 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant - PubMed (original) (raw)