Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees - PubMed (original) (raw)

Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees

Mohamad Saad et al. Genet Epidemiol. 2014 Nov.

Abstract

In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.

Keywords: MCMC; association analysis; burden test; inheritance vectors; kernel statistic; mixed linear model; sequence data; variance components.

PubMed Disclaimer

Figures

Figure 1

Joint probabilities of possible genotypes (AA,Aa, aa) and their variances.

Figure 2

Correlation between allelic dosages obtained by GIGI and the true genotypes (x-axis) versus correlation between allelic dosages obtained by BEAGLE and the true genotypes (y-axis), for different bins of MAFs: A) LowLD pattern, B) HighLD pattern.

Figure 3

Correlation between allelic dosages obtained by GIGI+BEAGLE and the true genotypes (x-axes) versus correlation between allelic dosages obtained by: BEAGLE (first row figures), GIGI (second row figures), and the MAX between the correlations obtained by GIGI and BEAGLE (third row figures) with the true genotypes (y-axes). A) LowLD pattern, B) HighLD pattern. Left part of every LD pattern column figures: MAF>0.01; Right part of every LD pattern column figures: MAF<=0.01.

Figure 4

Power of famSKAT, famSKAT-B, famSKAT-RC, and famCMWS in the sequence data, under the LowLD pattern, for the different settings of number of associated and non-associated SNPs and the proportion of common SNPs among them;A) For a model with associated SNPs only:_A_=10, _fc_=0.3;_A_=10, _fc_=0.5;_A_=20, _fc_=0.3; and _A_=20, _fc_=0.5;B) For a model with associated and non-associated SNPs:_A_=10, _U_=20,_fc_=0.3;_A_=10, _U_=20,_fc_=0.5;_A_=20, _U_=40,_fc_=0.3; and_A_=20, _U_=40,fc_=0.5, where_fc is the proportion of common SNPs.

Figure 5

Power of famCMWS for the different imputation and the sequence data, for a model with associated SNPs only, for the different settings of number of associated SNPs and the proportion of common SNPs among them:_A_=10, _fc_=0.3;_A_=10, _fc_=0.5;_A_=20, _fc_=0.3; and _A_=20, _fc_=0.5, where fc is the proportion of common associated SNPs. A) LowLD pattern; B) HighLD pattern.

Figure 6

Power of famCMWS for the different combined imputation data (GIGI+BEAGLE, G+B+T, and G_S+B), under the LowLD pattern, for a model with associated SNPs only, for the different settings of number of associated SNPs and the proportion of common SNPs among them:_A_=10, _fc_=0.3;_A_=10, _fc_=0.5;_A_=20, _fc_=0.3; and _A_=20, _fc_=0.5, where fc is the proportion of common associated SNPs.

Cited by

How local reference panels improve imputation in French populations.
Herzig AF, Velo-Suárez L; FrEx Consortium; FranceGenRef Consortium; Dina C, Redon R, Deleuze JF, Génin E. Herzig AF, et al. Sci Rep. 2024 Jan 3;14(1):370. doi: 10.1038/s41598-023-49931-3. Sci Rep. 2024. PMID: 38172507 Free PMC article.
Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data.
Boutry S, Helaers R, Lenaerts T, Vikkula M. Boutry S, et al. PLoS Comput Biol. 2023 Sep 14;19(9):e1011488. doi: 10.1371/journal.pcbi.1011488. eCollection 2023 Sep. PLoS Comput Biol. 2023. PMID: 37708232 Free PMC article.
A joint use of pooling and imputation for genotyping SNPs.
Clouard C, Ausmees K, Nettelblad C. Clouard C, et al. BMC Bioinformatics. 2022 Oct 13;23(1):421. doi: 10.1186/s12859-022-04974-7. BMC Bioinformatics. 2022. PMID: 36229780 Free PMC article.
Burden of Type 2 Diabetes and Associated Cardiometabolic Traits and Their Heritability Estimates in Endogamous Ethnic Groups of India: Findings From the INDIGENIUS Consortium.
Venkatesan V, Lopez-Alvarenga JC, Arya R, Ramu D, Koshy T, Ravichandran U, Ponnala AR, Sharma SK, Lodha S, Sharma KK, Shaik MV, Resendez RG, Venugopal P, R P, Saju N, Ezeilo JA, Bejar C, Wander GS, Ralhan S, Singh JR, Mehra NK, Vadlamudi RR, Almeida M, Mummidi S, Natesan C, Blangero J, Medicherla KM, Thanikachalam S, Panchatcharam TS, Kandregula DK, Gupta R, Sanghera DK, Duggirala R, Paul SFD. Venkatesan V, et al. Front Endocrinol (Lausanne). 2022 Apr 14;13:847692. doi: 10.3389/fendo.2022.847692. eCollection 2022. Front Endocrinol (Lausanne). 2022. PMID: 35498404 Free PMC article.
Alternative Applications of Genotyping Array Data Using Multivariant Methods.
Samuels DC, Below JE, Ness S, Yu H, Leng S, Guo Y. Samuels DC, et al. Trends Genet. 2020 Nov;36(11):857-867. doi: 10.1016/j.tig.2020.07.006. Epub 2020 Aug 6. Trends Genet. 2020. PMID: 32773169 Free PMC article. Review.

References

1. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–73. - PMC - PubMed
1. Badke YM, Bates RO, Ernst CW, Schwab C, Fix J, Van Tassell CP, Steibel JP. Methods of tagSNP selection and other variables affecting imputation accuracy in swine. BMC Genet. 2013;14:8. - PMC - PubMed
1. Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, Frazer KA. Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res. 2010a;20(4):537–45. - PMC - PubMed
1. Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet. 2010b;11(11):773–85. - PMC - PubMed
1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701. - PMC - PubMed

Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees - PubMed (original) (raw)