Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees - PubMed (original) (raw)

Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees

Mohamad Saad et al. Genet Epidemiol. 2014 Nov.

Abstract

In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.

Keywords: MCMC; association analysis; burden test; inheritance vectors; kernel statistic; mixed linear model; sequence data; variance components.

© 2014 WILEY PERIODICALS, INC.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Joint probabilities of possible genotypes (AA,Aa, aa) and their variances.

Figure 2

Figure 2

Correlation between allelic dosages obtained by GIGI and the true genotypes (x-axis) versus correlation between allelic dosages obtained by BEAGLE and the true genotypes (y-axis), for different bins of MAFs: A) LowLD pattern, B) HighLD pattern.

Figure 3

Figure 3

Correlation between allelic dosages obtained by GIGI+BEAGLE and the true genotypes (x-axes) versus correlation between allelic dosages obtained by: BEAGLE (first row figures), GIGI (second row figures), and the MAX between the correlations obtained by GIGI and BEAGLE (third row figures) with the true genotypes (y-axes). A) LowLD pattern, B) HighLD pattern. Left part of every LD pattern column figures: MAF>0.01; Right part of every LD pattern column figures: MAF<=0.01.

Figure 4

Figure 4

Power of famSKAT, famSKAT-B, famSKAT-RC, and famCMWS in the sequence data, under the LowLD pattern, for the different settings of number of associated and non-associated SNPs and the proportion of common SNPs among them;A) For a model with associated SNPs only:_A_=10, _fc_=0.3;_A_=10, _fc_=0.5;_A_=20, _fc_=0.3; and _A_=20, _fc_=0.5;B) For a model with associated and non-associated SNPs:_A_=10, _U_=20,_fc_=0.3;_A_=10, _U_=20,_fc_=0.5;_A_=20, _U_=40,_fc_=0.3; and_A_=20, _U_=40,fc_=0.5, where_fc is the proportion of common SNPs.

Figure 5

Figure 5

Power of famCMWS for the different imputation and the sequence data, for a model with associated SNPs only, for the different settings of number of associated SNPs and the proportion of common SNPs among them:_A_=10, _fc_=0.3;_A_=10, _fc_=0.5;_A_=20, _fc_=0.3; and _A_=20, _fc_=0.5, where fc is the proportion of common associated SNPs. A) LowLD pattern; B) HighLD pattern.

Figure 6

Figure 6

Power of famCMWS for the different combined imputation data (GIGI+BEAGLE, G+B+T, and G_S+B), under the LowLD pattern, for a model with associated SNPs only, for the different settings of number of associated SNPs and the proportion of common SNPs among them:_A_=10, _fc_=0.3;_A_=10, _fc_=0.5;_A_=20, _fc_=0.3; and _A_=20, _fc_=0.5, where fc is the proportion of common associated SNPs.

Similar articles

Cited by

References

    1. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–73. - PMC - PubMed
    1. Badke YM, Bates RO, Ernst CW, Schwab C, Fix J, Van Tassell CP, Steibel JP. Methods of tagSNP selection and other variables affecting imputation accuracy in swine. BMC Genet. 2013;14:8. - PMC - PubMed
    1. Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, Frazer KA. Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res. 2010a;20(4):537–45. - PMC - PubMed
    1. Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet. 2010b;11(11):773–85. - PMC - PubMed
    1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources