Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering - PubMed (original) (raw)

. 2007 Nov;81(5):1084-97.

doi: 10.1086/521987. Epub 2007 Sep 21.

Affiliations

PMID: 17924348
PMCID: PMC2265661
DOI: 10.1086/521987

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering

Sharon R Browning et al. Am J Hum Genet. 2007 Nov.

Abstract

Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.

PubMed Disclaimer

Figures

Figure 1.

Example of a directed acyclic graph representing the localized haplotype-cluster model for four markers, with the haplotype counts given in table 1. For each marker, allele 1 is represented by a solid line, and allele 2 by a dashed line. The bold-line edges from the root node to the terminal node represent the haplotype 2112. The node marked by an asterisk (*) is the parent node for edge

e F

Figure 2.

Error rates for selected haplotype-phasing methods. Three classes of data were considered: low-density data with ∼1 SNP per 10 kb (left column), high-density data with ∼1 SNP per 3 kb (middle column), and Affymetrix 500K data for the WTCCC controls (right column). Within each plot, three sample sizes (n) are shown. Each row of graphs gives a different measure of accuracy (_Y_-axis). The relative error graphs show differences in error rate between each method and a reference method, which is Beagle with

_R_=25

samples per individual. All estimates are averaged across the data sets, with error bars showing ±2 SEs.

Cited by

Insights into trait-association of selection signatures and adaptive eQTL in indigenous African cattle.
Friedrich J, Liu S, Fang L, Prendergast J, Wiener P. Friedrich J, et al. BMC Genomics. 2024 Oct 19;25(1):981. doi: 10.1186/s12864-024-10852-8. BMC Genomics. 2024. PMID: 39425030 Free PMC article.
Genome-Wide Association Study on Body Conformation Traits in Xinjiang Brown Cattle.
Zhang M, Wang Y, Chen Q, Wang D, Zhang X, Huang X, Xu L. Zhang M, et al. Int J Mol Sci. 2024 Sep 30;25(19):10557. doi: 10.3390/ijms251910557. Int J Mol Sci. 2024. PMID: 39408884 Free PMC article.
Analysis of Gyimes Csango population samples on a high-resolution genome-wide basis.
Bánfai Z, Büki G, Ádám V, Sümegi K, Szabó A, Hadzsiev K, Erős K, Gallyas F, Miseta A, Kásler M, Melegh B. Bánfai Z, et al. BMC Genomics. 2024 Oct 7;25(1):942. doi: 10.1186/s12864-024-10833-x. BMC Genomics. 2024. PMID: 39375616 Free PMC article.
Genome-wide association testing beyond SNPs.
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Harris L, et al. Nat Rev Genet. 2024 Oct 7. doi: 10.1038/s41576-024-00778-y. Online ahead of print. Nat Rev Genet. 2024. PMID: 39375560 Review.
GWAS and polygenic risk score of severe COVID-19 in Eastern Europe.
Kovalenko E, Shaheen L, Vergasova E, Kamelin A, Rubinova V, Kharitonov D, Kim A, Plotnikov N, Elmuratov A, Borovkova N, Storozheva M, Solonin S, Gilyazova I, Mironov P, Khusnutdinova E, Petrikov S, Ilinskaya A, Ilinsky V, Rakitko A. Kovalenko E, et al. Front Med (Lausanne). 2024 Sep 19;11:1409714. doi: 10.3389/fmed.2024.1409714. eCollection 2024. Front Med (Lausanne). 2024. PMID: 39364016 Free PMC article.

References

Web Resources

1. Beagle genetic analysis software package, http://www.stat.auckland.ac.nz/~browning/beagle/beagle.html
1. WTCCC, http://www.wtccc.org.uk/

References

1. Browning BL, Browning SR (2007) Efficient multilocus association mapping for whole genome association studies using localized haplotype clustering. Genet Epidemiol 31:365–37510.1002/gepi.20216 - DOI - PubMed
1. The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–132010.1038/nature04226 - DOI - PMC - PubMed
1. Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, et al (2006) A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet 78:437–450 - PMC - PubMed
1. Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927 - PubMed
1. Long JC, Williams RC, Urbanek M (1995) An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 56:799–810 - PMC - PubMed

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering - PubMed (original) (raw)

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering

Abstract

Figures

Similar articles

Cited by

References

Web Resources

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources