Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering - PubMed (original) (raw)
. 2007 Nov;81(5):1084-97.
doi: 10.1086/521987. Epub 2007 Sep 21.
Affiliations
- PMID: 17924348
- PMCID: PMC2265661
- DOI: 10.1086/521987
Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
Sharon R Browning et al. Am J Hum Genet. 2007 Nov.
Abstract
Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.
Figures
Figure 1.
Example of a directed acyclic graph representing the localized haplotype-cluster model for four markers, with the haplotype counts given in table 1. For each marker, allele 1 is represented by a solid line, and allele 2 by a dashed line. The bold-line edges from the root node to the terminal node represent the haplotype 2112. The node marked by an asterisk (*) is the parent node for edge
e F
.
Figure 2.
Error rates for selected haplotype-phasing methods. Three classes of data were considered: low-density data with ∼1 SNP per 10 kb (left column), high-density data with ∼1 SNP per 3 kb (middle column), and Affymetrix 500K data for the WTCCC controls (right column). Within each plot, three sample sizes (n) are shown. Each row of graphs gives a different measure of accuracy (_Y_-axis). The relative error graphs show differences in error rate between each method and a reference method, which is Beagle with
_R_=25
samples per individual. All estimates are averaged across the data sets, with error bars showing ±2 SEs.
Similar articles
- Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.
Money D, Wilson D, Jenko J, Whalen A, Thorn S, Gorjanc G, Hickey JM. Money D, et al. Genet Sel Evol. 2020 Jul 8;52(1):38. doi: 10.1186/s12711-020-00558-2. Genet Sel Evol. 2020. PMID: 32640985 Free PMC article. - A haplotype inference algorithm for trios based on deterministic sampling.
Iliadis A, Watkinson J, Anastassiou D, Wang X. Iliadis A, et al. BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78. BMC Genet. 2010. PMID: 20727218 Free PMC article. - Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.
Freyman WA, McManus KF, Shringarpure SS, Jewett EM, Bryc K; 23 and Me Research Team; Auton A. Freyman WA, et al. Mol Biol Evol. 2021 May 4;38(5):2131-2151. doi: 10.1093/molbev/msaa328. Mol Biol Evol. 2021. PMID: 33355662 Free PMC article. - Missing data imputation and haplotype phase inference for genome-wide association studies.
Browning SR. Browning SR. Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11. Hum Genet. 2008. PMID: 18850115 Free PMC article. Review. - [Analysis and application of SNP and haplotype in the human genome].
Li J, Pan YC, Li YX, Shi TL. Li J, et al. Yi Chuan Xue Bao. 2005 Aug;32(8):879-89. Yi Chuan Xue Bao. 2005. PMID: 16231744 Review. Chinese.
Cited by
- Insights into trait-association of selection signatures and adaptive eQTL in indigenous African cattle.
Friedrich J, Liu S, Fang L, Prendergast J, Wiener P. Friedrich J, et al. BMC Genomics. 2024 Oct 19;25(1):981. doi: 10.1186/s12864-024-10852-8. BMC Genomics. 2024. PMID: 39425030 Free PMC article. - Genome-Wide Association Study on Body Conformation Traits in Xinjiang Brown Cattle.
Zhang M, Wang Y, Chen Q, Wang D, Zhang X, Huang X, Xu L. Zhang M, et al. Int J Mol Sci. 2024 Sep 30;25(19):10557. doi: 10.3390/ijms251910557. Int J Mol Sci. 2024. PMID: 39408884 Free PMC article. - Analysis of Gyimes Csango population samples on a high-resolution genome-wide basis.
Bánfai Z, Büki G, Ádám V, Sümegi K, Szabó A, Hadzsiev K, Erős K, Gallyas F, Miseta A, Kásler M, Melegh B. Bánfai Z, et al. BMC Genomics. 2024 Oct 7;25(1):942. doi: 10.1186/s12864-024-10833-x. BMC Genomics. 2024. PMID: 39375616 Free PMC article. - Genome-wide association testing beyond SNPs.
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Harris L, et al. Nat Rev Genet. 2024 Oct 7. doi: 10.1038/s41576-024-00778-y. Online ahead of print. Nat Rev Genet. 2024. PMID: 39375560 Review. - GWAS and polygenic risk score of severe COVID-19 in Eastern Europe.
Kovalenko E, Shaheen L, Vergasova E, Kamelin A, Rubinova V, Kharitonov D, Kim A, Plotnikov N, Elmuratov A, Borovkova N, Storozheva M, Solonin S, Gilyazova I, Mironov P, Khusnutdinova E, Petrikov S, Ilinskaya A, Ilinsky V, Rakitko A. Kovalenko E, et al. Front Med (Lausanne). 2024 Sep 19;11:1409714. doi: 10.3389/fmed.2024.1409714. eCollection 2024. Front Med (Lausanne). 2024. PMID: 39364016 Free PMC article.
References
Web Resources
- Beagle genetic analysis software package, http://www.stat.auckland.ac.nz/~browning/beagle/beagle.html
- WTCCC, http://www.wtccc.org.uk/
References
- Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources