A general approach for haplotype phasing across the full spectrum of relatedness - PubMed (original) (raw)
. 2014 Apr 17;10(4):e1004234.
doi: 10.1371/journal.pgen.1004234. eCollection 2014 Apr.
Deepti Gurdasani 2, Olivier Delaneau 3, Nicola Pirastu 4, Sheila Ulivi 5, Massimiliano Cocca 6, Michela Traglia 6, Jie Huang 7, Jennifer E Huffman 8, Igor Rudan 9, Ruth McQuillan 9, Ross M Fraser 9, Harry Campbell 9, Ozren Polasek 10, Gershim Asiki 11, Kenneth Ekoru 12, Caroline Hayward 8, Alan F Wright 8, Veronique Vitart 8, Pau Navarro 8, Jean-Francois Zagury 12, James F Wilson 9, Daniela Toniolo 6, Paolo Gasparini 4, Nicole Soranzo 7, Manjinder S Sandhu 2, Jonathan Marchini 1
Affiliations
- PMID: 24743097
- PMCID: PMC3990520
- DOI: 10.1371/journal.pgen.1004234
A general approach for haplotype phasing across the full spectrum of relatedness
Jared O'Connell et al. PLoS Genet. 2014.
Abstract
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. Examples of inferred haplotypes with true recombination events and SEs.
In each examples , , and denotes the two parental and child haplotypes and denotes the pattern of gene flow. Top: Correctly inferred haplotypes in a region of a true recombination event that causes a transition in the duo HMM. The other 4 examples in the figure add SEs to these true parental and child haplotypes. Middle left: addition of a SE in the child's haplotypes that causes a transition. Middle right: addition of a SE in the parent's haplotypes that causes a transition. Bottom left: addition of a SE in the parent's haplotypes at the site of the recombination event that causes the transition to be missed. Bottom right: addition of a SE in both the child's and parent's haplotypes at the same position that causes a transition.
Figure 2. Summary of IBD sharing in cohorts.
Left: The proportion of heterozygote sites phased by SLRP for all individuals (pink) and when individuals with close relatives () are removed (blue). Right: The distributions of the average number of “surrogate” parents for each cohort when closely related pairs () are removed.
Figure 3. The duo HMM Viterbi paths for 50 father-child duos from the Val Borbera cohort on chromosome 10.
The four possible IBD states (A, B, C, D) are shown using colours pale blue, dark blue, light red and dark red respectively. The left and right panels show the results of the duo HMM applied to the SHAPEIT2 and Beagle haplotypes respectively. Changes between a blue and red colour correspond to a or transition, both of which imply a SE in the child. Changes of colour between light and dark blue or between light and dark red correspond to transitions, which correspond to a change on IBD state in the parent, and could be caused by a recombination or a SE in the parent. The x-axis shows the sex-averaged genetic distance across the chromosome in centiMorgans.
Figure 4. Switch error rates for individuals in extended pedigrees for different phasing pipelines across all European cohorts (chromosome 10).
Points are coloured according to what relationship was used by Beagle to phase that individual (red meaning no relationships were used). Left: Beagle using duo/trio phasing versus SHAPEIT2 using no relationships. Centre: Beagle using duo/trio phasing versus SHAPEIT2+duoHMM using no relationships. Right: Beagle using duo/trio phasing versus SHAPEIT2+duoHMM using no relationships when masking loci flagged as probable genotyping errors by the duoHMM. Switch error is reduced for both methods suggesting the masking is sensible.
Figure 5. Inferred gene flow by Merlin (purple) and our method (grey) for ten informative meioses on chromosome 10 taken from Val Borbera cohort pedigrees.
The light and dark purple represent genetic material from the grand-paternal and grand-maternal chromosomes (as inferred by Merlin's Viterbi algorithm), hence changing from light to dark implies a a recombination event. The grey rectangles contain the posterior probability (in black) of recombination from our method. The two methods broadly agree, although Merlin has inferred a number of implausibly small cross over events.
Figure 6. Distributions of the number of detected crossovers for all cohorts.
Only duos that were part of an informative pedigree were used. Top: The mean number of recombinations per meiosis (for all informative duos from all cohorts) found for each chromosome against the expected number (from the 2002 deCODE map) for paternal meioses (left) and maternal meioses (right). Merlin's values are substantially inflated whilst SHAPEIT2's are more consistent with the well known deCODE map genetic lengths. Bottom: Q-Q plots for the observed against expected number of recombinations estimated by each method for paternal meioses (left) and maternal meioses (right). For the expected distribution of recombination rates, a Poisson distribution using the genetic lengths from the 2002 deCODE Map was used (with rate parameter 42.81 and 25.9 for maternal and paternal recombinations respectively). SHAPEIT2's rates are less inflated than those of the Merlin.
Figure 7. Recombination detection accuracy in uninformative duos simulated from chromosome X data in the Val Borbera cohort.
The green values are for a cohort with nominally unrelated individuals and the orange values are for a cohort that has been filtered such that no individuals are closely related (). Left: The ROC curves for recombination detection in uninformative duos for our duo HMM using the SHAPEIT2 haplotypes. Right: The average number of correct detections against the average posterior probability. Setting a high probability threshold ensures a very low false discovery rate.
Similar articles
- HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination.
Zhang K, Sun F, Zhao H. Zhang K, et al. Bioinformatics. 2005 Jan 1;21(1):90-103. doi: 10.1093/bioinformatics/bth388. Epub 2004 Jul 1. Bioinformatics. 2005. PMID: 15231536 - Detection of recombination events, haplotype reconstruction and imputation of sires using half-sib SNP genotypes.
Ferdosi MH, Kinghorn BP, van der Werf JH, Gondro C. Ferdosi MH, et al. Genet Sel Evol. 2014 Feb 4;46(1):11. doi: 10.1186/1297-9686-46-11. Genet Sel Evol. 2014. PMID: 24495596 Free PMC article. - Efficient computation of minimum recombination with genotypes (not haplotypes).
Wu Y, Gusfield D. Wu Y, et al. J Bioinform Comput Biol. 2007 Apr;5(2a):181-200. doi: 10.1142/s0219720007002631. J Bioinform Comput Biol. 2007. PMID: 17589959 - Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE.
Dimitromanolakis A, Paterson AD, Sun L. Dimitromanolakis A, et al. Am J Hum Genet. 2019 Jul 3;105(1):78-88. doi: 10.1016/j.ajhg.2019.05.007. Epub 2019 Jun 6. Am J Hum Genet. 2019. PMID: 31178127 Free PMC article. - A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.
Miar Y, Sargolzaei M, Schenkel FS. Miar Y, et al. J Dairy Sci. 2017 Apr;100(4):2837-2849. doi: 10.3168/jds.2016-11590. Epub 2017 Feb 1. J Dairy Sci. 2017. PMID: 28161175
Cited by
- Genome-wide association analysis of cystatin c and creatinine kidney function in Chinese women.
Cai Y, Lv H, Yuan M, Wang J, Wu W, Fang X, Chen C, Mu J, Liu F, Gu X, Xie H, Liu Y, Xu H, Fan Y, Shen C, Ma X. Cai Y, et al. BMC Med Genomics. 2024 Nov 18;17(1):272. doi: 10.1186/s12920-024-02048-6. BMC Med Genomics. 2024. PMID: 39558362 Free PMC article. - Genome-wide local ancestry and the functional consequences of admixture in African and European cattle populations.
McHugo GP, Ward JA, Ng'ang'a SI, Frantz LAF, Salter-Townshend M, Hill EW, O'Gorman GM, Meade KG, Hall TJ, MacHugh DE. McHugo GP, et al. Heredity (Edinb). 2024 Nov 8. doi: 10.1038/s41437-024-00734-w. Online ahead of print. Heredity (Edinb). 2024. PMID: 39516247 - Discovery and prioritization of genetic determinants of kidney function in 297,355 individuals from Taiwan and Japan.
Chen HL, Chiang HY, Chang DR, Cheng CF, Wang CCN, Lu TP, Lee CY, Chattopadhyay A, Lin YT, Lin CC, Yu PT, Huang CF, Lin CH, Yeh HC, Ting IW, Tsai HK, Chuang EY, Tin A, Tsai FJ, Kuo CC. Chen HL, et al. Nat Commun. 2024 Oct 29;15(1):9317. doi: 10.1038/s41467-024-53516-7. Nat Commun. 2024. PMID: 39472450 Free PMC article. - Gut microbial and human genetic signatures of inflammatory bowel disease increase risk of comorbid mental disorders.
Lee J, Oh SJ, Ha E, Shin GY, Kim HJ, Kim K, Lee CK. Lee J, et al. NPJ Genom Med. 2024 Oct 29;9(1):52. doi: 10.1038/s41525-024-00440-w. NPJ Genom Med. 2024. PMID: 39472439 Free PMC article. - Two founder variants account for over 90% of pathogenic BRCA alleles in the Orkney and Shetland Isles in Scotland.
Kerr SM, Klaric L, Muckian MD, Cowan E, Snadden L, Tzoneva G, Shuldiner AR, Miedzybrodzka Z, Wilson JF. Kerr SM, et al. Eur J Hum Genet. 2024 Dec;32(12):1624-1631. doi: 10.1038/s41431-024-01704-w. Epub 2024 Oct 22. Eur J Hum Genet. 2024. PMID: 39438716 Free PMC article.
References
- Delaneau O, Zagury JF, Marchini J (2013) Improved whole-chromosome phasing for disease and population genetic studies. Nature Methods 10: 5–6. - PubMed
- Delaneau O, Marchini J, Zagury J (2011) A linear complexity phasing method for thousands of genomes. Nature Methods 9: 179–181. - PubMed
Publication types
MeSH terms
Grants and funding
- CZB/4/710/CSO_/Chief Scientist Office/United Kingdom
- G0901213/MRC_/Medical Research Council/United Kingdom
- MC_PC_U127561128/MRC_/Medical Research Council/United Kingdom
- MC_PC_U127592696/MRC_/Medical Research Council/United Kingdom
- G0801823/MRC_/Medical Research Council/United Kingdom
- MR/K013491/1/MRC_/Medical Research Council/United Kingdom
- WT_/Wellcome Trust/United Kingdom
- 090058/Z/09/Z/WT_/Wellcome Trust/United Kingdom
LinkOut - more resources
Full Text Sources
Other Literature Sources