Fast "coalescent" simulation - PubMed (original) (raw)
Fast "coalescent" simulation
Paul Marjoram et al. BMC Genet. 2006.
Abstract
Background: The amount of genome-wide molecular data is increasing rapidly, as is interest in developing methods appropriate for such data. There is a consequent increasing need for methods that are able to efficiently simulate such data. In this paper we implement the sequentially Markovian coalescent algorithm described by McVean and Cardin and present a further modification to that algorithm which slightly improves the closeness of the approximation to the full coalescent model. The algorithm ignores a class of recombination events known to affect the behavior of the genealogy of the sample, but which do not appear to affect the behavior of generated samples to any substantial degree.
Results: We show that our software is able to simulate large chromosomal regions, such as those appropriate in a consideration of genome-wide data, in a way that is several orders of magnitude faster than existing coalescent algorithms.
Conclusion: This algorithm provides a useful resource for those needing to simulate large quantities of data for chromosomal-length regions using an approach that is much more efficient than traditional coalescent models.
Figures
Figure 1
The various categories of recombination. Illustration of the different types of recombinations. Ancestral material is shown as solid red lines, while non-ancestral material is shown as red-dotted lines. Locations of recombinations are shown below and to the left of the recombination event. Type of recombination is indicated with a blue numeral above the event.
Figure 2
Illustration of FastCoal algorithm. This figure shows how the algorithm forms the next tree along the chromosome, moving from left-to-right, given the state of the current tree.
Figure 3
Decay of _r_2. This figure shows how _r_2 decays as a function of distance for both the SMC and SMC' algorithm and for an exact coalescent model (simulated using ms). Data was simulated for a 2 Mb region and a sample size of n = 20.
Similar articles
- A sequential coalescent algorithm for chromosomal inversions.
Peischl S, Koch E, Guerrero RF, Kirkpatrick M. Peischl S, et al. Heredity (Edinb). 2013 Sep;111(3):200-9. doi: 10.1038/hdy.2013.38. Epub 2013 May 1. Heredity (Edinb). 2013. PMID: 23632894 Free PMC article. - Critical assessment of coalescent simulators in modeling recombination hotspots in genomic sequences.
Yang T, Deng HW, Niu T. Yang T, et al. BMC Bioinformatics. 2014 Jan 3;15:3. doi: 10.1186/1471-2105-15-3. BMC Bioinformatics. 2014. PMID: 24387001 Free PMC article. - The Bacterial Sequential Markov Coalescent.
De Maio N, Wilson DJ. De Maio N, et al. Genetics. 2017 May;206(1):333-343. doi: 10.1534/genetics.116.198796. Epub 2017 Mar 3. Genetics. 2017. PMID: 28258183 Free PMC article. - Linkage disequilibrium: what history has to tell us.
Nordborg M, Tavaré S. Nordborg M, et al. Trends Genet. 2002 Feb;18(2):83-90. doi: 10.1016/s0168-9525(02)02557-x. Trends Genet. 2002. PMID: 11818140 Review. - Mapping genes through the use of linkage disequilibrium generated by genetic drift: 'drift mapping' in small populations with no demographic expansion.
Terwilliger JD, Zöllner S, Laan M, Pääbo S. Terwilliger JD, et al. Hum Hered. 1998 May-Jun;48(3):138-54. doi: 10.1159/000022794. Hum Hered. 1998. PMID: 9618061 Review.
Cited by
- Inference and applications of ancestral recombination graphs.
Nielsen R, Vaughn AH, Deng Y. Nielsen R, et al. Nat Rev Genet. 2025 Jan;26(1):47-58. doi: 10.1038/s41576-024-00772-4. Epub 2024 Sep 30. Nat Rev Genet. 2025. PMID: 39349760 Review. - Exact Decoding of a Sequentially Markov Coalescent Model in Genetics.
Ki C, Terhorst J. Ki C, et al. J Am Stat Assoc. 2024;119(547):2242-2255. doi: 10.1080/01621459.2023.2252570. Epub 2023 Oct 3. J Am Stat Assoc. 2024. PMID: 39323740 Free PMC article. - Improved inference of population histories by integrating genomic and epigenomic data.
Sellinger T, Johannes F, Tellier A. Sellinger T, et al. Elife. 2024 Sep 12;12:RP89470. doi: 10.7554/eLife.89470. Elife. 2024. PMID: 39264367 Free PMC article. - Global and Local Ancestry and its Importance: A Review.
Goli RC, Chishi KG, Ganguly I, Singh S, Dixit SP, Rathi P, Diwakar V, Sree C C, Limbalkar OM, Sukhija N, Kanaka KK. Goli RC, et al. Curr Genomics. 2024;25(4):237-260. doi: 10.2174/0113892029298909240426094055. Epub 2024 May 9. Curr Genomics. 2024. PMID: 39156729 Free PMC article. Review. - A general and efficient representation of ancestral recombination graphs.
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. Wong Y, et al. Genetics. 2024 Sep 4;228(1):iyae100. doi: 10.1093/genetics/iyae100. Genetics. 2024. PMID: 39013109 Free PMC article.
References
- Kingman JFC. On the genealogy of large populations. J Appl Prob. 1982;19A:27–43.
- Jiang R, Marjoram P, Stram D. "New data from old" – simulation of test data for mapping studies. 2005.
Publication types
MeSH terms
Grants and funding
- P50 HG002790/HG/NHGRI NIH HHS/United States
- R01 GM069890/GM/NIGMS NIH HHS/United States
- GM069890-01A1/GM/NIGMS NIH HHS/United States
- HG002790-01A1/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources