scrm: efficiently simulating long sequences using the approximated coalescent with recombination - PubMed (original) (raw)

scrm: efficiently simulating long sequences using the approximated coalescent with recombination

Paul R Staab et al. Bioinformatics. 2015.

Abstract

Motivation: Coalescent-based simulation software for genomic sequences allows the efficient in silico generation of short- and medium-sized genetic sequences. However, the simulation of genome-size datasets as produced by next-generation sequencing is currently only possible using fairly crude approximations.

Results: We present the sequential coalescent with recombination model (SCRM), a new method that efficiently and accurately approximates the coalescent with recombination, closing the gap between current approximations and the exact model. We present an efficient implementation and show that it can simulate genomic-scale datasets with an essentially correct linkage structure.

© The Author 2015. Published by Oxford University Press.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Approximation of genetic linkage. Shown is the correlation of ρ (_y_-axis) of the total local branch length at two sites δ base pairs apart (_x_-axis). The linkage in the CWR (ms, options

20 1 -r 4000 10000001 -T

) is indicated in black. Results for scrm using different exact window sizes (see legend) are indicated in colour

Fig. 2.

Fig. 2.

Efficiency for different approximations. Shown is the deviation (_y_-axis) against run-time (_x_-axis) for simulating 10 Mb with a recombination rate of 10−8 per base per generation. The deviation of the approximation from the correct values is measured as the square root of the area between the ρ−δ correlation curves for the approximate simulated data, and _ms_-generated data (see Fig. 1). For scrm and MaCS multiple approximation levels are drawn using different exact window sizes or history parameters. The recently published Cosi2 (Shlyakhter et al., 2014) does not output trees and could not be included in this figure; for a comparison of Cosi2 and scrm using different summary statistics see

Supplementary Figure S5

Similar articles

Cited by

References

    1. Chen G.K., et al. . (2009) Fast and flexible simulation of DNA sequence data, Genome Res., 19, 136–142. - PMC - PubMed
    1. Eriksson A., et al. . (2009) Sequential Markov coalescent algorithms for population models with demographic structure, Theor. Popul. Biol. , 76, 84–91. - PubMed
    1. Excoffier L., Foll M. (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios, Bioinformatics , 27, 1332–1334. - PubMed
    1. Hudson R.R. (2002) Generating samples under a Wright–Fisher neutral model, Bioinformatics, 18, 337–338. - PubMed
    1. Marjoram P., Wall J. (2006) Fast “coalescent” simulation, BMC Genetics, 7, 16. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources