Fast and flexible simulation of DNA sequence data (original) (raw)
- Gary K. Chen1,2,
- Paul Marjoram1 and
- Jeffrey D. Wall2,3
- 1 Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033, USA;
- 2 Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143, USA
Abstract
Simulation of genomic sequences under the coalescent with recombination has conventionally been impractical for regions beyond tens of megabases. This work presents an algorithm, implemented as the program MaCS (Markovian Coalescent Simulator), that can efficiently simulate haplotypes under any arbitrary model of population history. We present several metrics comparing the performance of MaCS with other available simulation programs. Practical usage of MaCS is demonstrated through a comparison of measures of linkage disequilibrium between generated program output and real genotype data from populations considered to be structured.
Footnotes
↵3 Corresponding author.
↵E-mail wallj{at}humgen.ucsf.edu; fax (415) 476-1356.[Supplemental material is available online at www.genome.org. The MaCS source code is freely available at http://www-hsc.usc.edu/∼garykche/.]
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.083634.108.
- Received July 22, 2008.
- Accepted October 7, 2008.
Copyright © 2009, Cold Spring Harbor Laboratory Press