Fast and flexible simulation of DNA sequence data (original) (raw)

  1. Gary K. Chen1,2,
  2. Paul Marjoram1 and
  3. Jeffrey D. Wall2,3
  4. 1 Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033, USA;
  5. 2 Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143, USA

Abstract

Simulation of genomic sequences under the coalescent with recombination has conventionally been impractical for regions beyond tens of megabases. This work presents an algorithm, implemented as the program MaCS (Markovian Coalescent Simulator), that can efficiently simulate haplotypes under any arbitrary model of population history. We present several metrics comparing the performance of MaCS with other available simulation programs. Practical usage of MaCS is demonstrated through a comparison of measures of linkage disequilibrium between generated program output and real genotype data from populations considered to be structured.

Footnotes