Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes (original) (raw)

  1. Antoine Barrière1,
  2. Shiaw-Pyng Yang2,
  3. Elizabeth Pekarek1,
  4. Cristel G. Thomas3,
  5. Eric S. Haag3,4 and
  6. Ilya Ruvinsky1,4
  7. 1 Department of Ecology and Evolution and Institute for Genomics and Systems Biology, The University of Chicago, Chicago, Illinois 60637, USA;
  8. 2 Genome Sequencing Center, Washington University, St. Louis, Missouri 63108, USA;
  9. 3 Department of Biology and Molecular and Cell Biology Program, University of Maryland, College Park, Maryland 20742, USA

Abstract

The majority of nematodes are gonochoristic (dioecious) with distinct male and female sexes, but the best-studied species,Caenorhabditis elegans, is a self-fertile hermaphrodite. The sequencing of the genomes of C. elegans and a second hermaphrodite, C. briggsae, was facilitated in part by the low amount of natural heterozygosity, which typifies selfing species. Ongoing genome projects for gonochoristic Caenorhabditis species seek to approximate this condition by intense inbreeding prior to sequencing. Here we show that despite this inbreeding, the heterozygous fraction of the whole genome shotgun assemblies of three gonochoristic Caenorhabditis species, C. brenneri, C. remanei, and C. japonica, is considerable. We first demonstrate experimentally that independently assembled sequence variants in C. remanei and C. brenneri are allelic. We then present gene-based approaches for recognizing heterozygous regions of WGS assemblies. We also develop a simple method for quantifying heterozygosity that can be applied to assemblies lacking gene annotations. Consistently we find that ∼10% and 30% of the C. remanei and C. brenneri genomes, respectively, are represented by two alleles in the assemblies. Heterozygosity is restricted to autosomes and its retention is accompanied by substantial inbreeding depression, suggesting that it is caused by multiple recessive deleterious alleles and not merely by chance. Both the overall amount and chromosomal distribution of heterozygous DNA is highly variable between assemblies of close relatives produced by identical methodologies, and allele frequencies have continued to change after strains were sequenced. Our results highlight the impact of mating systems on genome sequencing projects.

Footnotes