Major Improvements to the Heliconius melpomene Genome Assembly Used to Confirm 10 Chromosome Fusion Events in 6 Million Years of Butterfly Evolution (original) (raw)
2016, G3: Genes, Genomes, Genetics
The Heliconius butterflies are a widely studied adaptive radiation of 46 species spread across Central and South America, several of which are known to hybridize in the wild. Here, we present a substantially improved assembly of the Heliconius melpomene genome, developed using novel methods that should be applicable to improving other genome assemblies produced using short read sequencing. First, we wholegenome-sequenced a pedigree to produce a linkage map incorporating 99% of the genome. Second, we incorporated haplotype scaffolds extensively to produce a more complete haploid version of the draft genome. Third, we incorporated 20x coverage of Pacific Biosciences sequencing, and scaffolded the haploid genome using an assembly of this long-read sequence. These improvements result in a genome of 795 scaffolds, 275 Mb in length, with an N50 length of 2.1 Mb, an N50 number of 34, and with 99% of the genome placed, and 84% anchored on chromosomes. We use the new genome assembly to confirm that the Heliconius genome underwent 10 chromosome fusions since the split with its sister genus Eueides, over a period of about 6 million yr. KEYWORDS Heliconius genome assembly linkage mapping chromosome fusions Eueides Understanding evolution and speciation requires an understanding of genome architecture. Phenotypic variation within a population can be maintained by chromosome inversions (Lowry and Willis 2010; Joron et al. 2011; Wang et al. 2013), and may lead to species divergence (Noor et al. 2001; Feder and Nosil 2009) or to the spread of phenotypes by introgression (Kirkpatrick and Barrett 2015). Genetic divergence and genome composition is affected by variation in recombination rate (Nachman and Payseur 2012; Nam and Ellegren 2012). Gene flow between species can be extensive (Martin et al. 2013), and varies considerably across chromosomes (Via and West 2008; Weetman et al. 2012). Describing chromosome inversions, recombination rate variation, and gene flow in full requires as close to chromosomal assemblies of the genomes of study species as possible. Recombination rate varies along chromosomes and is influenced by chromosome length (Fledel-Alon et al. 2009; Kawakami et al. 2014), and inversions are often hundreds of kilobases to megabases long. However, many draft genomes generated with short-read technologies contain thousands of scaffolds, often