Chromosome-scale reference genome assembly of a diploid potato clone derived from an elite variety (original) (raw)

Haplotype-resolved genome analyses of a heterozygous diploid potato

Nature Genetics, 2020

Potato (Solanum tuberosum L.) is the most important tuber crop worldwide. Efforts are underway to transform the crop from a clonally propagated tetraploid into a seed-propagated, inbred-line-based hybrid, but this process requires a better understanding of potato genome. Here, we report the 1.67-Gb haplotype-resolved assembly of a diploid potato, RH89-039-16, using a combination of multiple sequencing strategies, including circular consensus sequencing. Comparison of the two haplotypes revealed ~2.1% intragenomic diversity, including 22,134 predicted deleterious mutations in 10,642 annotated genes. In 20,583 pairs of allelic genes, 16.6% and 30.8% exhibited differential expression and methylation between alleles, respectively. Deleterious mutations and differentially expressed alleles were dispersed throughout both haplotypes, complicating strategies to eradicate deleterious alleles or stack beneficial alleles via meiotic recombination. This study offers a holistic view of the genom...

Possibilities and Challenges of the Potato Genome Sequence

Potato Research, 2014

This paper describes the progress that has been made since the draft genome sequence of potato has been obtained and the analyses that need to be done to make further progress. Although sequencing has become less expensive and read lengths have increased, making optimal use of the information obtained is still difficult, certainly in the tetraploid potato crop. Major challenges in potato genomics are standardized genome assembly and haplotype analysis. Sequencing methods need to be improved further to achieve precision breeding. With the current new generation sequencing technology, the focus in potato breeding will shift from phenotype improvement to genotype improvement. In this respect, it is essential to realize that different alleles of the same gene can lead to different phenotypes depending on the genetic background and that there is significant epistatic interaction between different alleles. Genome-wide association studies will gain statistical power when binary single nucleotide polymorphism (SNP) data can be replaced with multi-allelic haplotype data. Binary SNP can be distributed across the many different alleles per locus or may be haplotype-specific, and potentially tag specific alleles which clearly differ in their contribution to a certain trait value. Assembling reads from the same linkage phase proved to allow constructing sufficiently long haplotype tracts to ensure their uniqueness. Combining large phenotyping data sets with modern approaches to sequencing and haplotype analysis and proper software will allow the efficiency of potato breeding to increase.

Genome sequence and analysis of the tuber crop potato

Nature, 2011

Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.

Sequencing the Potato Genome: Outline and First Results to Come from the Elucidation of the Sequence of the World’s Third Most Important Food Crop

American Journal of Potato Research, 2009

Potato is a member of the Solanaceae, a plant family that includes several other economically important species, such as tomato, eggplant, petunia, tobacco and pepper. The Potato Genome Sequencing Consortium (PGSC) aims to elucidate the complete genome sequence of potato, the third most important food crop in the world. The PGSC is a collaboration between 13 research groups from China, India, Poland, Russia, the Netherlands, Ireland, Argentina, Brazil, Chile, Peru, USA, New Zealand and the UK. The potato genome consists of 12 chromosomes and has a (haploid) length of approximately 840 million base pairs, making it a medium-sized plant genome. The sequencing project builds on a diploid potato genomic bacterial artificial chromosome (BAC) clone library of 78000 clones, which has been fingerprinted and aligned into ~7000 physical map contigs. In addition, the BAC-ends have been sequenced and are publicly available. Approximately 30000 BACs are anchored to the Ultra High Density genetic map of potato, composed of 10000 unique AFLPTM markers. From this integrated genetic-physical map, between 50 to 150 seed BACs have currently been identified for every chromosome. Fluorescent in situ hybridization experiments on selected BAC clones confirm these anchor points. The seed clones provide the starting point for a BAC-by-BAC sequencing strategy. This strategy is being complemented by whole genome shotgun sequencing approaches using both 454 GS FLX and Illumina GA2 instruments. Assembly and annotation of the sequence data will be performed using publicly available and tailor-made tools. The availability of the annotated data will help to characterize germplasm collections based on allelic variance and to assist potato breeders to more fully exploit the genetic potential of potato. La papa es un miembro de las Solanaceae, una familia de plantas que incluye varias otras especies económicamente importantes, tales como tomate, berenjena, petunia, tabaco y ají o chili. El consorcio de secuenciación del genoma de la papa (PGSC) tiene por objeto dilucidar la secuencia completa del genoma de la papa, el tercer cultivo alimentario más importante del mundo. El PGSC es una colaboración entre 13 grupos de investigación procedentes de China, India, Polonia, Rusia, los Países Bajos, Irlanda, Argentina, Brasil, Chile, Perú, EE.UU., Nueva Zelanda y el Reino Unido. El genoma de la papa consiste de 12 cromosomas y tiene una longitud (haploide) de aproximadamente 840 millones de pares de bases, por lo que es una planta con un genoma de tamaño mediano. El proyecto de secuenciación se basa en una biblioteca de 78000 clones de cromosoma artificial bacteriano genomico de papa diploide (BAC), del que se ha obtenido la huella genética y alineado en 7000 ~ contigs de mapa físico. Además, los extremos terminales BAC se han secuenciado y están a disposición del público. Aproximadamente 30000 BACS están anclados al mapa genético de ultra alta densidad de la papa, compuesto de 10000 marcadores AFLPTM únicos. De esta mapa genético-físico integrado, entre 50 a 150 semillas BACs han sido identificadas para cada cromosoma. Experimentos de hibridación in situ fluorescente en clones BAC selectos confirman estos puntos de anclaje. La clones semilla proveen el punto de partida para la estrategia de secuenciación de BAC a BAC. Esta estrategia se complementa con los enfoques de secuenciación escopeta del genoma completo usando los instrumentos 454 GS FLX e Illumina GA2. El ensamblaje y anotación de los datos de la secuencia será realizados mediante herramientas publicas disponibles y hechas a la medida. La disponibilidad de los datos anotados ayudarán a caracterizar las colecciones de germoplasma basándose en variación alélica y ayudará a los fitomejoradores de papa a explotar más plenamente el potencial genético de la papa.

A hybrid BAC physical map of potato: a framework for sequencing a heterozygous genome

BMC Genomics, 2011

Background: Potato is the world's third most important food crop, yet cultivar improvement and genomic research in general remain difficult because of the heterozygous and tetraploid nature of its genome. The development of physical map resources that can facilitate genomic analyses in potato has so far been very limited. Here we present the methods of construction and the general statistics of the first two genome-wide BAC physical maps of potato, which were made from the heterozygous diploid clone RH89-039-16 (RH). Results: First, a gel electrophoresis-based physical map was made by AFLP fingerprinting of 64478 BAC clones, which were aligned into 4150 contigs with an estimated total length of 1361 Mb. Screening of BAC pools, followed by the KeyMaps in silico anchoring procedure, identified 1725 AFLP markers in the physical map, and 1252 BAC contigs were anchored the ultradense potato genetic map. A second, sequence-tag-based physical map was constructed from 65919 whole genome profiling (WGP) BAC fingerprints and these were aligned into 3601 BAC contigs spanning 1396 Mb. The 39733 BAC clones that overlap between both physical maps provided anchors to 1127 contigs in the WGP physical map, and reduced the number of contigs to around 2800 in each map separately. Both physical maps were 1.64 times longer than the 850 Mb potato genome. Genome heterozygosity and incomplete merging of BAC contigs are two factors that can explain this map inflation. The contig information of both physical maps was united in a single table that describes hybrid potato physical map.

Construction of Reference Chromosome-Scale Pseudomolecules for Potato: Integrating the Potato Genome with Genetic and Physical Maps

The genome of potato, a major global food crop, was recently sequenced. The work presented here details the integration of the potato reference genome (DM) with a new sequence-tagged site marker2based linkage map and other physical and genetic maps of potato and the closely related species tomato. Primary anchoring of the DM genome assembly was accomplished by the use of a diploid segregating population, which was genotyped with several types of molecular genetic markers to construct a new 936 cM linkage map comprising 2469 marker loci. In silico anchoring approaches used genetic and physical maps from the diploid potato genotype RH89-039-16 (RH) and tomato. This combined approach has allowed 951 superscaffolds to be ordered into pseudomolecules corresponding to the 12 potato chromosomes. These pseudomolecules represent 674 Mb (~93%) of the 723 Mb genome assembly and 37,482 (~96%) of the 39,031 predicted genes. The superscaffold order and orientation within the pseudomolecules are closely collinear with independently constructed high density linkage maps. Comparisons between marker distribution and physical location reveal regions of greater and lesser recombination, as well as regions exhibiting significant segregation distortion. The work presented here has led to a greatly improved ordering of the potato reference genome superscaffolds into chromosomal "pseudomolecules".

Construction of a 10,000-Marker Ultradense Genetic Recombination Map of Potato: Providing a Framework for Accelerated Gene Isolation and a Genomewide Physical Map

An ultradense genetic linkage map with .10,000 AFLP loci was constructed from a heterozygous diploid potato population. To our knowledge, this is the densest meiotic recombination map ever constructed. A fast marker-ordering algorithm was used, based on the minimization of the total number of recombination events within a given marker order in combination with genotyping error-detection software. This resulted in ''skeleton bin maps,'' which can be viewed as the most parsimonious marker order. The unit of distance is not expressed in centimorgans but in ''bins.'' A bin is a position on the genetic map with a unique segregation pattern that is separated from adjacent bins by a single recombination event. Putative centromeres were identified by a strong clustering of markers, probably due to cold spots for recombination. Conversely, recombination hot spots resulted in large intervals of up to 15 cM without markers. The current level of marker saturation suggests that marker density is proportional to physical distance and independent of recombination frequency. Most chromatids (92%) recombined once or never, suggesting strong chiasma interference. Absolute chiasma interference within a chromosome arm could not be demonstrated. Two examples of contig construction and map-based cloning have demonstrated that the marker spacing was in accordance with the expected physical distance: approximately one marker per BAC length. Currently, the markers are used for genetic anchoring of a physical map of potato to deliver a sequence-ready minimal tiling path of BAC contigs of specific chromosomal regions for the potato genome sequencing consortium (http:/ /www.potatogenome.net).

Cultivar-specific transcriptome and pan-transcriptome reconstruction of tetraploid potato

Background: Although the reference genome of Solanum tuberosum group Phureja double-monoploid (DM) clone is available, knowledge on the genetic diversity of the highly heterozygous tetraploid group Tuberosum, representing most cultivated varieties, remains largely unexplored. This lack of knowledge hinders further progress in potato research and its subsequent applications in breeding. Results: For the DM genome assembly, two only partially-overlapping gene models exist differing in a unique set of genes and intron/exon structure predictions. First step was to merge and manually curate the merged gene model, creating a union of genes in Phureja scaffold. We next compiled available RNA-Seq datasets (cca. 1.5 billion reads) for three tetraploid potato genotypes (cultivar Désirée, cultivar Rywal, and breeding clone PW363) with diverse breeding pedigrees. Short-read transcriptomes were assembled using CLC, Trinity, Velvet, and rnaSPAdes de novo assemblers using different settings to tes...