A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes - PubMed (original) (raw)

A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes

Marina Püpke Marone et al. Plant Methods. 2022.

Abstract

Background: As complete and accurate genome sequences are becoming easier to obtain, more researchers wish to get one or more of them to support their research endeavors. Reliable and well-documented sequence assembly workflows find use in reference or pangenome projects.

Results: We describe modifications to the TRITEX genome assembly workflow motivated by the rise of fast and easy long-read contig assembly of inbred plant genomes and the routine deployment of the toolchains in pangenome projects. New features include the use as surrogates of or complements to dense genetic maps and the introduction of user-editable tables to make the curation of contig placements easier and more intuitive.

Conclusion: Even maximally contiguous sequence assemblies of the telomere-to-telomere sort, and to a yet greater extent, the fragmented kind require validation, correction, and comparison to reference standards. As pangenomics is burgeoning, these tasks are bound to become more widespread and TRITEX is one tool to get them done. This technical guide is supported by a step-by-step computational tutorial accessible under https://tritexassembly.bitbucket.io/ . The TRITEX source code is hosted under this URL: https://bitbucket.org/tritexassembly .

Keywords: Chromosome conformation capture sequencing; Genetic map; Genome sequence assembly; Long-read sequence assembly; Pangenome.

© 2022. The Author(s).

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1

Fig. 1

Graphical overview of the TRITEX pipeline. Steps in red boxes are done with Unix shell scripts; those in blue boxes on the R prompt

Fig. 2

Fig. 2

Manual curation in the TRITEX’s correct-map-inspect cycle. A The Hi-C contacts show a pattern indicative of an inversion in the terminal contig. The orientation is swapped in the Excel table and a new Hi-C matrix is computed with the updated configuration. The revised Hi-C matrix has fewer off-diagonal signals. B Hi-C contacts show a pattern indicative of a misplaced contig. The order of the final two rows is reversed in the Excel table and the Hi-C matrix is computed with the new configuration. The revised Hi-C matrix has fewer off-diagonal signals

Fig. 3

Fig. 3

Hi-C contact matrices of reference-guided pseudomolecules (A and C) and collinearity of guide map markers to the maize B73 RefGen_v5 reference genome sequence (B and D). Grey lines mark contig boundaries

Fig. 4

Fig. 4

Marker-guided Hi-C map construction. Contact matrix (A), collinearity with the maize B73 RefGen_v5 reference genome sequence (B), and collinearity with the IBM guide map (C) for chromosome 8. Analogous plots for other chromosomes are shown in Additional file 1: Fig. S4

Fig. 5

Fig. 5

Impact of Hi-C data density on assembly quality. Hi-C maps were constructed from downsampled Hi-C matrices. Results for chromosome 8 are shown. Hi-C contact matrices (AC). Collinearity of Hi-C maps guided by the maize RefGen_v5 reference genomes (DF). Collinearity of the Hi-C maps and their underlying guide maps. The header of each column reports the downsampling level, the number of Hi-C links after thinning, how many of these are on chromosome 8 (inside parenthesis), and the correlation coefficient between marker positions in the Hi-C map and the reference genome

Fig. 6

Fig. 6

Impact of guide map density on assembly quality. Hi-C maps guided by genetic maps of different sizes were constructed. Results for chromosome 8 are shown. Hi-C contact matrices (AC). Collinearity of Hi-C maps guided by the maize RefGen_v5 reference genomes (DF). Collinearity of the Hi-C maps and their underlying guide maps (GI). The headers of each column report the downsampling levels, the number of markers in the downsampled guide maps, the marker counts on chromosome 8 inside parenthesis, and the correlation coefficients between marker positions in the Hi-C map and the reference genome

References

    1. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–175. doi: 10.1038/s41592-020-01056-5. -DOI -PMC -PubMed
    1. Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, et al. Long-read sequence assembly: a technical evaluation in barley. Plant Cell. 2021;33(6):1888–1906. doi: 10.1093/plcell/koab077. -DOI -PMC -PubMed
    1. Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30(9):1291–1305. doi: 10.1101/gr.263566.120. -DOI -PMC -PubMed
    1. Duan H, Jones AW, Hewitt T, Mackenzie A, Hu Y, Sharp A, et al. Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data. Genome Biol. 2022;23(1):84. doi: 10.1186/s13059-022-02658-2. -DOI -PMC -PubMed
    1. Sun H, Jiao WB, Krause K, Campoy JA, Goel M, Folz-Donahue K, et al. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar. Nat Genet. 2022;54(3):342–348. doi: 10.1038/s41588-022-01015-0. -DOI -PMC -PubMed

LinkOut - more resources