Single-virus genomics reveals hidden cosmopolitan and abundant viruses - PubMed (original) (raw)

doi: 10.1038/ncomms15892.

Oscar Fornas 2 3, Monica Lluesma Gomez 1, Benjamin Bolduc 4, Maria Jose de la Cruz Peña 1, Joaquín Martínez Martínez 5, Josefa Anton 1, Josep M Gasol 6, Riccardo Rosselli 7, Francisco Rodriguez-Valera 7, Matthew B Sullivan 4 8, Silvia G Acinas 6, Manuel Martinez-Garcia 1

Affiliations

Single-virus genomics reveals hidden cosmopolitan and abundant viruses

Francisco Martinez-Hernandez et al. Nat Commun. 2017.

Abstract

Microbes drive ecosystems under constraints imposed by viruses. However, a lack of virus genome information hinders our ability to answer fundamental, biological questions concerning microbial communities. Here we apply single-virus genomics (SVGs) to assess whether portions of marine viral communities are missed by current techniques. The majority of the here-identified 44 viral single-amplified genomes (vSAGs) are more abundant in global ocean virome data sets than published metagenome-assembled viral genomes or isolates. This indicates that vSAGs likely best represent the dsDNA viral populations dominating the oceans. Species-specific recruitment patterns and virome simulation data suggest that vSAGs are highly microdiverse and that microdiversity hinders the metagenomic assembly, which could explain why their genomes have not been identified before. Altogether, SVGs enable the discovery of some of the likely most abundant and ecologically relevant marine viral species, such as vSAG 37-F6, which were overlooked by other methodologies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1. Global viral protein-sharing network.

A total of 5,539 partial and full-length genomes, and 634,497 relationships (edges) from GOV, environmental phage from Genbank, archaeal and bacterial viral references (indicated by a black star, *), and vSAGs (this study, indicated by a black dot •, bold font) were included in the analyses. Only viral clusters—with each viral cluster indicated by unique colours—including ≥1 vSAG sequences are represented. Edges between nodes indicate a statistically significant weighted pairwise similarity between the protein profiles of each node (see Methods) with similarity scores ≥1. Viral clusters (italic font) are determined by applying the Markov Cluster Algorithm (MCL) to the edges. vSAG 37-F6 is indicated by a red star.

Figure 2

Figure 2. Relative abundance and distribution of surface marine viruses.

Virome and microbiome metagenomic fragment recruitments of marine viruses in each ocean. Rings represent the relative microbiome and virome recruitment frequency for each genomic data set, corresponding to the relative abundance of viral populations. External ring is for the microbiome recruit by using ≥95% nucleotide identity threshold (species level). Inner and medium rings depict the virome recruitment at two different nucleotide identity cut-offs, ≥70% and ≥95%, corresponding to the genus and species levels, respectively (Supplementary Fig. 10 and Supplementary Notes 4 and 5). Viral genomic data sets used were: 40 surface vSAGs (this study), 179 reference virus isolates (Supplementary Table 9), 1,148 viral fosmids, 20 viral genomes from uncultured prokaryotic single cells and 3,018 surface viral contigs from the Tara expedition. For this calculation: (1) normalized recruitment as the total recruited nucleotides (kb) per kb of viral genome per Gb of virome (KPKG) was estimated for each virus genome, (2) mean normalized recruitment was calculated for each virus genomic data set (see also Supplementary Fig. 11) and (3) mean was normalized by the sum of means from all virus genomic data set expressed as relative recruitment. Statistically significant differences between the recruitment frequency average of the vSAGs versus the rest of viral groups are indicated (***; ANOVA P value <0.001). Viromes and microbiomes from previous surveys used here are abbreviated as: Pacific Ocean (PO), Chile-Peru oceanic region (CP), South Atlantic (SS), Red Sea (RS), Mediterranean Sea (MS), Northwest Arabian Sea upwelling (NA), Indian Monsoon gyre province (IM), Eastern Africa Coastal Province (EA), Benguela Current (BC) and Sargasso Sea (SS). The microbiome and virome from Blanes Bay Microbial Observatory (BB), where surface vSAGs were obtained, was constructed in this study. Global oceanic viromics and microbiome fragment recruitments for each virus genomic data set is represented in the centre of the picture, circles represent overall oceanic top-5 ranking of most abundant viruses at the species level (≥95% of identity).

Figure 3

Figure 3. Biogeography of most abundant marine viruses.

The abundance of the most abundant surface dsDNA viruses for each virus genome data set according to the procedure for genome recovering (single-virus genomics (red), viruses from single bacterial cells (blue), virus cloned in fosmids (grey), virus isolates (green) and viromics from Tara Oceans data set (yellow). Fragment recruitment data were used to estimate the overall abundance for each region. Bubbles represent the fragment recruitment estimation expressed in KPKG (as in Fig. 2).

Figure 4

Figure 4. Ecogenomics of the putative most abundant surface marine virus, the vSAG 37-F6.

(a) Virome, microbial metagenome and proteome fragment recruitment in different data sets. A hypervariable genomic island in virus 37-F6 was detected between genomic position 9,000 and 9,700 (unknown protein). (b) Genome annotation, synteny and whole-genome alignment of vSAG 37-F6 with closest viral relatives. Colour in the alignment (from black to white) denotes identity values among all four genomes for each genome position. (c) Whole-genome similarity with closest viral relatives. (d,e) Phylogeny of large subunit of viral terminases (TerL) and the large conserved hypothetical protein X (HP X) based on maximum-likelihood method. Bootstrap values are indicated in nodes.

Figure 5

Figure 5. Capsid protein of vSAG 37-F6 and abundance in proteomic Tara viral data set.

(a) Peptide alignment of vSAG 37-F6 with the capsid proteins of cluster CAM_CRCL_773. For convenience, we only show eight protein sequences out of 152 total capsid proteins. Coloured lines above amino-acid sequence of vSAG 37F6 represent the perfect matches of predicted peptide sequences from Tara expedition (100% identity similarity and query coverage). Colour denotes the origin of peptides. Conserved amino-acid positions in the protein alignment are denoted with ‘*’ (b) Representative 3D-structural model, using I-TASSER prediction server, of the 37-F6 capsid protein compared with the nearest viral capsid proteins: the Tara Contig 67SUR_4106 and viruses from SAGs AAA160-P02 (Flavobacteria) and AAA164-I21 (Verrucomicrobia). (c) Number of total recruited peptides from Tara expedition (100% identity and query coverage) for the top two most recruiting viruses from each viral genomic data set.

Figure 6

Figure 6. Assessment of natural vSAGs microdiversity and impact on metagenomic assembly.

(a) Species-specific recruitment patterns (also referred as diversity curves) for vSAGs and highly abundant viral contigs from viromics. Curves represent the percentage of recruited reads (Y axis) at different nucleotide identity values (X axis) for vSAGs and Tara Oceans contigs in their own viromes. The five most recruiting viruses of each viral data set are shown for convenience. (b) SNP frequency for most abundant viral populations at the species level (≥95% nucleotide identity) of vSAGs and viral contigs (within the top 30 ranking in recruitment) recovered by viromics from the Blanes Bay Microbial Observatory (same sampling site of surface vSAGs) and the Tara Mediterranean MS022 data set. In Blanes Bay Microbial Observatory, mean±s.d. of most abundant viral contigs (25 contigs) and vSAGs (4 contigs) are shown. (c) Impact of viral diversity and microdiversity on genome reconstruction by metagenomics. Three populations of virus 37-F6 with different (micro)-diversities were simulated within the virome Tara MS022 (ref. 3) (see details in Supplementary Fig. 20 and Supplementary Note 5). Population A lacked microdiversity (two simulated nearly identical genomes of 37-F6 with 20 SNPs). A chimeric contig with a mixture of SNPs was obtained (SNPs in blue from simulated genome 1, and in red from vSAG 37-F6). Population B simulated a simplistic scenario with five genomes (ANI≥95%) without high genetic variability in the hypervariable genomic island (Fig. 4; Supplementary Fig. 14). SPAdes assembler reconstructed a consensus contig from only one of the simulated genomes. Population C simulated a more realistic microdiverse scenario than observed in panel A with 10 simulated co-existing viruses (ANI 75-95% and high variability in the genomic island (see details in Supplementary Fig. 20 and Supplementary Note 5). The genome was almost entirely assembled only from those distantly related viruses 7 and 9, while 37-F6 genome could not be assembled. Blue arrows depict the simulated genomes. Black blocks depict the resulting assembled contigs by IDBA_UD and SPAdes assemblers.

Similar articles

Cited by

References

    1. Suttle C. A. Marine viruses-major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007). - PubMed
    1. Roux S. et al.. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016). - PubMed
    1. Brum J. R. et al.. Ocean plankton. Patterns and ecological drivers of ocean viral communities. Science 348, 1261498 (2015). - PubMed
    1. Paez-Espino D. et al.. Uncovering Earth’s virome. Nature 536, 425–430 (2016). - PubMed
    1. Manrique P. et al.. Healthy human gut phageome. Proc. Natl Acad. Sci. USA 113, 10400–10405 (2016). - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources