Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut - PubMed (original) (raw)

Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut

Natalya Yutin et al. Nat Microbiol. 2018 Jan.

Abstract

Metagenomic sequence analysis is rapidly becoming the primary source of virus discovery 1-3 . A substantial majority of the currently available virus genomes come from metagenomics, and some of these represent extremely abundant viruses, even if never grown in the laboratory. A particularly striking case of a virus discovered via metagenomics is crAssphage, which is by far the most abundant human-associated virus known, comprising up to 90% of sequences in the gut virome 4 . Over 80% of the predicted proteins encoded in the approximately 100 kilobase crAssphage genome showed no significant similarity to available protein sequences, precluding classification of this virus and hampering further study. Here we combine a comprehensive search of genomic and metagenomic databases with sensitive methods for protein sequence analysis to identify an expansive, diverse group of bacteriophages related to crAssphage and predict the functions of the majority of phage proteins, in particular those that comprise the structural, replication and expression modules. Most, if not all, of the crAss-like phages appear to be associated with diverse bacteria from the phylum Bacteroidetes, which includes some of the most abundant bacteria in the human gut microbiome and that are also common in various other habitats. These findings provide for experimental characterization of the most abundant but poorly understood members of the human-associated virome.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1. Architecture and evolution of the capsid gene module of the crAss-like phage family

The phylogenetic tree was constructed from concatenated multiple alignments of the 5 proteins of the capsid module. The genomic maps of the capsid gene block are shown for each branch. The 5 genes of the capsid module are color-coded, and uncharacterized adjacent genes are shown by empty block arrows. The colors of labels and branches indicate host or metagenome source: Red, human gut or fecal metagenome; Green, termite gut metagenome; Purple, terrestrial/groundwater; Brown, Soda lake (hypersaline brine); Turquoise, Marine sediment; Orange, populus root microbiome; Black, _Flavobacterium psychrophilum_950106-1/1 (fish pathogen). Tree branch colors indicate the DNA polymerase family represented in the respective genome (contig): Purple, family B DNAP; Red, family A DNAP; Green, no DNAP; Black, unknown (incomplete genomes). Support values were obtained using 100 bootstrap replications; values greater than 50% are shown. The scale bar represents the number of amino acid substitutions per site. No outgroup was included due to the low (or absent) similarity between the crAss-like family protein to homologs from other phages.

Figure 2

Figure 2. Whole genome maps of crAssphage and IAS virus, the two members of the crAss-like family that are abundant in the human gut virome

Conserved crAss-like family genes are color-coded. Dashed boxes highlight capsid, tail, replication, and transcription gene blocks. Genome regions encoding proteins with strong similarity to Bacteroidetes are shaded in pale green. Gene numbers are according to the crAssphage and IAS virus MetaGeneMark translations. Abbreviations: ssb, single-stranded DNA-binding protein; SF1, SF1 helicase; UDG, uracyl-DNA glycosylase; PolB, DNA polymerase family B; SF2, SNF2-family helicase; RecT, phage RecT recombinase; primase, DnaG family primase; ligase, ATP-dependent DNA ligase; dNK, deoxynucleoside monophosphate kinase; ThyX, flavin-dependent thymidylate synthase; Gp157, Siphovirus Gp157; dUTP, dUTPase; N4_gp49, phage protein of N4_gp49/Sf6_gp66 family; RepL, plasmid replication initiation protein RepL; IHF, integration host factor IHF subunit; PD-(D/E)XK, PD-(D/E)XK family nuclease; Rep_Org, putative replisome organizer protein; DnaB, DnaB replicative DNA helicase; AAA, AAA domain ATPase; rIIA, rIIA-like protector protein; rIIB, rIIB-like protector protein; NRDD, anaerobic ribonucleoside-triphosphate reductase; RNR, anaerobic ribonucleoside-triphosphate reductase activating protein; PolA, DNA polymerase family A; DprA, DNA processing protein DprA. For further details on the annotation, see Supplementary Table 2.

Figure 3

Figure 3. The replicative gene module of the crAss-like phage family

A. The crAssphage group B. The rest of the crAss-like family Homologous genes are marked by the same colors and labels. Genes with no predicted function are numbered according to crAssphage translation, OBV-13 virus translation (suffix ‘b’), Cellulophaga phage phi14:2 translation (suffix ‘c’), or IAS virus translation (suffix ‘i’). Abbreviations:: RNRm, class II ribonucleotide reductase; RNRa, ribonucleoside reductase alpha chain; RNRb, ribonucleoside reductase beta chain; GGCT, Gamma-glutamyl cyclotransferase; Gn_AT, glucosamine-fructose-6-phosphate aminotransferase; NTP-PPase, nucleoside triphosphate pyrophosphohydrolase. The rest of the abbreviations are as in Figure 2.

Figure 3

Figure 3. The replicative gene module of the crAss-like phage family

A. The crAssphage group B. The rest of the crAss-like family Homologous genes are marked by the same colors and labels. Genes with no predicted function are numbered according to crAssphage translation, OBV-13 virus translation (suffix ‘b’), Cellulophaga phage phi14:2 translation (suffix ‘c’), or IAS virus translation (suffix ‘i’). Abbreviations:: RNRm, class II ribonucleotide reductase; RNRa, ribonucleoside reductase alpha chain; RNRb, ribonucleoside reductase beta chain; GGCT, Gamma-glutamyl cyclotransferase; Gn_AT, glucosamine-fructose-6-phosphate aminotransferase; NTP-PPase, nucleoside triphosphate pyrophosphohydrolase. The rest of the abbreviations are as in Figure 2.

Figure 4

Figure 4. The genome expression gene module of the crass-like phage family

The predicted RNAP subunits as well as the RNAP and protease motifs are color-coded as shown at the bottom of the figure. The PD-DxK nucleases are most likely encoded in Group I introns.

Similar articles

Cited by

References

    1. Rohwer F. Global phage diversity. Cell. 2003;113:141. - PubMed
    1. Suttle CA. Marine viruses--major players in the global ecosystem. Nat Rev Microbiol. 2007;5:801–812. doi: 10.1038/nrmicro1750. nrmicro1750 [pii] - DOI - PubMed
    1. Simmonds P, et al. Consensus statement: Virus taxonomy in the age of metagenomics. Nat Rev Microbiol. 2017;15:161–168. doi: 10.1038/nrmicro.2016.177. nrmicro.2016.177 [pii] - DOI - PubMed
    1. Dutilh BE, et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat Commun. 2014;5:4498. doi: 10.1038/ncomms5498. ncomms5498 [pii] - DOI - PMC - PubMed
    1. Dutilh BE. Metagenomic ventures into outer sequence space. Bacteriophage. 2014;4:e979664. doi: 10.4161/21597081.2014.979664. 979664 [pii] - DOI - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources