A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes - PubMed (original) (raw)

Noriko Cassman 2, Katelyn McNair 3, Savannah E Sanchez 4, Genivaldo G Z Silva 5, Lance Boling 4, Jeremy J Barr 4, Daan R Speth 6, Victor Seguritan 4, Ramy K Aziz 7, Ben Felts 8, Elizabeth A Dinsdale 9, John L Mokili 4, Robert A Edwards 10

Affiliations

A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes

Bas E Dutilh et al. Nat Commun. 2014.

Abstract

Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which we refer to as crAssphage. Its ~97 kbp genome is six times more abundant in publicly available metagenomes than all other known phages together; it comprises up to 90% and 22% of all reads in virus-like particle (VLP)-derived metagenomes and total community metagenomes, respectively; and it totals 1.68% of all human faecal metagenomic sequencing reads in the public databases. The majority of crAssphage-encoded proteins match no known sequences in the database, which is why it was not detected before. Using a new co-occurrence profiling approach, we predict a Bacteroides host for this phage, consistent with Bacteroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Schematic representation of the circular crAssphage genome.

The genome contains 80 ORFs that were predicted with Glimmer trained on_Caudovirales_. The total coverage of each nucleotide in the F2T1 metagenome, and in all public metagenomes in MG-RAST is indicated (466 human faecal and 2,440 other metagenomes, as determined by blastn mapping: ≥75 bp aligned with ≥95% identity, see Methods). Green bars indicate the 36 regions that were validated by long-range PCR (see Table 2 and Supplementary Table 1). Selected regions of several PCR amplicons (indicated as light green regions in the green bars) were sequenced by Sanger dideoxynucleotide sequencing to validate that the amplicons were indeed derived from the crAssphage genome (Supplementary Table 1). See Supplementary Fig. 6 for the fully annotated figure.

Figure 2

Figure 2. CRISPR spacers similar to regions of the crAssphage genome.

CRISPR spacers were identified in 2,773 complete bacterial genomes from Genbank, and in 404 genomes of intestinal isolates from HMP and MetaHIT. The CRISPR spacers that were most similar to the crAssphage genome were found in Prevotella intermedia 17 (Genbank genomes) and in_Bacteroides_ sp. 20_3 (HMP and MetaHIT genomes). Conserved A, C, G, and T nucleotides are displayed in red, green, yellow and blue, respectively.

Figure 3

Figure 3. Phage–host prediction based on co-occurrence across metagenomes.

Unrooted co-occurrence cladogram of correlated depth profiles across 151 HMP faecal metagenomes of the crAssphage, two known _Bacteroides fragilis_-infecting phages, and 404 potential hosts. Colours indicate bacterial phyla. The phages are indicated with blue dashed lines. See Supplementary Fig. 7 for the fully annotated figure.

Figure 4

Figure 4. Abundance ubiquity plot of phage genomes in public metagenomes.

Reads from 2,944 publicly available shotgun metagenomes were aligned to a database of 1,193 phage genomes (see Methods). The average depth of aligned reads per nucleotide of the phage genome (abundance) is plotted against the number of metagenomes it is found in (ubiquity). See Supplementary Data 5 for details.

Figure 5

Figure 5. Normalized coverage plot of the crAssphage genome in 940 public metagenomes.

Rows are metagenomes, with the sequence volume in nucleotides indicated to the right (see Supplementary Data 4 for the order and detailed annotations of the metagenomes). The x axis of the heat map displays the 97,065 bp length of the crAssphage genome sequence. The colour bar indicates the percentage of nucleotides in each metagenome that aligns to each position. Black arrowheads at the top of the figure indicate metaviromic islands. Details are available in Supplementary Data 4. Note that some of the metagenomes at the bottom of the plot that are annotated as ‘Plant-associated’ are also faecal metagenomes.

References

    1. Tyson G. W. et al.. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004). - PubMed
    1. Breitbart M. et al.. Genomic analysis of uncultured marine viral communities. Proc. Natl Acad. Sci. USA 99, 14250–14255 (2002). - PMC - PubMed
    1. Cassman N. et al.. Oxygen minimum zones harbour novel viral communities with low diversity. Environ. Microbiol. 14, 3043–3065 (2012). - PubMed
    1. Minot S. et al.. Rapid evolution of the human gut virome. Proc. Natl Acad. Sci. USA 110, 12450–12455 (2013). - PMC - PubMed
    1. Minot S., Grunberg S., Wu G. D., Lewis J. D. & Bushman F. D. Hypervariable loci in the human gut virome. Proc. Natl Acad. Sci. USA 109, 3962–3966 (2012). - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources