Clades of huge phages from across Earth's ecosystems - PubMed (original) (raw)

. 2020 Feb;578(7795):425-431.

doi: 10.1038/s41586-020-2007-4. Epub 2020 Feb 12.

Rohan Sachdeva # 1, Lin-Xing Chen 1, Fred Ward 1, Patrick Munk 2, Audra Devoto 1, Cindy J Castelle 1, Matthew R Olm 1, Keith Bouma-Gregson 3, Yuki Amano 4, Christine He 1, Raphaël Méheust 1, Brandon Brooks 1, Alex Thomas 1, Adi Lavy 1, Paula Matheus-Carnevali 1, Christine Sun 5, Daniela S A Goltsman 5, Mikayla A Borton 6, Allison Sharrar 3, Alexander L Jaffe 1, Tara C Nelson 7, Rose Kantor 1, Ray Keren 1, Katherine R Lane 1, Ibrahim F Farag 1, Shufei Lei 3, Kari Finstad 8, Ronald Amundson 8, Karthik Anantharaman 3, Jinglie Zhou 9, Alexander J Probst 1, Mary E Power 10, Susannah G Tringe 9, Wen-Jun Li 11, Kelly Wrighton 6, Sue Harrison 12, Michael Morowitz 13, David A Relman 5, Jennifer A Doudna 1, Anne-Catherine Lehours 14, Lesley Warren 7, Jamie H D Cate 1, Joanne M Santini 15, Jillian F Banfield 16 17 18 19

Affiliations

Clades of huge phages from across Earth's ecosystems

Basem Al-Shayeb et al. Nature. 2020 Feb.

Abstract

Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems.

PubMed Disclaimer

Conflict of interest statement

The Regents of the University of California have patents pending for CRISPR technologies on which the authors are inventors. J.A.D. is a co-founder of Caribou Biosciences, Editas Medicine, Intellia Therapeutics, Scribe Therapeutics and Mammoth Biosciences, a scientific advisory board member of Caribous Biosciences, Intellia Therapeutics, eFFECTOR Therapeutics, Scribe Therapeutics, Synthego, Mammoth Biosciences and Inari, and is a Director at Johnson & Johnson and has sponsored research projects by Biogen and Pfizer. J.F.B. is a founder of Metagenomi.

Figures

Fig. 1

Fig. 1. Distribution of the genome sizes and tRNAs of phages.

a, Size distribution of circularized bacteriophage genomes from this study, Lak megaphage genomes reported recently for a subset of the same samples and reference sources. Reference genomes were collected from all complete RefSeq r92 dsDNA genomes and non-artefactual assemblies with lengths of more than 200 kb from a previous study. b, Histogram of the genome size distribution of phages with genomes of more than 200 kb from this study, Lak and reference genomes. Box-and-whisker plot of tRNA counts per genome from this study and Lak phages as a function of genome size (Spearman’s ρ = 0.61, P = 4.5 × 10−22, n = 201 individual phage genomes). The middle line for each box marks the median tRNA count for each size bin, the box marks the interquartile range, and the whiskers represent the maximum and minimum.

Fig. 2

Fig. 2. Phylogenetic reconstruction of the evolutionary history of huge phages.

The phylogeny of phages was reconstructed using large terminase sequences from this study (n = 397) and similar matches from all RefSeq r92 proteins (n = 532). The tree also includes large terminase sequences from complete RefSeq phage, the Lak megaphage clade (n = 9) and non-artefactual phage genomes that are more than 200 kb, from a previous study. Huge phage clades identified in this study were independently corroborated with a phylogenetic reconstruction of major capsid protein (MCP) genes (Extended Data Fig. 5a) and protein clustering (Extended Data Fig. 5b). The tree was rooted using eukaryotic herpesvirus terminases (n = 7). The inner to outer rings display the presence of CRISPR–Cas in this study, host phylum, environmental sampling type and genome size. Host phylum and genome size were not included for RefSeq protein database matches for which the sequence may be from an integrated prophage or part of organismal genome projects. Scale bars show the number of substitutions per site (left) and number of base pairs (right).

Fig. 3

Fig. 3. A model for phage interception and redirection of host translational systems.

Potential mechanisms for how phage-encoded capacities could function to redirect the translational system of the host to produce phage proteins (bacterial components in blue, phage proteins in red). No huge phage encodes all translation-related genes, but many have tRNAs and tRNA synthetases (Supplementary Table 6). Phage proteins with up to six ribosomal protein S1 domains occur in a few genomes. The S1 binds to mRNA to bring it into the site on the ribosome where it is decoded. Phage ribosomal protein S21 might promote translation initiation of phage mRNAs, and many sequences have N-terminal extensions that may be involved in binding RNA (dashed blue line in ribosome insert (RCSB Protein Data Bank (PDB) code: 6BU8)), analysed with UCSF Chimera. Many other proteins of the translational apparatus that belong to all steps of the translation cycle are encoded by huge phages. aaRS, aminoacyl-tRNA synthetase; CCA-adding, tRNA nucleotidyltransferase; EF, elongation factor; IF, initiation factor; PDF, peptide deformylase; QueC/D/F, queuosine synthesis and tRNA modification; RF, release factor; RNA Pol, RNA polymerase; RRF, ribosome recycling factor; TFIIB, transcription factor II B.

Fig. 4

Fig. 4. Phage and bacterial CRISPR-interaction dynamics.

a, Cell diagram of bacterium–phage and phage–phage interactions that involve CRISPR targeting during superinfection. Arrows indicate CRISPR–Cas targeting of the prophage and phage genomes. Phage names indicate related groups delineated by whole-genome alignment. We only included CRISPR interactions from samples of subjects of the same human cohort. b, Maximum likelihood phylogenetic tree of Cas12 subtypes a–i. Phage-encoded Cas12i and CasΦ, the new effector, are outlined in red, with bacteria-encoded proteins in blue. Bootstrap values >90 are shown on the branches (circles). Cas14 and type V-U trees are provided separately (Supplementary Fig. 11). Scale bars indicates the number of substitutions per site. c, Top, alignment of the consensus repeats from the A9 phage array and predicted host bacterial arrays. Bottom, interaction network showing the targeting of bacteria-encoded (blue) and phage-encoded (red) CRISPR spacers. The number of edges indicate the number of spacers from the array with targets to the smaller node. Solid edges denote spacer targets with no or one mismatch, and dashed edges denote two to three mismatches (to account for degeneration in old-end phage spacers, diversity in different subjects or phage mutation to avoid targeting).

Extended Data Fig. 1

Extended Data Fig. 1. Graphical abstract describing the approach and main findings of this study.

aaRS, aminoacyl-tRNA synthetase; CCA-adding, tRNA nucleotidyltransferase; EF, elongation factor; IF, initiation factor; PDF, peptide deformylase; QueC/D/F, queuosine synthesis and tRNA modification; RF, release factor; RNA Pol, RNA polymerase; RRF, ribosome recycling factor; TFIIB, transcription factor II B.

Extended Data Fig. 2

Extended Data Fig. 2. Ecosystems with phage genomes and plasmid-like sequences of more than 200 kb.

Genomes grouped by sampling-site type. Each box represents a phage genome or plasmid-like sequence, and boxes are horizontally arranged in order of decreasing genome size. The size range for each site type is listed to the right. Colours indicate putative host phylum on the basis of genome taxonomic profile, with confirmation by CRISPR spacer targeting (X) or rpS21 (+).

Extended Data Fig. 3

Extended Data Fig. 3. Examples of phage genomes that display GC skew indicative of bidirectional replication.

a, b, Example phage genomes with GC skew patterns that are strongly indicative of bidirectional replication (origin-to-terminus) that is typically found in bacteria (however, the origin may not correspond to the start of the genome). c, d, Phage genomes with GC skew patterns that are suggestive of unidirectional patterns.

Extended Data Fig. 4

Extended Data Fig. 4. Example of the alternative coding of phages.

Comparisons of gene predictions for a region with genes of clearly predicted function in M05_PHAGE_COMPLETE_32_3. a, The standard (code 11) genetic code. b, Both TAG and TAA repurposed (code 6). c, TAG repurposed (code 16). Overall, analysis of well-annotated genes supported code 16 as the best choice (TAG to X, as X could not be clearly resolved on the basis of sequence alignments with related proteins).

Extended Data Fig. 5

Extended Data Fig. 5. Phylogenetic and protein-cluster relationships between phages.

a, The phylogenetic tree of phages based on the MCPs. The outer ring shows genome length; bars in red indicate genomes reconstructed and reported in this study and bars in blue indicate database genomes. The next ring indicates the environment of origin. The inner ring indicates the phylum of the host (black indicates unknown). Superimposed colours indicate named clades that consist of huge phages that were identified in the terminase tree. Colours are as in Fig. 2. b, Hierarchical clustering dendrogram of phage genomes based on the Jaccard distance between the presence or absence profiles of protein families, performed using an average linkage method. The outermost ring shows phage genome length, the next ring shows the environment of origin, then predicted phylum affiliation of bacterial hosts. Superimposed colours indicate named clades that consist of huge phages that were identified in the terminase tree. Colours are as in Fig. 2. The clustering supports the phylogenetic analyses shown in a and Fig. 2.

Extended Data Fig. 6

Extended Data Fig. 6. Protein-clustering network for phages and plasmids.

Network analysis using vContact2 and Cytoscape based on the number of shared protein clusters between the genomes in this study, RefSeq prokaryotic virus genomes and 400 randomly sampled plasmid sequences from RefSeq. Each node represents a genome and each edge is the hypergeometric similarity (>30) between genomes based on shared protein clusters. This analysis was used to help to distinguish between the classification of genomes as phage, plasmid or unknown.

Extended Data Fig. 7

Extended Data Fig. 7. Phylogenetic analysis of tRNA synthetase.

a, Aminoacyl tRNA synthetases were detected in many huge phages reported in this study (Supplementary Table 6). The phylogenetic subtree for glutamate-tRNA synthetase sequences from phages (red text and small triangles) that place within or close to sequences from Bacteroidetes hosts is shown as an example. Bacterial sequences from public databases are indicated by black text and those from metagenomes from which huge phage genomes were reconstructed are indicated by blue text. Coloured circles indicate the predicted phylum of the bacterial host for each phage. b, Phylogenetic tree of phage-encoded ribosomal protein S21 and the top RefSeq hits for each protein, constructed using IQTREE. Sequences from this study are indicated by red branches.

Extended Data Fig. 8

Extended Data Fig. 8. Phylogenetic trees of Cas14, CRISPR–Cas type V-U and Cas9.

a, Phylogenetic tree for Cas14 and type V-U. b, Phylogenetic tree for Cas9. Sequences from this study are indicated by red branches.

Extended Data Fig. 9

Extended Data Fig. 9. Variant type I CRISPR–Cas system and Cas4-like proteins found in the genomes of huge phages.

a, Locus architecture for type-I variant CRISPR phages. An interesting type-I system identified in huge phages lacks Cas6 but has Cas5, which is most similar to the Cas5d protein from type I-C, in which Cas5d acts as the pre-crRNA endonuclease (a role commonly reserved for Cas6). The proposed active site residues of Cas5d are to some extent different in the Cas5 of this system, although this may still confer processing activity, as this change is also observed in other Cas6 homologues. b, Phylogenetic tree of superfamily 1–6 helicases, including Cas3 and the unidentified helicase in the type I-C variant system. Sequences from this study are indicated by red branches. c, Phylogenetic tree of Cas4, Cas4-like proteins from the phage and plasmid genomes reported here, and the top 50 RefSeq hits to the Cas4-like proteins. Cas4-like genes from this study are denoted by red circles.

Extended Data Fig. 10

Extended Data Fig. 10. Distribution of phage- and plasmid-encoded CRISPR array sizes.

The indicated count is the number of recovered repeats.

Similar articles

Cited by

References

    1. Yuan Y, Gao M. Jumbo bacteriophages: an overview. Front. Microbiol. 2017;8:403. - PMC - PubMed
    1. Breitbart M, Bonnain C, Malki K, Sawaya NA. Phage puppet masters of the marine microbial realm. Nat. Microbiol. 2018;3:754–766. - PubMed
    1. Rascovan N, Duraisamy R, Desnues C. Metagenomics and the human virome in asymptomatic individuals. Annu. Rev. Microbiol. 2016;70:125–141. - PubMed
    1. Emerson JB, et al. Host-linked soil viral ecology along a permafrost thaw gradient. Nat. Microbiol. 2018;3:870–880. - PMC - PubMed
    1. Balcazar JL. Bacteriophages as vehicles for antibiotic resistance genes in the environment. PLoS Pathog. 2014;10:e1004219. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources