Computational approaches to predict bacteriophage-host relationships - PubMed (original) (raw)

Review

Computational approaches to predict bacteriophage-host relationships

Robert A Edwards et al. FEMS Microbiol Rev. 2016 Mar.

Abstract

Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus-host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage-host signals. Sequence homology approaches are the most effective at identifying known phage-host pairs. Compositional and abundance-based methods contain significant signal for phage-host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage-host relationships, with potential relevance for medical and industrial applications.

Keywords: CRISPR; co-occurrence; metagenomics; oligonucleotide usage; phages; viruses of microbes.

PubMed Disclaimer

Figures

Figure 1.

ROC curves displaying the classification accuracy of computational phage–host prediction approaches. (A) Pearson correlation of phage and bacterial abundance profiles across environments; (B) overall alignment length of blastn hits between phage and bacterial genome sequences; (C) number of matching proteins in blastx search of phage DNA to bacterial proteins; (D) percent identity of CRISPR spacers aligned to phage genomes; (E) number of matching CRISPR spacers in phage genomes; (F) length of longest exact match between phage and bacterial genomes; (G) Pearson correlation of oligonucleotide usage profiles (tetramers, k = 4, for other lengths of k, see Fig. S2, Supporting Information); (H) similarity in codon usage profiles of phage and bacterial coding regions; (I) similarity in GC content between phage and bacterial genomes. Note that in some ROC plots, the TP and FP rates do not continue to FP rate = 1; TP rate = 1. In those cases, we used cutoffs for assignment of a hit.

Figure 2.

The identification of the number of phages matching a CRISPR spacer in a bacterial genome depends on the number of mismatches between the spacer and the phage genome. (A) Number of phages that match at least one CRISPR spacer in a given host; (B) number of phages that match at least two CRISPR spacers in a given host. Incorrect host predictions are shown with solid bars and correct host predictions are shown with grey bars.

Figure 3.

Histogram showing the length of the longest exact match for each phage, divided into correct and incorrect hosts. The approximate size range of several mechanisms leading to exact matches between phage and bacterial genomes are indicated. Note that multiple bacterial genomes can have the same longest exact match with a given phage, in which case they are all included.

Figure 4.

Percentage of phages with a correctly predicted bacterial species among the top scoring hosts using the different computational phage–host prediction approaches. Only the highest scoring bacteria were included, but if multiple top scoring hosts were present, the prediction was scored as correct if the correct host was among the predicted hosts. For details, including the percentage of phages with a correctly predicted host at different taxonomic levels, see Tables S1–18 (Supporting Information).

Cited by

Diversity and distribution of viruses inhabiting the deepest ocean on Earth.
Jian H, Yi Y, Wang J, Hao Y, Zhang M, Wang S, Meng C, Zhang Y, Jing H, Wang Y, Xiao X. Jian H, et al. ISME J. 2021 Oct;15(10):3094-3110. doi: 10.1038/s41396-021-00994-y. Epub 2021 May 10. ISME J. 2021. PMID: 33972725 Free PMC article.
A network-based integrated framework for predicting virus-prokaryote interactions.
Wang W, Ren J, Tang K, Dart E, Ignacio-Espinoza JC, Fuhrman JA, Braun J, Sun F, Ahlgren NA. Wang W, et al. NAR Genom Bioinform. 2020 Jun;2(2):lqaa044. doi: 10.1093/nargab/lqaa044. Epub 2020 Jun 23. NAR Genom Bioinform. 2020. PMID: 32626849 Free PMC article.
Ecology of inorganic sulfur auxiliary metabolism in widespread bacteriophages.
Kieft K, Zhou Z, Anderson RE, Buchan A, Campbell BJ, Hallam SJ, Hess M, Sullivan MB, Walsh DA, Roux S, Anantharaman K. Kieft K, et al. Nat Commun. 2021 Jun 9;12(1):3503. doi: 10.1038/s41467-021-23698-5. Nat Commun. 2021. PMID: 34108477 Free PMC article.
DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes.
Wei T, Lu C, Du H, Yang Q, Qi X, Liu Y, Zhang Y, Chen C, Li Y, Tang Y, Zhang WH, Tao X, Jiang N. Wei T, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae484. doi: 10.1093/bib/bbae484. Brief Bioinform. 2024. PMID: 39344712 Free PMC article.
Host-linked soil viral ecology along a permafrost thaw gradient.
Emerson JB, Roux S, Brum JR, Bolduc B, Woodcroft BJ, Jang HB, Singleton CM, Solden LM, Naas AE, Boyd JA, Hodgkins SB, Wilson RM, Trubl G, Li C, Frolking S, Pope PB, Wrighton KC, Crill PM, Chanton JP, Saleska SR, Tyson GW, Rich VI, Sullivan MB. Emerson JB, et al. Nat Microbiol. 2018 Aug;3(8):870-880. doi: 10.1038/s41564-018-0190-y. Epub 2018 Jul 16. Nat Microbiol. 2018. PMID: 30013236 Free PMC article.

References

1. Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40:e126. - PMC - PubMed
1. Allers E, Moraru C, Duhaime MB, et al. Single-cell and population level viral infection dynamics revealed by phageFISH, a method to visualize intracellular and free viruses. Environ Microbiol. 2013;15:2306–18. - PMC - PubMed
1. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. - PubMed
1. Anderson RE, Brazelton WJ, Baross JA. Using CRISPRs as a metagenomic tool to identify microbial hosts of a diffuse flow hydrothermal vent viral assemblage. FEMS Microbiol Ecol. 2011;77:120–33. - PubMed
1. Andersson AF, Banfield JF. Virus population dynamics and acquired virus resistance in natural microbial communities. Science. 2008;320:1047–50. - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Computational approaches to predict bacteriophage-host relationships - PubMed (original) (raw)