RNA viromes from terrestrial sites across China expand environmental viral diversity (original) (raw)
Data availability
The sequence reads generated in this study are available at the NCBI Sequence Read Archive (SRA) database under BioProject accession PRJNA716119. All viral sequences generated in this study have been deposited in GenBank under accession numbers (https://www.ncbi.nlm.nih.gov/nuccore?term=716119%5BBioProject%5D) MW784004-MW784109, MW896840-MW897324, MZ218144-MZ218759, MZ556337-MZ556592, MZ678955-MZ680357, ON049747-ON050964, ON161767-ON164489. All other data are available in the paper or in the supplementary materials. The CheckV database used for viral genome quality and completeness estimation can be accessed via https://bitbucket.org/berkeleylab/checkv. The Conserved Domain Database (CDD) used for ORF annotation can be accessed via https://www.ncbi.nlm.nih.gov/cdd/. The UniRef30_2021_03 database used in HHblits analysis can be accessed via http://wwwuser.gwdg.de/~compbiol/uniclust/2021_03/. The SILVA database used for rRNA removal can be accessed via https://www.arb-silva.de/. Source data are provided with this paper.
References
- Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).
Article CAS PubMed Google Scholar - Zhang, Y.-Z., Shi, M. & Holmes, E. C. Using metagenomics to characterize an expanding virosphere. Cell 172, 1168–1172 (2018).
Article CAS PubMed Google Scholar - Li, C.-X. et al. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015).
Article PubMed Central Google Scholar - Starr, E. P., Nuccio, E. E., Pett-Ridge, J., Banfield, J. F. & Firestone, M. K. Metatranscriptomic reconstruction reveals RNA viruses with the potential to shape carbon cycling in soil. Proc. Natl Acad. Sci. USA 116, 25900–25908 (2019).
Article CAS PubMed PubMed Central Google Scholar - Wolf, Y. I. et al. Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome. Nat. Microbiol. 5, 1262–1270 (2020).
Article CAS PubMed PubMed Central Google Scholar - Zayed, A. A. et al. Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome. Science 376, 156–162 (2022).
Article CAS PubMed Google Scholar - Simmonds, P. et al. Virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).
Article CAS PubMed Google Scholar - Trubl, G., Hyman, P., Roux, S. & Abedon, S. T. Coming-of-age characterization of soil viruses: a user’s guide to virus isolation, detection within metagenomes, and viromics. Soil Syst. 4, 23 (2020).
Article CAS Google Scholar - Jin, M. et al. Diversities and potential biogeochemical impacts of mangrove soil viruses. Microbiome 7, 58 (2019).
Article PubMed PubMed Central Google Scholar - Trubl, G. et al. Soil viruses are underexplored players in ecosystem carbon processing. mSystems 3, e00076-18 (2018).
Article CAS PubMed PubMed Central Google Scholar - Steward, G. F. et al. Are we missing half of the viruses in the ocean? ISME J. 7, 672–679 (2013).
Article CAS PubMed Google Scholar - Paul, K. I., Scott Black, A. & Conyers, M. K. in Advances in Agronomy. Sparks, D.L., Vol. 78 187–214 (Elsevier, 2003).
- Urayama, S., Takaki, Y. & Nunoura, T. FLDS: a comprehensive dsRNA sequencing method for intracellular RNA virus surveillance. Microbes Environ. 31, 33–40 (2016).
Article PubMed PubMed Central Google Scholar - Armbrust, E. V. The life of diatoms in the world’s oceans. Nature 459, 185–192 (2009).
Article CAS PubMed Google Scholar - Wu, W., Jin, Y., Bai, F. & Jin, S. in Molecular Medical Microbiology. Tang, Y.W., Liu, D., Schwartzman, J., Sussman, M., Poxton, I., 753–767 (Elsevier, 2015).
- Cooney, S., O’Brien, S., Iversen, C. & Fanning, S. in Encyclopedia of Food Safety. Motarjemi, Y., 433–441 (Elsevier, 2014).
- Geoghegan, J. L. et al. Hidden diversity and evolution of viruses in market fish. Virus Evol. 4, vey031 (2018).
Article PubMed PubMed Central Google Scholar - Lauber, C. et al. Deciphering the origin and evolution of hepatitis B viruses by means of a family of non-enveloped fish viruses. Cell Host Microbe 22, 387–399.e6 (2017).
Article CAS PubMed PubMed Central Google Scholar - Shi, M., Zhang, Y.-Z. & Holmes, E. C. Meta-transcriptomics and the evolutionary biology of RNA viruses. Virus Res. 243, 83–90 (2018).
Article CAS PubMed Google Scholar - Turnbull, O. M. H. et al. Meta-transcriptomic identification of divergent Amnoonviridae in Fish. Viruses 12, 1254 (2020).
Article CAS PubMed Central Google Scholar - Bauermann, F. V., Hause, B., Buysse, A. R., Joshi, L. R. & Diel, D. G. Identification and genetic characterization of a porcine hepe-astrovirus (bastrovirus) in the United States. Arch. Virol. 164, 2321–2326 (2019).
Article CAS PubMed Google Scholar - Oude Munnink, B. B. et al. A novel astrovirus-like RNA virus detected in human stool. Virus Evol. 2, vew005 (2016).
Article PubMed PubMed Central Google Scholar - Williamson, K. E. et al. Estimates of viral abundance in soils are strongly influenced by extraction and enumeration methods. Biol. Fertil. Soils 49, 857–869 (2013).
Article Google Scholar - Wang, C., Liu, D. & Bai, E. Decreasing soil microbial diversity is associated with decreasing microbial biomass under nitrogen addition. Soil Biol. Biochem. 120, 126–133 (2018).
Article CAS Google Scholar - Wang, Q. et al. Effects of nitrogen and phosphorus inputs on soil bacterial abundance, diversity, and community composition in Chinese fir plantations. Front. Microbiol. 9, 1543 (2018).
Article PubMed PubMed Central Google Scholar - Payne, S. in Viruses. Payne, S., 219–226 (Elsevier, 2017).
- Hillman, B. I. & Cai, G. The family Narnaviridae. Adv. Virus Res. 86, 149–176 (2013).
Article PubMed Google Scholar - Wolf, Y. I. et al. Origins and evolution of the global RNA virome. mBio 9, e02329-18 (2018).
Article PubMed PubMed Central Google Scholar - Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar - Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Article CAS PubMed Google Scholar - Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
Article CAS PubMed PubMed Central Google Scholar - Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar - Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar - Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article CAS PubMed PubMed Central Google Scholar - Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Article CAS PubMed Google Scholar - Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analysis in R. Bioinformatics 35, 526–528 (2019).
Article CAS PubMed Google Scholar - Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Article Google Scholar - Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar - Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
Article CAS PubMed Google Scholar - Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420–423 (2019).
Article CAS PubMed Google Scholar - Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. L. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).
Article CAS PubMed Google Scholar - Gupta, R., Jung, E. & Brunak, S. NetNGlyc 1.0 Server (2017). DTU Health Tech. http://www.cbs.dtu.dk/services/NetNGlyc/
- Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).
Article CAS PubMed Google Scholar - Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).
Article CAS Google Scholar - Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2012).
Article PubMed PubMed Central Google Scholar - Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Article CAS PubMed PubMed Central Google Scholar - Lagkouvardos, I., Fischer, S., Kumar, N. & Clavel, T. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons. PeerJ 5, e2836 (2017).
Article PubMed PubMed Central Google Scholar - McLeod, A., Xu, C. & Lai, Y. Package ‘bestglm’. CRAN. (2020).
- Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Acknowledgements
We thank D.-X. Wang, K. Li, W.-B. Zhao, X.-N. Diao, A.-J. Gong, Y.-L. Zhang, J.-B. Wang, H. Luo, D.-A. Zhang, Y.-Q. Zhao and M.-Li for their contributions to sample collection, and X.-Q. Luo, R.-X. Hu, M.-Z. Liu, J. Liu, Y. Jiang, J.-J. Guo, J.-J. Wang and P. Lu for assisting with PCR confirmations. This study was supported by the National Natural Science Foundation of China (grant nos. 32130002, 31930001, 32041004, 81861138003 and 81672057 to Y.-Z.Z) and the National Key R&D Program of China (2016YFC1201900 to Y.-Z.Z). E.C.H was supported by an ARC Australian Laureate Fellowship (FL170100022).
Author information
Author notes
- These authors contributed equally Yan-Mei Chen, Sabrina Sadiq.
Authors and Affiliations
- Shanghai Public Health Clinical Center, Shanghai Key Laboratory of Organ Transplantation of Zhongshan Hospital, State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
Yan-Mei Chen, Wen Wang, Edward C. Holmes & Yong-Zhen Zhang - Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, Australia
Sabrina Sadiq, Michelle Wille & Edward C. Holmes - Wuhan Center for Disease Control and Prevention, Wuhan, Hubei, China
Jun-Hua Tian - College of Marine Sciences, South China Agricultural University, Guangzhou, Guangdong, China
Xiao Chen - Wenzhou Center for Disease Control and Prevention, Wenzhou, Zhejiang, China
Xian-Dan Lin - Yancheng Center for Disease Control and Prevention, Yancheng, Jiangsu, China
Jin-Jin Shen & Feng Li - Jiangsu Yancheng Wetland National Nature Reserve of Rare Birds, Yangcheng, Jiangsu, China
Hao Chen - Henan Center for Disease Control and Prevention, Zhengzhou, Henan, China
Zong-Yu Hao - Professional Committee of Native Aquatic Organisms and Water Ecosystem of China Fisheries Association, Beijing, China
Zhuo-Cheng Zhou - Jiyuan People’s Hospital, Jiyuan, Henan, China
Jun Wu - Neixiang Center for Disease Control and Prevention, Nanyang, Henan, China
Hong-Wei Wang - College of Ocean and Earth Science, Xiamen University, Xiamen, Fujian, China
Wei-Di Yang - Yili Prefecture Center for Disease Control and Prevention, Yili, China
Qi-Yi Xu - Department of Zoonosis, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
Wen Wang & Wen-Hua Gao
Authors
- Yan-Mei Chen
- Sabrina Sadiq
- Jun-Hua Tian
- Xiao Chen
- Xian-Dan Lin
- Jin-Jin Shen
- Hao Chen
- Zong-Yu Hao
- Michelle Wille
- Zhuo-Cheng Zhou
- Jun Wu
- Feng Li
- Hong-Wei Wang
- Wei-Di Yang
- Qi-Yi Xu
- Wen Wang
- Wen-Hua Gao
- Edward C. Holmes
- Yong-Zhen Zhang
Contributions
Y.-Z.Z. conceived and designed the study. Y.-M.C., J.-H.T., X.C., X.-D.L., J.-J.S., H.C., Z.-Y.H., W.-D.Y., Z.-C.Z., J.W., F.L., H.-W.W. and Q.-Y.X. performed sample collection and geographic information recording. S.S., Y.-M.C., M.W. and E.C.H. analysed the data. Y.-M.C., W.-H.G. and W.W. performed the experiments. S.S., Y.-M.C., E.C.H. and Y.-Z.Z. wrote the paper with input from all authors. Y.-Z.Z. led the study.
Corresponding author
Correspondence toYong-Zhen Zhang.
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1
Environments and Chinese provinces sampled in this study.
Extended Data Fig. 2 Viral composition of each library.
Relative proportions were determined by the number of reads corresponding to contigs with viral hits to each viral clade as a proportion of total viral reads in each of 442 biologically independent samples (analysis performed using DIAMOND BLASTX).
Extended Data Fig. 3 Ecological factors significantly associated with viral abundance in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral abundance with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6a). Boxplots represent abundance values plotted against (B) environment, (C) location, (D) total phosphorus, (E) total potassium, (F) available phosphorus, (G) available potassium, (H) organic content, (I) eukaryote species abundance, and (J) eukaryote species richness. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 4 Ecological factors significantly associated with viral abundance in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral abundance with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7a), along with the abundance values plotted against (B) location, (C) organic content, and (D) eukaryote species abundance. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 5 Ecological factors significantly associated with Shannon diversity in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on Shannon diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6c). Boxplots represent Shannon diversity values plotted against (B) environment, (C) location, (D) pH, (E) total nitrogen, (F) total potassium, (G) available nitrogen, (H) organic content, and (I) eukaryote species richness. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 6 Ecological factors significantly associated with true diversity in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on true diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6d), along with the true diversity values plotted against (B) environment, (C) location, (D) pH, (E) total nitrogen, (F) total potassium, (G) available nitrogen, (H) organic content, (I) eukaryote species richness, and (J) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 7 Ecological factors significantly associated with Shannon diversity in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on Shannon diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7c), along with the Shannon diversity values plotted against (B) environment. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 8 Ecological factors significantly associated with true diversity in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on true diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7d), along with the true diversity values plotted against (B) environment. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 9 Ecological factors significantly associated with viral richness in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral richness with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6b). Boxplots represent richness values plotted against (B) environment, (C) location, (D) pH, (E) total phosphorus, (F) available nitrogen, (G) organic content, (H) eukaryote species richness, (I) eukaryote species Shannon diversity, and (J) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 10 Ecological factors significantly associated with viral richness in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral richness with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7b), along with the richness values plotted against (B) environment, (C) location, (D) pH, (E) total phosphorus, (F) organic content, (G) eukaryote species richness, and (H) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Supplementary information
Source data
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, YM., Sadiq, S., Tian, JH. et al. RNA viromes from terrestrial sites across China expand environmental viral diversity.Nat Microbiol 7, 1312–1323 (2022). https://doi.org/10.1038/s41564-022-01180-2
- Received: 25 January 2022
- Accepted: 21 June 2022
- Published: 28 July 2022
- Version of record: 28 July 2022
- Issue date: August 2022
- DOI: https://doi.org/10.1038/s41564-022-01180-2