Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets (original) (raw)
Abstract
Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host cells. The latter scenario usually occurs in the case of host-associated metagenomes. Identification and removal of contaminating sequences is important, since these sequences not only impact estimates of microbial diversity but also affect the accuracy of several downstream analyses. Currently, the computational techniques used for identifying contaminating eukaryotic sequences, being alignment based, are slow, inefficient, and require huge computing resources. In this article, we present Eu-Detect, an alignment-free algorithm that can rapidly identify eukaryotic sequences contaminating metagenomic data sets. Validation results indicate that on a desktop with modest hardware specifications, the Eu-Detect algorithm is able to rapidly segregate DNA sequence fragments of prokaryotic and eukaryotic origin, with high sensitivity. A Web server for the Eu-Detect algorithm is available at http://metagenomics.atc.tcs.com/Eu-Detect/ .
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime Subscribe now
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
References
- Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ 1990 Basic local alignment search tool. J. Mol. Biol. 215 403–410
PubMed CAS Google Scholar - Diaz N, Krause L, Goesmann A, Niehaus K and Nattkemper T 2009 TACOA-Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinfo 10 56
Article CAS Google Scholar - Hartigan JA and Wong MA 1979 A K-Means Clustering Algorithm. App. Stat. 28 100–108
Article Google Scholar - Lopez-Garcia P, Rodriguez-Valera F, Pedros-Alio C and Moreira D 2001 Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature (London) 409 603–607
Article CAS Google Scholar - Mardia KV, Kent JT and Bibby JM 1979 Multivariate analysis (Academic Press)
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, et al. 2005 Genome sequencing in micro-fabricated high-density pico-litre reactors. Nature (London) 437 376–380
CAS Google Scholar - Moon-Van Der Staay SY, Wachter RD and Vaulot D 2001 Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature (London) 409 607–610
Article CAS Google Scholar - Piganeau G, Desdevises Y, Derelle E and Moreau H 2008 Picoeukaryotic sequences in the Sargasso Sea metagenome. Genome Biol. 9 R5
Article PubMed CAS Google Scholar - Pride DT, Meinersmann RJ, Wassenaar TM and Blaser MJ 2003 Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res. 13 145–158
Article PubMed CAS Google Scholar - Richter DC, Ott F, Auch AF, Schmid R and Huson DH 2008 MetaSim – A sequencing simulator for genomics and metagenomics. PLoS One 3 e3373
Article PubMed CAS Google Scholar - Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, et al. 2007 The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 5 e77
Article PubMed CAS Google Scholar - Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, et al. 1977 The nucleotide sequence of bacteriophage phi X174 DNA. Nature (London) 265 687–695
Article CAS Google Scholar - Scanlan PD and Marchesi JR 2008 Micro-eukaryotic diversity of the human distal gut microbiota: qualitative assessment using culture-dependent and independent analysis of faeces. ISME J. 2 1183–1193
Article PubMed CAS Google Scholar - Schmieder R and Edwards R 2011 Fast identification and removal of sequence contamination from genomic and metagenomic data sets. PLoS One, 6 e17288
Article PubMed CAS Google Scholar - Teeling H, Meyerdierks A, Bauer M, Amann R and Glockner FO 2004 Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6 938–947
Article PubMed CAS Google Scholar - Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, et al. 2004 Environmental genome shotgun sequencing of the Sargasso sea. Science 304 66–74
Article PubMed Google Scholar - Warnecke F, Luginbühl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT, Cayouette M, McHardy AC, et al. 2007 Metagenomic and functional analysis of hindgut micro-biota of a wood-feeding higher termite. Nature(London) 450 560–565
Article CAS Google Scholar - Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, Silva J, Tammadoni S, Nosrat B, et al. 2009 Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One 4 e7370
Article PubMed CAS Google Scholar - Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, et al. 2007 The Sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 5 e16
Article PubMed CAS Google Scholar - Zhang Z, Schwartz S, Wagner L and Miller W 2000 A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7 203–214
Article PubMed CAS Google Scholar
Author information
Authors and Affiliations
- Bio-Sciences R&D Division, TCS Innovation Labs, Tata Consultancy Services Limited, Hyderabad, 500 081, India
Monzoorul Haque Mohammed, Sudha Chadaram, Dinakar Komanduri, Tarini Shankar Ghosh & Sharmila S Mande
Authors
- Monzoorul Haque Mohammed
You can also search for this author inPubMed Google Scholar - Sudha Chadaram
You can also search for this author inPubMed Google Scholar - Dinakar Komanduri
You can also search for this author inPubMed Google Scholar - Tarini Shankar Ghosh
You can also search for this author inPubMed Google Scholar - Sharmila S Mande
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toSharmila S Mande.
Additional information
Corresponding editor: REINER A VEITIA
[Mohammed MH, Chadaram S, Komanduri D, Ghosh TS and Mande SS 2011 Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets. J. Biosci. 36 709–717] DOI 10.1007/s12038-011-9105-2
Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/Sep2011/pp709–717/suppl.pdf
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Mohammed, M.H., Chadaram, S., Komanduri, D. et al. Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets.J Biosci 36, 709–717 (2011). https://doi.org/10.1007/s12038-011-9105-2
- Received: 01 February 2011
- Accepted: 31 May 2011
- Published: 10 September 2011
- Issue Date: September 2011
- DOI: https://doi.org/10.1007/s12038-011-9105-2