UProC: tools for ultra-fast protein domain classification - PubMed (original) (raw)
UProC: tools for ultra-fast protein domain classification
Peter Meinicke. Bioinformatics. 2015.
Abstract
Motivation: With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics.
Results: The ultrafast protein classification (UProC) toolbox implements a novel algorithm ('Mosaic Matching') for large-scale sequence analysis. UProC is by three orders of magnitude faster than profile-based methods and in a metagenome simulation study achieved up to 80% higher sensitivity on unassembled 100 bp reads.
Availability and implementation: UProC is available as an open-source software at https://github.com/gobics/uproc. Precompiled databases (Pfam) are linked on the UProC homepage: http://uproc.gobics.de/.
Contact: peter@gobics.de.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press.
Figures
Fig. 1.
UProC workflow and Mosaic Matching sketch. For DNA input sequences, first all ORFs with at least 60 bp are identified, filtered and translated. The protein sequences then are analysed with the Mosaic Matching algorithm which compares all oligopeptides in the query sequence with oligopeptides from reference sequences in the database. From all matching reference oligopeptides with the same family label a maximum substitution score is computed for each residue and summed up over the whole sequence to provide the total Mosaic Matching score. If this score exceeds a length-dependent noise threshold the protein hit and the corresponding score is written to the output. The substitution scores that result from oligopeptide comparisons using PSSM are indicated by heatmap color (red:high, blue:low). The example shows all matching oligopeptides that contribute to the total score of Pfam family PF01370
Fig. 2.
Contributions of different word positions to PSSM in terms of the SSW obtained from regularized least-squares classifier training (see text)
Similar articles
- OrfM: a fast open reading frame predictor for metagenomic data.
Woodcroft BJ, Boyd JA, Tyson GW. Woodcroft BJ, et al. Bioinformatics. 2016 Sep 1;32(17):2702-3. doi: 10.1093/bioinformatics/btw241. Epub 2016 May 3. Bioinformatics. 2016. PMID: 27153669 Free PMC article. - MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.
Hauser M, Steinegger M, Söding J. Hauser M, et al. Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6. Bioinformatics. 2016. PMID: 26743509 - Struo: a pipeline for building custom databases for common metagenome profilers.
de la Cuesta-Zuluaga J, Ley RE, Youngblut ND. de la Cuesta-Zuluaga J, et al. Bioinformatics. 2020 Apr 1;36(7):2314-2315. doi: 10.1093/bioinformatics/btz899. Bioinformatics. 2020. PMID: 31778148 - MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.
Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, Yamashita H, Lam TW. Li D, et al. Methods. 2016 Jun 1;102:3-11. doi: 10.1016/j.ymeth.2016.02.020. Epub 2016 Mar 21. Methods. 2016. PMID: 27012178 Review. - Web Resources for Metagenomics Studies.
Dudhagara P, Bhavsar S, Bhagat C, Ghelani A, Bhatt S, Patel R. Dudhagara P, et al. Genomics Proteomics Bioinformatics. 2015 Oct;13(5):296-303. doi: 10.1016/j.gpb.2015.10.003. Epub 2015 Nov 18. Genomics Proteomics Bioinformatics. 2015. PMID: 26602607 Free PMC article. Review.
Cited by
- Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences.
Wemheuer F, Taylor JA, Daniel R, Johnston E, Meinicke P, Thomas T, Wemheuer B. Wemheuer F, et al. Environ Microbiome. 2020 May 18;15(1):11. doi: 10.1186/s40793-020-00358-7. Environ Microbiome. 2020. PMID: 33902725 Free PMC article. - The green impact: bacterioplankton response toward a phytoplankton spring bloom in the southern North Sea assessed by comparative metagenomic and metatranscriptomic approaches.
Wemheuer B, Wemheuer F, Hollensteiner J, Meyer FD, Voget S, Daniel R. Wemheuer B, et al. Front Microbiol. 2015 Aug 11;6:805. doi: 10.3389/fmicb.2015.00805. eCollection 2015. Front Microbiol. 2015. PMID: 26322028 Free PMC article. - Metagenomic Profiling of Ocular Surface Microbiome Changes in Meibomian Gland Dysfunction.
Zhao F, Zhang D, Ge C, Zhang L, Reinach PS, Tian X, Tao C, Zhao Z, Zhao C, Fu W, Zeng C, Chen W. Zhao F, et al. Invest Ophthalmol Vis Sci. 2020 Jul 1;61(8):22. doi: 10.1167/iovs.61.8.22. Invest Ophthalmol Vis Sci. 2020. PMID: 32673387 Free PMC article. - 16S rDNA profiling of Loach (Misgurnus anguillicus) fed with soybean fermented powder intestinal flora in response to Lipopolysaccharide (LPS) infection.
Dai W, Liu Y, Zhang X, Dai L. Dai W, et al. Heliyon. 2023 Nov 11;9(11):e22369. doi: 10.1016/j.heliyon.2023.e22369. eCollection 2023 Nov. Heliyon. 2023. PMID: 38053882 Free PMC article. - Comprehensive Longitudinal Microbiome Analysis of the Chicken Cecum Reveals a Shift From Competitive to Environmental Drivers and a Window of Opportunity for Campylobacter.
Ijaz UZ, Sivaloganathan L, McKenna A, Richmond A, Kelly C, Linton M, Stratakos AC, Lavery U, Elmi A, Wren BW, Dorrell N, Corcionivoschi N, Gundogdu O. Ijaz UZ, et al. Front Microbiol. 2018 Oct 15;9:2452. doi: 10.3389/fmicb.2018.02452. eCollection 2018. Front Microbiol. 2018. PMID: 30374341 Free PMC article.
References
- Eddy S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. - PubMed
- Fung G., Mangasarian O.L. (2001) Proximal support vector machine classifiers. In Proceedings KDD-2001: Knowledge Discovery and Data Mining, pp. 77–86.
- Gestel T.V., et al. . (2004) Benchmarking least squares support vector machine classifiers. Mach. Learn., 54, 5–32.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources