MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data - PubMed (original) (raw)

MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data

Ankit Gupta et al. PLoS One. 2014.

Abstract

The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Steps used by HMM module for the prediction of pathogenic protein.

Figure 2

Figure 2. Prediction of pathogenic or nonpathogenic proteins using MP3.

HS: predictions from both HMM and SVM are in consensus, H: prediction is based only on HMM module, S: prediction is based only on SVM module.

References

    1. Dobrindt U, Hacker J (2001) Whole genome plasticity in pathogenic bacteria. Curr Opin Microbiol 4: 550–557. - PubMed
    1. Maurelli AT, Fernandez RE, Bloch CA, Rode CK, Fasano A (1998) “Black holes” and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. Proc Natl Acad Sci U S A 95: 3943–3948. - PMC - PubMed
    1. Brenner DJ, Fanning GR, Johnson KE, Citarella RV, Falkow S (1969) Polynucleotide sequence relationships among members of Enterobacteriaceae. J Bacteriol 98: 637–650. - PMC - PubMed
    1. Aistleitner K, Heinz C, Hormann A, Heinz E, Montanaro J, et al. (2013) Identification and characterization of a novel porin family highlights a major difference in the outer membrane of chlamydial symbionts and pathogens. PLoS One 8: e55010. - PMC - PubMed
    1. Araujo Gde S, Fonseca FL, Pontes B, Torres A, Cordero RJ, et al. (2012) Capsules from pathogenic and non-pathogenic Cryptococcus spp. manifest significant differences in structure and ability to protect against phagocytic cells. PLoS One 7: e29561. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

The work was supported by Institutional Research Fund of IISER Bhopal. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources