A Hidden Markov Model Method, Capable of Predicting and Discriminating β-Barrel Outer Membrane Proteins (original) (raw)

The Prediction of Membrane Protein Structure and Genome Structural Annotation

Comparative and Functional Genomics, 2003

New methods, essentially based on hidden Markov models (HMM) and neural networks (NN), can predict the topography of both β-barrel and all-α membrane proteins with high accuracy and a low rate of false positives and false negatives. These methods have been integrated in a suite of programs to filter proteomes of Gram-negative bacteria, searching for new membrane proteins.

EBGW_OMP: A sequence-based method for accurate prediction of outer membrane proteins

2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, 2014

Outer membrane proteins (OMPs) play important roles in bacterial cellular processes. Discriminating OMPs from different fold types of proteins is helpful for successful prediction of their structures and for exact designs of OMP-targeted drugs. In this paper, we developed a novel prediction method based on primary sequence features and support vector machine (SVM) algorithms. For protein sequences, discriminative features were extracted by the combination of sequence encoding based on grouped weights (EBGW), amino acid compositions and biochemical properties. Feature subsets were screened using Fscore algorithm for training a SVM-based classifier, namely EBGW_OMP. The performance of EBGW_OMP was examined on a benchmark dataset of 1087 proteins. The results show that EBGW_OMP can discriminate OMPs from globular proteins, αhelical membrane proteins or non-OMPs with cross-validated accuracy of 98.0%, 97.6% or 97.9%, respectively, which outperformed existing sequence-based methods. EBGW_OMP also successfully distinguished 681 out of 722 OMPs with 97.0% accuracy in another benchmark dataset of 2657 proteins. Genome-wide tests show that EBGW_OMP has excellent capability of correctly detecting OMPs and is considerable for genomic OMPs prediction. The web server implements EBGW_OMP is freely accessible at http://bioinfo.tmmu.edu.cn /EBGW_ OMP.

A sequence-profile-based HMM for predicting and discriminating barrel membrane proteins

Bioinformatics, 2002

Motivation: Membrane proteins are an abundant and functionally relevant subset of proteins that putatively include from about 15 up to 30% of the proteome of organisms fully sequenced. These estimates are mainly computed on the basis of sequence comparison and membrane protein prediction. It is therefore urgent to develop methods capable of selecting membrane proteins especially in the case of outer membrane proteins, barely taken into consideration when proteome wide analysis is performed. This will also help protein annotation when no homologous sequence is found in the database. Outer membrane proteins solved so far at atomic resolution interact with the external membrane of bacteria with a characteristic β barrel structure comprising different even numbers of β strands (β barrel membrane proteins). In this they differ from the membrane proteins of the cytoplasmic membrane endowed with alpha helix bundles (all alpha membrane proteins) and need specialised predictors.

OMPdb: a Database of β-Barrel Outer Membrane Proteins From Gram-Negative Bacteria

Nucleic Acids Research, 2011

We describe here OMPdb, which is currently the most complete and comprehensive collection of integral β-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family’s domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels.

Fishing new proteins in the twilight zone of genomes: The test case of outer membrane proteins in Escherichia coli K12, Escherichia coli O157:H7, and other Gram-negative bacteria

Protein Science, 2003

We address the problem of clustering the whole protein content of genomes into three different categoriesglobular, all-␣, and all-␤ membrane proteins-with the aim of fishing new membrane proteins in the pool of nonannotated proteins (twilight zone). The focus is then mainly on outer membrane proteins. This is performed by using an integrated suite of programs (Hunter) specifically developed for predicting the occurrence of signal peptides in proteins of Gram-negative bacteria and the topography of all-␣ and all-␤ membrane proteins. Hunter is tested on the well and partially annotated proteins (2160 and 760, respectively) of Escherichia coli K 12 scoring as high as 95.6% in the correct assignment of each chain to the category. Of the remaining 1253 nonannotated sequences, 1099 are predicted globular, 136 are all-␣, and 18 are all-␤ membrane proteins. In Escherichia coli 0157:H7 we filtered 1901 nonannotated proteins. Our analysis classifies 1564 globular chains, 327 inner membrane proteins, and 10 outer membrane proteins. With Hunter, new membrane proteins are added to the list of putative membrane proteins of Gram-negative bacteria. The content of outer membrane proteins per genome (nine are analyzed) ranges from 1.5% to 2.4%, and it is one order of magnitude lower than that of inner membrane proteins. The finding is particularly relevant when it is considered that this is the first large-scale analysis based on validated tools that can predict the content of outer membrane proteins in a genome and can allow cross-comparison of the same protein type between different species.

Prediction of membrane proteins based on classification of transmembrane segments

Protein Engineering Design and Selection, 1998

The number of transmembrane segments often corresponds to a structural or functional class of membrane proteins such as to seven-transmembrane receptors and six-transmembrane ion channels. We have developed a new prediction method to detect the membrane protein class that is defined by the number of transmembrane segments, as well as to locate the transmembrane segments in the amino acid sequence. Each membrane protein class is represented by a model of ordering different types of transmembrane segments. Specifically, we have classified the transmembrane segments in known membrane proteins into five groups (types) using the Mahalanobis distance with the average hydrophobicity and the periodicity of hydrophobicity as a measure of similarity. The discriminant functions derived for these groups were then used to detect transmembrane segments and to match with the models for one-to fourteen-spanning membrane proteins and for globular proteins. Using the test data set of 89 membrane proteins whose transmembrane positions are known by experimental evidence, 61.8% of the proteins and 85.1% of the transmembrane segments were correctly predicted. Because of the new feature to predict membrane protein classes, the method should be useful in the functional assignment of genomic sequences.