Structural Classification of Proteins through Amino Acid Sequence using Interval Type-2 Fuzzy Logic System (original) (raw)

Multi-Output Interval Type-2 Fuzzy Logic System for Protein Secondary Structure Prediction

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2015

A new multi-output interval type-2 fuzzy logic system (MOIT2FLS) is introduced for protein secondary structure prediction in this paper. Three outputs of the MOIT2FLS correspond to three structure classes including helix, strand (sheet) and coil. Quantitative properties of amino acids are employed to characterize twenty amino acids rather than the widely used computationally expensive binary encoding scheme. Three clustering tasks are performed using the adaptive vector quantization method to construct an equal number of initial rules for each type of secondary structure. Genetic algorithm is applied to optimally adjust parameters of the MOIT2FLS. The genetic fitness function is designed based on the Q3 measure. Experimental results demonstrate the dominance of the proposed approach against the traditional methods that are Chou-Fasman method, Garnier-Osguthorpe-Robson method, and artificial neural network models.

Protein Family Recognition based on Fuzzy Logic

In the rise rabid research related to biometrics, bio-informatics and genome; many researches, fields, and issues are still undergo any uncertainties. One of the hottest areas in this field of research is the proteins informatics, that is relates the protein data with the modern information technology and it includes portions mapping and classification. This paper contributes an intelligent system which consists of adaptive neuro-fuzzy computations that is able to recognize and classify the proteins in families. An intelligent trainer will be structured based on Perceptron neural network in order to build an intelligent fuzzy inference system that is capable of predicting and classifying that data into categories according to the function of each protein. The structured system preprocesses that data set and extracts unique features from it. The system was built using a highly developed programming language. This paper will clearly show the results that such system achievement about 92% of accuracy when over 1000 inputs sequence of the validation sample was processed.

An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection

Fuzzy Sets and Systems, 2005

In this article, we propose an efficient technique for classifying amino acid sequences into different superfamilies. The proposed method first extracts 20 features from a set of training sequences. The extracted features are such that they take into consideration the probabilities of occurrences of the amino acids in the different positions of the sequences. Thereafter, a genetic fuzzy clustering approach is used to automatically evolve a set of prototypes representing each class. The characteristic of this clustering method is that it does not require the a priori information about the number of clusters, and is also able to come out of locally optimal configurations. Finally, the nearest neighbor rule is used to classify an unknown sequence into a particular superfamily class, based on its proximity to the prototypes evolved using the genetic fuzzy clustering technique. This results in a significant improvement in the time required for classifying unknown sequences. Results for three superfamilies, namely globin, trypsin and ras, demonstrate the effectiveness of the proposed technique with respect to the case where all the training sequences are considered for classification using the same set of features. Comparison with the well-known technique BLAST also shows that the proposed method provides a significant improvement in terms of the time required for classification while providing comparable classification performance.

Generating fuzzy rules for protein classification

2008

ABSTRACT. This paper considers the generation of some interpretable fuzzy rules for assigning an amino acid sequence into the appropriate protein superfamily. Since the main objective of this classifier is the interpretability of rules, we have used the distribution of amino acids in the sequences of proteins as features. These features are the occurrence probabilities of six exchange groups in the sequences. To generate the fuzzy rules, we have used some modified versions of a common approach. The generated rules are simple and understandable, especially for biologists. To evaluate our fuzzy classifiers, we have used four protein superfamilies from UniProt database. Experimental results show the comprehensibility of generated fuzzy rules with comparable classification accuracy. 1.

Helix Segment Assignment In Proteins Using Fuzzy Logic

IRANIAN JOURNAL of …, 2007

The automatic assignment of protein secondary struc-ture from three dimensional coordinates is an essential step in the characterization of protein structure. Although, the recognition of secondary structures such as alpha-helices and beta-sheets seem straightfor-ward, but there are ...

Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity

IEEE Access

In the post-genome era, it is becoming more complex to process high dimensional, low-instance available, and nonlinear biological datasets. This paper aims to address these characteristics as they have adverse effects on the performance of predictive models in bioinformatics. In this paper, an interval type-2 Takagi Sugeno fuzzy predictive model is proposed in order to manage high-dimensionality and nonlinearity of such datasets which is the common feature in bioinformatics. A new clustering framework is proposed for this purpose to simplify antecedent operations for an interval type-2 fuzzy system. This new clustering framework is based on overlapping regions between the clusters. The cluster analysis of partitions and statistical information derived from them has identified the upper and lower membership functions forming the premise part. This is further enhanced by adapting the regression version of support vector machines in the consequent part. The proposed method is used in experiments to quantitatively predict affinities of peptide bindings to biomolecules. This case study imposes a challenge in post-genome studies and remains an open problem due to the complexity of the biological system, diversity of peptides, and curse of dimensionality of amino acid index representation characterizing the peptides. Utilizing four different peptide binding affinity datasets, the proposed method resulted in better generalization ability for all of them yielding an improved prediction accuracy of up to 58.2% on unseen peptides in comparison with the predictive methods presented in the literature. Source code of the algorithm is available at https://github.com/sekerbigdatalab. INDEX TERMS Interval type-2 fuzzy systems, support vector regression, overlapping clusters, peptide binding affinity, clustering, high-dimensionality.

Protein motif extraction with neuro-fuzzy optimization

2002

Motivation: It is attempted to improve the speed and flexibility of protein motif identification. The proposed algorithm is able to extract both rigid and flexible protein motifs. Results: In this work, we present a new algorithm for extracting the consensus pattern, or motif, from a group of related protein sequences. This algorithm involves a statistical method to find short patterns with high frequency and then neural network training to optimize the final classification accuracies. Fuzzy logic is used to increase the flexibility of protein motifs. C2H2 Zinc Finger Protein and epidermal growth factor protein sequences are used to demonstrate the capability of the proposed algorithm in finding motifs.

Extraction and optimization of fuzzy protein sequences classification rules using GRBF neural networks

2003

Abstract—Traditionally, two protein sequences are classified into the same class if their feature patterns have high homology. These feature patterns were originally extracted by sequence alignment algorithms, which measure similarity between an unseen protein sequence and identified protein sequences. Neural network approaches, while reasonably accurate at classification, give no information about the relationship between the unseen case and the classified items that is useful to biologist.

Fuzzy clustering of physicochemical and biochemical properties of amino Acids

Amino Acids, 2012

In this article, we categorize presently available experimental and theoretical knowledge of various physicochemical and biochemical features of amino acids, as collected in the AAindex database of known 544 amino acid (AA) indices. Previously reported 402 indices were categorized into six groups using hierarchical clustering technique and 142 were left unclustered. However, due to the increasing diversity of the database these indices are overlapping, therefore crisp clustering method may not provide optimal results. Moreover, in various large-scale bioinformatics analyses of whole proteomes, the proper selection of amino acid indices representing their biological significance is crucial for efficient and error-prone encoding of the short functional sequence motifs. In most cases, researchers perform exhaustive manual selection of the most informative indices. These two facts motivated us to analyse the widely used AA indices. The main goal of this article is twofold. First, we present a novel method of partitioning the bioinformatics data using consensus fuzzy clustering, where the recently proposed fuzzy clustering techniques are exploited. Second, we prepare three high quality subsets of all available indices. Superiority of the consensus fuzzy clustering method is demonstrated quantitatively, visually and statistically by comparing it with the previously proposed hierarchical clustered results. The processed AAindex1 database, supplementary material and the software are available at http://sysbio.icm.edu.pl/ aaindex/. Keywords Amino acids Á AAindex database Á Consensus fuzzy clustering Á High-quality indices Á Validity measures Á Physico-chemical features