A Novel Approach For Protein Classification Using Fourier Transform (original) (raw)
Related papers
Motif-Based Protein Sequence Classification Using Neural Networks
Journal of Computational Biology, 2005
We present a system for multi-class protein classification based on neural networks. The basic issue concerning the construction of neural network systems for protein classification is the sequence encoding scheme that must be used in order to feed the neural network. To deal with this problem we propose a method that maps a protein sequence into a numerical feature space using the matching scores of the sequence to groups of conserved patterns (called motifs) into protein families. We consider two alternative ways for identifying the motifs to be used for feature generation and provide a comparative evaluation of the two schemes. We also evaluate the impact of the incorporation of background features (2-grams) on the performance of the neural system. Experimental results on real datasets indicate that the proposed method is highly efficient and is superior to other well-known methods for protein classification.
Neural networks for protein classification.
Applied bioinformatics, 2004
This paper describes a biomolecular classification methodology based on multilayer perceptron neural networks. The system developed is used to classify enzymes found in the Protein Data Bank. The primary goal of classification, here, is to infer the function of an (unknown) enzyme by analysing its structural similarity to a given family of enzymes. A new codification scheme was devised to convert the primary structure of enzymes into a real-valued vector. The system was tested with a different number of neural networks, training set sizes and training epochs. For all experiments, the proposed system achieved a higher accuracy rate when compared with profile hidden Markov models. Results demonstrated the robustness of this approach and the possibility of implementing fast and efficient biomolecular classification using neural networks.
A probabilistic neural network approach for protein superfamily classification
2005
The protein superfamily classification problem, which consists of determining the superfamily membership of a given unknown protein sequence, is very important for a biologist for many practical reasons, such as drug discovery, prediction of molecular function and medical diagnosis. In this work, we propose a new approach for protein classification based on a Probabilistic Neural Network and feature selection. Our goal is to predict the functional family of novel protein sequences based on the features extracted from the protein's primary structure i.e., sequence only. For this purpose, the datasets are extracted form Protein Data Bank(PDB), a curated protein family database, are used as training datasets. In these conducted experiments, the performance of the classifier is compared to other known data mining approaches / sequence comparison methods. The computational results have shown that the proposed method performs better than the other ones and looks promising for problems with characteristics similar to the problem.
2010 Annual IEEE India Conference (INDICON), 2010
Classification, or supervised learning, is one of the major data mining processes. Protein classification focuses on predicting the function or the structure of new proteins. This can be done by classifying a new protein to a given family with previously known characteristics. There are many approaches available for classification tasks, such as statistical techniques, decision trees and the neural networks. In this paper, three types of neural networks such as feedforward neural network, probabilistic neural network and radial basis function neural network are implemented. The main objective of the paper is to build up an efficient classifier using neural networks. The measures used to estimate the performance of the classifier are Precision, Sensitivity and Specificity.
IEEE Transactions on NanoBioscience, 2000
Here, we consider a two-level (four classes in level 1 and 27 folds in level 2) protein fold determination problem. We propose several new features and use some existing features including frequencies of adjacent residues, frequencies of residues separated by one residue, and triplets (trio) of amino acid compositions (AACs). The dimensionality of the trio AAC features is drastically reduced using a neural network based novel online feature selection scheme. We also propose new sets of features called trio potential computed using the hydrophobicity values considering only the selected trio AACs. We demonstrate that the proposed features including the selected trio AACs and trio potential have good discriminating power for protein fold determination. As machine learning tools, we use multilayer perceptron network, radial basis function network, and support vector machine. To improve the recognition accuracies further, we use fusion of different classifiers using the same set of features as well as different sets of features. The effectiveness of our schemes is demonstrated with a benchmark structural classification of proteins (SCOP) dataset. Our system achieves 84.9% test accuracy for the SCOP structural class (four classes) determination and 68.6% test accuracy for the fold recognition with 27 folds. In order to demonstrate the consistency of feature sets and fusion schemes, we also perform the fivefold cross-validation experiments. Index Terms-Fusion, majority voting, multilayer perceptron (MLP), online feature selection (OFS), protein structure prediction, radial basis function (RBF), structural classification of proteins (SCOP), support vector machine (SVM), triplets of amino acid composition (trio AAC).
Prediction of protein structural classes by neural network
Biochimie, 2000
Protein structures can be classified as all-, all-, / , + and according to protein chain folding topologies. Previous studies have shown evidence that some correlation between the protein structural class and amino acid composition does exist, and the protein structural class can be predicted to some extent according to amino acid composition alone. In this study we apply Kohonen's self-organization neural network to approach this problem. The results obtained show that the structural class of a protein is considerably correlated with its amino acid composition, and the neural network is a useful tool for predicting the structural classes of proteins.
Protein Sequence Classification Using Probabilistic Motifs and Neural Networks
Lecture Notes in Computer Science, 2003
The basic issue concerning the construction of neural network systems for protein classification is the sequence encoding scheme that must be used in order to feed the network. To deal with this problem we propose a method that maps a protein sequence into a numerical feature space using the matching local scores of the sequence to groups of conserved patterns (called motifs). We consider two alternative schemes for discovering a group of D motifs within a set of K-class sequences. We also evaluate the impact of the background features (2-grams) to the performance of the neural system. Experimental results on real datasets indicate that the proposed method is superior to other known protein classification approaches.
A New Method for Binary Classification of Proteins with Machine Learning
Computational Science and Its Applications – ICCSA 2021, 2021
In this work we set out to find a method to classify protein structures using a Deep Learning methodology. Our Artificial Intelligence has been trained to recognize complex biomolecule structures extrapolated from the Protein Data Bank (PDB) database and reprocessed as images; for this purpose various tests have been conducted with pretrained Convolutional Neural Networks, such as InceptionResNetV2 or InceptionV3, in order to extract significant features from these images and correctly classify the molecule. A comparative analysis of the performances of the various networks will therefore be produced.
Protein classification artificial neural system
Protein Science, 1992
A neural network classification method is developed as an alternative approach to the large database search/ organization problem. The system, termed Protein Classification Artificial Neural System (ProCANS), has been implemented on a Cray supercomputer for rapid superfamily classification of unknown proteins based on the information content of the neural interconnections. The system employs an n-gram hashing function that is similar to the k-tuple method for sequence encoding. A collection of modular back-propagation networks is used to store the large amount of sequence patterns. The system has been trained and tested with the first 2,148 of the 8,309 entries of the annotated Protein Identification Resource protein sequence database (release 29). The entries included the electron transfer proteins and the six enzyme groups (oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases), with a total of 620 superfamilies. After a total training time of seven Cray central processing unit (CPU) hours, the system has reached a predictive accuracy of 90%. The classification is fast (i.e., 0.1 Cray CPU second per sequence), as it only involves a forward-feeding through the networks. The classification time on a full-scale system embedded with all known superfamilies is estimated to be within 1 CPU second. Although the training time will grow linearly with the number of entries, the classification time is expected to remain low even if there is a 10-100-fold increase of sequence entries. The neural database, which consists of a set of weight matrices of the networks, together with the ProCANS software, can be ported to other computers and made available to the genome community. The rapid and accurate superfamily classification would be valuable to the organization of protein sequence databases and to the gene recognition in large sequencing projects.
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2016
The function of any protein depends directly on its secondary and tertiary structure. Proteins can fold into a three-dimensional shape, which is primarily depended on the arrangement of amino acids in the primary structure. In recent years, with the explosive sequencing of proteins, it is unfeasible to perform detailed experimental studies, as these methodologies are very expensive and time consuming. This leaves the structure of the majority of currently available protein sequences unknown. In this paper, a predictive model is therefore presented for the classification of protein sequence's secondary structures, namely alpha helix and beta sheet. The proteins used throughout this study were collected from the Structural Classification of Proteinsextended (SCOPe) database, which contains manually curated information from proteins with known structure. Two sets of proteins are used for all alpha and all beta protein sequences. The first set comprise of sequences with less than 40...