Multiple Sequence Comparison-A Peptide Matching Approach 1 (original) (raw)

Multiple sequence comparison — a peptide matching approach

Theoretical Computer Science, 1997

Abstract: We present in this paper a peptide matching approach to the multiple comparison of a setof protein sequences. This approach consists in looking for all the words that are common to qof these sequences, where q is a parameter.

A sequential development towards a unified approach to protein sequence comparison based on classified groups of amino acids

International Journal of Engineering & Technology, 2018

The methods of comparison of protein sequences based on different classified groups of amino acids add a significant contribution to the literature of protein sequence comparison. But the methods vary with choice of different classified groups. Therefore, the purpose of the paper is to develop a unified approach towards the analysis of protein sequence comparison based on classification of amino acids in different groups of different cardinality. The paper considers 4 group classification, 5 group classification and 6 group classifications of amino acids, and in each case it applies the unified method for comparing two types of protein sequences, viz., 9 proteins of ND5 category and 50 Corona virus Spike Proteins. The results agree with those, which were obtained earlier by other methods based on classified groups of amino acids. An-yway it is found that the present unified formula is relatively simpler and fundamentally different from the earlier ones. Further, it can be applied co...

An algorithm for the identification of similar oligopeptides between amino acid sequences

Bioinformatics, 1993

Molecular mimicry is the origin of common structural patterns in sequences of viral and host proteins, and it appears to be related to the development of autoimmune diseases. The identification of structural molecular similarities among viral and host proteins is thus very relevant in the development of engineered antiviral vaccines to avoid potentially dangerous effects. In this respect identifying pairs of similar oligopeptides between given proteins, independently of the overall degree of similarity of their amino acid sequences, is of interest. To this aim we have designed and implemented an algorithm capable of finding and classifying (with respect to their statistical significance) all possible pairs of similar oligopeptides between two proteins irrespective of length, number, location and ordering of the pairs along the sequences. The algorithm is very efficient and much more suited for this kind of local search than standard alignment programs. The latter, dealing with the sequences as a whole, are, in these cases, of very limited applicability. We have used the algorithm to compare a glycoprotein of the human immunodeficiency virus (HIV) type 1 and with the ^-chains of human leukocyte antigen (HLA). Besides a previously identified peptide, we have found a new peptide located in the fusion site of HIV that shares high similarity with the transmembrane domains of HLA.

PatMatch: a program for finding patterns in peptide and nucleotide sequences

2005

Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as ciselements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_ for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497-498], with nondeterministicreverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265-1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch.

A Substitution and Alignment Free Similarity Measure for Protein Sequences

The literature reports a large number of approaches for measuring the similarity between protein sequences. Most of these approaches estimate this similarity using alignment-based techniques that do not necessarily yield biologically plausible results, for two reasons. First, for the case of non-alignable (i.e., not yet definitively aligned and biologically approved) sequences such as multi-domain, circular permutation and tandem repeat protein sequences, alignment-based approaches do not succeed in producing biologically plausible results. This is due to the nature of the alignment, which is based on the matching of subsequences in equivalent positions, while non-alignable proteins often have similar and conserved domains in non-equivalent positions. Second, the alignment-based approaches lead to similarity measures that depend heavily on the parameters set by the user for the alignment (e.g., gap penalties and substitution matrices). For easily alignable protein sequences, it'...

An algorithm for random match probability calculation from peptide sequences

Forensic Science International: Genetics, 2020

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

The limits of protein sequence comparison?

Current Opinion in Structural Biology, 2005

Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized.

Investigation of protein sequence similarity based on physio-chemical properties of amino acids

2020

Comparison of protein sequence similarity is a significant study. By virtue of this method, we can expose the evolutionary relationship among protein sequences. So, it is required to design effective computational algorithms that can compare the similarities among the colossal amount of sequences. The aim of this research is to develop efficient tools in the field of protein sequences comparison and phylogenetic study. The proposed method performs a feature generation process based on the physiochemical properties of amino acids that best describes the revolutionary relationship among the species in a protein family. The protein sequences are transferred into an Eighty dimensional feature vector among the group of amino acids. Finally, four different datasets were used to validate the accuracy of the proposal and a correlation coefficient of 0.94417 of ND5 dataset using ClustalW has been found. This is much higher than some of the methods. At last the result explains the effectiveness in the similarity analysis among genome sequences.

Defining a similarity threshold for a functional protein sequence pattern: The signal peptide cleavage site

Proteins-structure Function and Bioinformatics, 1996

When preparing data sets of amino acid or nucleotide sequences it is necessary to exclude redundant or homologous sequences in order to avoid overestimating the predictive performance of an algorithm. For some time methods for doing this have been available in the area of protein structure prediction. We have developed a similar procedure based on pair-wise alignments for sequences with functional sites. We show how a correlation coefficient between sequence similarity and functional homology can be used to compare the efficiency of different similarity measures and choose a non-arbitrary threshold value for excluding redundant sequences. The impact of the choice of scoring matrix used in the alignments is examined. We demonstrate that the parameter determining the quality of the correlation is the relative entropy of the matrix, rather than the assumed (PAM or identity) substitution model. Results are presented for the case of prediction of cleavage sites in signal peptides. By inspection of the false positives, several errors in the database were found. The procedure presented may be used as a general outline for finding a problem-specific similarity measure and threshold value for analysis of other functional amino acid or nucleotide sequence patterns.