Protein sequence comparison at genome scale - PubMed (original) (raw)
Comparative Study
Protein sequence comparison at genome scale
E V Koonin et al. Methods Enzymol. 1996.
Abstract
An adequate set of computer procedures tailored to address the task of genome-scale analysis of protein sequences will greatly increase the beneficial impact of the genome sequencing projects on the progress of biological research. This is especially pertinent given the fact that, for model organisms, one-half or more of the putative gene products have not been functionally characterized. Here we described several programs that may comprise the core of such a set and their application to the analysis of about 3000 proteins comprising 75% of the E. coli gene products. We find that the protein sequences encoded in this model genome are a rich source of information, with biologically relevant similarities detected for more than 80% of them. In the majority of cases, these similarities become evident directly from the results of BLAST searches. However, methods for motif analysis provide for a significant increase in search sensitivity and are particularly important for the detection of ancient conserved regions. As a result of sequence similarity analysis, generalized functional predictions can be made for the majority of uncharacterized ORF products, allowing efficient focusing of experimental effort. Clustering of the E. coli proteins on the basis of sequence similarity shows that almost one-half of the bacterial proteins have at least one paralog and that the likelihood that a protein belongs to a small or a large cluster depends on the function of this particular protein.
Similar articles
- Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.
Borodovsky M, Rudd KE, Koonin EV. Borodovsky M, et al. Nucleic Acids Res. 1994 Nov 11;22(22):4756-67. doi: 10.1093/nar/22.22.4756. Nucleic Acids Res. 1994. PMID: 7984428 Free PMC article. - Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.
Koonin EV, Tatusov RL, Rudd KE. Koonin EV, et al. Proc Natl Acad Sci U S A. 1995 Dec 5;92(25):11921-5. doi: 10.1073/pnas.92.25.11921. Proc Natl Acad Sci U S A. 1995. PMID: 8524875 Free PMC article. - Functional insights from structural predictions: analysis of the Escherichia coli genome.
Rychlewski L, Zhang B, Godzik A. Rychlewski L, et al. Protein Sci. 1999 Mar;8(3):614-24. doi: 10.1110/ps.8.3.614. Protein Sci. 1999. PMID: 10091664 Free PMC article. - Genome sequences: genome sequence of a model prokaryote.
Koonin EV. Koonin EV. Curr Biol. 1997 Oct 1;7(10):R656-9. doi: 10.1016/s0960-9822(06)00328-9. Curr Biol. 1997. PMID: 9368752 Review. - Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.
Link AJ, Robison K, Church GM. Link AJ, et al. Electrophoresis. 1997 Aug;18(8):1259-313. doi: 10.1002/elps.1150180807. Electrophoresis. 1997. PMID: 9298646 Review.
Cited by
- Lineage-specific gene expansions in bacterial and archaeal genomes.
Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV. Jordan IK, et al. Genome Res. 2001 Apr;11(4):555-65. doi: 10.1101/gr.gr-1660r. Genome Res. 2001. PMID: 11282971 Free PMC article. - Methods of combinatorial optimization to reveal factors affecting gene length.
Bolshoy A, Tatarinova T. Bolshoy A, et al. Bioinform Biol Insights. 2012;6:317-27. doi: 10.4137/BBI.S10525. Epub 2012 Dec 10. Bioinform Biol Insights. 2012. PMID: 23300345 Free PMC article. - A minimal gene set for cellular life derived by comparison of complete bacterial genomes.
Mushegian AR, Koonin EV. Mushegian AR, et al. Proc Natl Acad Sci U S A. 1996 Sep 17;93(19):10268-73. doi: 10.1073/pnas.93.19.10268. Proc Natl Acad Sci U S A. 1996. PMID: 8816789 Free PMC article. - Genome of lumpy skin disease virus.
Tulman ER, Afonso CL, Lu Z, Zsak L, Kutish GF, Rock DL. Tulman ER, et al. J Virol. 2001 Aug;75(15):7122-30. doi: 10.1128/JVI.75.15.7122-7130.2001. J Virol. 2001. PMID: 11435593 Free PMC article. - Analysis of NCL Proteins from an Evolutionary Standpoint.
Muzaffar NE, Pearce DA. Muzaffar NE, et al. Curr Genomics. 2008 Apr;9(2):115-36. doi: 10.2174/138920208784139573. Curr Genomics. 2008. PMID: 19440452 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials