The structure of the protein universe and genome evolution (original) (raw)

References

  1. Holm, L. & Sander, C. Mapping the protein universe. Science 273, 595–603 (1996).
    Article ADS CAS Google Scholar
  2. Zhang, C. & DeLisi, C. Protein folds: molecular systematics in three dimensions. Cell. Mol. Life Sci. 58, 72–79 (2001).
    Article CAS Google Scholar
  3. Rost, B. Did evolution leap to create the protein universe? Curr. Opin. Struct. Biol. 12, 409–416 (2002).
    Article CAS Google Scholar
  4. Dayhoff, M. The origin and evolution of protein superfamilies. Fed. Proc. 35, 2132–2138 (1976).
    CAS PubMed Google Scholar
  5. Dayhoff, M. O., Barker, W. C. & Hunt, L. T. Establishing homologies in protein sequences. Methods Enzymol. 91, 524–545 (1983).
    Article CAS Google Scholar
  6. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).
    CAS PubMed Google Scholar
  7. Murzin, A. G. Structural classification of proteins: new superfamilies. Curr. Opin. Struct. Biol. 6, 386–394 (1996).
    Article CAS Google Scholar
  8. Orengo, C. A. et al. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
    Article CAS Google Scholar
  9. Todd, A. E., Orengo, C. A. & Thornton, J. M. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307, 1113–1143 (2001).
    Article CAS Google Scholar
  10. Lo Conte, L., Brenner, S. E., Hubbard, T. J., Chothia, C. & Murzin, A. G. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res. 30, 264–267 (2002).
    Article CAS Google Scholar
  11. Orengo, C. A. et al. The CATH protein family database: a resource for structural and functional annotation of genomes. Proteomics 2, 11–21 (2002).
    Article CAS Google Scholar
  12. Branden, C.-I & Tooze, J. Introduction to Protein Structure (Garland Publishing, New York, 1999).
    Google Scholar
  13. Anantharaman, V., Koonin, E. V. & Aravind, L. Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res. 30, 1427–1464 (2002).
    Article CAS Google Scholar
  14. Anantharaman, V., Koonin, E. V. & Aravind, L. Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. J. Mol. Biol. 307, 1271–1292 (2001).
    Article CAS Google Scholar
  15. Saraste, M., Sibbald, P. R. & Wittinghofer, A. The P-loop—a common motif in ATP- and GTP-binding proteins. Trends Biochem. Sci. 15, 430–434 (1990).
    Article Google Scholar
  16. Koonin, E. V. A superfamily of ATPases with diverse functions containing either classical or deviant ATP-binding motif. J. Mol. Biol. 229, 1165–1174 (1993).
    Article CAS Google Scholar
  17. Aravind, L., Mazumder, R., Vasudevan, S. & Koonin, E. V. Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12, 392–399 (2002).
    Article CAS Google Scholar
  18. Galperin, M. Y., Walker, D. R. & Koonin, E. V. Analogous enzymes: independent inventions in enzyme evolution. Genome Res. 8, 779–790 (1998).
    Article CAS Google Scholar
  19. Martin, A. C. et al. Protein folds and functions. Structure 6, 875–884 (1998).
    Article CAS Google Scholar
  20. Fitch, W. M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970).
    Article CAS Google Scholar
  21. Fitch, W. M. Homology a personal view on some of the problems. Trends Genet. 16, 227–231 (2000).
    Article CAS Google Scholar
  22. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997).
    Article ADS CAS Google Scholar
  23. Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).
    Article CAS Google Scholar
  24. Jordan, I. K., Makarova, K. S., Spouge, J. L., Wolf, Y. I. & Koonin, E. V. Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res. 11, 555–565 (2001).
    Article CAS Google Scholar
  25. Remm, M., Storm, C. E. & Sonnhammer, E. L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001).
    Article CAS Google Scholar
  26. Lespinet, O., Wolf, Y. I., Koonin, E. V. & Aravind, L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 12, 1048–1059 (2002).
    Article CAS Google Scholar
  27. Henikoff, S. et al. Gene families: the taxonomy of protein paralogs and chimeras. Science 278, 609–614 (1997).
    Article ADS CAS Google Scholar
  28. Alexandrov, N. N. & Go, N. Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins. Protein Sci. 3, 866–875 (1994).
    Article CAS Google Scholar
  29. Orengo, C. A., Jones, D. T. & Thornton, J. M. Protein superfamilies and domain superfolds. Nature 372, 631–634 (1994).
    Article ADS CAS Google Scholar
  30. Zuckerkandl, E. The appearance of new structures and functions in proteins during evolution. J. Mol. Evol. 7, 1–57 (1975).
    Article ADS CAS Google Scholar
  31. Chothia, C. One thousand families for the molecular biologist. Nature 357, 543–544 (1992).
    Article ADS CAS Google Scholar
  32. Zhang, C. T. Relations of the numbers of protein sequences, families and folds. Protein Eng. 10, 757–761 (1997).
    Article CAS Google Scholar
  33. Wang, Z. X. A re-estimation for the total numbers of protein folds and superfamilies. Protein Eng. 11, 621–626 (1998).
    Article CAS Google Scholar
  34. Zhang, C. & DeLisi, C. Estimating the number of protein folds. J. Mol. Biol. 284, 1301–1305 (1998).
    Article CAS Google Scholar
  35. Govindarajan, S., Recabarren, R. & Goldstein, R. A. Estimating the total number of protein folds. Proteins 35, 408–414 (1999).
    Article CAS Google Scholar
  36. Wolf, Y. I., Grishin, N. V. & Koonin, E. V. Estimating the number of protein folds and families from complete genome data. J. Mol. Biol. 299, 897–905 (2000).
    Article CAS Google Scholar
  37. Coulson, A. F. & Moult, J. A unifold, mesofold, and superfold model of protein fold use. Proteins 46, 61–71 (2002).
    Article CAS Google Scholar
  38. Kuznetsov, V. A. in Computational and Statistical Approaches to Genomics (eds Zhang, W. & Shmulevich, I.) 125–171 (Kluwer, Boston, 2002).
    Google Scholar
  39. Karev, G. P., Wolf, Y. I., Rzhetsky, A. Y., Berezovskaya, F. S. & Koonin, E. V. in Computational Genomics: from Sequence to Function (eds Galperin, M. Y. & Koonin, E. V.) (Horizon, Amsterdam, in the press).
  40. Karev, G. P., Wolf, Y. I., Rzhetsky, A. Y., Berezovskaya, F. S. & Koonin, E. V. Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol. Biol. (in the press).
  41. Huynen, M. A. & van Nimwegen, E. The frequency distribution of gene family sizes in complete genomes. Mol. Biol. Evol. 15, 583–589 (1998).
    Article CAS Google Scholar
  42. Qian, J., Luscombe, N. M. & Gerstein, M. Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J. Mol. Biol. 313, 673–681 (2001).
    Article CAS Google Scholar
  43. Harrison, P. M. & Gerstein, M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J. Mol. Biol. 318, 1155–1174 (2002).
    Article CAS Google Scholar
  44. Luscombe, N., Qian, J., Zhang, Z., Johnson, T. & Gerstein, M. The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol. 3, research0040.1–0040.7 (2002).
    Article Google Scholar
  45. Barabasi, A. L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
    Article ADS MathSciNet CAS Google Scholar
  46. Bilke, S. & Peterson, C. Topological properties of citation and metabolic networks. Phys. Rev. E 64, 036106-1–036106-5 (2001).
    Article ADS Google Scholar
  47. Barabasi, A. L. Linked: The New Science of Networks (Perseus, New York, 2002).
    Google Scholar
  48. Albert, R. & Barabasi, A. L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002).
    Article ADS MathSciNet Google Scholar
  49. Gisiger, T. Scale invariance in biology: coincidence or footprint of a universal mechanism? Biol. Rev. Camb. Phil. Soc. 76, 161–209 (2001).
    Article CAS Google Scholar
  50. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & Barabasi, A. L. The large-scale organization of metabolic networks. Nature 407, 651–654 (2000).
    Article ADS CAS Google Scholar
  51. Zipf, G. K. Human Behaviour and the Principle of Least Effort (Addison-Wesley, Boston, 1949).
    Google Scholar
  52. Pareto, V. Cours d'Economie Politique (Rouge et Cie, Paris, 1897).
    Google Scholar
  53. Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabasi, A. L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).
    Article ADS CAS Google Scholar
  54. Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).
    Article ADS CAS Google Scholar
  55. Li, H., Helling, R., Tang, C. & Wingreen, N. Emergence of preferred structures in a simple model of protein folding. Science 273, 666–669 (1996).
    Article ADS CAS Google Scholar
  56. Li, H., Tang, C. & Wingreen, N. S. Are protein folds atypical? Proc. Natl Acad. Sci. USA 95, 4987–4990 (1998).
    Article ADS CAS Google Scholar
  57. Rzhetsky, A. & Gomez, S. M. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17, 988–996 (2001).
    Article CAS Google Scholar
  58. Yule, G. U. A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis, F.R.S. Phil. Trans. R. Soc. Lond. B 213, 21–87 (1924).
    Article Google Scholar
  59. Gould, S. J. The Structure of Evolutionary Theory (Harvard Univ. Press, Cambridge, MA, 2002).
    Book Google Scholar
  60. Doolittle, W. F. Lateral genomics. Trends Cell Biol. 9, M5–M8 (1999).
    Article CAS Google Scholar
  61. Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 2124–2129 (1999).
    Article CAS Google Scholar
  62. Doolittle, W. F. You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet 14, 307–311 (1998).
    Article CAS Google Scholar
  63. Koonin, E. V., Makarova, K. S. & Aravind, L. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55, 709–742 (2001).
    Article CAS Google Scholar
  64. Ragan, M. A. Detection of lateral gene transfer among microbial genomes. Curr. Opin. Genet. Dev. 11, 620–626 (2001).
    Article CAS Google Scholar
  65. Marcotte, E. M. et al. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999).
    Article CAS Google Scholar
  66. Enright, A. J., Illopoulos, I., Kyrpides, N. C. & Ouzounis, C. A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).
    Article ADS CAS Google Scholar
  67. Galperin, M. Y. & Koonin, E. V. Who's your neighbor? New computational approaches for functional genomics. Nature Biotechnol. 18, 609–613 (2000).
    Article CAS Google Scholar
  68. Aravind, L. Guilt by association: contextual information in genome analysis. Genome Res. 10, 1074–1077 (2000).
    Article CAS Google Scholar
  69. Koonin, E. V., Aravind, L. & Kondrashov, A. S. The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576 (2000).
    Article CAS Google Scholar
  70. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
    Article ADS CAS Google Scholar
  71. Wolf, Y. I., Brenner, S. E., Bash, P. A. & Koonin, E. V. Distribution of protein folds in the three superkingdoms of life. Genome Res. 9, 17–26 (1999).
    CAS PubMed Google Scholar
  72. Wuchty, S. Scale-free behavior in protein domain networks. Mol. Biol. Evol. 18, 1694–1702 (2001).
    Article CAS Google Scholar
  73. Apic, G., Gough, J. & Teichmann, S. A. An insight into domain combinations. Bioinformatics 17 (Suppl. 1), S83–S89 (2001).
    Article Google Scholar
  74. Bork, P. et al. A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J. 11, 68–76 (1997).
    Article CAS Google Scholar
  75. Derbyshire, D. J. et al. Crystal structure of human 53BP1 BRCT domains bound to p53 tumour suppressor. EMBO J. 21, 3863–3872 (2002).
    Article CAS Google Scholar
  76. Vitkup, D., Melamud, E., Moult, J. & Sander, C. Completeness in structural genomics. Nature Struct. Biol. 8, 559–566 (2001).
    Article CAS Google Scholar
  77. Marchler-Bauer, A. et al. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281–283 (2002).
    Article CAS Google Scholar

Download references