The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences (original) (raw)
- 143 Accesses
- 17 Citations
- Explore all metrics
Abstract
This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79–95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 “super-family” proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length.
The theory leads to three interesting conclusions. First, it predicts that a _tetra_-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079–2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377–1382).
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime Subscribe now
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
References
- Barker WC, George DG, Hunt LT, Garavelli JS (1991) The PIR protein sequence database. Nucleic Acids Res Suppl 19:2231–2236
Google Scholar - Blake CCF (1983) Exons—present from the beginning? Nature 306:535–537
Google Scholar - Bossi L, Roth JR (1980) The influence of codon context on genetic code translation. Nature 286:123–127
Google Scholar - Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990a) The signal for termination of protein synthesis in prokaryotes. Nucleic Acids Res 18:2079–2086
Google Scholar - Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990b) Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Res 18:6339–6345
Google Scholar - Cavalier-Smith T (1985) Selfish DNA and the origin of introns. Nature 315:283–284
Google Scholar - Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392
Google Scholar - Darnell JE (1978) Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202:1257–1260
Google Scholar - Dill KA (1985) Theory of the folding and stability of globular proteins. Biochemistry 24:1501–1509
Google Scholar - Doolittle RF (1979) Protein evolution. In: Neurath H, Hill RL (eds) The proteins, vol IV. Academic Press, New York, pp 1–118
Google Scholar - Doolittle RF (1991) Counting and discounting the universe of exons. Science 253:677–679
Google Scholar - Doolittle WF (1978) Genes in pieces: were they ever together? Nature 272:581–582
Google Scholar - Doolittle WF (1990) Understanding introns: origins and functions. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 43–62
Google Scholar - Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382
Google Scholar - Dorit RL, Gilbert W (1991) The limited universe of exons. Cur Opinion Struc Biol 1:973–977
Google Scholar - Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science 152:363–366
Google Scholar - Flory PJ (1953) Principles of polymer chemistry. Cornell University Press, Ithaca, NY, pp 1–672
Google Scholar - Gilbert W (1978) Why genes in pieces? Nature 271:501
Google Scholar - Hanyu N, Kuchino Y, Nishimura S (1986) Dramatic events in ciliate evolution: alteration of UAA and UAG termination codons to glutamine codons due to anticodon mutations in two Tetrahymena tRNAs(Gln). EMBO 15:1307–1311
Google Scholar - Hawkins JD (1988) A survey on intron and exon lengths. Nucleic Acids Res 2:9893–9908
Google Scholar - Holland SK, Blake CCF (1990) Proteins, exons, and molecular evolution. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 10–42
Google Scholar - Iranpour R, Chacon P (1991) Basic stochastic processes. Macmillan, New York, pp 1–258
Google Scholar - Jukes TH (1982) Possible evolutionary steps in the genetic code. Biochem Biophys Res Comm 107:225–228
Google Scholar - Jukes TH, Osawa S, Moto A, Lehman N (1987) Evolution of anticodons: variations in the genetic code. Cold Spring Harbor Sympos Quant Biol 52:769–776
Google Scholar - Lau KF, Dill KA (1990) Theory for protein mutability and biogenesis. Proc Natl Acad Sci USA 87:638–642
Google Scholar - McLachlan AD (1972) Repeating sequences and gene duplication in proteins. J Mol Biol 64:417–437
Google Scholar - Mound J (1971) Chance and necessity. An essay on the natural philosophy of modern biology. Alfred A. Knopf, New York, pp 1–199
Google Scholar - Naora H, Deacon NJ (1982) Relationship between total size of exons and introns in protein-coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200
Google Scholar - Nei M, Chakraborty R, Fuerst PA (1976) Infinite allele model with varying mutation rate. Proc Natl Acad Sci USA 73:4164–4168
Google Scholar - Osawa S, Jukes TH (1988) Evolution of the genetic code as affected by anticodon content. Trends Genet 4:191–198
Google Scholar - Patthy L (1991) Exons—original building blocks of proteins? BioEssays 13:187–192
Google Scholar - Ross SM (1989) Introduction to probability models, 4th ed. Academic Press, San Diego, pp 1–544
Google Scholar - Rossman MG (1990) Introductory comments on the function of domains in protein structure. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 3–9
Google Scholar - Senapathy P (1986) Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proc Natl Acad Sci USA 83:2133–2137
Google Scholar - Senapathy P (1988) Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proc Natl Acad Sci USA 85:1129–1133
Google Scholar - Shakhnovich EL, Gutin AM (1989) Formation of unique structure in polypeptide chains: theoretical investigation with the aid of a replica approach. Biophys Chem 34:187–199
Google Scholar - Shakhnovich EL, Gutin AM (1990) Implications of thermodynamics of protein folding for evolution of primary sequences. Nature 346:773–775
Google Scholar - Sharp PA (1985) On the origin of RNA splicing and introns. Cell 42:397–400
Google Scholar - Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55
Google Scholar - Sommer SS, Cohen JE (1980) The size distributions of proteins, mRNA, and nuclear RNA. J Mol Evol 15:37–57
Google Scholar - Tate WP, Brown CM (1992) Translational termination: “stop” for protein synthesis or “pause” for regulation of gene expression? Biochemistry 31:2443–2450
Google Scholar - Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948
Google Scholar - White SH (1992) The amino acid preferences of small proteins: implications for protein stability and evolution. J Mol Biol 227:991–995
Google Scholar - White SH, Jacobs RE (1990) Statistical distribution of hydrophobic residues along the length of protein chains—implications for protein folding and evolution. Biophys 157:911–921
Google Scholar - White SH, Jacobs RE (1993) The evolution of proteins from random amino acid sequences I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 36:79–95.
Google Scholar
Author information
Authors and Affiliations
- Department of Physiology and Biophysics, University of California, 92717, Irvine, CA, USA
Stephen H. White
Authors
- Stephen H. White
You can also search for this author inPubMed Google Scholar
Rights and permissions
About this article
Cite this article
White, S.H. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences.J Mol Evol 38, 383–394 (1994). https://doi.org/10.1007/BF00163155
- Received: 24 June 1992
- Revised: 24 May 1993
- Issue Date: April 1994
- DOI: https://doi.org/10.1007/BF00163155