Universally conserved translation initiation factors (original) (raw)

Abstract

The process by which translation is initiated has long been considered similar in Bacteria and Eukarya but accomplished by a different unrelated set of factors in the two cases. This not only implies separate evolutionary histories for the two but also implies that at the universal ancestor stage, a translation initiation mechanism either did not exist or was of a different nature than the extant processes. We demonstrate herein that (i) the “analogous” translation initiation factors IF-1 and eIF-1A are actually related in sequence, (ii) the “eukaryotic” translation factor SUI1 is universal in distribution, and (iii) the eukaryotic/archaeal translation factor eIF-5A is homologous to the bacterial translation factor EF-P. Thus, the rudiments of translation initiation would seem to have been present in the universal ancestor stage. However, significant development and refinement subsequently occurred independently on both the bacterial lineage and on the archaeal/eukaryotic line.


The three major cellular information processing systems (replication, transcription, and translation) differ greatly in the degree to which their componentry is universally conserved (1, 2). At the one extreme is genome replication, where not even the central DNA polymerase is orthologous between the Bacteria and the Archaea/Eukarya (1, 3). At the other is translation, where most of the componentry is universal in distribution: ribosomal RNAs are strongly conserved in both primary and secondary structure among all organisms (1, 4); the majority of the ribosomal proteins are as well, as are most of the elongation factors, the tRNAs, and aminoacyl-tRNA synthetases (1, 2). The only major exception appears to be translation initiation.

Although translation initiation is functionally similar in all organisms, the underlying componentry seems to be quite dissimilar between the Bacteria and the Archaea/Eukarya. The bacterial mechanism seems simple; three (single subunit) proteins are involved: initiation factors IF-1, IF-2, and IF-3 (5). By contrast, eukaryotic initiation is complex, involving a larger number of protein factors, many of which comprise multiple subunits (6, 7).

The relationship between these two translation initiation systems addresses a central evolutionary question: How advanced in their evolutionary development were the various cellular information processing systems at the time the universal ancestor gave rise to the primary lines of organismal descent; more generally, what was the nature of this entity we call the universal ancestor? Was it more rudimentary than the cells we study today—and, if so, in what ways (2, 8, 9)?

The publication of the Methanococcus jannaschii genome (1012) allowed the first complete (comparative) examination of the componentry of archaeal information processing systems. Before this, our understanding of archaeal translation initiation was at best rudimentary (13). The fact that archaeal mRNAs are polycistronic, uncapped, lack long poly(A) tails, and have Shine–Dalgarno sequences suggested (erroneously) that the archaeal process resembled the bacterial one. Yet, the M. jannaschii genome showed that archaeal translation initiation is remarkably similar to that seen in eukaryotes (10). Homologs of eukaryotic factors eIF-1A, eIF-2 (all three subunits), two of the five eIF-2B subunits (α and δ), eIF-4A, and eIF-5A were reported (10), a list that covers most eukaryotic factors (except for those involved with mRNA cap recognition).

The three recognized bacterial translation initiation factors have functional counterparts among the eukaryotic factors—although function in all cases is to one degree or another poorly defined. Bacterial IF-1 is thought to enhance the rate of ribosomal subunit dissociation and to stimulate the IF-2-dependent fMet-tRNAiMet binding to the small ribosomal subunit (5), a functionality quite like that ascribed to eIF-1A (14, 15). Bacterial IF-2 is the “central player” in translation initiation, for it associates the initiator Met-tRNAiMet (and GTP) with the small ribosomal subunit (5). Eukaryotic eIF-2, a more complex trimeric protein acts similarly (6), although the two functions differ somewhat in their mechanistic details (e.g., the stage at which GTP or mRNA participates into the complex) (5, 6). Finally, IF-3 (16) acts as a subunit antiassociation factor and promotes the selection of the initiator Met-tRNAiMet, a function quite similar to that of the eukaryotic SUI1 (17).

In Eukarya, the subsequent binding of the 43S preinitiation complex to the mRNA and the scanning for the start codon is mediated by eIF-4 (eIF-4A, eIF-4B, eIF-4E, eIF-4G, and eIF-4F) (6, 7). Finally, eIF-5 [whose N-terminal domain is homologous to the C-terminal domain of the eIF-2β (N.C.K., unpublished data)] is used to promote the hydrolysis of GTP bound to the preinitiation complex, the release of eIF-2⋅GDP, and the joining of the 60S ribosomal subunit (19). [In Bacteria, GTP hydrolysis is performed by IF-2 (20)].

Despite functional resemblance, the three bacterial factors are thought to be unrelated (specifically) in sequence to their eukaryotic counterparts. The two systems are seen as merely analogous (i.e., each evolved separately), with the strong attached implication that translation initiation at the universal ancestor stage was either nonexistent or very different (presumably much simpler) than the extant processes.

Herein, we show what the knowledge of archaeal translation initiation has made obvious, i.e., that there exists far more homology between these bacterial and archaeal/eukaryotic processes than had previously been thought—which dramatically changes the way we look at the evolution of that process and, for that matter, at the nature of the universal ancestor.

In the present communication, we consider the relationship and the phylogenetic distribution of the initiation factors IF-1/eIF-1A, eIF-1/SUI1, and EF-P/eIF-5A; the more complex IF-2/eIF-2/eIF-2B relationships will be considered separately.

MATERIALS AND METHODS

Databases.

The nonredundant protein sequence database at the National Center of Biotechnology Information was used for all of the sequence similarity searches. The complete database of Methanobacterium thermoautotrophicum (22) gene products was obtained from the web page of Genome Therapeutics, at http://www.cric.com/htdocs/sequences/methanobacter/abstract.html. The Archaeoglobus fulgidus (23) complete sequence database was retrieved from The Institute for Genomic Research at http://www.tigr.org. A query for the mere presence or absence of a sequence from the unpublished genomes of Deinococcus radiodurans, Treponema pallidum, and Borrelia burgdorferi was done through the blast server for The Institute for Genomic Research at http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-tigrbl. (A list of genome databases is available at http://geta.life.uiuc.edu/~nikos/genomes.html.)

Database searches were performed with blast (24) and wu-blast 2.0 (25) programs, by using the BLOSUM62 substitution matrix and default parameters, at http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-newblast?Jform=1.

Access to bibliographical databases was greatly facilitated by using entrez (26) (at http://www.ncbi.nlm.nih.gov/Entrez/), and srs (27) (at http://srs.ebi.ac.uk:5000/).

Sequence Analysis.

Multiple sequence alignments were performed by clustalw (28) and the pileup program of the GCG package, version 8.1 from the University of Wisconsin (29). Visualization of the conserved residues was facilitated by the boxshade (version 3.21) program, at http://ulrec3.unil.ch/software/BOX_form.html. Sequence profiles (30) were generated from the multiple sequence alignment of individual families and used to search protein sequence databases. Motif identification searches were used with the meta-MEME motif search tool (31) (see also http://www.sdsc.edu/MEME/meme/website/).

RESULTS AND DISCUSSION

Bacterial Translation Factor IF-1 and Eukaryotic eIF-1A Are Homologs.

As stated above, despite their general functional similarity—i.e., each facilitates ribosomal subunit dissociation and stabilizes Met-tRNAiMet and mRNA binding to the small ribosomal subunit (5, 14, 15)—bacterial IF-1 and eukaryotic eIF-1A have been considered analogs, not homologs (32). However, this is not the case. Homology between the IF-1 and eIF-1A sequences is immediately apparent from profile searches (data not shown) and the alignment of Fig. 1. The archaeal and eukaryotic sequence are on average 38% identical, whereas archaeal and bacterial are 30% and bacterial and eukaryotic are 21%. However, within each of the three major groups, identities are greater than 50%.

Figure 1.

Figure 1

Multiple sequence alignment of bacterial IF-1, eukaryotic eIF-1A, and their archaeal homologs (aIF-1A). Positions in which sequence conservation is >50% identity are highlighted in black. The last line is a consensus of the six S1 motifs found in E. coli ribosomal protein S1: highlighted uppercase type denotes those residues that are also highly conserved in the IF-1/eIF-1A/aIF-1A family, whereas highlighted lowercase type denotes the residues for which a related amino acid occurs in the family. (Dots denote nonconserved positions in the S1 consensus, and dashes denote gaps/insertions of more than one residue.) The horizontal arrows display the positions of the β-strands according to the three-dimensional structure (33). Protein names (and accession number—when different) are as follows: IF1_ECOLI, E. coli IF-1; IF1-SYNEC, Synechocystis sp. IF-1 (EMBL:D90905_47); IF1_BACSU, Bacillus subtilis IF-1; IF1-DEIRA, unpublished ORF from Deinococcus radiodurans; IF1A_METJA, Methanococcus jannaschii aIF-1A; IF1A-METTH, Methanobacterium thermoautotrophicum unpublished ORF; IF1A-ARCFU, Archaeoglobus fulgidus unpublished ORF; YRP2_THEAC, Thermoplasma acidophilum hypothetical protein; IF1A_WHEAT, Triticum aestivum eIF-1A; IF1A_HUMAN, human eIF-1A; IF1A_RABIT, Oryctolagus cuniculus eIF-1A; IF1A_YEAST, Saccharomyces cerevisiae eIF-1A.

Given that the archaeal IF-1A sequences are unique but more similar to their eukaryotic than to their bacterial counterparts, we support the previously proposed renaming of them to archaeal IF-1A (aIF-1A) (32).

A well-known sequence motif, characteristic of bacterial ribosomal protein S1, also has been reported in IF-1 and eIF-2α, as well as in a number of other RNA-binding proteins (34, 35). The (solution) structures for Escherichia coli IF-1 (36) and the S1 domain (33) are built around a five-stranded antiparallel β-barrel, a structure that displays striking resemblance to proteins belonging to the OB (oligonucleotide/oligosaccharide binding) family, many members of which are single-stranded nucleic acid-binding proteins (37). However, the composition of the S1 motif correlates only weakly with the conservation patterns of the IF-1/eIF-1A alignment (see Fig. 1), not only among Bacteria, Archaea, and Eukarya but even within the Bacteria themselves. These conservation patterns would seem, then, to suggest functionality over and above simple nucleic acid binding.

Eukaryotic Initiation Factor eIF-1/SUI1 Occurs in Archaea and Some Bacteria.

Although mammalian eIF-1 is a single subunit factor (38) and yeast SUI1 is one of the eight eIF-3 subunits (17), the two are very similar at the sequence level (59% identity), and both are reported to have similar functions, i.e., stabilizing mRNA and initiator tRNA binding to the 40S ribosomal subunit (16).

As the alignment of Fig. 2 shows, all known archaeal genomes contain homologs of eIF-1/SUI1 (11). A few bacterial examples are known as well [i.e., enteric bacteria (39, 40) and cyanobacteria (41)], although the majority of the known bacterial genomes [i.e., _Helicobacter pylori_ (42), Mycoplasma genitalium (43), M. pneumoniae (44), Treponema pallidum, Borrelia burgdorferi, and _Deinococcus radiodurans_] contain none.

Figure 2.

Figure 2

Multiple sequence alignment of the eukaryotic SUI1/eIF-1 protein family with their archaeal and bacterial homologs. Protein names (and accession number—when different) are as follows: YCIH_ECOLI, E. coli hypothetical protein; YCIH_SALTY, Salmonella typhimurium hypothetical protein; YCIH_HAEIN, Haemophilus influenzae hypothetical protein; YCIH-SYNEC, Synechocystis sp. hypothetical protein (European Molecular Biology Laboratory accession no. D64003_48); SUI1-METJA, M. jannaschii hypothetical ORF MJ0463 (Protein Information Resource accession no. G64357); YRP1_METVA, Methanococcus vannielii hypothetical protein; SUI1-METTH, M. thermoautotrophicum unpublished ORF; SUI1-ARCFU, A. fulgidus unpublished ORF; SUI1_HUMAN, human SUI1; SUI1_ANOGA, Anopheles gambiae SUI1; SUI1_YEAST, Saccharomyces cerevisiae SUI1; SUI1_ARATH, Arabidopsis thaliana SUI1.

According to Table 1, percent identities between domains are more or less the same, in the 25–30% identity range, a break with the now-familiar pattern of the highest similarity being between archaeal and eukaryotic components of the translation components. Within each major group, the percent identities are greater than 55%, except for the bacterial examples (see Table 1). The 38% identity between the enteric Bacteria and the cyanobacterium questions their orthology. Although it is reasonable to assume that Archaea possess a eIF-1/SUI1 type of function (based on the overall similarity of archaeal and eukaryotic translation initiation), one wonders about the functional significance of the few phylogenetically scattered bacterial examples. Given that the bacterial versions of eIF-1/SUI1 are no more diverged from the archaeal and eukaryotic versions than these are from one another, the idea that this molecule is involved in translation initiation in some Bacteria (despite its being paralogous therein) must be seriously considered and subject to experimental test.

Table 1.

Pairwise amino acid percentage identity between the eukaryotic SUI1 family and its archaeal and bacterial homologs

1 2 3 4 5 6 7 8 9
1 (YCIH_ECOLI) 38.7 34.0 29.4 32.2 35.0 31.0 26.0 30.7
2 (YCIH-SYNEC) 38.7 30.9 30.6 25.2 29.0 30.7 35.5 31.3
3 (SUI1-METJA) 34.0 30.9 78.4 67.0 60.4 31.9 30.5 28.4
4 (YPRI_METVA) 29.4 30.6 78.4 61.0 55.4 35.4 29.3 32.6
5 (SUI1-METTH) 32.2 25.2 67.0 61.0 54.5 26.3 27.0 23.7
6 (SUI1-ARCFU) 35.0 29.0 60.4 55.4 54.5 29.1 25.5 28.0
7 (SUI1_HUMAN) 31.0 30.7 31.9 35.4 26.3 29.1 59.4 58.5
8 (SUI1_YEAST) 26.0 35.5 30.5 29.3 27.0 25.5 59.4 53.7
9 (SUI1_ARATH) 30.7 31.3 28.4 32.6 23.7 28.0 58.5 53.7

It has been suggested that SUI1 functions in concert with eIF-2 to confine Met-tRNAiMet recognition to the AUG initiator codon, because mutants of SUI1 allow tRNAiMet to initiate protein synthesis at UUG codons as well (45). Because it is believed that there may be a functional relationship between the eukaryotic and the bacterial start-site selection processes (46), the presence of SUI1 homologs in some bacterial genomes supports a similar start-site recognition mechanisms among Archaea, Eukarya, and some Bacteria.

Eukaryotic translation initiation factor eIF-5A appears to promote the synthesis of the initial peptide bond in mRNA translation (47) and is unique in being the only known cellular protein that contains the unusual amino acid hypusine, formed by a posttranslational modification of a specific lysine residue (48). This (modified) protein has been identified in all Eukarya and Archaea so far analyzed, but it appears to be absent from Bacteria (49). However, a similar facilitating function may be performed in Bacteria by translation factor EF-P (50).

Fig. 3 shows that, despite the lack of hypusine in the bacterial case, bacterial EF-P and eukaryotic/archaeal eIF-5A are homologs. Although sequence similarity between the eukaryotic and bacterial sequences are relatively low (in the 20% range), both the alignment of Fig. 3 and profile searches (data not shown) generated with either the Bacteria or the archaeal/eukaryotic family make clear their relatedness.

Figure 3.

Figure 3

Multiple sequence alignment of the eukaryotic translation initiation eIF-5A protein family with their archaeal homologs and the bacterial translation elongation EF-P family. The minimum domain of eukaryotic IF-5A needed for hypusine modification (51) is boxed; the asterisk denotes the lysine residue that is posttranslationally modified to hypusine. Protein names (and accession number—when different) are as follows: IF5A_METJA, M. jannaschii aIF-5A; IF5A-METTH, M. thermoautotrophicum unpublished ORF; IF5A-ARCFU, A. fulgidus unpublished ORF; IF5A_SULAC, Sulfolobus acidocaldarius aIF-5A; IF5A_HUMAN, human eIF-5A; IF5A_DICDI, Dictyostelium discoideum eIF-5A; IF51_YEAST, S. cerevisiae eIF-5A; IF52_CAEEL, Caenorhabditis elegans eIF-5A; EFP_BACSU, B. subtilis EF-P; EFP_SYNY3, Synechocystis sp. EF-P; EFP_ECOLI, E. coli EF-P; EFP_HELPY, Helicobacter pylori EF-P.

The archaeal IF-5A family displays an average of 32% and 26% identity to the eukaryotic and bacterial families, respectively, whereas bacterial and eukaryotes display an average of only 20%. The highest density of sequence conservation is found in the vicinity of the minimal region of the protein that must be conserved for lysine → hypusine conversion to occur (see boxed area of Fig. 3) (50).

Although hypusine is not present in the bacterial examples, a lysine residue usually is found in the corresponding position in the sequence. Note in Fig. 3 that in all archaeal and eukaryotic examples, the modified lysine residue is followed by histidine but that in all bacterial examples, an amino acid residue is absent at the corresponding position. Note also that bacterial EF-P sequences have a well-conserved C-terminal section that is absent in both archaeal and eukaryotic eIF-5A sequences, which suggests functional differentiation of the two molecular types.

A number of detailed suggestions have been made regarding eIF-5A function: the essential hypusine modification is suggested to stabilize Met-tRNAiMet binding to the peptidyltransferase center of the ribosome because formylation of the initiator Met residue (which removes its positive charge) renders it less dependent upon the presence of eIF-5A function (18). Complete intracellular depletion of eIF-5A (by gene deletion) results in inhibition of cell growth, although protein synthesis seems to be only slightly reduced (∼25%) (21). This has been interpreted to mean that eIF-5A may participate only in the translation of a subclass of mRNAs required directly for cell growth (21).

CONCLUSIONS

The above analysis makes clear that, contrary to generally accepted understanding, parts of the componentry of the translation initiation system (i.e., SUI1 and its relatives, the IF-1/eIF1A group, and the eIF-5A/EF-P group) are indeed universally distributed; representatives occur in at least some members of each primary line of descent. Whether the various members of any one group are orthologs or merely paralogs is not clear and, in any case, is a matter to be decided by experimentation. However, in at least one instance, the SUI1 group, it seems likely that functional orthology will hold across the entire group: Molecular length is nearly constant throughout the group; the sequence conservation pattern covers the entire molecule; and the degree of similarity is about the same among all three major taxa. Thus, it would appear that at least some aspects of a translation initiation system existed at the universal ancestor stage, which suggests a more advanced ancestral translation function than would otherwise be the case.

The three initiation factors under discussion are seen as playing relatively “peripheral” roles in the process. The central role, that of introducing the initiator tRNA into the mechanism, falls either to IF-2 (in Bacteria) or eIF-2 (in Eukarya and Archaea). Yet, IF-2 and eIF-2 are not specifically related to one another. Herein lies a puzzle: One does not expect molecules that refine or embellish a process to have evolved as such (functionally) before the evolution of the fundaments of that process. Although our observations do not lend themselves to satisfying conclusions at this time, they do offer the promise of ultimately understanding a great deal about the evolution of translation initiation (and translation in general) and focus on the need for comparative (experimental) approaches to this process.

Acknowledgments

We thank C. Ouzounis (European Bioinformatics Institute/United Kingdom) and D. Graham (University of Illinois at Urbana-Champaign) for critical reading of the manuscript. This work was supported by grants from the National Aeronautics and Space Administration (NAG 54500) and Department of Energy (DEFG C02-95ER61963) to C.R.W.

ABBREVIATION

IF

initiation factor

Note Added in Proof

After the submission of this paper, the complete genomes of A. fulgidus and M. thermoautotrophicum were published. Below are the ORF numbers of the genes mentioned above: IF1A-METTH, MTHI004; IF1A-ARCFU, AFO777; SUI1-METTH, MTHI0; SUI1-ARCFU, AFO914; IF5A-METTH, MTH869; and IF5A-ARCFU, AFO645.

References