The T box mechanism: tRNA as a regulatory molecule (original) (raw)

. Author manuscript; available in PMC: 2011 Jan 21.

Abstract

The T box mechanism is widely used in Gram-positive bacteria to regulate expression of aminoacyl-tRNA synthetase genes and genes involved in amino acid biosynthesis and uptake. Binding of a specific uncharged tRNA to a riboswitch element in the nascent transcript causes a structural change in the transcript that promotes expression of the downstream coding sequence. In most cases, this occurs by stabilization of an antiterminator element that competes with formation of a terminator helix. Specific tRNA recognition by the nascent transcript results in increased expression of genes important for tRNA aminoacylation in response to decreased pools of charged tRNA.

Keywords: transcription attenuation, antitermination, tRNA, regulation, riboswitch

1. Introduction

Maintenance of appropriate pools of aminoacylated tRNAs (aa-tRNAs) is essential for cell viability. This requires not only balanced levels of tRNAs and their cognate aminoacyl-tRNA synthetases (aaRSs) but also an adequate supply of the matching amino acid. In Escherichia coli, regulation of aaRS gene expression is mediated by a variety of mechanisms, including transcriptional control (AlaRS), translational control (ThrRS), and ribosome-mediated transcriptional attenuation (PheRS) [1]. In contrast, in Bacillus subtilis and many other Gram-positive bacteria, the majority of aaRS genes (as well as a number of other amino acid-related genes) are regulated by the T box regulatory mechanism [2]. In this system, a riboswitch element in the upstream (or “leader”) region of the nascent transcript of regulated genes monitors the relative amounts of charged vs. uncharged species of a specific tRNA through direct binding of the tRNA by the leader RNA.

Regulation by the T box mechanism most commonly occurs at the level of transcription attenuation [3]. In genes in this class, the nascent transcript includes an element (a G+C-rich helix followed by a run of U residues) that serves as an intrinsic transcriptional terminator. Sequences that form the 5′ side of the terminator helix can also participate in formation of an alternate, less stable antiterminator structure. Formation of the competing antiterminator element is dependent on binding of a specific uncharged tRNA, which stabilizes the antiterminator and therefore prevents formation of the terminator helix. Binding of charged tRNA promotes termination indirectly, by preventing binding of the uncharged tRNA. Regulation at the level of translation initiation has also been predicted for T box riboswitches in certain bacteria [4]. Translationally regulated leader RNAs do not have a terminator helix, and instead include a structure with the ability to sequester the Shine-Dalgarno (SD) sequence for the downstream regulated coding sequence by pairing of the SD region with a complementary anti-SD (ASD) sequence. Binding of uncharged tRNA stabilizes a structure analagous to the antiterminator that includes the ASD sequence, and formation of this alternate structure releases the SD sequence for binding of the 30S ribosomal subunit.

The T box mechanism was initially proposed based on analysis of a single gene in B. subtilis [5]. Subsequent bioinformatics analyses [4, 6, 7] have identified >1000 genes with features conserved in genes in this family. Recent genetic and biochemical studies have provided information about the sequence and structural requirements for T box riboswitch function, and the basis for specific tRNA recognition and tRNA-dependent regulation. tRNA-dependent antitermination, and leader RNA-tRNA binding, have been reproduced in purified systems, which illustrates the ability of the leader RNA to recognize the cognate tRNA in the absence of other cellular factors. This demonstration that the T box RNA can directly monitor regulatory signals in the absence of other cellular factors nucleated the discovery of metabolite binding riboswitch RNAs in the Henkin, Breaker and Nudler laboratories [8]. T box RNAs are therefore the founding member of this growing group of RNAs that are phylogenetically conserved, structurally complex, and capable of direct sensing of physiological signals to control downstream gene expression.

2. Identification of the T box system

We initiated the study of aaRS gene regulation in B. subtilis by characterization of the tyrS gene, encoding tyrosyl-tRNA synthetase (TyrRS) [5]. We recognized that many aaRS genes in Bacillus sp. exhibit a common organization, in which the coding region of the transcript is preceded by a long leader region that contains an intrinsic transcriptional terminator, immediately upstream of which is a conserved 14 nt sequence that we designated the T box sequence [5]. Analysis of tyrS expression in vivo showed that transcription initiation is constitutive, the leader region terminator is functional, and readthrough is stimulated when cells are grown under conditions where tyrosine availability is limited. In contrast, limitation for amino acids other than tyrosine has no effect [5]. Disruption of the stringent response to amino acid starvation (which is mediated by uncharged tRNA) had no effect on tyrS regulation, indicating that the T box mechanism operates independently of ppGpp synthesis (F. J. G. and T.M.H., unpublished results). We also demonstrated that the conserved T box sequence is required for readthrough of the terminator [5]. The conservation of the leader region arrangement suggested a common mechanism for gene regulation, and the specificity of the amino acid response suggested that regulation of genes in different amino acid classes must in some way differentially monitor amino acid availability, either directly or indirectly.

The issue of amino acid specificity was resolved when we uncovered a complex pattern of sequence and structural features conserved in all 10 aaRS leader sequences available at the time [9]. This pattern, which was derived by manual examination of the sequences for covariation in helical regions and conserved placement of primary sequence elements, provided the crucial framework for all subsequent work. The basic leader RNA pattern includes three helical domains (Stems I, II and III) and a pseudoknot element (Stem IIA/B), preceding the T box sequence and terminator (Fig. 1) [9-11]. The key breakthrough was the identification of a single codon (which corresponds to the amino acid specificity of the downstream aaRS gene) displayed in each RNA within a specific internal loop in Stem I. Mutational analysis of the tyrS gene revealed that alteration of the UAC tyrosine codon to a UUC phenylalanine codon causes the gene to respond to limitation for phenylalanine and not tyrosine [9]. This result clearly demonstrated that the codon, designated the “Specifier Sequence,” is indeed responsible for amino acid specificity. These studies were later expanded for tyrS [12] and confirmed for other T box genes by multiple research groups [13-20]. It is important to note that not all changes in the Specifier Sequence resulted in a switch in the amino acid specificity response, and no switch resulted in expression levels equivalent to the wild-type, suggesting the existence of specificity determinants in addition to the Specifier Sequence.

Figure 1.

Figure 1

Structural model of the B. subtilis tyrS leader RNA. The sequence is shown (and numbered) from the transcription start-point through the end of the leader region transcriptional terminator (U274); the coding sequence for TyrRS is further downstream. The structural model of the terminator conformation is shown; the alternate antiterminator conformation is shown above the terminator element. Structural domains (Stem I, Stem II, Stem IIA/B pseudoknot, Stem III) and conserved sequence and structural elements (GA motif, S-turns, AG bulge, Specifier Loop, T box) are labeled. Residues that are highly conserved are marked with asterisks (*). The pseudoknot pairing is shown in purple. Sequences on the 5′ side of the terminator (blue) also participate in pairing with a portion of the T box sequence (red) to form the antiterminator helix. The Specifier sequence residues (UAC tyrosine codon) are boxed. Residues that participate in pairing with the tRNA (UACA in the Specifier Loop, UGGU in the antiterminator bulge) are shown in green.

3. tRNA is the effector

Based on identification of a codon as the crucial _cis_-acting determinant of amino acid specificity, we proposed that the codon base-pairs with the anticodon of the corresponding tRNA [9]. Codons are used as regulatory signals in other transcription attenuation systems, like the E. coli trp operon, in the context of translation of a short peptide encoded in the leader region [3]. In contrast to systems of this type, the Specifier Sequence was not found within a leader peptide coding sequence. Furthermore, introduction of a frameshift mutation immediately upstream of the Specifier Sequence in tyrS had no effect, indicating that translation is unlikely to be involved [9].

Studies using Specifer Sequence mutations, nonsense suppressors and tRNA overproduction for B. subtilis tyrS [9, 12, 21], and isolation of mutants of tRNALeu for the B. subtilis ilv-leu operon [22], supported the model that uncharged tRNA acts as the effector to promote antitermination. Mutually exclusive terminator and antiterminator forms of the leader RNA structure were proposed. Basepairing between the acceptor end of the uncharged tRNA (NCCA) and residues in the antiterminator bulge (UGGN, where the N residues covary to maintain base pairing) was hypothesized to stabilize the antiterminator, and mutational analysis was consistent with the requirement for antiterminator/acceptor end pairing [21]. The variable position of the antiterminator therefore serves as a second determinant of the specificity of tRNA recognition; however, substitution of both the Specifier Sequence and antiterminator variable position were insufficient to allow certain tRNAs to promote antitermination of the tyrS gene, supporting the hypothesis that there are additional determinants for specific tRNA recognition [12].

Overproduction of an unchargable variant of tRNATyr resulted in induction of tyrS expression during growth in rich medium, demonstrating that amino acid limitation acts through accumulation of uncharged tRNA [21]. The presence in the cell of a chargable variant of the tRNA matching the Specifier Sequence reduced induction by the corresponding uncharged tRNA. These observations suggested that while uncharged tRNA is the key effector for promotion of antitermination, the system normally monitors both uncharged and charged tRNA.

These results led to the general model shown in Fig. 2. This model accounts for the specific amino acid response of each member of the T box family, and also for sensitivity to amino acid limitation, measured via the charging ratio of the cognate tRNA. A number of T box genes have been demonstrated to respond as predicted to amino acid limitation with increased readthrough correlated with a decrease in the tRNA charging ratio.

Figure 2.

Figure 2

The T box mechanism. Expression of genes in the T box family is regulated by the ratio of charged to uncharged tRNA in the cell. A. Aminoacylated tRNA binds only to the Specifier Loop; the presence of the amino acid prevents interaction of the acceptor end of the tRNA with the antiterminator. The more stable terminator helix (blue-black) forms and transcription terminates. B. Uncharged tRNA interacts at both the Specifier Loop and the antiterminator; this stabilizes the antiterminator (red-blue) which sequesters sequences (blue) that otherwise participate in formation of the terminator helix, and transcription reads through the termination site and into the downstream coding sequence. Binding of uncharged tRNA results in structural changes throughout the leader RNA. The tRNA is shown in cyan; the amino acid (aa) is shown as a yellow circle attached to the 3′ end of the charged tRNA. Positions of base-pairing between the leader RNA and the tRNA (Specifier Loop-tRNA anticodon, antiterminator bulge-tRNA acceptor end) are shown as green lines.

A series of tRNATyr mutations were generated to identify tRNA features required for antitermination [22]. These studies showed that the intact tRNA three-dimensional structure, including the tertiary interaction between the D-loop and T-loop, are essential, although changes within the helices were well-tolerated. The possibility of sequence-specific recognition of elements in the D- and T-arms was examined, but was not supported by mutational analysis [24]. Replacement of the long variable arm of tRNATyr with a short variable arm, or insertion of a longer helical element within the variable arm, had little effect, suggesting that this domain of the tRNA is not important for tRNA-directed antitermination [22]. These studies were limited, however, by the requirement that the tested tRNA variants be expressed stably within the cell.

Processing of the readthrough transcript was reported for certain T box genes [25]; the cleavage event occurs in the antiterminator region, at least in some cases, and the RNase responsible was identified as RNase J1 [26]. The processing event was proposed to increase the stability of the readthrough transcript, thereby amplifying the tRNA-directed antitermination effect, and may also serve to release the tRNA from the readthrough transcript to allow its return to the cellular tRNA pool.

4. tRNA is sufficient for antitermination in vitro

A key question that was difficult to approach using in vivo analyses alone was whether the response to uncharged tRNA requires the activity of additional cellular factors. To address this issue, we attempted to reproduce tRNA-directed antitermination in a purified in vitro transcription system. This work focused on the B. subtilis glyQS T box leader RNA, which is found upstream of the genes encoding the subunits of the GlyRS enzyme. The glyQS leader sequence (like most glycyl leaders) is a natural deletion variant from which the major Stem II and IIA/B pseudoknot elements are missing. An additional advantage of glyQS for biochemical analysis is that the anticodon loop of the corresponding tRNAGly is unmodified in vivo, increasing the probability that completely unmodified tRNA generated in vitro by T7 RNAP transcription would be functional.

We demonstrated tRNAGly-dependent antitermination of a glyQS construct, using RNAP purified from either B. subtilis or E. coli (which lacks the T box mechanism [27]. tRNA-directed antitermination occurs in the absence of any additional cellular factors, indicating that the tRNA alone can interact with the nascent transcript. This activity of the tRNA is specific, as antitermination requires a match between the tRNA and the leader construct at both the Specifier Sequence and the antiterminator bulge. A similar analysis of the B. subtilis thrS T box leader required either high spermidine or cellular extracts for antitermination activity [28].

We employed the glyQS in vitro antitermination system to test a variety of tRNA variants for antitermination activity [29]. While the results in some cases correspond to our previous in vivo analyses [23], for example in the requirement for the D-loop/T-loop interaction, we were also able to test variants that could not be generated in vivo because of cellular tRNA repair systems. To highlight a few interesting results, we showed that addition of a single residue to either the 5′ or 3′ end of the tRNA completely disrupts antitermination, presumably because of steric hindrance of the required pairing of the acceptor end of uncharged tRNA with that antiterminator bulge [29]. Subsequent studies demonstrated that addition to the 3′ end generates a useful stable mimic of charged tRNAGly, and that this RNA (tRNAGlyEX1C) acts as a competitive inhibitor of binding and antitermination by wild-type tRNA [30]. We also tested alterations in the lengths of the acceptor end and anticodon helices. Extension of the anticodon helix by 2 bp significantly reduces antitermination activity [29]. In contrast, addition of up to 4 bp in the acceptor helix had no inhibitory effect; however, addition of 5 or 8 bp abolishes antitermination. To our surprise, addition of 11 bp (one full turn of the RNA helix) completely restores antitermination activity (but 1.5 or 2 turns does not). This revealed a face-of-the-helix dependence in the presentation of the acceptor end of the tRNA to the antiterminator, with some flexibility in the allowable distance between the Specifier Loop and the antiterminator. We inserted residues at various positions of the glyQS leader RNA (e.g., between Stems I and III, or between Stem III and the antiterminator) in an attempt to compensate for the length increase in the tRNA variant with 2 extra turns, but antitermination was not restored. These results suggest that there is a limit to the flexibility of one or both of the RNA partners, or that there are other interactions (either within the leader RNA or between leader RNA and tRNA domains) that are disrupted in these constructs.

5. Kinetics of leader RNA transcription affect tRNA-directed antitermination

Efficient in vitro antitermination was achieved using single round transcription reactions and low NTP concentrations to reduce the rate of transcription [27]. We used time course experiments to characterize the pattern of RNAP pausing during leader RNA transcription, and the effect of variation of NTP concentration on both pausing and tRNA-directed antitermination [31]. These studies showed that severely reduced pausing at specific sites and increased overall rate of transcription could be tolerated. However, interpretation of these experiments was confounded by the presence of populations of transcription complexes occupying various positions along the template.

To generate uniform pools of transcription complexes containing defined segments of the nascent RNA, we took advantage of the E111Q variant of restriction endonuclease EcoRI. This variant is active in DNA binding but defective in DNA endonuclease activity, allowing its use as a reversible roadblock to transcription elongation [32]. Insertion of EcoRI sites at strategic positions within the leader sequence (that were tested to show that they had no effect on antitermination) allowed us to prebind EcoRI E111Q protein to the DNA template as a roadblock to processivity of RNAP. Addition of high KCl releases the EcoRI protein, allowing transcription to continue. This approach generated separate pools of transcription complexes from which a precisely defined portion of the leader RNA had emerged, and the ability of the tRNA to bind to and promote antitermination of each population was tested [30]. We found that transcription complexes poised anywhere along the template, including at a position immediately upstream of the termination site, so that the complete leader including the antiterminator element was exposed, are fully competent for tRNA binding and antitermination. This indicates that the nascent RNA can fold into the appropriate structure for tRNA binding in the absence of the tRNA, and does not need to fold in a stepwise manner around the tRNA during transcription of the leader RNA.

The ability of the tRNAGlyEX1C charged tRNA mimic to displace uncharged tRNAGly at each roadblock point was also measured. Our results indicate that both charged and uncharged tRNA have equal access to the nascent RNA until the antiterminator element is complete, at which point the uncharged tRNA forms a stable complex that cannot be displaced [30]. This suggests that the transcription complex can continuously monitor the relative amounts of charged and uncharged tRNA until antiterminator synthesis is complete, at which point a commitment step is reached. It also suggests that the antiterminator-acceptor end pairing significantly enhances the stability of the complex. This could be due simply to the four additional basepairs, or could involve a more complex structural rearrangement. We favor the latter hypothesis, based on our structural mapping experiments [33].

6. Structural analysis of the leader RNA-tRNA complex

The ability of the full-length nascent leader RNA to interact with uncharged tRNAGly led us to generate both RNAs by T7 RNAP transcription and test for binding. Our initial binding assays exploited the difference in size between the two RNA molecules, and separated bound from free radiolabeled tRNA using size exclusion filtration. These studies demonstrated sequence-specific binding that is disrupted by codon-anticodon mismatches [33]. Binding occurs to a significant degree using RNAs that contain only the Stem I element, but is substantially weaker, consistent with the absence of the tRNA acceptor end-antiterminator contacts. Binding assays also were developed in collaboration with the Hines and Agris labs using fluorescent residues inserted at specific positions within the antiterminator [34, 35] or Specifier Loop [36] and monitoring changes in fluorescence with binding of either the full-length tRNA or an appropriate helical element designed to mimic either the acceptor end or the anticodon end of the tRNA. These studies confirmed the importance of interactions at both ends of the tRNA, and demonstrated specific recognition of the corresponding tRNA element by the isolated antiterminator and Specifier Loop domains.

Since the binding studies suggested that T7 RNAP-transcribed RNAs could be correctly refolded, we used the purified RNAs for structural mapping of both RNA partners in the complex [33]. Initial studies were carried out with high Mg2+ at high pH, which stimulates in-line attack on positions within the RNA backbone that are free to rotate into the appropriate angle, while constrained positions (e.g., in helices) are protected from cleavage. These studies indicated that in the absence of tRNA the leader folds into a structure consistent with the phylogenetic model, although several regions shown as unpaired in the model (e.g., the terminal loop of Stem I) are not susceptible to cleavage, suggesting that they are indeed structured. Addition of tRNAGly resulted in protection of not only the Specifier Sequence but also the next residue (consistent with the prediction that the conserved purine residue downstream of the Specifier Sequence interacts with the universal U33 residue 5′ to the tRNA anticodon). Protection of all 7 residues of the antiterminator bulge was also observed, despite predicted base-pairing for only the first 4 residues (UGGA). The linker regions between Stem I and Stem III, and between Stem III and the antiterminator, are also highly protected. All of these changes are dependent on both codon-anticodon and tRNA acceptor end-antiterminator pairing. Addition of the charged tRNA mimic resulted in protection only of the Specifier Loop region, indicating that all other interactions are dependent on acceptor end-antiterminator pairing [33]. These results are consistent with the hypothesis that this final pairing promotes a structural transition that stabilizes the complex. Previous structural mapping studies with the thrS RNA also gave data consistent with the phylogenetic model, but showed no tRNA-dependent changes in vitro [37], making it difficult to determine if the RNAs used were correctly refolded.

Studies using a variety of other cleavage agents have revealed changes throughout the leader RNA in response to tRNA binding, including regions not known to base-pair with the tRNA (N. J. G., F. J. G. and T. M. H., unpublished). Similar studies with tRNAGly showed protection of the anticodon loop (including the conserved U33 residue) and also protection in the D loop, further supporting the model that changes occur in both RNA partners upon complex formation [33]. We also used oligonucleotide-directed RNase H cleavage of leader RNAs generated in the presence or absence of tRNA to show that transcripts generated in the presence of uncharged tRNAGly are in the antiterminator form, while transcripts generated in the absence of tRNAGly, or in the presence of the charged tRNA mimic tRNAGlyEX1C, are in the terminator form, providing a clear demonstration of the tRNA-dependent structural switch [33]. Both structural mapping data and biochemical analysis of leader RNA and tRNA mutants suggest that interaction with the tRNA occurs in at least two steps, an initial interaction dependent primarily on codon-anticodon pairing, and a secondary interaction dependent on acceptor end-anticodon pairing; this second interaction is required for a global rearrangement of the RNA structure. The nature of this rearrangement, and the position of leader RNA and tRNA elements relative to each other at each stage, remains unknown.

7. Leader RNA structural motifs

The basic features of the regulatory model and leader structural model were generated based on 10 T box genes. The tremendous increase in availability of microbial genomic sequence data has now provided us with >1000 leader sequences [5]. We used these sequences to refine the structural model, identify novel structural arrangements and sequence variants, and derive new insight into leader RNA structure, function, and interaction with the tRNA. Mutational analysis of the tyrS and glyQS leader sequences generally showed that conservation of sequence and structural elements correlates with the requirements for function in vivo, with some interesting exceptions [11, 38, 39; N. J. G., F. J. G. and T. M. H., unpublished].

We carried out more detailed phylogenetic and mutational analyses of the highly conserved antiterminator domain, which interacts with the acceptor end of the tRNA [38; F. J. G. and T. M. H., unpublished). In collaboration with the laboratory of Jennifer Hines we completed a set of biochemical studies on the antiterminator domain alone and its interaction with tRNA in vitro [40], and determined the solution structure of the antiterminator RNA [41]. The antiterminator bulge is highly flexible, predicting an “induced fit” mode of binding of the tRNA. The level of flexibility is important for antiterminator function, as introduction of a substitution at one of the conserved C residues results in a major increase in bulge flexibility, and a corresponding decrease in tRNA binding activity and tRNA-dependent antitermination in vivo and in vitro [40, 41]. Of special importance is our identification of the antiterminator element as a target for identification of a novel class of antibiotics [42]. We have identified compounds that either destabilize the tRNA-antiterminator interaction (i.e., lead compounds to be developed as antibiotics), or stabilize the antiterminator in the absence of tRNA (which provide information about the antiterminator structural transition).

We also focused our attention on the GA motif at the base of Stem I. The pattern of conservation of this motif matches the consensus for a “kink-turn” structural element, and sensitivity to mutation is consistent with this prediction [39, 43]. Further mutational analyses supported this prediction, but a surprising result is that mutations in this motif that severely compromise antitermination in vivo have little effect on antitermination or tRNA binding in vitro (F. J. G. and T. M. H., unpublished). The basis for this difference is not yet understood. Kink-turn elements in other RNAs are recognition elements for proteins in the L7AE family, and two ORFS of unknown function that are predicted to encode members of this family were identified in the B. subtilis genome. These ORFs are conserved in organisms that contain T box genes, but are generally absent in organisms without T box genes. In-frame deletion of these genes, either singly or in combination, had no effect on cell viability or T box gene expression (F. J. G. and T. M. H., unpublished]. It therefore appears that the T box kink-turn motif functions in the absence of a partner protein (as also appears to be true for the corresponding elements in S box and L box riboswitches; [44, 45]).

We showed that mutations in the highly conserved (but not universal) S-turn element (comprised of AGUA and GAA residues in the 5′ and 3′ sides of the Specifier Loop, respectively, above the Specifier Sequence) resulted in loss of tyrS antitermination in vivo [11] and in glyQS antitermination in vivo and in vitro [N. J. G, F. J. G. and T. M. H., unpublished]. In the context of the glyQS T box leader RNA, these mutations also disrupt tRNA binding in a fluorescence-based assay using a model RNA containing 2-aminopurine at the A98 residue immediately preceding the GGC Specifier Sequence [36]. A subgroup of leader RNAs (notably the thrS genes) lack the S-turn element, suggesting an alternate arrangement in this domain. In contrast, deletion of A98, which is located between the S-turn and the GGC and is not highly conserved, has no effect on leader RNA function [33]. NMR analysis of a model RNA based on the tyrS Specifier Loop domain provides support for the presence of the S turn in the Specifier Loop, which is lost upon mutation of the corresponding residues; these data support a model in which the S-turn helps to present the Specifier Sequence residues for pairing with the tRNA anticodon loop [J. Wang, T. M. H. and E. P. Nikonowicz, unpublished].

Another element that is highly conserved in the majority of T box sequences, but is absent from a subset including the glycyl genes, is the Stem IIA/B pseudoknot. Pairing of residues in the loop of Stem IIA with downstream residues (to form the Stem IIB pairing) is highly conserved, and was demonstrated by mutational analysis in the context of the B. subtilis tyrS gene [11]. In addition, high conservation of the sequence at the base of Stem IIA and the adjacent residues predicted to form the “turn” of the pseudoknot was shown to be functionally important. The upstream Stem II element, which is present or absent in T box leader sequences in conjunction with the Stem IIA/B pseudoknot, is generally variable in sequence and length, but often contains an element predicted to form an S turn; mutation of this element in the context of the tyrS leader resulted in reduced expression in vivo [11]. The role of the Stem II and Stem IIA/B elements is unknown, but their high (albeit not universal) conservation and sensitivity to mutation in leader sequences in which they are found suggest that they play an important role in tRNA-dependent antitermination.

We have also observed high conservation of sequence motifs and structural arrangements of domains at the top of Stem I. Mutation of these elements in the context of the tyrS and glyQS leader sequences results in defects in antitermination in vivo and in vitro [11; N. J. G., F. J. G. and T. M. H, unpublished]. The role of these elements remains to be determined.

Together, these studies demonstrate the importance of the structural motifs identified in T box leader RNAs from the phylogenetic analyses. Mutational studies have clearly shown the requirement for base-pairing between the Specifier Sequence and tRNA anticodon, and antiterminator and tRNA acceptor end. Structural elements that surround or are immediately adjacent to the residues that participate basepairing interactions are likely to be involved in correct presentation of the bases for pairing. The roles of other conserved elements, including those at the top of Stem I, Stem II and the Stem IIA/B pseudoknot, remain to be elucidated.

8. Phylogenetics

We initially identified T box leader sequences by two approaches: 1) search raw genomic data for conserved primary sequence elements of T box leader RNAs, and manually assemble the elements to decide if a complete leader element was present; and 2) search genomic data using coding sequences likely to be regulated by the T box mechanism (e.g., aaRS genes) and analyze the upstream regions of these genes for T box elements. These manual efforts yielded several hundred T box elements. The first approach was biased toward sequences exhibiting good agreement with the elements used in the search, while the second was restricted to known genes and likely organisms. To circumvent these biases, we collaborated with the laboratory of Dr. Enrique Merino (UNAM, Mexico) to generate a computerized search algorithm that would include many of the most conserved elements of T box leader RNAs, but would allow imperfect matches [4]. This search protocol yielded hundreds of additional T box elements, each of which was checked manually to ensure that it fit the appropriate parameters. The computerized search also yielded a number of false-positives, and a number of leader sequences in which the Specifier Sequence was improperly predicted, requiring manual correction; in addition, a subset of leaders in which major structural deletions or rearrangements have occurred were missed. Nevertheless, it allows a rapid survey of new genomic data. It is clear that the combination of computerized and manual approaches is essential.

Our current dataset includes >1000 carefully annotated T box elements [4] and while there is significant correspondence between our results and those of a similar analysis [6], it also includes a number of new variants. Leader sequences have been identified in members of all groups of Gram-positive bacteria, as well as in members of a few other groups of bacteria, including the deeply rooted Deinococcus and Thermus, and a few Gram-negative organisms, including Geobacter and Chloroflexus [4, 6]. Continuous searching of new genomes as they become available will allow this dataset to grow rapidly. This highly-curated dataset is much larger than any other riboswitch dataset currently available, and is especially valuable in examining amino acid class-specific variations.

The collection of new T box leaders provided important information about T box element structural variability and arrangement. For example, a novel group of ileS leader sequences was identified in Mycobacterium sp. and a related subgroup of Actinomycetes, and correlation between the phylogenetic distribution of leader structure and the downstream IleRS coding sequence provided interesting data about coevolution of IleRS enzyme and leader variants [F. J. G., J. R. Brown, S. M. Rollins and T. M. H., unpublished]. We also discovered a new class of T box leaders that contain a (noncognate) tRNA gene embedded within Stem III; we hypothesize that this tRNA is removed by processing after the termination/readthrough decision, possibly stabilizing the readthrough transcript and recycling both terminated leader RNA and the cognate tRNA bound to the leader region in the readthrough transcript [4].

The T box mechanism is most highly represented in aaRS genes (62% of identified elements), in agreement with its original identification in that context [4]. The remaining genes identified downstream of T box elements are involved in amino acid biosynthesis (18%) and transport (12%). A handful of regulatory genes have been identified (notably, the anti-Trap protein involved in regulation of tryptophan biosynthesis in B. subtilis [46]) and 8% are genes of unknown function. The predictive value of the Specifier Sequence for the tRNA class to which the gene responds provides insight into the probable physiological role of the regulated gene [4, 6, 7]. This is evident for amino acid transporters, for which it can be difficult to unambiguously assign their substrate. Another example is provided by a set of leader sequences preceding genes annotated as involved in aspartate/asparagine biosynthesis were found unexpectedly to have alanine Specifier Sequences; the corresponding genes in B. subtilis (which are not T box regulated) have a role in alanine biosynthesis, and the Specifier Sequence data suggest that these genes are misannotated in many genomes. We also uncovered genes likely to be responsible for a novel isoleucine biosynthesis pathway in members of the Clostridiales [4]. There are many additional examples, including unusual aaRS genes, amino acid transporters and regulatory genes that are difficult to classify based on sequence homology alone.

9. Conclusions

Cells take advantage of a wide variety of regulatory mechanisms, and have evolved the means to sense a variety of signals and transmit that information to the gene expression machinery. The T box mechanism demonstrates the versatility of tRNA as a regulatory molecule, and the ability of cells to use nascent transcripts to recognize a specific tRNA class, and discriminate between uncharged and charged tRNA species, by using both base-pairing and structural features of the tRNA.

Acknowledgements

This work was supported by National Institutes of Health grant R01-GM47823.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References