RNA Synthesis and RNA Processing (original) (raw)

Introduction

We have thus far considered how chromosomes are organized as very large DNA-protein complexes and how they are duplicated before a cell divides. But the main function of a chromosome is to act as a template for the synthesis of RNA molecules, since only in this way does the genetic information stored in chromosomes become directly useful to the cell. There is a great deal of RNA synthesis in a cell: the total rate at which nucleotides are incorporated into RNA during interphase is about 20 times the rate at which nucleotides are incorporated into DNA during S phase.

RNA synthesis, which is also called DNA transcription, is a highly selective process. In most mammalian cells, for example, only about 1% of the DNA nucleotide sequence is copied into functional RNA sequences (mature messenger RNA or structural RNA). The selectivity occurs at two levels, which we discuss in turn in this section: (1) only part of the DNA sequence is transcribed to produce nuclear RNAs, and (2) only a minor proportion of the nucleotide sequences in nuclear RNAs survives the RNA processing steps that precede the export of RNA molecules to the cytoplasm. We begin by describing RNA polymerases, the enzymes that catalyze all DNA transcription.

RNA Polymerase Exchanges Subunits as It Begins Each RNA Chain 29

As described in outline in Chapter 6, transcription begins when an RNA polymerase molecule binds to a promoter sequence on the DNA double helix. Next, in a step that is not well understood, the two strands of the DNA are separated locally to form an open complex. At this stage the template strand is exposed, and synthesis of the complementary RNA chain can begin. The polymerase then moves along the template strand, extending its growing RNA chain in the 5'-to-3' direction by the stepwise addition of ribonucleoside triphosphates until it reaches a stop (termination) signal, at which point the newly synthesized RNA chain and the polymerase are released from the DNA. Each RNA molecule thus represents a single-strand copy of the nucleotide sequence of one DNA strand in a relatively short region of the genome (see Figure6-2). This transcribed segment of DNA is called a transcription unit.

RNA polymerases are generally formed from multiple polypeptide chains and have masses of 500,000 daltons or more. The enzymes in bacteria and eucaryotes are evolutionarily related ( Figure 8-42). Since the bacterial enzyme has been far easier to study, its properties provide a basis for understanding its eucaryotic relatives. The E. coli enzyme contains four different subunits, α, β, β', and sigma, there being two copies of α and one each of the others. The complete amino acid sequence of each subunit has been determined from the nucleotide sequence of its gene.

Figure 8-42

The common origin of bacterial and eucaryotic RNA polymerases. Amino acid sequence similarities between the β' subunit of _E. coli_RNA polymerase and the largest subunit of eucaryotic RNA polymerase II are among the comparisons that reveal a common (more...)

The sigma (σ) subunit of the E. coli polymerase has a specific role in the initiation of transcription: it enables the enzyme to find promoter sequences to which it binds. After about eight nucleotides of an RNA molecule have been synthesized (step 4 in Figure 8-43), the s subunit dissociates and a number of elongation factors - important for chain elongation and termination - become associated with the enzyme instead. The elongation factors include several proteins that function in ways that are incompletely understood. The initiation of transcription is an important control point where the cell can regulate the expression of a gene; for this reason it is discussed in more detail in Chapter 9.

Figure 8-43

Schematic diagram of the steps in the initiation of RNA synthesis (DNA transcription) catalyzed by RNA polymerase. The steps indicated have been revealed by studies of the E. coli enzyme. A DNA molecule containing a promoter sequence for the E. coli polymerase (more...)

Three Kinds of RNA Polymerase Make RNA in Eucaryotes 30

Although the mechanism of DNA transcription is similar in eucaryotes and procaryotes such as E. coli, the machinery is considerably more complex in eucaryotes. In eucaryotes as diverse as yeasts and humans, for example, there are three types of RNA polymerases, each responsible for transcribing different sets of genes. These enzymes - denoted as RNA polymerases I, II, and III - are structurally similar to one another and have some common subunits, although other subunits are unique. Each is more complex than E. coli RNA polymerase and is thought to contain 10 or more polypeptide chains. Another important distinction between the bacterial and eucaryotic enzymes is that, whereas the purified bacterial enzyme can bind to promoters and initiate transcription on its own, the eucaryotic enzymes require the presence of additional initiation proteins that must bind to the promoter before the enzyme can bind. For this reason it was not until 1979 that systems with all the needed components became available so that eucaryotic initiation mechanisms could be analyzed in vitro. Because these initiation proteins and their interactions with the polymerases are intimately involved with the control of transcription initiation, we shall defer discussion of them until Chapter 9.

The three eucaryotic RNA polymerases were initially distinguished by their chemical differences during purification and by their sensitivity to α-amanitin, a poison isolated from the deadly toadstool Amanita phalloides. RNA polymerase I is unaffected by α-amanitin; RNA polymerase II is very sensitive to this poison; and RNA polymerase III is moderately sensitive to it. The sensitivity of RNA synthesis to α-amanitin is still used to determine which polymerase transcribes a gene. Such studies indicate that RNA polymerase II transcribes the genes whose RNAs will be translated into proteins. The other two polymerases synthesize RNAs that have structural or catalytic roles, chiefly as part of the protein synthetic machinery: polymerase I makes the large ribosomal RNAs, and polymerase III makes a variety of very small, stable RNAs - including the small 5S ribosomal RNA and the transfer RNAs. However, most of the small RNAs that form snRNPs, which we discuss later when we consider RNA processing, are made by polymerase II.

Mammalian cells typically contain 20,000 to 40,000 molecules of each of the RNA polymerases, and studies with cultured cells indicate that the concentrations of these enzymes are regulated individually according to the rate of cell growth.

RNA Polymerase II Transcribes Some DNA Sequences Much More Often Than Others 31

Because RNA polymerase II makes all of the mRNA precursors and thus determines which proteins a cell will make, we shall focus most of our discussion on its activities and on the fate of its products. Although experiments with purified polymerases in vitro are essential for establishing the mechanism of transcription, much can also be learned about how the process occurs in a cell by using the electron microscope to examine genes in action, with their bound RNA polymerases caught in the act of transcription.

Ordinary thin-section electron micrographs of interphase nuclei show granular clumps of chromatin (see Figure 8-71) but reveal very little about how genes are transcribed. A much more detailed picture emerges if the nucleus is ruptured and its contents spilled out onto an electron microscope grid ( Figures 8-44and 8-45). At the farthest point from the center of the lysed nucleus, the chromatin is diluted sufficiently to make individual chromatin strands visible in the expanded, beads-on-a-string form shown previously in Figure 8-9B.

Figure 8-71

Electron micrograph of a mammalian cell nucleus. Note that the condensed chromatin underlying the nuclear envelope is excluded from regions around the nuclear pores. (Courtesy of Larry Gerace.)

Figure 8-44

A typical cell nucleus visualized by electron microscopy using the procedure shown in Figure 8-45. An enormous tangle of chromatin can be seen spilling out of the lysed nucleus; only the chromatin at the outermost edge of this tangle will be sufficiently (more...)

Figure 8-45

A method for examining chromatin in the electron microscope. The nuclei are first lysed, and then the chromatin is freed from cellular debris and spread out on a grid.

RNA polymerase molecules actively engaged in transcription appear as globular particles with a single RNA molecule trailing behind. Particles representing active RNA polymerase II molecules are usually seen as single units, without nearby neighbors. This observation indicates that most genes are transcribed into mRNA precursors only infrequently, so that one polymerase finishes transcription before another one begins. Occasionally, however, many polymerase molecules (and their associated RNA transcripts) are seen clustered together. These clusters occur on the relatively few genes that are transcribed at high frequency ( Figure 8-46). The length of the attached RNA molecules in such a cluster increases in the direction of transcription, producing a characteristic pattern. This pattern defines the RNA polymerase II start site and direction of transcription for a specific transcription unit ( Figure 8-47).

Figure 8-46

Electron micrograph of a region of chromatin containing a gene being transcribed at unusually high frequency. Many RNA polymerase II molecules with their growing RNA transcripts are visible. The direction of transcription is from left to right (see Figure (more...)

Figure 8-47

An idealized transcription unit. The drawing illustrates how the electron microscope appearance (see Figure 8-46) demonstrates the direction of transcription, as well as the start site of the unit.

Biochemical studies have confirmed and extended the results obtained by electron microscopy, leading to three major conclusions:

Eucaryotic RNA polymerase molecules, like those in procaryotes, begin at specific sites on the chromosome.

The average length of the complete RNA molecule produced by RNA polymerase II from a single transcription unit is about 7000 nucleotides, and RNA molecules 10,000 to 20,000 nucleotides long are common. These lengths, which are much longer than the 1200 nucleotides of RNA needed to code for an average protein of 400 amino acid residues, reflect the complex structure of eucaryotic genes and, in particular, the presence of long intron sequences, which, as we discuss later, are later removed from the RNA.

Although chain elongation rates of about 30 nucleotides per second are observed for all RNAs, different RNA polymerase II start sites have different initiation frequencies, so that some genes are transcribed at much higher rates than others. As indicated in Table 8-2, the majority of the genes that are transcribed give rise to very few mRNA molecules.

Table 8-2

The Population of mRNA Molecules in a Typical Mammalian Cell.

The Precursors of Messenger RNA Are Covalently Modified at Both Ends 32

In eucaryotes mature mRNA is produced in several steps. The RNA molecules freshly synthesized by RNA polymerase II in the nucleus are known as _primary transcripts;_the collection of such transcripts was originally called heterogeneous nuclear RNA (hnRNA) because of the large variation in RNA size, contrasting with the more uniform and smaller size of the RNA sequences actually needed to encode proteins. We shall see shortly that much of this variation is due to the presence of long intron sequences in the primary transcripts. As they are being synthesized, these transcripts are covalently modified at both their 5' end and their 3' end in ways that clearly distinguish them from transcripts made by other RNA polymerases. These modifications will be used later in the cytoplasm as signals that these transcripts are to be translated into protein.

The 5' end of the RNA molecule (which is the end synthesized first during transcription) is first capped by the addition of a methylated G nucleotide. Capping occurs almost immediately, after about 30 nucleotides of RNA have been synthesized, and it involves condensation of the triphosphate group of a molecule of GTP with a diphosphate left at the 5' end of the initial transcript ( Figure 8-48). This 5' cap will later play an important part in the initiation of protein synthesis; it also seems to protect the growing RNA transcript from degradation.

Figure 8-48

The reactions that cap the 5' end of each RNA molecule synthesized by RNA polymerase II. The final cap contains a novel 5'-to-5' linkage between the positively charged 7-methyl G residue and the 5' end of the RNA transcript (see Figure 6-26). At least (more...)

The 3' end of most polymerase II transcripts is defined not by the termination of transcription but by a second modification in which the growing transcript is cleaved at a specific site and a poly-A tail is added by a separate polymerase to the cut 3' end. The signal for the cleavage is the appearance in the RNA chain of the sequence AAUAAA located 10 to 30 nucleotides upstream from the site of cleavage, plus a less well-defined downstream sequence. Immediately after cleavage, a poly-A polymerase enzyme adds 100 to 200 residues of adenylic acid (as poly A) to the 3' end of the RNA chain to complete the primary RNA transcript. Meanwhile, the polymerase fruitlessly continues transcribing for hundreds or thousands of nucleotides until termination occurs at one of several later sites; the extra piece of functionless RNA transcript thus generated presumably lacks a 5' cap and is rapidly degraded ( Figure 8-49).

Figure 8-49

Synthesis of a primary RNA transcript (an mRNA precursor) by RNA polymerase II. This diagram starts with a polymerase that has just begun synthesizing an RNA chain (step 4 of Figure 8-43). Recognition of a poly-A addition signal in the growing RNA transcript (more...)

The poly-A tail appears to have several functions: (1) as described later, it aids in the export of mature mRNA from the nucleus; (2) it is thought to affect the stability of at least some mRNAs in the cytoplasm; and (3) it seems to serve as a recognition signal for the ribosome that is required for efficient translation of mRNA. The latter feature - in combination with the 5' cap - would enable a ribosome to determine whether the mRNA was intact before expending energy and precursors to begin its translation.

Even though polymerase II transcripts comprise more than half of the RNA being synthesized by DNA transcription, we shall see below that most of the RNA in these transcripts is unstable and therefore short-lived. Consequently, the hnRNA in the cell nucleus and the cytoplasmic mRNA derived from it constitute only a minor fraction of the total RNA in a cell ( Table 8-3). Despite their relative scarcity, these RNA molecules can be readily purified because of their poly-A tails. When the total cellular RNA is passed through a column containing poly dT linked to a solid support, the complementary base-pairing between T and A residues selectively binds the molecules with poly-A tails to the column; the bound molecules can then be released for further analysis. This procedure is widely used to separate the hnRNA and mRNA molecules from the ribosomal and transfer RNA molecules that predominate in cells.

Table 8-3

Selected Data on Amounts of RNA in a Typical Mammalian Cell.

Only RNA polymerase II transcripts have 5' caps and 3' poly-A tails. This seems to be because the capping and cleavage plus poly-A addition reactions are mediated by enzymes that bind selectively to polymerase II. Thus, if a gene that is normally transcribed by polymerase II is separated from its promoter by recombinant DNA methods and fused to a promoter recognized by polymerase I or by polymerase III, the RNA transcripts produced from it by these polymerases are neither capped nor polyadenylated. The requirement for a specific capping and polyadenylation of mRNA precursors may explain why these RNAs are synthesized by a separate type of RNA polymerase molecule in eucaryotes.

RNA Processing Removes Long Nucleotide Sequences from the Middle of RNA Molecules 33

The discovery of interrupted genes in 1977 was entirely unexpected. Previous studies in bacteria had shown that their genes are composed of a continuous string of the nucleotides needed to encode the amino acids of a protein, and there seemed to be no obvious reason why a gene should be organized in any other way. The first indication that eucaryotic genes are not continuous like bacterial genes came when new methods allowing an accurate comparison of mRNA and DNA sequences were applied to mRNAs produced by a human adenovirus (a large DNA virus). The region of the viral DNA producing these RNAs turned out to contain sequences that are not present in the mature RNAs. The possibility that this situation was unique to viruses was quickly eliminated by the finding of similar interruptions in the ovalbumin and β-globin genes of vertebrates. As discussed earlier, the sequences present in the DNA but omitted from the mRNA are called intron sequences, while those present in the mRNA are called exon sequences ( Figures 8-50 and 8-51).

Figure 8-50

Early evidence for the existence of introns in eucaryotic genes. The evidence was provided by the "R-loop technique," in which a base-paired complex between mRNA and DNA molecules is visualized in the electron microscope. An unusually abundant mRNA molecule, (more...)

Figure 8-51

The transcribed portion of the human β-globin gene. The sequence of the DNA strand corresponding to the mRNA sequence is given, with the primary RNA transcript surrounded by a green line and the nucleo-tides in the three exons _shaded red._Note (more...)

Before the discovery of introns, the significance of hnRNA and its relationship to mRNA had seemed very mysterious. It had long been known that most of the RNA synthesized by RNA polymerase II is rapidly degraded in the nucleus. The hnRNA molecules of cultured cells can be radiolabeled by brief exposure to 3H-uridine and followed over a long period. This sort of experiment showed that the average length of the hnRNA molecules in the labeled population decreases rapidly, starting from about 7000 nucleotides, to reach the size of cytoplasmic mRNA molecules (an average of about 1500 nucleotides) after only about 30 minutes; at about the same time, radioactively labeled RNA molecules begin to leave the nucleus as mRNA molecules. Only about 5% of the mass of the labeled hnRNA ever reaches the cytoplasm, however; the remainder is degraded into small fragments in the nucleus over a period of about an hour. This seemed strangely wasteful. And the puzzle was deepened by the finding that even though the hnRNA molecules became progressively shorter, they retained their 5' caps and their 3' poly-A tails.

With the discovery of introns, the explanation became clear: the primary RNA transcript is a faithful copy of the gene, containing both exon and intron sequences, and the latter sequences are cut out of the middle of the RNA transcript to produce an mRNA molecule that codes directly for a protein (see Figure3-15). Because the coding RNA sequences on either side of an intron sequence are joined to each other after the intron sequence has been cut out, this reaction is known as RNA splicing. RNA splicing occurs in the cell nucleus, out of reach of the ribosomes, and RNA is exported to the cytoplasm only when processing is complete.

Because most mammalian genes contain much more intron than exon sequence (see Table 8-1, p. 340), RNA splicing can account for the conversion of the very long nuclear hnRNA molecules to the much shorter cytoplasmic mRNA molecules.

Before discussing the distribution of introns in eucaryotic genes and some of their consequences for cell function, it is necessary to explain how intron sequences are recognized and removed by the splicing machinery.

hnRNA Transcripts Are Immediately Coated with Proteins and snRNPs 34

Newly made RNA in eucaryotic cells, unlike that in bacteria, appears to become immediately condensed into a string of closely spaced protein-containing particles. Each particle consists of about 500 nucleotides of RNA wrapped around a protein complex that serves to condense and package each growing RNA transcript in a manner reminiscent of the DNA-protein complexes of nucleosomes. The resulting hnRNP particles ( heterogeneous nuclear ribonucleoprotein particles) can be purified after nuclei have been treated with ribonucleases at levels just sufficient to destroy the linker RNA between them. Each particle has a diameter of about 20 nm, which is twice that of a nucleosome, and the protein core is more complex and less well characterized, being composed of a set of at least eight different proteins. Except for histones, the proteins in this core are the most abundant proteins in the cell nucleus. Several of them contain a conserved domain of about 80 amino acids, which is often repeated and shared by many other RNA-binding proteins.

The hnRNP particles are generally distorted by the standard spreading techniques used to view gene transcription in the electron microscope (see Figure 8-45). These micrographs, however, reveal especially stable particles of a less common type, whose position on the RNA strongly implicates them in RNA splicing. These particles form very quickly at specific RNA sequences - at or near the junctions between intron and exon sequences - and, as the RNA transcript elongates, they coalesce in pairs to form a larger assembly that is thought to be the spliceosome that catalyzes RNA splicing ( Figure 8-52).

Figure 8-52

Spliceosomes. (A) Electron micrograph of a chromatin spread showing large ribonucleoprotein particles assembling at the 5' and 3' splice site regions to form a spliceosome. The RNA transcripts are being produced from a gene encoding a Drosophila chorion (more...)

Biochemical analysis has revealed that the cell nucleus contains many complexes of proteins with small RNAs (generally RNAs of 250 nucleotides or less), which have arbitrarily been designated U1, U2, . . . , U12 RNAs. These complexes, called small nuclear ribonucleoproteins (snRNPs - pronounced "snurps"), resemble ribosomes in that each contains a set of proteins complexed to a stable RNA molecule. They are much smaller than ribosomes, however - only 250,000 daltons compared with 4.5 million daltons for a ribosome. Some proteins are present in several types of snRNPs, whereas others are unique to one type. This was first demonstrated using serum from patients with the autoimmune disease systemic lupus erythematosus, who make antibodies directed against one or more of their own snRNP proteins: a single antibody was found that binds the U1, U2, U5, and U4/U6 snRNPs, for example, and we now know that they all contain common proteins.

Individual snRNPs are believed to recognize specific nucleic acid sequences through RNA-RNA base-pairing. Some mediate RNA splicing, one is known to be involved in the cleavage reaction that generates the 3' ends of newly formed histone RNAs, while the function of others is unknown. The evidence for the role of snRNPs in RNA splicing comes from experiments on RNA splicing in vitro, as well as from analyses of yeast cells that are mutant in one of the snRNP components.

Intron Sequences Are Removed as Lariat-shaped RNA Molecules 35

Introns range in size from about 80 nucleotides to 10,000 nucleotides or more. Unlike the sequence of an exon, the exact nucleotide sequence of an intron seems to be unimportant. Thus introns have accumulated mutations rapidly during evolution, and it is often possible to alter most of an intron's nucleotide sequence without greatly affecting gene function. This has led to the suggestion that intron sequences have no function at all and are largely genetic "junk," a proposition we shall examine at the end of the chapter. The only highly conserved sequences in introns are those required for intron removal, which are found at or near the ends of an intron and are very similar in all known intron sequences; they generally cannot be altered without affecting the splicing process that normally removes the intron sequence from the primary RNA transcript. These conserved boundary sequences at the 5' splice site (donor site) and the 3' splice site (acceptor site) of introns from higher eucaryotes are shown in Figure 8-53. The RNA breaking and rejoining reactions must be carried out precisely because an error of even one nucleotide would shift the reading frame in the resulting mRNA molecule and make nonsense of its message.

Figure 8-53

Consensus sequences for RNA splicing in higher eucaryotes. The sequence given is that for the RNA chain; the nearly invariant GU and AG dinucleotides at either end of the intron sequence are highlighted in yellow (see also Figure 8-51).

The pathway by which the intron sequences are removed from primary RNA transcripts has been elucidated by in vitro studies in which a pure RNA species containing a single intron is prepared by incubating an appropriately designed DNA fragment with an RNA polymerase (see Figure7-36). When these RNA molecules are added to a cell extract, they become spliced in a two-step enzymatic reaction that requires prolonged incubation with ATP, the U1, U2, U5, and U4/U6 snRNPs, and a number of additional proteins; these components assemble into a large multicomponent ribonucleoprotein complex, or spliceosome. Characterization of the RNA species that appear as intermediates during the reaction, as well as the snRNPs required to produce them, led to the discovery that the intron is excised in the form of a lariat, according to the splicing pathway shown in Figures 8-54 and 8-55.

Figure 8-54

The RNA splicing mechanism. RNA splicing is catalyzed by a spliceosome formed from the assembly of U1, U2, U5, and U4/U6 snRNPs (shown as green circles) plus other components (not shown). After assembly of the spliceosome, the reaction occurs in two steps: (more...)

Figure 8-55

Structure of the branched RNA chain that forms during nuclear RNA splicing. The nucleotide shown in yellow is the A nucleotide highlighted in Figure 8-54. The branch is formed in step 1 of the splicing reaction illustrated there, when the 5' end of the (more...)

Individual roles have been defined for several of the snRNPs. The U1 snRNP, for example, binds to the 5' splice site, guided by a nucleotide sequence in the U1 RNA that forms base pairs complementary to the nine-nucleotide splice site consensus sequence (see Figure 8-53). Since RNA is capable of acting like an enzyme, either the RNA or the protein components of the spliceosome could be responsible for catalyzing the breakage and formation of covalent bonds required for RNA splicing.

Multiple Intron Sequences Are Usually Removed from Each RNA Transcript 36

Because the spliceosome seems mainly to work by recognizing the consensus sequences that mark the two boundaries of an intron sequence (and for all intron sequences these consensus sequences are alike), the 5' splice site (donor site) at the end of any one intron sequence can in principle be spliced to the 3' splice site (acceptor site) of any other intron sequence. Indeed, when an RNA molecule is created artificially, with donor and acceptor splice sites from different intron sequences inserted into it, the intervening RNA is often recognized by the spliceosome and removed.

In view of this result, it is surprising that vertebrate genes can contain as many as 50 introns (see Table 8-1, p. 340). If any two 5' and 3' splice sites were mispaired for splicing, some functional mRNA sequences would be lost, with disastrous consequences. Somehow such mistakes are avoided: the RNA processing machinery normally guarantees that each 5' splice site pairs only with the 3' splice site that is closest to it in the downstream (5'-to-3') direction of the linear RNA sequence ( Figure 8-56). How this sequential pairing of splice sites is accomplished is not known, although the assembly of the spliceosome while the RNA transcript is still growing (see Figure 8-52) is presumed to play a major part in ensuring an orderly pairing of the appropriate splice sites. There is also evidence that the exact three-dimensional conformations adopted by the intron and exon sequences in the RNA transcript are important. We shall see in Chapter 9, however, that this simple 5'-to-3' splicing can be altered by specialized control mechanisms that allow a single gene to produce several different mRNAs and hence several different proteins.

Figure 8-56

Splicing the primary RNA transcript from the chicken ovalbumin gene. The drawing shows the organized removal of seven introns required to obtain a functional mRNA molecule. The 5' splice sites (donor sites) are denoted by D, and 3' splice sites (acceptor (more...)

Studies of Thalassemia Reveal How RNA Splicing Can Allow New Proteins to Evolve 37

Recombinant DNA techniques have made humans with inherited diseases an increasingly important source of material for genetic studies of cellular mechanisms. In a group of human genetic diseases called the thalassemia syndromes, for example, patients have an abnormally low level of hemoglobin - the oxygen-carrying protein in red blood cells. The change in the DNA sequence has been determined for more than 50 such mutants, and a large proportion of them have been found to have alterations in the pattern of splicing of globin RNA transcripts. Thus single nucleotide changes have been detected that inactivate a splice site. Surprisingly, analysis of the mRNAs produced in these mutant individuals reveals that the loss of a splice site does not prevent splicing but instead causes its normal partner site to seek out and become joined to a new "cryptic" site nearby; often a number of alternative splices are made in these mutants, causing the mutant gene to produce a set of altered proteins rather than just one ( Figure 8-57). Other single nucleotide changes create new splice sites by changing a sequence in an intron or an exon into a consensus splice site. These results demonstrate that RNA splicing is a flexible process in higher eucaryotic cells, and they suggest that changes in the splicing pattern caused by random mutations could be an important pathway in the evolution of genes and organisms.

Figure 8-57

Abnormal processing of the β-globin primary RNA transcript in humans with β thalassemia. The site of each mutation is denoted by a _black arrowhead._The _dark blue boxes_represent the three normal exon sequences illustrated previously in Figure (more...)

Spliceosome-catalyzed RNA Splicing Probably Evolved from Self-splicing Mechanisms 38

When the lariat intermediate in nuclear RNA splicing was first discovered, it puzzled molecular biologists. Why was this bizarre pathway used rather than the apparently simpler alternative of bringing the 5' and 3' splice sites together in an initial step, followed by their direct cleavage and rejoining? The answer seems to lie in the way the spliceosome evolved.

As explained in Chapter 1, it is thought that early cells may have used RNA molecules rather than proteins as their major catalysts and stored their genetic information in RNA rather than DNA sequences. RNA-catalyzed splicing reactions presumably played important roles in these early cells, and some self-splicing RNA introns remain today - for example, in the nuclear rRNA genes of the ciliate Tetrahymena, in bacteriophage T4, and in some mitochondrial and chloroplast genes. A self-splicing intron sequence can be identified in a test tube by incubating a pure RNA molecule that contains the intron sequence and observing the splicing reaction; it can also be identified from the RNA sequence, inasmuch as large parts of the intron sequence need to be conserved in order to fold to create a catalytic surface in the RNA molecule. Two major classes of self-splicing intron sequences can be readily distinguished in this way. Group I intron sequences begin the splicing reaction by binding a G nucleotide to the intron sequence; the G is thereby activated to form the attacking group that will break the first of the phosphodiester bonds cleaved during splicing (the bond at the 5' splice site). In group II intron sequences a specially reactive A residue in the intron sequence is the attacking group, and a lariat intermediate is generated. Otherwise the reaction pathways for the two types of sequences are the same. Both are presumed to represent vestiges of very ancient mechanisms ( Figure 8-58).

Figure 8-58

The two known classes of self-splicing intron sequences. The group I intron sequences bind a free G nucleotide to a specific site to initiate splicing (see Figure3-21), while the group II intron sequences use a specially reactive A nucleotide in the intron (more...)

In the evolution of nuclear RNA splicing, the reaction pathway used by the group II self-splicing intron sequences seems to have been retained, with the catalytic role of the intron sequences being replaced by separate spliceosome components. Thus the small RNAs U1 and U2, for example, may well be remnants of catalytic RNA sequences that were originally present in intron sequences. Shifting the catalysis from intron sequence to spliceosome presumably lifted most of the constraints on the evolution of introns, allowing many new intron sequences to evolve.

The Transport of mRNAs to the Cytoplasm Is Delayed Until Splicing Is Complete 39

Finished mRNA molecules are thought to be recognized by receptor proteins in the nuclear pore complex and to be transported actively to the cytoplasm (discussed in Chapter 12). The major proteins of the hnRNP particles and various processing molecules bound to the RNA, by contrast, are largely confined to the nucleus, although some of them pass into the cytoplasm with the transported mRNA before being rapidly stripped from the RNA and returned to the nucleus ( Figure 8-59).

Figure 8-59

The transport of mRNA molecules through nuclear pores. (A) Schematic illustration of the change in the proteins bound to the RNA molecule as it moves out of the nucleus. (B) Electron micrograph of a large mRNA molecule produced in an insect salivary gland (more...)

Studies of mutant yeasts suggest that for RNAs that have splice sites transport out of the nucleus can occur only after the splicing reaction has been completed. When a mutation creates a defect in the splicing machinery so that splicing cannot occur, unspliced mRNA precursors remain in the nucleus, while those mRNAs that do not require splicing (which includes most of the mRNAs in this single-cell eucaryote) are transported normally to the cytosol. This observation suggests that RNAs may be retained in the nucleus by their bound spliceosome components, which seem to form numerous large aggregates throughout the nucleus of higher eucaryotes. These aggregates could serve as "splicing islands" ( Figure 8-60); although it is not known how they form or function, they may be analogous to the nucleolus, a much larger and more prominent structure in the nucleus, whose organization and function are better understood.

Figure 8-60

Possible splicing islands. This immunofluorescence micrograph shows the staining of a human fibroblast nucleus with a monoclonal antibody that detects the snRNP particles involved in nuclear splicing of mRNA precursor molecules. The snRNP particles are (more...)

The nucleolus is the site where ribosomal RNA (rRNA) molecules are processed from a larger precursor RNA and assembled into ribosomes by the binding of ribosomal proteins. Before discussing nucleolar structure, however, we need to consider how the precursor rRNA molecules are synthesized from rRNA genes.

Ribosomal RNAs (rRNAs) Are Transcribed from Tandemly Arranged Sets of Identical Genes 40

Many of the most abundant proteins of a differentiated cell, such as hemoglobin in the red blood cell and myoglobin in a muscle cell, are synthesized from genes that are present in only a single copy per haploid genome. These proteins are abundant because each of the many mRNA molecules transcribed from the gene can be translated into as many as 10 protein molecules per minute. This will normally produce more than 10,000 protein molecules per mRNA molecule in each cell generation. Such an amplification step is not available for the synthesis of the intrinsic RNA components of the ribosome, however, since they are the final gene products. Yet a growing higher eucaryotic cell must synthesize 10 million copies of each type of ribosomal RNA molecule in each cell generation in order to construct its 10 million ribosomes. Adequate quantities of ribosomal RNAs, in fact, can be produced only because the cell contains multiple copies of the rRNA genes that code for ribosomal RNAs.

Even E. coli needs seven copies of its rRNA genes to keep up with the cell's need for ribosomes. Human cells contain about 200 rRNA gene copies per haploid genome, spread out in small clusters on five different chromosomes, while cells of the frog Xenopus contain about 600 rRNA gene copies per haploid genome in a single cluster on one chromosome. In eucaryotes the multiple copies of the highly conserved rRNA genes on a given chromosome are located in a tandemly arranged series in which each gene (8000 to 13,000 nucleotide pairs long, depending on the organism) is separated from the next by a nontranscribed region known as _spacer DNA,_which can vary greatly in length and sequence. We shall see later that such multiple copies of tandemly arranged genes tend to co-evolve.

Because of their repeating arrangement, and because they are transcribed at a very high rate, the tandem arrays of rRNA genes can easily be seen in spread chromatin preparations. The RNA polymerase molecules and their associated transcripts are so densely packed (typically about 100 per gene) that the transcripts stick out perpendicularly from the DNA to give each transcription unit a "Christmas tree" appearance ( Figure 8-61). As noted earlier (see Figure 8-47), the tip of each of these "trees" represents the point on the DNA at which transcription begins and where the transcripts are thus shortest, while the other end of the rRNA transcription unit is sharply demarcated by the sudden disappearance of RNA polymerase molecules and their transcripts.

Figure 8-61

Transcription from tandemly arranged rRNA genes, as visualized in the electron microscope. The pattern of alternating transcribed gene and nontranscribed spacer is readily seen in the lower-magnification view in the upper panel. The large particles at (more...)

The rRNA genes are transcribed by RNA polymerase I, and each gene produces the same primary RNA transcript. In humans this RNA transcript, known as 45S rRNA, is about 13,000 nucleotides long. Before it leaves the nucleus in assembled ribosomal particles, the 45S rRNA is cleaved to give one copy each of the 28S rRNA (about 5000 nucleotides), the 18S rRNA (about 2000 nucleotides), and the 5.8S rRNA (about 160 nucleotides) of the final ribosome. The derivation of these three rRNAs from the same primary transcript ensures that they will be made in equal quantities. The remaining part of each primary transcript (about 6000 nucleotides) is degraded in the nucleus ( Figure 8-62). Some of these extra RNA sequences are thought to play a transient part in ribosome assembly, which begins immediately as specific proteins bind to the growing 45S rRNA transcripts in the nucleus.

Figure 8-62

The processing of a 45S rRNA precursor molecule into three separate ribosomal RNAs. Nearly half of the nucleotide sequences in the primary RNA transcript are degraded in the nucleus.

Another set of tandemly arranged genes with nontranscribed spacers codes for the 5S rRNA of the large ribosomal subunit (the only rRNA that is transcribed separately). The 5S rRNA genes are only about 120 nucleotide pairs in length, and like a number of other genes encoding small stable RNAs (most notably the transfer RNA [tRNA] genes), they are transcribed by RNA polymerase III. Humans have about 2000 5S rRNA genes tandemly arranged in a single cluster far from all the other rRNA genes. It is not known why this one type of rRNA is transcribed separately.

The Nucleolus Is a Ribosome-producing Machine 40

The continuous transcription of multiple gene copies ensures an adequate supply of the rRNAs, which are immediately packaged with ribosomal proteins to form ribosomes. The packaging occurs in the nucleus, in a large, distinct structure called the nucleolus. The nucleolus contains large loops of DNA emanating from several chromosomes, each of which contains a cluster of rRNA genes. Each such gene cluster is known as a nucleolar organizer region. Here the rRNA genes are transcribed at a rapid rate by RNA polymerase I. The beginning of the rRNA packaging process can be seen in electron micrographs of these genes: the 5' tail of each transcript is encased by a protein-rich granule (see Figure 8-61). These granules, which do not appear on other types of RNA transcripts, presumably reflect the first of the protein-RNA interactions that take place in the nucleolus.

The biosynthetic functions of the nucleolus can be traced by briefly labeling newly made RNA with3H-uridine. After varying intervals of further incubation without3H-uridine, a cell fractionation procedure can be used to break the rRNA genes free of their chromosomes, thereby allowing the radioactive nucleoli to be isolated in relatively pure form ( Figure 8-63). Such experiments show that the intact 45S transcript is first packaged into a large complex containing many different proteins imported from the cytoplasm, where all proteins are synthesized. Most of the 80 different polypeptide chains that will make up the ribosome, as well as the 5S rRNAs, are incorporated at this stage. Other molecules are needed to process the 45S rRNA and to guide the assembly process. Thus the nucleolus also contains other RNA-binding proteins and certain small ribonucleoprotein particles (including U3 snRNP) that are believed to help catalyze the construction of ribosomes. These components remain in the nucleolus when the ribo-somal subunits are exported to the cytoplasm in finished form. An especially notable component is nucleolin, an abundant, well-characterized RNA-binding protein that seems to coat only ribosomal transcripts; this protein stains with silver in the characteristic manner of the nucleolus itself.

Figure 8-63

The nucleolus. This highly schematic view of a nucleolus in a human cell shows the contributions of loops of chromatin containing rRNA genes from 10 separate chromosomes. Purified nucleoli are very useful for biochemical studies of nucleolar function; (more...)

As the 45S rRNA molecule is processed, it gradually loses some of its RNA and protein and then splits to form separate precursors of the large and small ribosomal subunits ( Figure 8-64). Within 30 minutes of radioactive pulse labeling, the first mature small ribosomal subunits, containing their 18S rRNA, emerge from the nucleolus and appear in the cytoplasm. Assembly of the mature large ribosomal subunit, with its 28S, 5.8S, and 5S rRNAs, takes about an hour to complete. The nucleolus therefore contains many more incomplete large ribosomal subunits than small ones.

Figure 8-64

The function of the nucleolus in ribosome synthesis. The 45S rRNA transcript is packaged in a large ribonucleoprotein particle containing many ribosomal proteins imported from the cytoplasm. While this particle remains in the nucleolus, selected pieces (more...)

The last steps in ribosome maturation occur only as these subunits are transferred to the cytoplasm. This delay prevents functional ribosomes from gaining access to the incompletely processed hnRNA molecules in the nucleus.

The Nucleolus Is a Highly Organized Subcompartment of the Nucleus 41

As seen in the light microscope, the large spheroidal nucleolus is the most obvious structure in the nucleus of a nonmitotic cell. Consequently, it was so closely scrutinized by early cytologists that an 1898 review could list some 700 references. By the 1940s cytologists had demonstrated that the nucleolus contains high concentrations of RNA and proteins, but its major function in ribosomal RNA synthesis and ribosome assembly was not discovered until the 1960s.

Some of the details of nucleolar organization can be seen in the electron microscope. Unlike the cytoplasmic organelles, the nucleolus is not bounded by a membrane; instead, it seems to be constructed by the specific binding of unfinished ribosome precursors to one another to form a large network. In a typical electron micrograph three partially segregated regions can be distinguished ( Figure 8-65): (1) a pale-staining fibrillar center, which contains DNA that is not being actively transcribed; (2) a dense fibrillar component, which contains RNA molecules in the process of being synthesized; and (3) a granular component, which contains maturing ribosomal precursor particles.

Figure 8-65

Electron micrograph of a thin section of a nucleolus in a human fibroblast, showing its three distinct zones. (A) View of entire nucleus. (B) High-power view of the nucleolus. (Courtesy of E.G. Jordan and J. McGovern.)

The size of the nucleolus reflects its activity. Its size therefore varies greatly in different cells and can change in a single cell. It is very small in some dormant plant cells, for example, but can occupy up to 25% of the total nuclear volume in cells that are making unusually large amounts of protein. The differences in size are due largely to differences in the amount of the granular component, which is probably controlled at the level of ribosomal gene transcription: electron microscopy of spread chromatin shows that both the fraction of activated ribosomal genes and the rate at which each gene is transcribed can vary according to circumstances.

The Nucleolus Is Reassembled on Specific Chromosomes After Each Mitosis 42

The appearance of the nucleolus changes dramatically during the cell-division cycle. As the cell approaches mitosis, the nucleolus first decreases in size and then disappears as the chromosomes condense and all RNA synthesis stops, so that generally there is no nucleolus in a metaphase cell. When ribosomal RNA synthesis restarts at the end of mitosis (in telophase), tiny nucleoli reappear at the chromosomal locations of the ribosomal RNA genes ( Figure 8-66).

Figure 8-66

Changes in the appearance of the nucleolus in a human cell during the cell cycle. Only the cell nucleus is represented in this diagram. In most eucaryotic cells the nuclear membrane breaks down during mitosis, as indicated by the dashed circles.

In humans the ribosomal RNA genes are located near the tips of each of 5 different chromosomes, as shown previously in Figure 8-32 (that is, on 10 of the 46 chromosomes in a diploid cell). Correspondingly, 10 small nucleoli form after mitosis in a human cell, although they are rarely seen as separate entities because they quickly grow and fuse to form the single large nucleolus typical of many interphase cells ( Figure 8-67).

Figure 8-67

Nuclear fusion. These light micrographs of human fibroblasts grown in culture show various stages of nucleolar fusion. (Courtesy of E.G. Jordan and J. McGovern.)

What happens to the RNA and protein components of the disassembled nucleolus during mitosis? It seems that at least some of them become distributed over the surface of all of the metaphase chromosomes and are carried as cargo to each of the two daughter cell nuclei. As the chromosomes decondense at telophase, these "old" nucleolar components help reestablish the newly emerging nucleoli.

Individual Chromosomes Occupy Discrete Territories in the Nucleus During Interphase 43

As we have just seen, specific genes from separate interphase chromosomes are brought together at a single site in the nucleus when the nucleolus forms. Are other parts of chromosomes also nonrandomly ordered in the nucleus? First raised by biologists in the late nineteenth century, this fundamental question still has not been answered satisfactorily.

A certain degree of chromosomal order results from the configuration that the chromosomes always have at the end of mitosis. Just before a cell divides, the condensed chromosomes are pulled to each spindle pole by microtubules attached to the centromeres; thus, as the chromosomes move, the centromeres lead the way and the distal arms (terminating in telomeres) lag behind. The chromosomes in many nuclei tend to retain this so-called Rabl orientation throughout interphase, with their centromeres facing one pole of the nucleus and their telomeres pointing toward the opposite pole ( Figure 8-68A). In some cases the nucleus is specifically oriented in the cell: in the early Drosophila embryo, for example, all the centromeres face apically ( Figure 8-68B). Such fixed nuclear orientations might have important effects on cell polarity, but it is difficult to design experiments to test this possibility.

Figure 8-68

The polarized orientation of chromosomes in interphase cells of the early Drosophila embryo. (A) Diagrams of the Rabl orientation, with all centromeres facing one nuclear pole and all telomeres pointing toward the opposite pole. In the embryo each nucleus (more...)

In most cells the various chromosomes are indistinguishable from one another during interphase. Consequently, it is difficult to assess their arrangement in more detail than just described. The giant interphase chromosomes of the polytene cells of Drosophila larvae, however, are an exception. Here the individual chromosome bands can be resolved clearly enough to determine the precise positions of specific genes in intact nuclei by microscopic optical-sectioning and reconstruction techniques. The results of such analyses suggest that the interphase chromosome set is not highly ordered: although the Rabl orientation tends to be maintained, two apparently identical cells often have different chromosomes as nearest neighbors.

These analyses of polytene chromosomes have also indicated that each chromosome occupies its own territory in the interphase nucleus - that is, the individual chromosomes are not extensively intertwined ( Figure 8-69). Other experiments have shown that nonpolytene chromosomes also tend to occupy discrete domains in interphase nuclei. In situ hybridization experiments with an appropriate DNA probe, for example, can outline a single chromosome in hybrid mammalian cells grown in culture ( Figure 8-70). Most of the DNA of such a chromosome is seen to occupy only a small portion of the interphase nucleus, suggesting that each individual chromosome remains compact and organized while allowing selected portions of its DNA to be active in RNA synthesis.

Figure 8-69

A stereo pair that displays the three-dimensional arrangement of the polytene chromosomes in a single nucleus of a Drosophila larval gland cell. The large ball is the nucleolus, and the course of each chromosome arm is represented by a line running along (more...)

Figure 8-70

Selective labeling of a single chromosome in a cultured mammalian cell nucleus during interphase. In (A) and (B), an interphase nucleus freed from its cytoplasm is shown at the right, with scattered mitotic chromosomes released from a second cell on the (more...)

How Well Ordered Is the Nucleus? 44

The interior of the nucleus is not a random jumble of its many RNA, DNA, and protein components. We have seen that the nucleolus is organized as an efficient ribosome-construction machine, and clusters of spliceosome components are organized, possibly as discrete RNA-splicing islands (see Figure 8-60). Order is also seen in the electron microscope when one focuses on the regions around nuclear pores: the chromatin that lines the inner nuclear membrane (which is unusually condensed chromatin and therefore clearly visible in electron micrographs) is excluded from a considerable region beneath and around each nuclear pore, clearing a path between the cytoplasm and the nucleoplasm ( Figure 8-71). In some special cases, moreover, the nuclear pores are found to be highly organized in the nuclear envelope ( Figure 8-72), presumably reflecting a corresponding organization of the nuclear lamina to which the pores are attached.

$Figure 8-72. Freeze-fracture electron micrograph of the elongated nuclear envelope of a fern spore.$

Figure 8-72

Freeze-fracture electron micrograph of the elongated nuclear envelope of a fern spore. Note the ordered arrangement of the nuclear pore complexes in parallel rows. In other cells either concentrated clusters of nuclear pores or unusual areas free of nuclear (more...)

Is there an intranuclear framework, analogous to the cytoskeleton, on which nuclear components are organized? Many cell biologists believe there is. The nuclear matrix, or scaffold, has been defined as the insoluble material left in the nucleus after a series of biochemical extraction steps. Some of the proteins that constitute it can be shown to bind specific DNA sequences called SARs or MARs (for scaffold- or matrix-associated regions). Such DNA sequences have been postulated to form the base of chromosomal loops (see Figure 8-18). By means of such chromosomal attachment sites, the matrix might help organize chromosomes, localize genes, and regulate DNA transcription and replication within the nucleus. Because the structural components of the matrix have not yet been identified, however, it remains uncertain whether the matrix isolated by cell biologists represents a structure that is present in intact cells.

Summary

RNA polymerase, the enzyme that catalyzes DNA transcription, is a complex molecule containing many polypeptide chains. In eucaryotic cells there are three RNA polymerases, designated polymerases I, II, and III; they are evolutionarily related to one another and to bacterial RNA polymerase, and they have some subunits in common. After initiating transcription, each enzyme is thought to release one or more subunits and to bind other subunits that are required for RNA chain elongation and termination.

Most of the cell's mRNA is produced by a complex process beginning with the synthesis of heterogeneous nuclear RNA (hnRNA). The primary hnRNA transcript is made by RNA polymerase II. It is then capped by the addition of a special nucleotide to its 5' end and is cleaved and then polyadenylated at its 3' end. The modified RNA molecules are usually then subjected to one or more RNA splicing events, in which intron sequences are removed from the middle of the RNA molecule by a reaction catalyzed by a large ribonucleoprotein complex known as a spliceosome. In this process most of the mass of the primary RNA transcript is removed and degraded in the nucleus. As a result, although the rate of production of hnRNA typically accounts for about half of a cell's RNA synthesis, the mRNA produced represents only about 3% of the steady-state quantity of RNA in a cell.

Unlike genes that code for proteins, which are transcribed by polymerase II, the genes that code for most structural RNAs are transcribed by polymerase I and III. These genes are usually repeated many times in the genome and are often clustered in tandem arrays. RNA polymerase III makes a variety of small stable RNAs, including the tRNAs and the small 5S rRNA of the ribosome. RNA polymerase I makes the large rRNA precursor molecule (45S rRNA) containing the major rRNAs. Except for the ribosomes in mitochondria and chloroplasts, all the cell's ribosomes are assembled in the nucleolus - a distinct intranuclear organelle that is formed around the tandemly arranged rRNA genes, which are brought together from several chromosomes.