Modular regulatory principles of large non–coding RNAs (original) (raw)

Nature. Author manuscript; available in PMC 2014 Oct 14.

Published in final edited form as:

PMCID: PMC4197003

NIHMSID: NIHMS631736

Mitchell Guttman

1Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA

2Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

John L. Rinn

1Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA

3Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA

1Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA

2Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

3Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA

Abstract

It is clear that RNA has a diverse set of functions and is more than just a messenger between gene and protein. The mammalian genome is extensively transcribed, giving rise to thousands of non–coding transcripts. Whether all of these transcripts are functional is debated, but it is evident that there are many functional large non–coding RNAs (ncRNAs). Recent studies have begun to explore the functional diversity and mechanistic role of these large ncRNAs. Here we synthesize these studies to provide an emerging model whereby large ncRNAs might achieve regulatory specificity through modularity, assembling diverse combinations of proteins and possibly RNA and DNA interactions.

More than half a century after being placed as the central component in the flow of genetic information from gene to protein, it is now accepted that RNA can perform diverse roles. Shortly after the discovery of messenger RNA, a large class of heteronuclear RNAs (hnRNAs)1 was described, which did not include mRNA or associate with polyribosomes2. Following years of sifting through these hnRNAs, the first RNA subfamilies were identified. These included small nuclear RNAs involved in splicing regulation3 and small nucleolar RNAs involved in ribosome biogenesis4, as well as the ribosomal RNAs and transfer RNAs involved in translation5,6.

The world of RNA genes became even more complex with the discovery of RNAs that resembled mRNA in length and splicing structure but did not code for proteins. The first example was H19, which was identified as an RNA that was induced during liver development in the mouse7. The mouse H19 transcript contained no large open reading frames (ORFs), but instead only small sporadic ORFs that were not evolutionarily conserved, did not template translation in vivo and did not produce an identifiable protein product8. Shortly afterwards, another non-coding RNA (ncRNA), termed XIST, was found to be expressed exclusively from the inactive X chromosome9 and later demonstrated to be required for X inactivation in mammals10. Over the next two decades, more large ncRNA genes were discovered including Airn11, Tug1 (ref. 12), NRON13 and HOTAIR14. With the availability of a draft sequence of the human genome, it became clear that much of the mammalian genome is transcribed1518. These transcripts were mapped to discrete loci throughout the genome. Over the next 10 years, both large and small RNA transcripts were discovered at an unprecedented rate15,1720; however, the functional significance of most of these transcripts was unclear. Although some of these could be considered noise21,22, there are still many large ncRNAs that are known to have diverse functions2329.

This Review focuses on the classic examples of large ncRNAs that have helped to form the basis of more recent global studies of coding potential, function and mechanism. We discuss the concepts that have emerged from these examples that provide a framework for understanding the principles of RNA interactions. We propose that by assembling distinct regulatory components, large ncRNAs could produce intricate functional specificity, which is suggestive of a possible modular RNA code.

RNA maps

After the sequencing of the human genome, the next major hurdle was to define the genes it encoded. To do this, several research groups developed tiling microarrays17,19,20 and complementary DNA sequencing methods15 to investigate transcriptional activity across the human genome, which led to the observation of widespread transcription of the genome. These studies, although limited to specific tissues and cell types, demonstrated that the mammalian genome encodes many thousands of non-coding transcripts including both short (<200 nucleotides in length) and long (>200 nucleotides in length) transcripts. In this Review, we focus on large ncRNAs produced from long transcripts, including those that originate from intergenic loci or overlapping protein-coding genes.

Dramatic innovations in sequencing technologies have allowed the deep sequencing of cDNAs, known as RNA-Seq30; this deep sequencing, coupled with new computational methods for assembling the transcriptome31, has identified non-coding transcripts across many different cell types and tissues31,32. It is now clear that there are thousands of well-expressed large ncRNAs with exquisite cell-type and tissue specificity3133.

As the numbers of identified non-coding transcripts increased, so did the uncertainty regarding their function; this led some authors to express concern that many of these transcripts may be just transcriptional noise21,22 with no function or incidental by-products of transcription from enhancer regions34,35. These concerns are supported by the observations that many of these transcripts are expressed at extremely low levels32,36 and they have lower levels of evolutionary conservation than protein-coding genes25,31,37. Although some of these transcripts may indeed be transcriptional noise21, the remaining transcripts consist of many distinct subclasses, including processed small RNAs18,29,38, promoter-associated RNAs18,39, transcripts from enhancer regions34,35 and functional large ncRNAs14,23; each class varies in its expression and conservation properties31,37. Distinguishing between these classes of RNA transcripts requires additional biological information including the coding potential of the RNA and the chromatin modifications of the corresponding genomic region (Fig. 1a).

An external file that holds a picture, illustration, etc. Object name is nihms631736f1.jpg

Layering of genomic regions

a, Genomic regions are colour-coded by the presence of different genomic annotations. RNA transcription of a locus (grey), K4–K36 chromatin signature (red), K4me1 modification and transcriptional activator p300 (green) and protein-coding potential (blue). By overlaying this information, distinct transcripts are revealed, including ncRNAs (red), protein-coding genes (purple) and transcripts from enhancer regions (green). b, A cross-species alignment of a coding and a non-coding gene. Boxes represent codons, and each row represents a different aligned species. Blue boxes represent mutations that cause a synonymous substitution, and red boxes represent mutations that cause a non-synonymous substitution. A score capturing the coding potential of a sequence across species aligns sequences in all frames and scores mutations that maintain coding potential (blue boxes) relative to mutations that break coding potential (that is, non-synonymous mutations, stop codons and frameshifting insertions or deletions) (red boxes). c, The coding potential score is shown for three gene types, SIRT1 (a protein-coding gene), XIST (ncRNA gene) and tarsal-less (small-peptide coding gene), in which positive scores represent coding regions (blue) and negative scores represent non-coding regions (red). In each example, the gene structure is shown, where blue boxes represent known protein-coding exons and red boxes represent non-coding exons. SIRT1 with an ORF length of 576 amino acids (aa) contains a positive score over each coding exon but not the non-coding regions. XIST with an ORF length of 172 amino acids contains negative scores over the entire transcribed region. tarsal-less with an ORF of 11 and 32 amino acids, contains positive scores over all known small peptides.

Chromatin signatures

Genomic DNA is wrapped around histone proteins and packaged into higher-order structures termed chromatin40. These histones can be modified in different ways that are indicative of the underlying DNA functional state. Advances in sequencing technologies have allowed the comprehensive characterization of the chromatin-modification landscape of mammalian genomes4144. These studies revealed combinations of histone modifications (termed chromatin signatures) that correspond to various gene properties, including a signature for active transcription41,44. This signature consists of a short stretch of trimethylation of histone protein H3 at the lysine in position 4 (H3K4me3), which corresponds to promoter regions, followed by a longer stretch of trimethylation of histone H3 at the lysine in position 36 (H3K36me3), which covers the entire transcribed region41,44 (Fig. 1a).

Chromatin maps revealed that, similar to protein-coding genes, many ncRNA genes also contain a ‘K4–K36’ signature44. By searching for K4–K36 domains that do not overlap with known genes, chromatin signatures revealed approximately 1,600 regions in the mouse genome and approximately 2,500 regions in the human genome that were actively transcribed25,45. The vast majority of these intergenic K4–K36 domains produce multi-exonic RNAs that have little capability to encode a conserved protein25,31. RNAs expressed from these K4–K36 domains were termed large intergenic ncRNAs (lincRNAs) because identification by this chromatin signature required the RNAs to be contained within the intergenic regions25. Similarly, chromatin-state maps revealed that active enhancer regions contained short stretches of H3 lysine 4 monomethylation (H3K4me1) (ref. 43) and the transcriptional coactivator p300 (ref. 42), as well as additional modifications46 (Fig. 1a). By coupling RNA sequencing and chromatin maps, many of the already identified non-coding transcripts were observed to be transcribed from active enhancers34,35. However, lincRNAs and transcripts from enhancer regions are distinct classes, which are marked by different chromatin signatures25,34. Although it needs to be determined whether transcripts originating from enhancers have a function34,35, the functional importance of lincRNAs is becoming clearer14,23,24,26,28,47. Several of these lincRNAs have been shown to have enhancer-like functions as they activate the expression of neighbouring genes24,28.

Coding potential

Determining whether a transcript is non-coding is challenging because a long non-coding transcript is likely to contain an ORF purely by chance48. Accordingly, the evidence for the absence of coding potential for the XIST and H19 genes came from the lack of evolutionary conservation of the identified ORFs, the lack of homology to known protein domains and the inability to template significant protein production8,49. These principles have been generalized to classify coding potential across thousands of transcripts by scoring conserved ORFs across dozens of species50,51, by searching for homology in large protein-domain databases52, and by sequencing RNA associated with polyribosomes53.

Computational methods such as the ‘codon substitution frequency’ algorithm50,51 leverage evolutionary information to determine whether an ORF is conserved across species and provide a general strategy for determining coding potential (Fig. 1b, c). Owing to the large number of available genome sequences, these methods have been used to accurately determine conserved coding potential in regions as small as 5 amino acids25, which makes them extremely sensitive to the potentially small peptides, such as the 11 amino acid peptide encoded by the tarsal-less gene54,55 (Fig. 1c). Despite their sensitivity, conservation-based methods may fail to detect newly evolved proteins because they do not contain a conserved ORF50,51. However, because many ncRNAs show clear evolutionary constraint25,31,37 but no evolutionarily conserved ORF, this indicates that the observed evolutionary selection is not due to a newly evolved protein.

Experimental methods, such as ribosome profiling, have provided a strategy for identifying ribosome occupancy on RNA, which have been proposed as a method for distinguishing between coding and non-coding transcripts53. However, this still needs to be tested because non-coding transcripts that show an association with the ribosome have not been shown to have a protein product53,56. Importantly, an association of RNA with a ribosome alone cannot be taken as evidence of protein-coding potential because both the ncRNAs of H19 and TUG1 can be detected in the ribosome53,57 despite having clear roles as ncRNAs 8,45,58,59.

An alternative explanation for these observed associations is ‘translational noise’, spurious association that may lead to non-functional translation products22. Consistent with this, virtually all of the transcripts that have been suggested to encode small peptides by ribosome profiling53 lack the evolutionary conservation of their proposed coding regions25,31, which is in striking contrast to almost all known protein-coding genes60, including the few well-characterized functional small peptides56,61,62 (Fig. 1c). Accordingly, identification of any new protein-coding gene requires the clear demonstration of the function of the protein product in vivo54,55.

Global identification of ncRNA function

Identifying the functional role of an ncRNA requires direct perturbation experiments, such as loss-of-function and gain-of-function. Individual ncRNAs involved in specific processes have been functionally characterized (see ref. 63 for a review). For example, XIST is crucial for random inactivation of the X chromosome10; Air is crucial for imprinting control at the Igf2r locus11; HOTAIR affects expression of the HOXD gene family14, as well as other genes throughout the genome45,64,65; HOTTIP affects expression of the HOXA gene family28; lincRNA-RoR affects reprogramming efficiency47; NRON affects NFAT transcription factor activity13; and Tug1 affects retina development through the regulation of the cell cycle12. Although there are now many examples of large ncRNAs that are required for the correct regulation of gene expression, this is just one of many functions in which they are involved; ranging from telomere replication66 to translation67.

The global characterization of ncRNA function has proved to be challenging because, in most cases, it is unclear which phenotype to investigate13. One approach to classifying the putative function of ncRNAs uses ‘guilt-by-association’25. This approach associates ncRNAs with biological processes based on a common expression pattern across cell types and tissues (Fig. 2a) and can therefore identify groups of ncRNAs that are associated with specific cellular processes (Fig. 2b). This approach has been used to predict roles for hundreds of ncRNAs in diverse biological processes such as stem cell pluripotency, immune responses, neural processes and cell-cycle regulation25,27,36.

An external file that holds a picture, illustration, etc. Object name is nihms631736f2.jpg

Classification of ncRNA function

a, Illustration of an ncRNA with expression patterns related to the NFκB pathway. Each row represents a gene, and a positive association (red box) is assigned between the ncRNA and the pathway based on the correlation of the genes in the process. Similarly, the ncRNA is assigned negative association (blue box) with the p53 pathway based on anticorrelation with the genes in the process. b, The scores for each functional term and ncRNA can be clustered to identify classes of ncRNAs. In this example (adapted, with permission, from ref. 25) each column represents a different ncRNA, and each row represents a different functional term. c, A model of ncRNAs that have a _cis_-function by remaining tethered to their site of transcription. In this model, RNA polymerase (green) transcribes an RNA (red), which can associate with regulatory proteins (purple) to affect neighbouring regions, as proposed for XIST9,71. d, One model for ncRNA _trans_-regulation. In this model an ncRNA can associate with DNA-binding proteins (blue) and regulatory proteins to localize and affect the expression of the targets, as proposed for HOTAIR64. e, A model for ncRNAs that bind regulatory proteins and change their activity, in this case leading to a change in modification state and expression of the target gene, as proposed for the CCND1 ncRNAs, which interact with the TLS protein89. f, A model for ncRNAs that act as ‘decoys’. In this model, ncRNAs bind protein complexes and prevent them from binding to their proper regulatory targets, as proposed for GAS5 and PANDA27.

Although these correlations cannot prove that ncRNAs have a function in these processes, they do provide a hypothesis for targeted loss-of-function experiments. For example, lincRNA-p21 was predicted to be associated with the p53-mediated DNA damage response25, and indeed lincRNA-p21 was found to be a target of p53 and on perturbation was shown to regulate apoptosis in response to DNA damage26. In the same way, the ncRNA PANDA (p21 associated ncRNA DNA damage activated) was implicated, and was demonstrated to have a function, in the regulation of apoptosis27. Another ncRNA, lincEnc1 (ref. 25), was predicted to have a role in cell-cycle regulation in embryonic stem (ES) cells and has been shown in a separate study to affect the proliferation of ES cells68.

Alternatively, global approaches can be used to determine function, such as systematic RNA interference (RNAi) knockdown followed by gene-expression profiling. Unlike correlation analysis, these perturbation-based experiments provide evidence for the function of an ncRNA23. Methods to classify function using this approach are conceptually similar to guilt-by-association because the function can be inferred on the basis of the genes that are affected by loss of function of ncRNAs23. A systematic perturbation study demonstrated that knockdown of the vast majority of lincRNAs expressed in ES cells had a major effect on gene expression23. The gene-expression signatures revealed dozens of lincRNAs that block key lineage-commitment programs within ES cells and function in crucial ES cell regulatory and signalling pathways. Importantly, this study also identified 26 lincRNAs that are required to maintain the pluripotent state23.

Not all non-coding transcripts are functional RNA molecules. Several examples of intergenic transcription have been identified in which the process of transcription alone changes the chromatin- and transcription-factor-binding landscape to allow activation and repression of neighbouring genes69,70. Methods that degrade RNA after its transcription, such as RNAi, can distinguish between a functional RNA molecule and the process of transcription, on which there should be no observable effect after RNA degradation. Collectively, the genome-wide guilt-by-association approach and targeted and global perturbation studies have demonstrated that large ncRNAs have a crucial regulatory role in diverse biological processes23,2527,32,47.

_cis_- versus _trans_-regulatory mechanisms

The discovery that the XIST product was an ncRNA, led immediately to the suggestion of a model for how it could function in an allele-specific manner9. In theory, an ncRNA has an intrinsic _cis_-regulatory capacity because it can function while remaining tethered to its own locus9,71 (Fig. 2c), whereas an mRNA must be dissociated, exported and translated for it to function. Here we define a _cis_-regulator as one that exerts its function on a neighbouring gene on the same allele from which it is transcribed, and define a _trans_-regulator as one that does not meet this criterion. Owing to the unique _cis_-regulatory capability of ncRNAs, it has been speculated that _cis_-regulation could be a common mechanism for large ncRNAs24,71. However, global functional evidence strongly suggests that this is not the case (Box 1).

Box 1

Distinguishing _cis_- from _trans_-regulation

If an ncRNA is a _cis_-regulator, then several observations will be true: (i) the gene-expression levels of a neighbouring gene will be correlated with the RNA expression across all conditions; (ii) loss-of-function of the RNA would affect expression of a neighbouring gene, and (iii) the ncRNA would affect expression of a neighbouring gene on the same allele that it is expressed from. The absence of any of these criteria supports _trans_-regulation. We illustrate this point using five common regulatory models. The figure shows what would be observed using specific computational and experimental methods for each regulatory model. The boxes with a tick indicate observed effects on neighbouring genes for each method, and boxes with a cross indicate no observed effect on neighbouring genes. Known ncRNA examples of each of these regulatory models are shown to the right of the figure.

An external file that holds a picture, illustration, etc.
Object name is nihms631736u1.jpg

To distinguish _cis_- from _trans_-regulatory models, initial studies have used correlation analysis and identified a significant correlation of expression between ncRNAs and their neighbouring protein-coding genes21,72. However, several of these cases have been demonstrated to be _trans_-regulatory models, and the apparent correlations are due to shared upstream regulation (such as, lincRNA-p21 (ref. 26) and lincRNA-Sox2 (ref. 25)), positional correlation (such as, HOTAIR14), transcriptional ‘ripple effects’21 and indirect regulation of neighbouring genes (Box 1). Consistent with these explanations, a recent study showed that an increased correlation of expression between ncRNAs and their neighbouring genes is comparable to that observed for protein-coding genes32.

Recently, loss-of-function experiments have been used to investigate _cis_- versus _trans_-effects of lincRNAs. One study knocked down seven lincRNAs and identified no effects on neighbouring genes but did show an effect on other genes45. A second study knocked down 12 lincRNAs, 7 of which had modest effects on some of the genes within a wide genomic neighbourhood24. More recently, a systematic study knocked down approximately 150 lincRNAs and identified no effect on the neighbouring genes for about 95% of the lincRNAs, which is similar to that observed for protein-coding genes23.

Although perturbation experiments can demonstrate that an RNA functions as a _trans_-regulator, evidence for RNA acting as a _cis_-regulator is more difficult to obtain (Box 1). For example, perturbation experiments demonstrated that the ncRNA from JPX affects the expression of the neighbouring XIST gene, but as a _trans_-regulator73. Conclusive proof of _cis_-regulation requires the demonstration that an RNA regulates a neighbouring gene on the same allele (Box 1). So far, few studies have performed this test, and it is unclear what percentage of ncRNAs that are suggested to have a _cis_-function by loss-of-function experiments24,28 will pass this test. Together, these studies indicate that although some ncRNAs are _cis_-regulators9,11,7476, the vast majority, which have been identified and characterized so far, function as _trans_-regulators14,23,26,45,73,77.

Formation of RNA–protein interactions

The precise mechanism by which ncRNAs function remains poorly understood. However, one emerging theme is the interaction between ncRNAs and protein complexes. The functional importance of many ncRNA–protein interactions for correct transcriptional regulation has been demonstrated14,23,45,7881, including several ncRNAs that are required for the correct localization of chromatin proteins to genomic DNA targets7983.

The XIST ncRNA is a key example demonstrating that RNA can play a direct role in silencing large genomic regions81 by physically interacting with the polycomb complex84, leading to the condensation of chromatin and transcriptional repression of an entire X chromosome85 (Fig. 2c). Similar to XIST, many ncRNAs have been identified that physically associate with chromatin-regulatory complexes and ‘guide’ the associated complexes to specific genomic DNA regions, including HOTAIR14, AIR86, KCNQ1ot1 (ref. 75) and lincRNA-p21 (ref. 26) (Fig. 2d).

Biochemical evidence has demonstrated that many large ncRNAs interact with chromatin regulators23,45,87,88. The precise numbers vary depending on the experimental approach45,87, but a conservative estimate suggests that at least 30% of lincRNAs associate with at least 1 of 12 distinct chromatin-regulatory complexes, which include readers, writers and erasers of chromatin modifications23.

Importantly, lincRNAs can provide regulatory specificity to these complexes because the knockdown of these lincRNAs affects a subset of the genes that are normally regulated by these complexes23,45. One hypothesis is that ncRNAs provide regulatory specificity by localizing chromatin-regulatory complexes to genomic DNA targets14,26,28,45,78,86. Several methods have been developed to generate maps of RNA–DNA proximity82,83, but it still needs to be determined what percentage of ncRNAs localize to genomic DNA regions and how these interactions occur.

In addition to their role in chromatin regulation, ncRNAs can also modulate the regulatory activity of protein complexes (Fig. 2e). As an example, an ncRNA upstream of cyclin D1 can bind to the TLS (translocation in liposarcoma) RNA-binding protein, which changes it from an inactive to an active state89. Similarly, the NRON ncRNA can bind to the NFAT (nuclear factor of activated T cells)-transcription factor rendering it inactive because it prevents nuclear accumulation13. ncRNAs can also function as molecular ‘decoys’ by preventing correct regulation through competitive binding (Fig. 2f). For example, the GAS5 ncRNA binds to the glucocorticoid receptor and prevents the receptor from binding to its correct regulatory elements90, and the PANDA ncRNA can prevent NF-Y localization, which leads to apoptosis27. Similarly, several studies have shown that ncRNAs can function as decoys to other RNA species, such as miRNAs, to control miRNA levels91,92.

Large ncRNAs as molecular scaffolds of proteins

One emerging theme common to many large ncRNAs is the formation of multiple distinct RNA–protein interactions that are used to carry out their function (Fig. 3). The first indication of this phenomenon came from the discovery of telomerase93. Telomerase activity requires a telomerase RNA component (TERC)94, which serves as a template for telomeric regulation and as a molecular scaffold for the polymerase enzyme around the RNA95 (Fig. 3b). Importantly, genetic studies demonstrated that TERC plays a modular functional role, as genetically swapping particular domains of TERC retained the overall function66. This indicated that TERC was made up of discrete functional modules to bring multiple proteins into the proximity of a protein66.

An external file that holds a picture, illustration, etc. Object name is nihms631736f3.jpg

Modular principles of large ncRNA genes

a, The four principles of nucleic acid and protein interactions. (1) RNA–protein interactions, (2) DNA–RNA hybridization-based interactions, (3) DNA–protein interactions and (4) RNA–RNA hybridization based interactions. b, Each of these principles can be combined to build distinct complexes. For example, combining RNA– protein and RNA–DNA interactions can localize a protein complex to a specific DNA sequence in an RNA-dependent manner; as has been implicated for the DHFR99 promoter and localization of DNMT3b98. Combining RNA–protein and protein–DNA principles can also localize a diverse set of proteins, which have a molecular scaffold created by RNA, to a specific DNA sequence in a protein-dependent manner. The ribosome is a multifaceted combination of RNA–protein interactions that facilitate correct RNA–RNA interactions for the ribozyme activity of the ribosome. The telomere replication activity of telomerase is an example of combining RNA–protein, RNA–DNA and protein–DNA interactions.

More recently, HOTAIR was shown to contain distinct protein-interaction domains that can associate with polycomb repressive complex 2 (PRC2) (ref. 14) and the CoREST–LSD1 complex64, which together are required for correct function (Fig. 3b). XIST also has discrete functional domains. Through a series of genetic deletions XIST was shown to contain at least two discrete domains that are responsible for silencing (RepA) and localization (RepC)81 (Fig. 3b). These functional domains could be independently deleted without affecting the role of the other domain, which suggests the modular nature of the XIST ncRNA81. These functional domains of XIST also interact with discrete proteins; the silencing domain (RepA) binds to PRC2 and the localization domain (RepC) binds to YY1 (ref. 96) and hnRNPU97. These examples show that large ncRNAs can function as molecular scaffolds of protein complexes. Importantly, this phenomenon is likely to be a general one because approximately 30% of ES cell lincRNAs associate with multiple regulatory complexes23.

In addition to interacting with multiple proteins, in several examples, ncRNAs have been shown to interact directly with both DNA and RNA. ncRNAs for example form triplex structures with DNA98,99 (Fig. 3a) such as a ncRNA that binds to the ribosomal DNA promoter and interacts with the DNMT3b protein to silence expression98. Furthermore, RNA can form traditional duplex base-pairing interactions with DNA, a property that has long been speculated for large ncRNAs71. Finally, RNA can form base-pair interactions with RNA (Fig. 3a), which are crucial for processes such as tRNA–mRNA anticodon recognition5, ribonuclease P recognition of pre-tRNAs5, miRNA targeting100, ribosome structure as a ribozyme67 and splicing regulation6. Despite these examples, the interactions between large ncRNAs, genomic DNA and other RNAs are not well characterized.

A potential modular RNA code

Collectively, the studies reviewed here suggest an intriguing hypothesis: large ncRNAs are flexible modular scaffolds23,64,66,81. In this model, RNA contains discrete domains that interact with specific protein complexes. These RNAs, through a combination of domains, bring specific regulatory components into proximity with each other, which results in the formation of a unique functional complex. These RNA regulatory complexes can include interactions with proteins but can also extend to RNA–DNA and RNA–RNA regulatory interactions.

RNA is well-suited for this role because it is a malleable evolutionary substrate compared with a protein, allowing for the selection of discrete interaction domains5. Specifically, RNA can be easily mutated, tested and selected without breaking its core functionality5. This model of modular interactions can explain the observation that there are highly conserved ‘patches’ within large ncRNA genes25,31,37 that could have evolved for specific protein interactions26,81,84. The remaining regions may be more evolutionarily flexible, allowing the formation of new functional domains by random mutation and selection. This is consistent with the observation that non-constrained regions of telomerase are dispensable66.

The model of RNA as a modular scaffold is not limited to protein interactions. RNA can also base-pair with DNA, which might be used to guide complexes to specific DNA sequences. Alternatively, RNAs might guide complexes by bridging together sets of DNA-binding proteins. Such a model could explain how the same protein complexes are guided to different DNA loci in distinct cell types.

Large ncRNAs can also form RNA–RNA interactions, raising intriguing possibilities for future investigations. For example, two large RNA molecular scaffolds might be linked through RNA–RNA interactions. Another possibility is that RNA–RNA interactions could result in unique RNA structures that can interact with protein complexes that are not attainable by the individual units. This has been observed in the ribosome, where the combination of RNA–RNA and RNA–protein interactions are required for correct complex formation.

Outlook

We are only beginning to understand the mechanism by which large ncRNAs carry out their regulatory function. A modular RNA regulatory code is an attractive hypothesis but remains to be tested; in particular, the way in which large ncRNAs, and proteins interact, and the underlying molecular principles are still unknown. Understanding these principles will require the identification of the sites of the RNA–protein interactions and the exact RNA-binding proteins in vivo. Furthermore, the way in which large ncRNAs localize to their target genes is unknown but could involve direct RNA–DNA interactions (Fig. 3a) or interactions with proteins that contain DNA recognition elements, which has been suggested for XIST96 and HOTAIR64. To gain insight into these processes, it will be important to catalogue the interactions that ncRNAs form with genomic DNA and RNAs. These data will help elucidate the rules that guide these interactions as well as the functional implications of these associations, which can then be tested experimentally.

If large ncRNAs are truly modular, then each individual domain would have a unique function that is independent of other domains. Demonstrating modularity will require the genetic deletion of domains and spacer regions, as well as domain-swapping experiments. Learning these principles would result in a defined ‘modular RNA code’ for how RNAs can affect cell states. By truly understanding this modular RNA code, it may be possible to create synthetically engineered RNAs that could interact with both nucleic acids and protein modules to carry out engineered regulatory roles. However, at present, it is premature to dismiss the possibility of large ncRNAs having other mechanisms of action that may not fit neatly into this modular RNA code. In the meantime, it is clear that mammalian genomes encode a diverse set of large important ncRNAs.

Acknowledgments

We thank M. Cabili, J. Engreitz, M. Garber, P. McDonel and A. Pauli for their reading and suggestions; T. Cech for comments and suggestions; E. Lander for helpful discussions and ideas; and S. Knemeyer and L. Gaffney for assistance with figures in this Review.

Footnotes

Author Information The authors declare no competing financial interests.

References

1. Warner JR, Soeiro R, Birnboim HC, Girard M, Darnell JE. Rapidly labeled HeLa cell nuclear RNA. I. Identification by zone sedimentation of a heterogeneous fraction separate from ribosomal precursor RNA. J Mol Biol. 1966;19:349–361. [PubMed] [Google Scholar]

2. Salditt-Georgieff M, Harpold MM, Wilson MC, Darnell JE., Jr Large heterogeneous nuclear ribonucleic acid has three times as many 5′ caps as polyadenylic acid segments, and most caps do not enter polyribosomes. Mol Cell Biol. 1981;1:179–187. This paper demonstrates an abundant class of RNA species that do not enter polyribosomes. [PMC free article] [PubMed] [Google Scholar]

3. Weinberg RA, Penman S. Small molecular weight monodisperse nuclear RNA. J Mol Biol. 1968;38:289–304. [PubMed] [Google Scholar]

4. Zieve G, Penman S. Small RNA species of the HeLa cell: metabolism and subcellular localization. Cell. 1976;8:19–31. [PubMed] [Google Scholar]

5. Gesteland RF, Cech T, Atkins JF. The RNA World : The Nature of Modern RNA Suggests a Prebiotic RNA World. 3rd. Cold Spring Harbor Laboratory Press; 2006. [Google Scholar]

6. Eddy SR. Non-coding RNA genes and the modern RNA world. Nature Rev Genet. 2001;2:919–929. [PubMed] [Google Scholar]

7. Pachnis V, Brannan CI, Tilghman SM. The structure and expression of a novel gene activated in early mouse embryogenesis. EMBO J. 1988;7:673–681. [PMC free article] [PubMed] [Google Scholar]

8. Brannan CI, Dees EC, Ingram RS, Tilghman SM. The product of the H19 gene may function as an RNA. Mol Cell Biol. 1990;10:28–36. This paper was the first report of a large ncRNA showing that the H19 transcript lacked conserved ORFs and did not make a protein product in vivo. [PMC free article] [PubMed] [Google Scholar]

9. Brown CJ, et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature. 1991;349:38–44. [PubMed] [Google Scholar]

10. Penny GD, Kay GF, Sheardown SA, Rastan S, Brockdorff N. Requirement for Xist in X chromosome inactivation. Nature. 1996;379:131–137. [PubMed] [Google Scholar]

11. Sleutels F, Zwart R, Barlow DP. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature. 2002;415:810–813. [PubMed] [Google Scholar]

12. Young TL, Matsuda T, Cepko CL. The noncoding RNA taurine upregulated gene 1 is required for differentiation of the murine retina. Curr Biol. 2005;15:501–512. [PubMed] [Google Scholar]

13. Willingham AT, et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science. 2005;309:1570–1573. [PubMed] [Google Scholar]

14. Rinn JL, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007;129:1311–1323. [PMC free article] [PubMed] [Google Scholar]

15. Carninci P, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. This paper describes the large-scale cDNA sequencing efforts in the mouse genome and reveals many thousands of non-coding transcripts. [PubMed] [Google Scholar]

16. Birney E, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. [PMC free article] [PubMed] [Google Scholar]

17. Bertone P, et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004;306:2242–2246. [PubMed] [Google Scholar]

18. Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. [PubMed] [Google Scholar]

19. Rinn JL, et al. The transcriptional activity of human Chromosome 22. Genes Dev. 2003;17:529–540. [PMC free article] [PubMed] [Google Scholar]

20. Kapranov P, et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002;296:916–919. [PubMed] [Google Scholar]

21. Ebisuya M, Yamamoto T, Nakajima M, Nishida E. Ripples from neighbouring transcription. Nature Cell Biol. 2008;10:1106–1113. [PubMed] [Google Scholar]

22. Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct Mol Biol. 2007;14:103–105. [PubMed] [Google Scholar]

23. Guttman M, et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011;477:295–300. [PMC free article] [PubMed] [Google Scholar]

24. Orom UA, et al. Long noncoding RNAs with enhancer-like function in human cells. Cell. 2010;143:46–58. [PMC free article] [PubMed] [Google Scholar]

25. Guttman M, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458:223–227. This paper applied a chromatin signature to identify lincRNAs and used a guilt-by-association approach to classify their likely functions in diverse biological processes. [PMC free article] [PubMed] [Google Scholar]

26. Huarte M, et al. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell. 2010;142:409–419. [PMC free article] [PubMed] [Google Scholar]

27. Hung T, et al. Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nature Genet. 2011;43:621–629. [PMC free article] [PubMed] [Google Scholar]

28. Wang KC, et al. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature. 2011;472:120–124. [PMC free article] [PubMed] [Google Scholar]

29. Wilusz JE, Freier SM, Spector DL. 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell. 2008;135:919–932. [PMC free article] [PubMed] [Google Scholar]

30. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5:621–628. [PubMed] [Google Scholar]

31. Guttman M, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnol. 2010;28:503–510. [PMC free article] [PubMed] [Google Scholar]

32. Cabili MN, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. [PMC free article] [PubMed] [Google Scholar]

33. Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS. Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci USA. 2008;105:716–721. [PMC free article] [PubMed] [Google Scholar]

34. De Santa F, et al. A large fraction of extragenic RNA Pol II transcription sites overlap enhancers. PLoS Biol. 2010;8:e1000384. [PMC free article] [PubMed] [Google Scholar]

35. Kim TK, et al. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465:182–187. [PMC free article] [PubMed] [Google Scholar]

36. Ravasi T, et al. Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res. 2006;16:11–19. [PMC free article] [PubMed] [Google Scholar]

37. Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007;17:556–565. [PMC free article] [PubMed] [Google Scholar]

38. Taft RJ, et al. Tiny RNAs associated with transcription start sites in animals. Nature Genet. 2009;41:572–578. [PubMed] [Google Scholar]

40. Kouzarides T. Chromatin modifications and their function. Cell. 2007;128:693–705. [PubMed] [Google Scholar]

41. Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. [PubMed] [Google Scholar]

42. Visel A, et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature. 2009;457:854–858. [PMC free article] [PubMed] [Google Scholar]

43. Heintzman ND, et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009;459:108–112. [PMC free article] [PubMed] [Google Scholar]

44. Mikkelsen TS, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–560. [PMC free article] [PubMed] [Google Scholar]

45. Khalil AM, et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA. 2009;106:11667–11672. [PMC free article] [PubMed] [Google Scholar]

46. Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. [PMC free article] [PubMed] [Google Scholar]

47. Loewer S, et al. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nature Genet. 2010;42:1113–1117. [PMC free article] [PubMed] [Google Scholar]

48. Dinger ME, Pang KC, Mercer TR, Mattick JS. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol. 2008;4:e1000176. [PMC free article] [PubMed] [Google Scholar]

49. Brockdorff N, et al. The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell. 1992;71:515–526. [PubMed] [Google Scholar]

50. Lin MF, Deoras AN, Rasmussen MD, Kellis M. Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLoS Comput Biol. 2008;4:e1000067. [PMC free article] [PubMed] [Google Scholar]

51. Lin MF, Jungreis I, Kellis M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics. 2011;27:i275–i282. [PMC free article] [PubMed] [Google Scholar]

53. Ingolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011;147:789–802. [PMC free article] [PubMed] [Google Scholar]

54. Galindo MI, Pueyo JI, Fouix S, Bishop SA, Couso JP. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 2007;5:e106. This paper demonstrates the existence of functional small peptides within a presumed ‘non-coding’ transcript through ORF conservation, in vivo protein identification and functional analysis. [PMC free article] [PubMed] [Google Scholar]

55. Kondo T, et al. Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis. Science. 2010;329:336–339. [PubMed] [Google Scholar]

56. Jiao Y, Meyerowitz EM. Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control. Mol Syst Biol. 2010;6:419. [PMC free article] [PubMed] [Google Scholar]

57. Li YM, et al. The H19 transcript is associated with polysomes and may regulate IGF2 expression in trans. J Biol Chem. 1998;273:28247–28252. [PubMed] [Google Scholar]

58. Cai X, Cullen BR. The imprinted H19 noncoding RNA is a primary microRNA precursor. RNA. 2007;13:313–316. [PMC free article] [PubMed] [Google Scholar]

59. Yang L, et al. ncRNA- and Pc2 methylation-dependent gene relocation between nuclear structures mediates gene activation programs. Cell. 2011;147:773–788. [PMC free article] [PubMed] [Google Scholar]

60. Clamp M, et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA. 2007;104:19428–19433. [PMC free article] [PubMed] [Google Scholar]

61. Kastenmayer JP, et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 2006;16:365–373. [PMC free article] [PubMed] [Google Scholar]

62. Hanada K, Zhang X, Borevitz JO, Li WH, Shiu SH. A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection. Genome Res. 2007;17:632–640. [PMC free article] [PubMed] [Google Scholar]

64. Tsai MC, et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science. 2010;329:689–693. This paper identified multiple protein-interaction domains within HOTAIR that together allowed it to carry out its function, which demonstrated that a large ncRNA can act as a molecular scaffold. [PMC free article] [PubMed] [Google Scholar]

65. Gupta RA, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464:1071–1076. [PMC free article] [PubMed] [Google Scholar]

66. Zappulla DC, Cech TR. Yeast telomerase RNA: a flexible scaffold for protein subunits. Proc Natl Acad Sci USA. 2004;101:10024–10029. This paper demonstrated that telomerase RNA can bridge proteins by showing that protein interaction domains can be swapped and spacer regions deleted with minimal impact on the function of the RNA. [PMC free article] [PubMed] [Google Scholar]

67. Korostelev A, Noller HF. The ribosome in focus: new structures bring new insights. Trends Biochem Sci. 2007;32:434–441. [PubMed] [Google Scholar]

68. Ivanova N, et al. Dissecting self-renewal in stem cells with RNA interference. Nature. 2006;442:533–538. [PubMed] [Google Scholar]

69. Martens JA, Laprade L, Winston F. Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature. 2004;429:571–574. [PubMed] [Google Scholar]

70. Schmitt S, Prestel M, Paro R. Intergenic transcription through a Polycomb group response element counteracts silencing. Genes Dev. 2005;19:697–708. [PMC free article] [PubMed] [Google Scholar]

71. Lee JT. Lessons from X-chromosome inactivation: long ncRNA as guides and tethers to the epigenome. Genes Dev. 2009;23:1831–1842. [PMC free article] [PubMed] [Google Scholar]

72. Ponjavic J, Oliver PL, Lunter G, Ponting CP. Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain. PLoS Genet. 2009;5:e1000617. [PMC free article] [PubMed] [Google Scholar]

73. Tian D, Sun S, Lee JT. The long noncoding RNA, Jpx, is a molecular switch for X chromosome inactivation. Cell. 2010;143:390–403. [PMC free article] [PubMed] [Google Scholar]

74. Koerner MV, Pauler FM, Huang R, Barlow DP. The function of non-coding RNAs in genomic imprinting. Development. 2009;136:1771–1783. [PMC free article] [PubMed] [Google Scholar]

75. Pandey RR, et al. Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell. 2008;32:232–246. [PubMed] [Google Scholar]

76. Bertani S, Sauer S, Bolotin E, Sauer F. The noncoding RNA Mistral activates Hoxa6 and Hoxa7 expression and stem cell differentiation by recruiting MLL1 to chromatin. Mol Cell. 2011;43:1040–1046. [PMC free article] [PubMed] [Google Scholar] Retracted

77. Feng J, et al. The Evf-2 noncoding RNA is transcribed from the Dlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptional coactivator. Genes Dev. 2006;20:1470–1484. [PMC free article] [PubMed] [Google Scholar]

78. Koziol MJ, Rinn JL. RNA traffic control of chromatin complexes. Curr Opin Genet Dev. 2010;20:142–148. [PMC free article] [PubMed] [Google Scholar]

79. Maison C, et al. Higher-order structure in pericentric heterochromatin involves a distinct pattern of histone modification and an RNA component. Nature Genet. 2002;30:329–334. [PubMed] [Google Scholar]

80. Bernstein E, et al. Mouse polycomb proteins bind differentially to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Mol Cell Biol. 2006;26:2560–2569. [PMC free article] [PubMed] [Google Scholar]

81. Wutz A, Rasmussen TP, Jaenisch R. Chromosomal silencing and localization are mediated by different domains of Xist RNA. Nature Genet. 2002;30:167–174. This paper reported the generation of deletion mutants across the Xist locus and identified the discrete domains responsible for the silencing and localization roles of the RNA. [PubMed] [Google Scholar]

82. Chu C, Qu K, Zhong FL, Artandi SE, Chang HY. Genomic maps of long noncoding RNA occupancy reveal principles of RNA–chromatin interactions. Mol Cell. 2011;44:667–678. [PMC free article] [PubMed] [Google Scholar]

83. Simon MD, et al. The genomic binding-sites of a non-coding RNA. Proc Natl Acad Sci USA. 2011;108:20497–20502. [PMC free article] [PubMed] [Google Scholar]

84. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science. 2008;322:750–756. [PMC free article] [PubMed] [Google Scholar]

85. Plath K, Mlynarczyk-Evans S, Nusinow DA, Panning B. Xist RNA and the mechanism of X chromosome inactivation. Annu Rev Genet. 2002;36:233–278. [PubMed] [Google Scholar]

86. Nagano T, et al. The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science. 2008;322:1717–1720. [PubMed] [Google Scholar]

87. Zhao J, et al. Genome-wide identification of Polycomb-associated RNAs by RIP-seq. Mol Cell. 2010;40:939–953. [PMC free article] [PubMed] [Google Scholar]

88. Kaneko S, et al. Phosphorylation of the PRC2 component Ezh2 is cell cycle-regulated and up-regulates its binding to ncRNA. Genes Dev. 2010;24:2615–2620. [PMC free article] [PubMed] [Google Scholar]

89. Wang X, et al. Induced ncRNAs allosterically modify RNA-binding proteins in cis to inhibit transcription. Nature. 2008;454:126–130. [PMC free article] [PubMed] [Google Scholar]

90. Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP. Noncoding RNA Gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal. 2010;3:ra8. [PMC free article] [PubMed] [Google Scholar]

91. Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP. A ceRNA hypothesis: the Rosetta stone of a hidden RNA language? Cell. 2011;146:353–358. [PMC free article] [PubMed] [Google Scholar]

92. Cesana M, et al. A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell. 2011;147:358–369. [PMC free article] [PubMed] [Google Scholar]

93. Greider CW, Blackburn EH. Identification of a specific telomere terminal transferase activity in Tetrahymena extracts. Cell. 1985;43:405–413. [PubMed] [Google Scholar]

94. Feng J, et al. The RNA component of human telomerase. Science. 1995;269:1236–1241. [PubMed] [Google Scholar]

95. Lingner J, et al. Reverse transcriptase motifs in the catalytic subunit of telomerase. Science. 1997;276:561–567. [PubMed] [Google Scholar]

96. Jeon Y, Lee JT. YY1 tethers Xist RNA to the inactive X nucleation center. Cell. 2011;146:119–133. [PMC free article] [PubMed] [Google Scholar]

97. Hasegawa Y, Brockdorff N, Kawano S, Tsutui K, Nakagawa S. The matrix protein hnRNP U is required for chromosomal localization of Xist RNA. Dev Cell. 2010;19:469–476. [PubMed] [Google Scholar]

98. Schmitz KM, Mayer C, Postepska A, Grummt I. Interaction of noncoding RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing of rRNA genes. Genes Dev. 2010;24:2264–2269. [PMC free article] [PubMed] [Google Scholar]

99. Martianov I, Ramadass A, Serra Barros A, Chow N, Akoulitchev A. Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript. Nature. 2007;445:666–670. [PubMed] [Google Scholar]