Long noncoding RNA in genome regulation: Prospects and mechanisms (original) (raw)

Abstract

Long noncoding RNAs (lncRNAs) are pervasively transcribed and critical regulators of the epigenome.1,2 These long, polyadenylated RNAs do not code for proteins, but function directly as RNAs, recruiting chromatin modifiers to mediate transcriptional changes in processes ranging from X-inactivation (XIST) to imprinting (H19).3 The recent discovery that lncRNA HOTAIR can link chromatin changes to cancer metastasis4 furthers the relevance of lncRNAs to human disease. Here, we discuss lncRNAs as regulatory modules and explore the implications for disease pathogenesis.

Key words: lncRNA, HOTAIR, chromatin, metastasis, PRC2


Although large-scale analyses of mammalian transcriptomes have revealed that more than 50% of transcripts have no protein coding potential,2,5,6 the functions of these putative transcripts are largely unknown. A subset of these noncoding transcripts are termed long noncoding RNAs (lncRNAs), based on an arbitrary minimum length of 200 nucleotides. LncRNAs are roughly classified based on their position relative to protein-coding genes: intergenic (between genes), intragenic/intronic (within genes) and antisense.2 Initial efforts to characterize these molecules demonstrated that they function in cis, regulating their immediate genomic neighbors. Examples include AIR, XIST and Kcnq1ot (reviewed in ref. 1, 7 and 8), which recruit chromatin modifying complexes to silence adjacent sites. The scope of lncRNAs in gene regulation was advanced with the finding that lncRNA HOTAIR exhibited trans regulatory capacities.

HOTAIR is transcribed at the intersection of opposing chromatin domains in the HOXC locus, but targets Polycomb Repressive Complex 2 (PRC2) to silence 40 kilobases of HOXD,9 a locus involved in developmental patterning. A subsequent study revealed that HOTAIR is overexpressed in approximately one quarter of human breast cancers, directing PRC2 to approximately 800 ectopic sites in the genome, which leads to histone H3 lysine 27 trimethylation and changes in gene expression.4 The impacts of lncRNA-mediated chromatin changes are noteworthy: not only did HOTAIR drive metastasis in a mouse model, but HOTAIR expression in human breast cancer was found to be an independent prognostic marker for death and metastasis.4 The fact that HOTAIR drives chromatin reprogramming genome-wide suggests that long-range regulation by lncRNAs may be a widespread mechanism. This is supported by a study showing that >20% of tested lncRNAs are bound by PRC2 and other chromatin modifiers.10 Furthermore, this is an underestimate of the total RNAs involved in chromatin modification, as PRC2 target genes also transcribe smaller 50–200 nt RNAs that interact with SUZ12 to mediate gene repression.11 These findings provoke questions regarding the initial triggers for HOTAIR overexpression and whether understanding of lncRNA mechanics may have clinical relevance.

Long Noncoding RNAs and Disease

The association of HOTAIR with cancer metastasis adds to a growing cohort of lncRNAs associated with disease.12 For example, although the blepharophimosis syndrome (BPES) is driven by dysregulation of the FOXL2 gene, numerous extragenic mutations have been reported in patients.13 One particular deletion occurring 283 kb away from FOXL2 disrupts a lncRNA, PISRT1, that was shown by chromatin confirmation capture to physically loop with FOXL2.14 Another example is the lncRNA EVF2, which recruits the transcription factor Dlx2 to activate the protein coding genes DLX5 and DLX6 that are associated with the Split Hand/Split Foot malformation disorder.15,16 Importantly, none of the known disease mutations reside within the protein coding loci, suggesting that disruption of noncoding transcripts may initiate pathogenesis. A third example is the BC200 RNA, a primate brain transcript that is reduced by 70% in Alzheimer's brain tissues.17 A role of BC200 in neurological disorders is further supported by its direct interaction with the fragile X mental retardation protein (FMRP),18 which is selectively lost in the majority of fragile X patients.19

Functional analysis of a noncoding locus within 9p21 that coincides with several exons of the 4 kb lncRNA ANRIL has bolstered a role for lncRNAs in disease. Despite a lack of protein coding genes, sequence polymorphisms in this 58 kb region are associated with coronary artery disease,20 including two SNPs within exons of ANRIL.21 Targeted deletion of the orthologous 70 kb region in a mouse model altered cardiac transcript levels of neighboring genes Cdkn2A and Cdkn2b and resulted in aberrant cell proliferation.22 Thus, an lncRNA locus may provide a mechanistic link between a disease polymorphism and its associated phenotype.

Altogether, the long range regulation of mRNAs by noncoding sequences appears to be a reoccurring theme in disease development. Yet undiscovered lncRNAs may underlie the functional significance of unexplained disease polymorphisms and expands the catalogue of potential “first-hits” in pathogenesis. For example, one avenue to explore would be whether the gross overexpression of HOTAIR in metastastic tumors can be explained by mutations of the noncoding gene.

Mechanisms for Targeting of Long Noncoding RNAs

The role of lncRNAs in disease processes creates an urgency to understand the mechanisms by which these RNAs seek their targets. The earliest lncRNAs suggested a simplistic model where the RNA remains tethered to the site of origin to regulate transcriptional changes in cis. One example is an lncRNA upstream of the CCND1 promoter that recruits the RNA binding protein TLS to mediate heterochromatin formation.23 However, with trans acting RNAs such as HOTAIR affecting genome wide chromatin changes, it is clear that additional targeting mechanisms must be involved. The extensive sequence space available to lncRNAs provide plausible strategies for highly discriminative binding to the genome in an allele- or gene-specific fashion (reviewed in ref. 3). Possible RNA targeting schemes include the following (see fig. 1):

Figure 1.

Figure 1

Possible lncRNA targeting mechanisms.

Sequence-specific recognition: RNA-RNA.

Global RNA targeting may occur through direct sequence homology, a mechanism that is common for antisense lncRNAs such as p15AS24 and an RNA antisense to CDKN1A.25 A skewed equilibrium between sense and antisense transcripts can lead to disease, as seen in vascular anomalies tissues with altered ratios of TIE-1 mRNA to TIE1-AS lncRNA.26 Since as many as 70% of transcripts have antisense partners,27 antisense regulation is likely to be a widespread phenomenon.

RNA-DNA hybrids.

Sequence complementarity can also be employed in more complex configurations such as RNA-DNA duplexes and triplexes. An example occurs at the DHFR locus, where an lncRNA forms a triplex with the promoter to mediate sequence-specific transcriptional repression.28 In the case of XIST, a lncRNA that spreads over 150 Mb of the inactive X chromosome to mediate gene silencing, there is also evidence that genomic sequence plays a role in the silencing function of the RNA, albeit in a non homologous fashion. By identifying DNA features unique to genes that undergo or escape inactivation, the inactivation status of 80% of the genes could be predicted based on sequence alone.29 This example suggests that in a situation where lncRNA-mediated silencing is nearly compulsory, genomic sequence factors can still confer specificity to the targets. However, despite having identical genomic sequences, there is heterogeneity in the inactivation profile of different fibroblast cell lines,30 suggesting that DNA sequence alone is not sufficient to guide the lncRNA complexes.

Structure-mediated interactions.

The ability of RNA molecules to form secondary and tertiary structures enables more complex schemes for lncRNA targeting. First, base-pairing and looping within an RNA molecule may connect distant sequences to create a binding module that is not evident by primary sequence. Secondly, lncRNAs appear to form secondary structure configurations that mediate their functions. For example, repA RNA forms a duplex of four double hairpin repeats that mediates binding to PRC2.31,32 Similarly, Gas5 lncRNA serves as a decoy for the Glucocorticoid Receptor to titrate it away from its DNA binding sites,33 and a mutation disrupting a hairpin structure inhibits this activity. The importance of secondary structure can be seen when comparing two RNAs, human Alu and mouse B2. Although they have no obvious sequence homology, both are transacting RNAs that sequester RNA polymerase II from initiating transcription.34,35 Indeed, thousands of mouse and human transcripts with no primary sequence conservation share commonalities in RNA structure.36 Thus, it is possible that a structure-mediated mechanism may underlie the specificity of lncRNA targeting.

Protein-mediated interactions.

Nearly a fourth of known human proteins have nucleic acid binding domains,37 so it is possible that proteins may link lncRNAs to target loci. The telomere complex is a prime model for proteins serving as adapters between RNAs and DNAs:38,39 the ribonucleoprotein hnRNP A2 binds both telomerase RNA40 and telomeric DNA repeats.41 Likewise, the telomere repeat factor TRF2 forms a stable complex with telomere-repeat-encoding RNA (TERRA) and telomere DNA repeats.42 The strategies used to probe these interactions should be applied to the lncRNA field to determine whether these formations occur beyond the telomere.

Altogether, the diversity and abundance of noncoding transcripts suggests that several permutations of the aforementioned mechanisms may exist. In the early examples of cis regulatory RNAs, it was difficult to distinguish lncRNA targeting from mere diffusion of the RNAs. For example, the mechanism by which the XIST RNA spreads across the entire X chromosome is still undefined. The recent analysis of HOTAIR thus provides a unique avenue to explore requirements for lncRNA targeting. Specifically, analysis of sequence elements associated with the 800+ genes affected by HOTAIR overexpression may identify features that guide these RNAs. Future analysis of ectopic protein localization mediated by other disease-associated RNAs would further enhance our understanding of ncRNAs in disease. As the mechanics of lncRNA localization become elucidated, we may eventually develop strategies to interfere with their targeting, thus blocking the epigenetic reprogramming that contributes to diseases such as cancer.

Acknowledgements

We thank members of the Chang lab for discussion. Supported by National Science Foundation, American Society for Engineering Education (T.H.), National Institutes of Health (H.Y.C.) and California Institute for Regenerative Medicine (H.Y.C.). H.Y.C. is an Early Career Scientist of the Howard Hughes Medical Institute.

Footnotes

References