The Landscape of long noncoding RNA classification - PubMed (original) (raw)

Review

The Landscape of long noncoding RNA classification

Georges St Laurent et al. Trends Genet. 2015 May.

Abstract

Advances in the depth and quality of transcriptome sequencing have revealed many new classes of long noncoding RNAs (lncRNAs). lncRNA classification has mushroomed to accommodate these new findings, even though the real dimensions and complexity of the noncoding transcriptome remain unknown. Although evidence of functionality of specific lncRNAs continues to accumulate, conflicting, confusing, and overlapping terminology has fostered ambiguity and lack of clarity in the field in general. The lack of fundamental conceptual unambiguous classification framework results in a number of challenges in the annotation and interpretation of noncoding transcriptome data. It also might undermine integration of the new genomic methods and datasets in an effort to unravel the function of lncRNA. Here, we review existing lncRNA classifications, nomenclature, and terminology. Then, we describe the conceptual guidelines that have emerged for their classification and functional annotation based on expanding and more comprehensive use of large systems biology-based datasets.

Keywords: annotation of long non-coding RNAs; classification of long non-coding RNAs; function of long non-coding RNAs; lincRNA; lncRNA; long non-coding RNA; systems biology; transcriptome; vlincRNA.

Copyright © 2015 Elsevier Ltd. All rights reserved.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Schematic diagram illustrating various classes of ncRNAs

Three hypothetical loci are shown. Protein coding exons are shown as green (locus 1) or yellow boxes (locus 3). Locus 2 signifies a pseudogene of locus 1. Regulatory regions of locus 1 are shown in purple (promoter) and magenta (enhancer). Repeats are denoted by brown boxes. Lines with arrows represent ncRNAs. CAR: chromatin-associated RNA. ceRNA: Competing endogenous. RNA ciRNA: chromatin-interlinking RNA (grey) or circular intronic RNA (green). ecircRNA: exonic circular RNA. eRNA: enhancer-associated RNA. lincRNA: long intervening non-coding RNA. ncRNA-a: activating non-coding RNAs. PALR: promoter-associated long RNA. PIN: partially intronic RNA. TIN: totally intronic RNA. TSSa-RNA: transcription start site-associated RNA. T-UCR: Transcribed Ultraconserved Regions. uaRNA: 3′UTR-derived RNAs. vlincRNA: very long intergenic non-coding RNA. The role depicted here for CARs and ciRNAs in stabilizing a chromatin loop is hypothetical.

Figure 2

Figure 2. Properties of different published lists of human transcripts representing various classes of ncRNAs

Sequence conservation was defined by the conserved elements from the Vertebrate Multiz Alignment & Conservation (100 Species) from the UCSC Browser [147]. Relative conservation represents the fraction of conserved bases relative to the total lengths for each list of ncRNAs. Relative mass and expression levels represent averages of several malignant and normal tissues profiled using single-molecule RNA-seq analysis [5, 29]. Only uniquely aligning non-rRNA and non-chrM reads were considered. Relative mass represents proportion of reads mapping to a particular genomic element relative to all reads. The relative expression is the relative mass divided by the total length of each list and normalized to the relative expression of coding exons (defined by UCSC Genes). Promoter-associated RNAs were defined by the regions 3 kb upstream of annotated start sites of UCSC Genes. Given the lack of a comprehensive list of standalone human intronic RNAs, we extrapolated the relative mass of those based on mouse data [50]. The GENCODE annotations [33] are based on v19.

Figure 3

Figure 3. Outline of the consolidated conceptual framework of ncRNA classification

Highly accurate empirical RNA-seq data drives both annotation and quantification of the longest ncRNA (Tier 1) and of processed ncRNA species (Tier 2) across the entire genome. The quantitation data serves as the basis for the combined global matrix of knowledge of expression of each (coding and non-coding) RNA gene and transcript across multiple biological sources (Tier 3). This information provides the input for the functional annotation of non-coding transcripts using systems biology approaches. Mapping of RNA modifications provides the final layer of knowledge in this scheme.

Figure 4

Figure 4. A genomic view of the 8q24 region upstream of the human MYC gene

This clinically-important locus containing many GWAS hits associated with several cancers represents an example of a genomic region that could clearly benefit from the new annotation scheme. The RNAseq analysis reveals fairly strong signal on both strands covering most of this >1Mbp region. Yet, the known lncRNA annotations represent only a small fraction of this locus and judging by the distribution of the RNAseq signal and known promoters, are likely part of much larger transcript units (for example vlincRNAs shown on the figure). Transcriptome RNA-seq data is represented by the polyA- nuclear RNA from normal epidermal keratinocytes (NHEK) and embryonic stem cells (H1) generated by the ENCODE consortium [3]. In addition, vlincRNAs [29], promoters [32], and disease-associated variants from genome-wide association studies [148] (GWAS) are shown. Reproduced with permission from [12].

Similar articles

Cited by

References

    1. Kapranov P, et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002;296:916–919. - PubMed
    1. Okazaki Y, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420:563–573. - PubMed
    1. Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. - PMC - PubMed
    1. Bernstein BE, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. - PMC - PubMed
    1. Kapranov P, et al. The majority of total nuclear-encoded non-ribosomal RNA in a human cell is ‘dark matter’ un-annotated RNA. BMC Biol. 2010;8:149. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources