Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells - PubMed (original) (raw)

Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells

Alla A Sigova et al. Proc Natl Acad Sci U S A. 2013.

Abstract

Many long noncoding RNA (lncRNA) species have been identified in mammalian cells, but the genomic origin and regulation of these molecules in individual cell types is poorly understood. We have generated catalogs of lncRNA species expressed in human and murine embryonic stem cells and mapped their genomic origin. A surprisingly large fraction of these transcripts (>60%) originate from divergent transcription at promoters of active protein-coding genes. The divergently transcribed lncRNA/mRNA gene pairs exhibit coordinated changes in transcription when embryonic stem cells are differentiated into endoderm. Our results reveal that transcription of most lncRNA genes is coordinated with transcription of protein-coding genes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Most lncRNAs are associated with active protein-coding genes in hESCs. (A) Schematic diagram of pipeline for identification of lncRNAs in hESCs. An “initial RNA pool” was compiled from transcripts assembled de novo from RNA-seq reads (this study;

SI Materials and Methods

) and published data (20). Four criteria required for the selection of expressed transcripts from this pool are indicated in red. Transcripts were required to be expressed from a high-confidence start site (occupied by H3K4me3), to be noncoding [lacking features of protein-coding RNAs as defined by the CPC (28)], to be long (>100 nt), and to be nonrepetitive. (B) Summary of various types and numbers of lncRNA loci in hESCs, which are listed in

Dataset S1

. Diagrams at right depict lncRNA loci as red lines, protein-coding genes as blue lines, and an enhancer as an open box. An arrow indicates direction of transcription initiation. Enhancer-associated lncRNAs overlap or originate at genomic regions enriched in nucleosomes with histone 3 acetylated at lysine 27 (H3K27Ac). Enriched regions for H3K27Ac are available in

Dataset S2

. (C) Example of lncRNA locus whose 5′ end occurs within 2 kb of the TSS of a protein-coding gene (promoter-associated lncRNA). Gene tracks represent ChIP-seq data for H3K4me3-modified nucleosomes (48) together with reads for polyadenylated RNA in the vicinity of CAPN10. Transcription at lncRNA locus 959 generates three alternatively spliced lncRNA transcripts that are divergent from CAPN10. The x axis represents the linear sequence of genomic DNA, and the y axis represents the total number of ChIP-seq and RNA-seq mapped reads. RNA-seq reads that map to Watson (blue) and Crick (red) strands of genomic DNA are shown separately. The scale is indicated in the upper right. (D) Distribution of TSS of lncRNAs relative to the TSS of protein-coding genes. Coding regions are normalized to equal length, and the regions upstream of associated promoters are divided into one hundred 100_-_bp bins. Distance between TSS of protein-coding gene and 5′ end of lncRNA is indicated on x axis and expressed in kilobases (kb). Antisense lncRNA loci are indicated in red. Sense lncRNA loci are indicated in blue.

Fig. 2.

Fig. 2.

lncRNAs are derived from divergent transcription of active protein-coding genes in hESCs. (A) Alignment of GRO-seq reads for the 2,318 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kilobases. The y axis indicates the average number of uniquely mapped GRO-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on graph. (B) Example of lncRNA locus whose 5′ end occurs within 2 kb of the TSS of a protein-coding gene (promoter-associated lncRNA). Gene tracks represent ChIP-seq data for H3K4me3-modified nucleosomes (48) together with GRO-seq reads and reads for polyadenylated RNA in the vicinity of STAM. Transcription at lncRNA locus 3182 generates two alternatively spliced lncRNA transcripts that are divergent from STAM. The x axis represents the linear sequence of genomic DNA, and the y axis represents the total number of ChIP-seq, GRO-seq, and RNA-seq mapped reads. RNA-seq reads that map to Watson (blue) and Crick (red) strands of genomic DNA are shown separately. GRO-seq reads that map to Watson (purple) and Crick (magenta) strands of genomic DNA are shown separately. The scale is indicated in the upper right. (C) Alignment of RNA-seq reads for the 2,318 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kb. The y axis indicates the average number of uniquely mapped RNA-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on the graph.

Fig. 3.

Fig. 3.

Most lncRNAs are divergently transcribed from protein-coding genes in mESCs. (A) Summary of various types and numbers of lncRNA loci in the mESC catalog, which are listed in

Dataset S1

. Diagrams at right depict lncRNA loci as red lines, protein-coding genes as blue lines, and an enhancer as an open box. An arrow indicates direction of transcription initiation. Enhancer-associated lncRNAs overlap or originate at genomic regions enriched in H3K27Ac (49). Enriched regions for H3K27Ac are available in

Dataset S2

. (B) Example of lncRNA locus whose 5′ end occurs within 2 kb of the TSS of a protein-coding gene (promoter-associated lncRNA). Gene tracks represent ChIP-seq data for H3K4me3 modified nucleosomes (this study), together with GRO-seq reads and reads for polyadenylated RNA in the vicinity of Nol10. Transcription at lncRNA locus 1160 generates lncRNA transcripts that are divergent from Nol10. The x axis represents the linear sequence of genomic DNA, and the y axis represents the total number of ChIP-seq, GRO-seq, and RNA-seq mapped reads. RNA-seq reads that map to Watson (blue) and Crick (red) strands of genomic DNA are shown separately. GRO-seq reads that map to Watson (purple) and Crick (magenta) strands of genomic DNA are shown separately. The scale is indicated above the track. (C) Alignment of GRO-seq reads for the 1,030 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kb. The y axis represents average number of uniquely mapped GRO-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on graph. (D) Alignment of RNA-seq reads for the 1,030 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kilobases. The y axis indicates the average number of uniquely mapped RNA-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on the graph.

Fig. 4.

Fig. 4.

Divergent lncRNA/mRNA pairs exhibit coordinated changes in transcription as ESCs differentiate into endoderm. (A) Summary of the genomic distribution of lncRNA loci 48 h after induction of endodermal differentiation in hESCs. Diagrams at right depict lncRNA loci as red lines, protein-coding genes as blue lines, and an enhancer as an open box. An arrow indicates direction of transcription initiation. Enhancer-associated lncRNAs overlap or originate at genomic regions enriched in H3K27Ac. Enriched regions for H3K27Ac are available in

Dataset S2

. (B) Alignment of GRO-seq reads 48 h after induction of endodermal differentiation in hESCs for the 2,680 protein-coding genes that contain lncRNAs within 2 kb of their TSS. The x axis indicates the distance from the TSS in kilobases, and the y axis indicates the average number of uniquely mapped GRO-seq reads normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on the graph. (C) Alignment of RNA-seq reads 48 h after induction of endodermal differentiation in hESCs for the 2,680 protein-coding genes that contain lncRNAs within 2 kb of their TSS. The x axis indicates the distance from the TSS in kilobases, and the y axis indicates the average number of uniquely mapped RNA-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on graph. (D) Example of lncRNA/mRNA pairs exhibiting coordinated transcriptional induction 48 h after hESCs were differentiated toward the endoderm. Gene tracks represent GRO-seq data in the vicinity of GATA6. Divergent transcription generates antisense lncRNA locus 5689 upstream of GATA6. The x axis represents the linear sequence of genomic DNA, and the y axis represents the number of GRO-seq reads normalized to total number of mapped reads. GRO-seq reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately. GRO-seq reads mapped to the Crick (red) strand of genomic DNA are shown flipped/rotated beneath. The scale is indicated in kilobases (kb) above the track. (E) Example of lncRNA/mRNA pairs exhibiting coordinated transcriptional induction 48 h after hESCs were differentiated toward the endoderm. Gene tracks represent GRO-seq data in the vicinity of LHX5. Divergent transcription generates antisense lncRNA locus 4010 upstream of LHX5. The x axis represents the linear sequence of genomic DNA, and the y axis represents the number of GRO-seq reads normalized to total number of mapped reads. (F) Coordinate transcriptional induction of lncRNA/mRNA gene pairs. A total of 683 lncRNA/mRNA pairs were selected, in which the numbers of GRO-seq reads of mRNA increased at least 1.25-fold after 48 h of endodermal differentiation. The average number of uniquely mapped GRO-seq reads from the strands encoding the mRNA transcripts is shown in black (Upper). The average number of uniquely mapped GRO-seq reads from the strands encoding the lncRNA transcripts is shown in red (Lower). Solid lines represent transcription in hESCs, and dashed lines represent transcription 48 h after induction of differentiation toward the endoderm. The x axis indicates the linear distance in kilobases, and the y axis indicates the average reads per genomic bin per million uniquely mapped reads.

References

    1. Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489(7414):101–108. - PMC - PubMed
    1. Esteller M. Non-coding RNAs in human disease. Nat Rev Genet. 2011;12(12):861–874. - PubMed
    1. Yu W, et al. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature. 2008;451(7175):202–206. - PMC - PubMed
    1. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science. 2008;322(5902):750–756. - PMC - PubMed
    1. Pandey RR, et al. Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell. 2008;32(2):232–246. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources