Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome - PubMed (original) (raw)

Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome

Jia Qian Wu et al. Genome Biol. 2008.

Abstract

Background: Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.

Results: We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.

Conclusion: We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Frequency of PCR products obtained from different genomic regions. Primers designed to the sense and antisense strands of exons, novel transcriptionally active regions (TARs) and nontranscribed regions were used to generate rapid amplification of cDNA ends (RACE) products. The frequency of PCR products obtained is indicated. nontx, region not previously shown to be transcribed.

Figure 2

Figure 2

Distribution of RACE product sequences in the DRG1 and FBX07 regions. (a) DRG1 Region and (b) FBX07 region. Products from the sense strand (+) are shown in the top half of the panel. Products from the antisense strand are in the bottom half of the panel. Blue products are detected sequences from 5'-rapid amplification of cDNA ends (RACE); red products are detected sequences from 3'-RACE; black indicates refSeq; black asterisks indicate consensus splice sites (GT-AG, GC-AG, or AT-AC); and green asterisks indicate novel isoforms with more than 50% consensus splice sites. Note that the antisense products that lack consensus splice sites are indicated in lighter colors.(c) cDNA and RNA hybridization signals in DRG1 region. The blue tracks indicate the signals that were generated from hybridization of cDNA prepared from NB4 cells using reverse transcriptase to the strand-specific microarray. The red tracks indicate hybridization of RNA that has been labeled directly by chemical means, thus omitting the use of reverse transcriptase, to the strand-specific microarray. Products from the sense strand (+) are shown in the top half of the panel. Products from the antisense strand are in the bottom half of the panel.

Figure 3

Figure 3

RACE sequencing can detect transcripts not previously detected by microarray analysis in NB4 cells. (a) Integrated Genome Browser (IGB) view SYN3 and TIMP3 rapid amplification of cDNA ends (RACE) products in NB4 RNA. (b) Real-time PCR quantification of SYN3 and TIMP3 transcripts relative to HPRT1 in NB4 cells.

Figure 4

Figure 4

RACE products from novel TARs and nonTx regions. (a) novel transcriptionally active regions (TARs) and (b) regions not previously shown to be transcribed (nonTx regions). Pink indicates novel TARs, and green nonTx regions that the primers were designed from. Note that the products are primarily unspliced.

Figure 5

Figure 5

Features of the RACE products. (a) Connectivity of detected transcripts to known exons/novel transcriptionally active regions (TARs). (b) Frequency of splice and unspliced rapid amplification of cDNA ends (RACE) products derived from known exons, novel TARs, and untranscribed regions. (c) Average microarray intensities of regions encoding spliced and unspliced RACE products. nontx, region not previously shown to be transcribed.

Figure 6

Figure 6

Example of a novel transcript detected by RACE sequencing. (a) Novel transcript 5NGSP2F8 (with consensus splice site) has a potential open reading frame of 142 amino acids; also, there is spliced expressed sequence tag (EST) evidence for it. (b) Real-time PCR relative quantification of the novel transcript to HPRT1 in placenta polyA+ RNA. RACE, rapid amplification of cDNA ends.

Similar articles

Cited by

References

    1. ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. - DOI - PMC - PubMed
    1. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C. et al.The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. doi: 10.1126/science.1112014. - DOI - PubMed
    1. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermueller J, Hofacker IL. et al.RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341. - DOI - PubMed
    1. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. - DOI - PubMed
    1. Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nat Rev Genet. 2007;8:413–423. doi: 10.1038/nrg2083. - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources