Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome - PubMed (original) (raw)
Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome
Jia Qian Wu et al. Genome Biol. 2008.
Abstract
Background: Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.
Results: We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.
Conclusion: We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.
Figures
Figure 1
Frequency of PCR products obtained from different genomic regions. Primers designed to the sense and antisense strands of exons, novel transcriptionally active regions (TARs) and nontranscribed regions were used to generate rapid amplification of cDNA ends (RACE) products. The frequency of PCR products obtained is indicated. nontx, region not previously shown to be transcribed.
Figure 2
Distribution of RACE product sequences in the DRG1 and FBX07 regions. (a) DRG1 Region and (b) FBX07 region. Products from the sense strand (+) are shown in the top half of the panel. Products from the antisense strand are in the bottom half of the panel. Blue products are detected sequences from 5'-rapid amplification of cDNA ends (RACE); red products are detected sequences from 3'-RACE; black indicates refSeq; black asterisks indicate consensus splice sites (GT-AG, GC-AG, or AT-AC); and green asterisks indicate novel isoforms with more than 50% consensus splice sites. Note that the antisense products that lack consensus splice sites are indicated in lighter colors.(c) cDNA and RNA hybridization signals in DRG1 region. The blue tracks indicate the signals that were generated from hybridization of cDNA prepared from NB4 cells using reverse transcriptase to the strand-specific microarray. The red tracks indicate hybridization of RNA that has been labeled directly by chemical means, thus omitting the use of reverse transcriptase, to the strand-specific microarray. Products from the sense strand (+) are shown in the top half of the panel. Products from the antisense strand are in the bottom half of the panel.
Figure 3
RACE sequencing can detect transcripts not previously detected by microarray analysis in NB4 cells. (a) Integrated Genome Browser (IGB) view SYN3 and TIMP3 rapid amplification of cDNA ends (RACE) products in NB4 RNA. (b) Real-time PCR quantification of SYN3 and TIMP3 transcripts relative to HPRT1 in NB4 cells.
Figure 4
RACE products from novel TARs and nonTx regions. (a) novel transcriptionally active regions (TARs) and (b) regions not previously shown to be transcribed (nonTx regions). Pink indicates novel TARs, and green nonTx regions that the primers were designed from. Note that the products are primarily unspliced.
Figure 5
Features of the RACE products. (a) Connectivity of detected transcripts to known exons/novel transcriptionally active regions (TARs). (b) Frequency of splice and unspliced rapid amplification of cDNA ends (RACE) products derived from known exons, novel TARs, and untranscribed regions. (c) Average microarray intensities of regions encoding spliced and unspliced RACE products. nontx, region not previously shown to be transcribed.
Figure 6
Example of a novel transcript detected by RACE sequencing. (a) Novel transcript 5NGSP2F8 (with consensus splice site) has a potential open reading frame of 142 amino acids; also, there is spliced expressed sequence tag (EST) evidence for it. (b) Real-time PCR relative quantification of the novel transcript to HPRT1 in placenta polyA+ RNA. RACE, rapid amplification of cDNA ends.
Similar articles
- Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays.
Kapranov P, Drenkow J, Cheng J, Long J, Helt G, Dike S, Gingeras TR. Kapranov P, et al. Genome Res. 2005 Jul;15(7):987-97. doi: 10.1101/gr.3455305. Genome Res. 2005. PMID: 15998911 Free PMC article. - Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human.
Zhang Z, Pang AW, Gerstein M. Zhang Z, et al. BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S14. doi: 10.1186/1471-2148-7-S1-S14. BMC Evol Biol. 2007. PMID: 17288572 Free PMC article. - Global identification and characterization of transcriptionally active regions in the rice genome.
Li L, Wang X, Sasidharan R, Stolc V, Deng W, He H, Korbel J, Chen X, Tongprasit W, Ronald P, Chen R, Gerstein M, Deng XW. Li L, et al. PLoS One. 2007 Mar 14;2(3):e294. doi: 10.1371/journal.pone.0000294. PLoS One. 2007. PMID: 17372628 Free PMC article. - Unexpected complexity of the budding yeast transcriptome.
Ito T, Miura F, Onda M. Ito T, et al. IUBMB Life. 2008 Dec;60(12):775-81. doi: 10.1002/iub.121. IUBMB Life. 2008. PMID: 18649367 Review. - Utility of next-generation RNA-sequencing in identifying chimeric transcription involving human endogenous retroviruses.
Sokol M, Jessen KM, Pedersen FS. Sokol M, et al. APMIS. 2016 Jan-Feb;124(1-2):127-39. doi: 10.1111/apm.12477. APMIS. 2016. PMID: 26818267 Review.
Cited by
- Capturing the 'ome': the expanding molecular toolbox for RNA and DNA library construction.
Boone M, De Koker A, Callewaert N. Boone M, et al. Nucleic Acids Res. 2018 Apr 6;46(6):2701-2721. doi: 10.1093/nar/gky167. Nucleic Acids Res. 2018. PMID: 29514322 Free PMC article. Review. - Highly parallel direct RNA sequencing on an array of nanopores.
Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, Pantic N, Admassu T, James P, Warland A, Jordan M, Ciccone J, Serra S, Keenan J, Martin S, McNeill L, Wallace EJ, Jayasinghe L, Wright C, Blasco J, Young S, Brocklebank D, Juul S, Clarke J, Heron AJ, Turner DJ. Garalde DR, et al. Nat Methods. 2018 Mar;15(3):201-206. doi: 10.1038/nmeth.4577. Epub 2018 Jan 15. Nat Methods. 2018. PMID: 29334379 - Comparative Transcriptome Analysis Reveals Substantial Tissue Specificity in Human Aortic Valve.
Wang J, Wang Y, Gu W, Ni B, Sun H, Yu T, Gu W, Chen L, Shao Y. Wang J, et al. Evol Bioinform Online. 2016 Jul 31;12:175-84. doi: 10.4137/EBO.S37594. eCollection 2016. Evol Bioinform Online. 2016. PMID: 27493474 Free PMC article. - Identification and analysis of the promoter region of the STGC3 gene.
Li S, Wang L, He X, Xie Y, Zhang Z. Li S, et al. Arch Med Sci. 2015 Oct 12;11(5):1095-100. doi: 10.5114/aoms.2015.49213. Epub 2015 May 21. Arch Med Sci. 2015. PMID: 26528355 Free PMC article. - Building an RNA Sequencing Transcriptome of the Central Nervous System.
Dong X, You Y, Wu JQ. Dong X, et al. Neuroscientist. 2016 Dec;22(6):579-592. doi: 10.1177/1073858415610541. Epub 2015 Oct 13. Neuroscientist. 2016. PMID: 26463470 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous