Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study - PubMed (original) (raw)
Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study
Lei Sun et al. BMC Bioinformatics. 2012.
Abstract
Background: Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.
Results: We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.
Conclusions: Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.
Figures
Figure 1
Pipeline for predicting novel lncRNAs. (a) Initial assembly. Raw reads are first mapped onto the reference mouse genome. The un-mapped reads are trimmed before re-mapping. Merging the read alignments of all 6 replicates is to increase the read coverage. At the assembly stage, RABT generates synthetic reads from the RefSeq gene annotation to compensate the read coverage gaps over transcripts; (b) Novel lncRNAs detection. The initial assemblies are categorized by cuffcompare, compared with the combined gene annotations. The low-quality transcripts are then filtered according to the optimum FPKM (2.12). The lncRScan program is performed to detect the novel lncRNAs from the remaining high-quality assemblies according to multiple criteria.
Figure 2
Steps of lncRScan. (1) ‘extract_category’ extracts five candidate categories of assemblies (Transcripts-1), including ‘i’, ‘j’, ‘o’, ‘u’ and ‘x’; (2) ‘extract_length’ is used to extract the transcripts with length > 200 nt (Transcripts-2); (3) ‘extract_ORF’ selects the transcripts with maximum putative ORF < 300 nt (Transcripts-3); (4) ‘extract_PhyloCSF’ extracts the transcripts with PhyloCSF score < 0 or test failure due to ORF < 25 aa (Transcripts-4); (5) ’extract_Pfam’ searches the remaining transcripts in the Pfam database and excludes the transcripts with significant protein domain hits. Towards the end of lncRScan, the remaining 308 transcripts (Transcripts-5) are defined as the novel lncRNAs.
Figure 3
Differential expression tests. The cuffdiff program performs differential expression tests between the WT and Klf1 KO samples based on the read alignments (BAM) of the six replicates and high-quality assemblies (GTF).
Figure 4
FPKM distributions of complete and partial transcripts. The ‘=’ classcode is originally assigned to the transcripts that have complete match intron chain with a reference transcript and they can be treated as complete transcripts while the ‘c’ classcode is attached to the transcripts contained by reference and they are defined as partial assemblies. The complete (‘=’, red curve) and partial (‘c’, blue curve) transcripts assembled from the read alignments represent distinguishable FPKM distributions from each other (∼29.67 vs ∼4.86).
Figure 5
Performance of FPKM in distinguishing between complete and partial transcripts. An assembled transcript will be classified into the category of complete assemblies (‘=’ classcode) if its FPKM is larger than a given threshold, otherwise it will be put into the partial category (‘c’ classcode). The blue ROC curve [39] represents the performance of FPKM in classifying the complete and partial transcripts. The corresponding Area Under Curve (AOC) is 0.7825.
Figure 6
Comparison between novel lncRNAs and NONCODE lncRNAs. There are 36991 lncRNAs annotated by NONCODE 3.0 and 308 lncRNAs predicted by our method. Of the 80 (25.97% of our prediction) overlapped lncRNAs, 5 ones have been exactly annotated by NONCODE 3.0
Figure 7
Comparisons of transcript length, exon number and ORF length. (a) Comparison of transcript length. The novel lncRNAs show shorter length (∼1.2kb) on average than either RefSeq protein-coding (∼3.1kb) or non-coding transcripts (∼1.9kb); (b) Comparison of exon number. The lncRNAs represent fewer exons (∼2.8) than the other two categories of transcripts (∼10.0 and ∼3.3, respectively) on average; (c) Comparison of ORF length. The novel lncRNAs show shorter putative ORF length (∼0.17kb) than either of the two RefSeq gene categories (∼1.6kb and ∼0.3kb, respectively) on average. All means are marked by red points
Figure 8
Comparison of expression level between protein-coding transcripts and novel lncRNAs. (a) In the WT condition, the protein-coding transcripts (∼50.92) represent slightly higher expression level than the novel lncRNAs (∼44.54), but significantly higher expression than the lincRNAs (∼11.29) extracted from the lncRNAs; (b) In the Klf1 KO condition, the protein-coding transcripts (∼37.63) also show slightly higher expression level than the lncRNAs (∼34.06), but significantly higher expression than the lincRNAs (∼9.6). In addition, the protein-coding transcripts and the novel lncRNAs represent similar median expression in either WT (10.29 vs 9.509) or Klf1 KO (9.421 vs 7.722) condition. All means are marked by red points
Figure 9
Differential expression of transcripts between WT and Klf1 KO. The three volcano plots illustrate the differential expression (DE) between the WT and Klf1 KO samples at either gene or transcript level: (a) DE of all genes. At the gene level, Klf1 globally appears to be an activator since more genes are significantly repressed (334, red points over the positive x-axis) than the activated ones (250, red points over the negative x-axis) after Klf1 is knocked out; (b) DE of all transcripts. At the transcript/isoform level, Klf1 also behaves like an activator since more transcripts are significantly repressed (262) than activated ones (147) after Klf1 is knocked out; (c) DE of the novel lncRNAs. For the 13 DE significant lncRNA transcripts, Klf1 still functions like an activator since 10 lncRNAs are repressed and 3 ones are activated after Klf1 is knocked out. The DE significant transcripts are all represented by red points
Similar articles
- Systematic identification of long intergenic non-coding RNAs expressed in bovine oocytes.
Wang J, Koganti PP, Yao J. Wang J, et al. Reprod Biol Endocrinol. 2020 Feb 21;18(1):13. doi: 10.1186/s12958-020-00573-4. Reprod Biol Endocrinol. 2020. PMID: 32085734 Free PMC article. - In silico prediction of long intergenic non-coding RNAs in sheep.
Bakhtiarizadeh MR, Hosseinpour B, Arefnezhad B, Shamabadi N, Salami SA. Bakhtiarizadeh MR, et al. Genome. 2016 Apr;59(4):263-75. doi: 10.1139/gen-2015-0141. Epub 2016 Feb 19. Genome. 2016. PMID: 27002388 - Preliminary RNA-Seq Analysis of Long Non-Coding RNAs Expressed in Human Term Placenta.
Majewska M, Lipka A, Paukszto L, Jastrzebski JP, Gowkielewicz M, Jozwik M, Majewski MK. Majewska M, et al. Int J Mol Sci. 2018 Jun 27;19(7):1894. doi: 10.3390/ijms19071894. Int J Mol Sci. 2018. PMID: 29954144 Free PMC article. - Roles of long noncoding RNAs in brain development, functional diversification and neurodegenerative diseases.
Wu P, Zuo X, Deng H, Liu X, Liu L, Ji A. Wu P, et al. Brain Res Bull. 2013 Aug;97:69-80. doi: 10.1016/j.brainresbull.2013.06.001. Epub 2013 Jun 10. Brain Res Bull. 2013. PMID: 23756188 Review. - Characterizing and annotating the genome using RNA-seq data.
Chen G, Shi T, Shi L. Chen G, et al. Sci China Life Sci. 2017 Feb;60(2):116-125. doi: 10.1007/s11427-015-0349-4. Epub 2016 Jun 13. Sci China Life Sci. 2017. PMID: 27294835 Review.
Cited by
- Genome-Wide Identification of Long Non-Coding RNAs and Their Regulatory Networks Involved in Apis mellifera ligustica Response to Nosema ceranae Infection.
Chen D, Chen H, Du Y, Zhou D, Geng S, Wang H, Wan J, Xiong C, Zheng Y, Guo R. Chen D, et al. Insects. 2019 Aug 9;10(8):245. doi: 10.3390/insects10080245. Insects. 2019. PMID: 31405016 Free PMC article. - Long Non-Coding RNA as Potential Biomarker for Prostate Cancer: Is It Making a Difference?
Deng J, Tang J, Wang G, Zhu YS. Deng J, et al. Int J Environ Res Public Health. 2017 Mar 7;14(3):270. doi: 10.3390/ijerph14030270. Int J Environ Res Public Health. 2017. PMID: 28272371 Free PMC article. Review. - Genome-wide discovery of long intergenic noncoding RNAs and their epigenetic signatures in the rat.
Li A, Zhou ZY, Hei X, Otecko NO, Zhang J, Liu Y, Zhou H, Zhao Z, Wang L. Li A, et al. Sci Rep. 2017 Nov 1;7(1):14817. doi: 10.1038/s41598-017-13844-9. Sci Rep. 2017. PMID: 29093522 Free PMC article. - Long non-coding RNA discovery across the genus anopheles reveals conserved secondary structures within and beyond the Gambiae complex.
Jenkins AM, Waterhouse RM, Muskavitch MA. Jenkins AM, et al. BMC Genomics. 2015 Apr 23;16(1):337. doi: 10.1186/s12864-015-1507-3. BMC Genomics. 2015. PMID: 25903279 Free PMC article. - Comprehensive transcriptome and methylome analysis delineates the biological basis of hair follicle development and wool-related traits in Merino sheep.
Zhao B, Luo H, He J, Huang X, Chen S, Fu X, Zeng W, Tian Y, Liu S, Li CJ, Liu GE, Fang L, Zhang S, Tian K. Zhao B, et al. BMC Biol. 2021 Sep 9;19(1):197. doi: 10.1186/s12915-021-01127-9. BMC Biol. 2021. PMID: 34503498 Free PMC article.
References
- Baker M. Long noncoding RNAs: the search for function. Nat Meth. 2011;8(5):379–383. doi: 10.1038/nmeth0511-379. [10.1038/nmeth0511-379] - DOI
- Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermüller J, Hofacker IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt G, Ganesh M, Ghosh S, Piccolboni A, Sementchenko V, Tammana H, Gingeras TR. RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription. Science. 2007;316(5830):1484–1488. doi: 10.1126/science.1138341. - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources