miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades - PubMed (original) (raw)
miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades
Marc R Friedländer et al. Nucleic Acids Res. 2012 Jan.
Abstract
microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6-99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions. Last, a new miRNA expression profiling routine, low time and memory usage and user-friendly interactive graphic output can make miRDeep2 useful to a wide range of researchers.
Figures
Figure 1.
Flow charts of modules. Flow charts for (A) the miRDeep2 module (identifies known and novel miRNAs in high-throughput sequencing data), (B) the Mapper module (processes Illumina output and maps it to the reference genome) and (C) the Quantifier module (sums up read counts for known miRNAs in a sequencing data set). For each module the input, internal work flow (in black borders) and output is shown. Files are presented in rectangular boxes; processes are presented in rounded boxes. Files and processes that are novel to miRDeep2 are in yellow. Files and processes that have been modified are in green. Those that remain largely unchanged from the first version of miRDeep are in blue, while those that are optional are in grey. The file formats are: .fa, fasta; .arf, arf mapping format; .str, RNAfold output; .rand, randfold output; .mrd, miRDeep2 text output; .csv, csv spread-sheet; _seq.txt, raw sequence output from the Illumina platform; seq, sequence given on command line (see online documentation for description of formats). The ‘work flow of miRDeep2 modules’ results section contains detailed descriptions of all steps.
Figure 2.
Novel human miRNA detected in three independent liver samples. The upper left table gives the miRDeep2 score break-down for the reported miRNA, along with read counts for the mature, loop and star sequence. The upper right figure shows the predicted RNA secondary structure of the hairpin, partitioned according to miRNA biogenesis: red, mature; yellow, loop; purple, star. The middle density plot shows the distribution of reads in the predicted precursor sequence. The sequences below indicate the positions of the mature, loop and star strand. The positions of the star strand as expected from Drosha/Dicer processing is shown in light blue, while the star consensus positions as observed from the sequencing data is shown in purple. The dotted lines below show the aligned reads. Mismatched nucleotides are presented in upper case. mm, number of mismatches. Both mature and star miRNA strands of this particular miRNA are detected in each of three independent liver samples (NL1-NL3).
Figure 3.
miRDeep2 performance on sequencing data from seven animal clades. miRDeep2 was run on Illumina sequencing data from seven animal species, representing deuterostomes (human, mouse, sea squirts), ecdysozoans (fruit fly, nematode), lophotrochozoans (planaria) and non-bilaterians (sea anemone). Accuracy is calculated as accuracy = sensitivity × prevalence + specificity (1-prevalence) and ranges from 98.6% to 99.9%. Sensitivity is calculated as the fraction of correctly classified miRNA loci. Specificity is the fraction of correctly classified non-miRNA loci. Prevalence is the fraction of analyzed loci which are miRNA loci. In the calculations, miRNA loci is set equivalent to miRBase miRNA loci, for a discussion of this assumption, see the ‘Results’ section. The true positive rate of novel miRNAs is estimated from the miRDeep2 built-in controls. In five of the seven species, both mature and star strand of novel miRDeep2 miRNAs were detected in at least two independent samples. No such independent detection was possible in the sea anemone data, as data from a single sample was analyzed.
Figure 4.
Effect of Dicer silencing on small RNA expression. RNA interference was used to silence Dicer in a MCF-7 cell line. (A) Schematic representation of the experiment; levels of Dicer mRNA in total cells (B) or cytoplasm (C) before and after silencing. Fold-changes in small RNA expression are noted for (D) snoRNAs, (E) tRNAs, (F) rRNAs, (G) genomic control sequences, (H) miRBase miRNAs, (I) novel miRNAs reported by miRDeep2. The median fold-change is indicated above each plot. A comparison of predictions by miRDeep2 with (J) miRanalyzer, (K) MIReNA and (L) miRTRAP was done. The predicted precursors are assigned to sets based on sequencing support, e.g. all the precursors in sets labeled ‘5’ are supported by five or more sequencing reads (control + siDicer). Precursors reported only by miRDeep2 are in blue, precursors reported only by the competing program are in orange. Precursors reported by both are in purple.
Figure 5.
Example miRDeep2 output: performance survey and novel human miRNAs. For each analysis, miRDeep2 outputs a single .html page that links to all results generated by the module. In the top of the .html file is a survey of miRDeep2 performance for varying score cut-offs, providing estimates of sensitivity and number of true positive novel miRNAs. Below is a table of novel miRNAs discovered in the sequencing data. Each line includes the following information on one novel miRNA candidate: the miRDeep2 score, the probability that the miRNA candidate is genuine given the evidence from the sequencing data, sequence and read count summaries, a link to a graphic representation of structure and read signature (example seen in Figure 2), a link to the UCSC genome browser for the species analyzed, and a link to NCBI blast results for the candidate precursor sequence.
Similar articles
- Identification of novel and known miRNAs in deep-sequencing data with miRDeep2.
Mackowiak SD. Mackowiak SD. Curr Protoc Bioinformatics. 2011 Dec;Chapter 12:12.10.1-12.10.15. doi: 10.1002/0471250953.bi1210s36. Curr Protoc Bioinformatics. 2011. PMID: 22161567 - MicroRNA profiling of the whitefly Bemisia tabaci Middle East-Aisa Minor I following the acquisition of Tomato yellow leaf curl China virus.
Wang B, Wang L, Chen F, Yang X, Ding M, Zhang Z, Liu SS, Wang XW, Zhou X. Wang B, et al. Virol J. 2016 Feb 2;13:20. doi: 10.1186/s12985-016-0469-7. Virol J. 2016. PMID: 26837429 Free PMC article. - Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests.
Vitsios DM, Kentepozidou E, Quintais L, Benito-Gutiérrez E, van Dongen S, Davis MP, Enright AJ. Vitsios DM, et al. Nucleic Acids Res. 2017 Dec 1;45(21):e177. doi: 10.1093/nar/gkx836. Nucleic Acids Res. 2017. PMID: 29036314 Free PMC article. - RNA Binding Proteins in the miRNA Pathway.
Connerty P, Ahadi A, Hutvagner G. Connerty P, et al. Int J Mol Sci. 2015 Dec 26;17(1):31. doi: 10.3390/ijms17010031. Int J Mol Sci. 2015. PMID: 26712751 Free PMC article. Review. - MicroRNA biogenesis: regulating the regulators.
Finnegan EF, Pasquinelli AE. Finnegan EF, et al. Crit Rev Biochem Mol Biol. 2013 Jan-Feb;48(1):51-68. doi: 10.3109/10409238.2012.738643. Epub 2012 Nov 19. Crit Rev Biochem Mol Biol. 2013. PMID: 23163351 Free PMC article. Review.
Cited by
- MicroRNA expression profiles in plasma exosomes of late pregnant giant pandas.
Cheng M, Zhou Y, Wang Q, Luo B, Lai Y, Cheng J, Zhang X, Huang Y, Li D. Cheng M, et al. Mol Biol Rep. 2024 Oct 18;51(1):1068. doi: 10.1007/s11033-024-09988-3. Mol Biol Rep. 2024. PMID: 39422788 - Investigating the Causal Effects of Exercise-Induced Genes on Sarcopenia.
Wang L, Zhang S. Wang L, et al. Int J Mol Sci. 2024 Oct 7;25(19):10773. doi: 10.3390/ijms251910773. Int J Mol Sci. 2024. PMID: 39409102 Free PMC article. - Secretome from estrogen-responding human placenta-derived mesenchymal stem cells rescues ovarian function and circadian rhythm in mice with cyclophosphamide-induced primary ovarian insufficiency.
Le DC, Ngo MT, Kuo YC, Chen SH, Lin CY, Ling TY, Pham QTT, Au HK, Myung J, Huang YH. Le DC, et al. J Biomed Sci. 2024 Oct 11;31(1):95. doi: 10.1186/s12929-024-01085-8. J Biomed Sci. 2024. PMID: 39390588 Free PMC article. - A glimpse into the world of microRNAs and their putative roles in hard ticks.
Leal-Galvan B, Kumar D, Karim S, Saelao P, Thomas DB, Oliva Chavez A. Leal-Galvan B, et al. Front Cell Dev Biol. 2024 Sep 23;12:1460705. doi: 10.3389/fcell.2024.1460705. eCollection 2024. Front Cell Dev Biol. 2024. PMID: 39376631 Free PMC article. Review. - Haplotype-resolved and near-T2T genome assembly of the African catfish (Clarias gariepinus).
Nguinkal JA, Zoclanclounon YAB, Brunner RM, Chen Y, Goldammer T. Nguinkal JA, et al. Sci Data. 2024 Oct 7;11(1):1095. doi: 10.1038/s41597-024-03906-9. Sci Data. 2024. PMID: 39375414 Free PMC article.
References
- Winter J, Jung S, Keller S, Gregory RI, Diederichs S. Many roads to maturity: microRNA biogenesis pathways and their regulation. Nat. Cell Biol. 2009;11:228–234. - PubMed
- Chekulaeva M, Filipowicz W. Mechanisms of miRNA-mediated post-transcriptional regulation in animal cells. Curr. Opin. Cell Biol. 2009;21:452–460. - PubMed
- Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, et al. Combinatorial microRNA target predictions. Nat. Genet. 2005;37:495–500. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources