miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades - PubMed (original) (raw)

miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades

Marc R Friedländer et al. Nucleic Acids Res. 2012 Jan.

Abstract

microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6-99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions. Last, a new miRNA expression profiling routine, low time and memory usage and user-friendly interactive graphic output can make miRDeep2 useful to a wide range of researchers.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Flow charts of modules. Flow charts for (A) the miRDeep2 module (identifies known and novel miRNAs in high-throughput sequencing data), (B) the Mapper module (processes Illumina output and maps it to the reference genome) and (C) the Quantifier module (sums up read counts for known miRNAs in a sequencing data set). For each module the input, internal work flow (in black borders) and output is shown. Files are presented in rectangular boxes; processes are presented in rounded boxes. Files and processes that are novel to miRDeep2 are in yellow. Files and processes that have been modified are in green. Those that remain largely unchanged from the first version of miRDeep are in blue, while those that are optional are in grey. The file formats are: .fa, fasta; .arf, arf mapping format; .str, RNAfold output; .rand, randfold output; .mrd, miRDeep2 text output; .csv, csv spread-sheet; _seq.txt, raw sequence output from the Illumina platform; seq, sequence given on command line (see online documentation for description of formats). The ‘work flow of miRDeep2 modules’ results section contains detailed descriptions of all steps.

Figure 2.

Figure 2.

Novel human miRNA detected in three independent liver samples. The upper left table gives the miRDeep2 score break-down for the reported miRNA, along with read counts for the mature, loop and star sequence. The upper right figure shows the predicted RNA secondary structure of the hairpin, partitioned according to miRNA biogenesis: red, mature; yellow, loop; purple, star. The middle density plot shows the distribution of reads in the predicted precursor sequence. The sequences below indicate the positions of the mature, loop and star strand. The positions of the star strand as expected from Drosha/Dicer processing is shown in light blue, while the star consensus positions as observed from the sequencing data is shown in purple. The dotted lines below show the aligned reads. Mismatched nucleotides are presented in upper case. mm, number of mismatches. Both mature and star miRNA strands of this particular miRNA are detected in each of three independent liver samples (NL1-NL3).

Figure 3.

Figure 3.

miRDeep2 performance on sequencing data from seven animal clades. miRDeep2 was run on Illumina sequencing data from seven animal species, representing deuterostomes (human, mouse, sea squirts), ecdysozoans (fruit fly, nematode), lophotrochozoans (planaria) and non-bilaterians (sea anemone). Accuracy is calculated as accuracy = sensitivity × prevalence + specificity (1-prevalence) and ranges from 98.6% to 99.9%. Sensitivity is calculated as the fraction of correctly classified miRNA loci. Specificity is the fraction of correctly classified non-miRNA loci. Prevalence is the fraction of analyzed loci which are miRNA loci. In the calculations, miRNA loci is set equivalent to miRBase miRNA loci, for a discussion of this assumption, see the ‘Results’ section. The true positive rate of novel miRNAs is estimated from the miRDeep2 built-in controls. In five of the seven species, both mature and star strand of novel miRDeep2 miRNAs were detected in at least two independent samples. No such independent detection was possible in the sea anemone data, as data from a single sample was analyzed.

Figure 4.

Figure 4.

Effect of Dicer silencing on small RNA expression. RNA interference was used to silence Dicer in a MCF-7 cell line. (A) Schematic representation of the experiment; levels of Dicer mRNA in total cells (B) or cytoplasm (C) before and after silencing. Fold-changes in small RNA expression are noted for (D) snoRNAs, (E) tRNAs, (F) rRNAs, (G) genomic control sequences, (H) miRBase miRNAs, (I) novel miRNAs reported by miRDeep2. The median fold-change is indicated above each plot. A comparison of predictions by miRDeep2 with (J) miRanalyzer, (K) MIReNA and (L) miRTRAP was done. The predicted precursors are assigned to sets based on sequencing support, e.g. all the precursors in sets labeled ‘5’ are supported by five or more sequencing reads (control + siDicer). Precursors reported only by miRDeep2 are in blue, precursors reported only by the competing program are in orange. Precursors reported by both are in purple.

Figure 5.

Figure 5.

Example miRDeep2 output: performance survey and novel human miRNAs. For each analysis, miRDeep2 outputs a single .html page that links to all results generated by the module. In the top of the .html file is a survey of miRDeep2 performance for varying score cut-offs, providing estimates of sensitivity and number of true positive novel miRNAs. Below is a table of novel miRNAs discovered in the sequencing data. Each line includes the following information on one novel miRNA candidate: the miRDeep2 score, the probability that the miRNA candidate is genuine given the evidence from the sequencing data, sequence and read count summaries, a link to a graphic representation of structure and read signature (example seen in Figure 2), a link to the UCSC genome browser for the species analyzed, and a link to NCBI blast results for the candidate precursor sequence.

Similar articles

Cited by

References

    1. Winter J, Jung S, Keller S, Gregory RI, Diederichs S. Many roads to maturity: microRNA biogenesis pathways and their regulation. Nat. Cell Biol. 2009;11:228–234. - PubMed
    1. Chekulaeva M, Filipowicz W. Mechanisms of miRNA-mediated post-transcriptional regulation in animal cells. Curr. Opin. Cell Biol. 2009;21:452–460. - PubMed
    1. Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19:92–105. - PMC - PubMed
    1. Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, et al. Combinatorial microRNA target predictions. Nat. Genet. 2005;37:495–500. - PubMed
    1. Stoeckius M, Maaskola J, Colombo T, Rahn HP, Friedlander MR, Li N, Chen W, Piano F, Rajewsky N. Large-scale sorting of C. elegans embryos reveals the dynamics of small RNA expression. Nat. Methods. 2009;6:745–751. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources