A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome (original) (raw)

MirGeneDB2.0: the curated microRNA Gene Database

Non-coding RNAs (ncRNA), a significant part of the increasingly popular dark matter of the human genome, have gained substantial attention due to their involvement in animal development and human disorders such as cardiovascular diseases and cancer. Although many different types of regulatory ncRNAs have been discovered over the last 25 years, microRNAs (miRNAs) are unique within these as they are the only class of ncRNAs with individual genes sequentially conserved across the animal kingdom. Because of the conserved roles miRNAs play in establishing robustness of gene regulatory networks across Metazoa, it is important that homologous miRNAs in different species are correctly identified, annotated, and named using consistent criteria against the backdrop of numerous other types of coding and non-coding RNA fragments.

miRNEST database: an integrative approach in microRNA search and annotation

Nucleic Acids Research, 2012

Despite accumulating data on animal and plant microRNAs and their functions, existing public miRNA resources usually collect miRNAs from a very limited number of species. A lot of microRNAs, including those from model organisms, remain undiscovered. As a result there is a continuous need to search for new microRNAs. We present miRNEST (http://mirnest.amu.edu.pl), a comprehensive database of animal, plant and virus microRNAs. The core part of the database is built from our miRNA predictions conducted on Expressed Sequence Tags of 225 animal and 202 plant species. The miRNA search was performed based on sequence similarity and as many as 10 004 miRNA candidates in 221 animal and 199 plant species were discovered. Out of them only 299 have already been deposited in miRBase. Additionally, miRNEST has been integrated with external miRNA data from literature and 13 databases, which includes miRNA sequences, small RNA sequencing data, expression, polymorphisms and targets data as well as links to external miRNA resources, whenever applicable. All this makes miRNEST a considerable miRNA resource in a sense of number of species (544) that integrates a scattered miRNA data into a uniform format with a user-friendly web interface.

miRBase: from microRNA sequences to function

Nucleic Acids Research, 2018

miRBase catalogs, names and distributes microRNA gene sequences. The latest release of miRBase (v22) contains microRNA sequences from 271 organisms: 38 589 hairpin precursors and 48 860 mature microR-NAs. We describe improvements to the database and website to provide more information about the quality of microRNA gene annotations, and the cellular functions of their products. We have collected 1493 small RNA deep sequencing datasets and mapped a total of 5.5 billion reads to microRNA sequences. The read mapping patterns provide strong support for the validity of between 20% and 65% of microRNA annotations in different well-studied animal genomes, and evidence for the removal of >200 sequences from the database. To improve the availability of mi-croRNA functional information, we are disseminating Gene Ontology terms annotated against miRBase sequences. We have also used a text-mining approach to search for microRNA gene names in the full-text of open access articles. Over 500 000 sentences from 18 542 papers contain microRNA names. We score these sentences for functional information and link them with 12 519 microRNA entries. The sentences themselves, and word clouds built from them, provide effective summaries of the functional information about specific microRNAs. miRBase is publicly and freely available at http://mirbase.org/.

miROrtho: computational survey of microRNA genes

Nucleic Acids Research, 2009

MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature~22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige. ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.

MirGeneDB 2.0: the metazoan microRNA complement

Nucleic Acids Research, 2019

Small non-coding RNAs have gained substantial attention due to their roles in animal development and human disorders. Among them, microRNAs are special because individual gene sequences are conserved across the animal kingdom. In addition, unique and mechanistically well understood features can clearly distinguish bona fide miRNAs from the myriad other small RNAs generated by cells. However, making this distinction is not a common practice and, thus, not surprisingly, the heterogeneous quality of available miRNA complements has become a major concern in microRNA research. We addressed this by extensively expanding our curated microRNA gene database-MirGeneDB-to 45 organisms, encompassing a wide phylogenetic swath of animal evolution. By consistently annotating and naming 10,899 microRNA genes in these organisms, we show that previous microRNA annotations contained not only many false positives, but surprisingly lacked >2000 bona fide microRNAs. Indeed, curated microRNA complements of closely related organisms are very similar and can be used to reconstruct ancestral miRNA repertoires. MirGeneDB represents a robust platform for microRNA-based research, providing deeper and more significant insights into the biology and evolution of miRNAs as well as biomedical and biomarker research. MirGeneDB is publicly and freely available at http://mirgenedb.org/.

Large-scale validation of miRNAs by disease association, evolutionary conservation and pathway activity

RNA Biology, 2018

The validation of microRNAs (miRNAs) identified by next generation sequencing involves amplificationfree and hybridization-based detection of transcripts as criteria for confirming valid miRNAs. Since respective validation is frequently not performed, miRNA repositories likely still contain a substantial fraction of false positive candidates while true miRNAs are not stored in the repositories yet. Especially if downstream analyses are performed with these candidates (e.g. target or pathway prediction), the results may be misleading. In the present study, we evaluated 558 mature miRNAs from miRBase and 1,709 miRNA candidates from next generation sequencing experiments by amplification-free hybridization and investigated their distributions in patients with various disease conditions. Notably, the most significant miRNAs in diseases are often not contained in the miRBase. However, these candidates are evolutionary highly conserved. From the expression patterns, target gene and pathway analyses and evolutionary conservation analyses, we were able to shed light on the complexity of miRNAs in humans. Our data also highlight that a more thorough validation of miRNAs identified by next generation sequencing is required. The results are available in miRCarta (https://mircarta.cs.uni-saarland.de).

Towards a Consistent, Quantitative Evaluation of MicroRNA Evolution

Journal of integrative bioinformatics, 2017

The miRBase currently reports more than 25,000 microRNAs in several hundred genomes that belong to more than 1000 families of homologous sequences. Quantitative investigations of miRNA gene evolution requires the construction of data sets that are consistent in their coverage and include those genomes that are of interest in a given study. Given the size and structure of data, this can be achieved only with the help of a fully automatic pipeline that improves the available seed alignments, extends the set of available sequences by homology search, and reliably identifies true positive homology search results. Here we describe the current progress towards such a system, emphasizing the task of improving and completing the initial seed alignment.

Large-scale genome analysis reveals unique features of microRNAs

Gene, 2009

Keywords: microRNA Precursor Minimal folding free energy Minimal folding free energy index Metazoan Although great progress has been made in identifying microRNAs (miRNAs) and their functions, their essential functional features remain largely unknown. In this study, we systemically investigated the nucleotide and thermodynamic folding distribution characteristics of 3853 miRNAs currently reported for metazoans. We determined that uracil is the dominant nucleotide in both mature and precursor sequences, and that it is particularly enriched at three sites in mature miRNAs: the first, ninth, and the five terminal 3′ nucleotides. The location of these enriched uracil nucleotides is particularly interesting because positions one and nine are the edges of the "seed region", which is responsible for targeting mRNAs for gene regulation. The prevalence of U residues at these sites may contribute to the mechanism whereby miRNAs target and bind to their corresponding mRNAs. A comparison of the overall lengths of metazoan pre-miRNAs revealed that they ranged from 53 to 215 nt in length with an average of 88.10 ± 14.14 nt, significantly higher than previously reported. Comparisons of miRNA diversity at different taxonomic levels revealed that the 12 features investigated in this study varied significantly among miRNAs represented by different phyla, with particularly high levels of divergence in platyhelminths relative to nematodes, arthropods or vertebrates. By comparison, lower levels of diversity were observed at lower taxonomic levels such that there was a direct relationship between divergence in miRNA features and taxonomic level. We conclude that large-scale genome analysis shows that miRNAs have many more unique features than previously reported. In particular, the distribution of nucleotides suggests an important role for uracil at the boundaries of the 'seed' region and at their termini. These results will facilitate the design of new computational programs for identifying novel miRNAs and investigating the mechanism of miRNA-mediated gene regulation.

Lowly Expressed Human MicroRNA Genes Evolve Rapidly

Molecular Biology and Evolution, 2009

To study the evolution of human microRNAs (miRNAs), we examined nucleotide variation in humans, sequence divergence between species, and genomic clustering patterns for miRNAs with different expression levels. We found that expression level is a major indicator of the rate of evolution and that ;30% of currently annotated human miRNA genes are almost free of selective pressure.

miRNAMap 2.0: genomic maps of microRNAs in metazoan genomes

Nucleic Acids Research, 2007

MicroRNAs (miRNAs) are small non-coding RNA molecules that can negatively regulate gene expression and thus control numerous cellular mechanisms. This work develops a resource, miRNAMap 2.0, for collecting experimentally verified microRNAs and experimentally verified miRNA target genes in human, mouse, rat and other metazoan genomes. Three computational tools, miRanda, RNAhybrid and TargetScan, were employed to identify miRNA targets in 3'-UTR of genes as well as the known miRNA targets. Various criteria for filtering the putative miRNA targets are applied to reduce the false positive prediction rate of miRNA target sites. Additionally, miRNA expression profiles can provide valuable clues on the characteristics of miRNAs, including tissue specificity and differential expression in cancer/normal cell. Therefore, quantitative polymerase chain reaction experiments were performed to monitor the expression profiles of 224 human miRNAs in 18 major normal tissues in human. The negative correlation between the miRNA expression profile and the expression profiles of its target genes typically helps to elucidate the regulatory functions of the miRNA. The interface is also redesigned and enhanced. The miRNAMap 2.0 is now available at http://miRNAMap.mbc.nctu.edu.tw/.