Origin, Evolution, and Biological Role of miRNA Cluster in DLK-DIO3 Genomic Region in Placental Mammals (original) (raw)
Journal Article
,
Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia
Search for other works by this author on:
,
Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia
Search for other works by this author on:
,
Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia
Search for other works by this author on:
Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia
Search for other works by this author on:
Accepted:
04 February 2008
Published:
04 February 2008
Cite
Evgeny A. Glazov, Sean McWilliam, Wesley C. Barris, Brian P. Dalrymple, Origin, Evolution, and Biological Role of miRNA Cluster in DLK-DIO3 Genomic Region in Placental Mammals, Molecular Biology and Evolution, Volume 25, Issue 5, May 2008, Pages 939–948, https://doi.org/10.1093/molbev/msn045
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
MicroRNAs (miRNAs) are a rapidly growing family of small regulatory RNAs modulating gene expression in plants and animals. In animals, most of the miRNAs discovered in early studies were found to be evolutionarily conserved across the whole kingdom. More recent studies, however, have identified many miRNAs that are specific to a particular group of organisms or even a single species. These present a question about evolution of the individual miRNAs and their role in establishing and maintaining lineage-specific functions and characteristics.
In this study, we describe a detailed analysis of the miRNA cluster (hereafter mir-379/mir-656 cluster) located within the imprinted DLK-DIO3 region on human chromosome 14. We show that orthologous miRNA clusters are present in all sequenced genomes of the placental (eutherian) mammals but not in the marsupial (metatherian), monotreme (prototherian), or any other vertebrate genomes. We provide evidence that the locus encompassing this cluster emerged in an early eutherian ancestor prior to the radiation of modern placental mammals by tandem duplication of the ancient precursor sequence. The original amplified cluster may have contained in excess of 250 miRNA precursor sequences, most of which now appear to be inactive. Examination of the eutherian genomes showed that the cluster has been maintained in evolution for approximately 100 Myr.
Analysis of genes that contain predicted evolutionarily conserved targets for miRNAs from this cluster revealed significant overrepresentation of the Gene Ontology terms associated with biological processes such as neurogenesis, embryonic development, transcriptional regulation, and RNA metabolism. Consistent with these findings, a survey of the miRNA expression data within the cluster demonstrates a strong bias toward brain and placenta samples from adult organisms and some embryonic tissues.
Our results suggest that emergence of the mir-379/mir-656 miRNA cluster was one of the factors that facilitated evolution of the placental mammals. Overrepresentation of genes involved in regulation of neurogenesis among predicted miRNAs targets indicates an important role of the mir-379/mir-656 cluster in this biological process in the placental mammals.
Introduction
MicroRNAs (miRNAs) are small 21–25 nt regulatory RNAs modulating gene expression in animals and plants. In animals, regulation of gene expression by miRNAs is achieved by sequence-specific targeting of the 3′ untranslated regions (UTRs) of messenger RNAs (mRNAs) by the RNA-induced silencing complex that results in translational repression of the protein synthesis (He and Hannon 2004). In the past few years, the number of discovered miRNAs has increased from tens to thousands and is likely to grow further (Griffiths-Jones et al. 2006). Although most of the miRNAs discovered early were found to be highly conserved in evolution, more and more of the newly identified miRNAs are present in only a small group of organisms and in some cases in a single species (Bentwich et al. 2005; Berezikov, Thuemmler, et al. 2006; Berezikov, van Tetering, et al. 2006; Ruby et al. 2007). The functional significance of these evolutionarily divergent miRNAs has not been established experimentally. However, it is hypothesized that these miRNAs might play a role in establishing and maintaining phenotypic diversity between different groups of organisms (Plasterk 2006; Sempere et al. 2006). A few comparative studies have established connections between some miRNAs and evolutionary changes in animal body plan (Tanzer and Stadler 2004; Sempere et al. 2006; Prochnik et al. 2007).
The miRNA cluster mir-379/mir-656 was originally described as 2 families of related repeats adjacent to a small nucleolar RNA cluster located within the imprinted DLK-DIO3 region on human chromosome 14 (Cavaille et al. 2002). Since then, mature miRNAs derived from most of these repeats have been experimentally identified in mouse, rat, chimpanzee, human, and cow (Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Kim et al. 2004; Seitz et al. 2004; Suh et al. 2004; Bentwich et al. 2005; Berezikov, Thuemmler, et al. 2006; Berezikov, van Tetering, et al. 2006; Coutinho et al. 2007). Currently, the miRBase miRNA database contains 38 human and 35 mouse miRNAs that originate from this cluster, which makes it the largest known miRNA cluster in vertebrates (Griffiths-Jones et al. 2005). In mouse, some of these miRNAs were shown to be expressed as products of a large noncoding transcript named Mirg (Seitz et al. 2003).
We used a combination of comparative genomics and bioinformatics approaches to examine the evolutionary history of the mir-379/mir-656 cluster and its function in vertebrate biology and evolution.
Materials and Methods
Sources of Sequences and Assemblies
Draft genome assemblies of armadillo, elephant, and tenrec were produced by the Broad Institute at Massachusetts Institute of Technology and Harvard (http://www.broad.mit.edu/). Draft genome assembly of platypus genome was produced by the Genome Sequencing Center at Washington University School of Medicine in St Louis (http://genome.wustl.edu/genome_group_index.cgi). Draft genome assembly of cow genome was produced by Baylor College of Medicine Sequencing Center (http://www.hgsc.bcm.tmc.edu/projects/bovine/). Human, chimpanzee, mouse, rat, dog, possum, chicken, and puffer fish genomes were produced by their respective genome sequencing consortiums (Lander et al. 2001; Aparicio et al. 2002; Waterston et al. 2002; Consortium 2004, 2005; Gibbs et al. 2004; Lindblad-Toh et al. 2005; Mikkelsen et al. 2007). Unless specified otherwise, sequences of miRNA precursors and mature miRNAs were obtained from the latest release of miRNA registry (miRBase 10.0, August 2007, http://microrna.sanger.ac.uk/sequences/) (Griffiths-Jones et al. 2005, 2006).
Sequence Searches and Analysis
Pairwise whole-genome sequence alignments and “RefSeq” gene annotation data were obtained from UCSC genome browser (http://genome.ucsc.edu/) (Kent et al. 2002, 2003; Karolchik et al. 2003). DLK-DIO3 syntenic regions were identified using UCSC whole-genome chained sequence alignments as described by (Kent et al. 2003). The regions from different species were considered syntenic if the gene order was preserved. BLAT was used for sequence similarity cross-searches between different genomes (Kent 2002). BLAT parameters were determined empirically by searching for known human and mouse miRNAs reciprocally in the genomes of these 2 organisms. Maximum search sensitivity was achieved with a tile size set to 6 and overall minimum sequence identity set to 65%. These parameters were used to query vertebrate genomes using known human and mouse pre-miRNA sequences. Only the alignments covering at least 90% of the query pre-miRNA sequence were considered as orthologous pre-miRNA candidate sequences. The following genome assemblies were used in this study: human, hg18; chimpanzee, panTro2; mouse, mm8; rat, rn3; dog, canFam2; cow, bTau2; chicken, galGal3; elephant, loxAfr1; armadillo, dasNov1; opossum, monDom4; tenrec, echTel1; and fugu, fr2.
Detection and Analysis of Sequence Motifs
Sequence motifs were identified using MEME algorithm at the San Diego Supercomputing Center Web site (http://meme.sdsc.edu/meme/intro.html) (Bailey and Elkan 1994). The following parameters were applied: model = tcm, minimum width = 6, maximum width = 100, minimum sites = 2, maximum sites = 300. Sequence logos were generated using WebLogo 2.8.2 at http://weblogo.berkeley.edu/logo.cgi (Crooks et al. 2004).
The miRNA Target Genes, Gene Ontology Enrichment, and P Values
Predicted miRNA target genes were obtained from the TargetScan 4.0 Web site (http://www.targetscan.org/) (Lewis et al. 2003). To reduce false-positive rate of miRNA target prediction in our analyses, we considered a gene to be a true miRNA target if it contained at least 2 evolutionarily conserved miRNA target sites within its 3′ UTR. Gene Ontology (GO) annotations were downloaded from the GO consortium Web site (April 2007, http://www.geneontology.org/) (Camon et al. 2004; Harris et al. 2004). “Known Isoforms” identifiers for UCSC human (hg18) and mouse (mm8) “Known Genes” were used to make sure that a gene was only counted once where there were multiple isoforms. A Perl script and Structured Query Language code were created to calculate enrichment of terms and “Fisher's exact” P values against a background of all GO-annotated genes in the UCSC Known Genes database. For significance, we required at least 2-fold enrichment, P < 1 × 10−5, and at least 10 associated Known Genes in the target genes sample.
Results
Evolution of the mir-379/mir-656 Cluster
To address the evolutionary origin of the mir-379/mir-656 cluster, we examined the sequenced genomes of 11 mammals, chicken, and puffer fish. Using whole-genome sequence alignments from the UCSC genome browser, we searched for the DLK1-DIO3 syntenic regions in the assembled vertebrate genomes. We found that DLK1-DIO3 synteny is maintained in all mammalian and bird genomes but is absent in the puffer fish genome (table 1).
Table 1
Synteny of the mir-379/mir-656 Cluster
Organism | Genome Size (Gb) | Chromosome, Scaffold, Contig | DLK1-DIO3 Locus Size (kb) | mir-379/mir-656 Size (kb) |
---|---|---|---|---|
Human | 2.8 | chr14 | 826.2 | 44.7 |
Chimp | 2.8 | chr14 | 846.3 | 46.5 |
Mouse | 2.5 | chr12 | 818.5 | 35.4 |
Rat | 2.7 | chr6 | ND | 37.9 |
Dog | 2.4 | chr8 | 726.7 | 41.4 |
Cow | 2.4 | chr21 | 859.5 | 42.9 |
Armadillo | ∼3.0 | Scaffold 5303, scaffold 19965 | ND | ∼38 |
Elephant | ∼3.0 | Scaffold 4770 | ND | ND |
Tenrec | ∼3.0 | ND | ND | ND |
Possum | 3.4 | chr1 | 1603.5 | A |
Platypus | ∼3.0 | Ultracontig 378 | ∼700 | A |
Chicken | 1.0 | chr5 | 346.3 | A |
Puffer fish | 0.33 | chrUn | ND | A |
Organism | Genome Size (Gb) | Chromosome, Scaffold, Contig | DLK1-DIO3 Locus Size (kb) | mir-379/mir-656 Size (kb) |
---|---|---|---|---|
Human | 2.8 | chr14 | 826.2 | 44.7 |
Chimp | 2.8 | chr14 | 846.3 | 46.5 |
Mouse | 2.5 | chr12 | 818.5 | 35.4 |
Rat | 2.7 | chr6 | ND | 37.9 |
Dog | 2.4 | chr8 | 726.7 | 41.4 |
Cow | 2.4 | chr21 | 859.5 | 42.9 |
Armadillo | ∼3.0 | Scaffold 5303, scaffold 19965 | ND | ∼38 |
Elephant | ∼3.0 | Scaffold 4770 | ND | ND |
Tenrec | ∼3.0 | ND | ND | ND |
Possum | 3.4 | chr1 | 1603.5 | A |
Platypus | ∼3.0 | Ultracontig 378 | ∼700 | A |
Chicken | 1.0 | chr5 | 346.3 | A |
Puffer fish | 0.33 | chrUn | ND | A |
NOTE.—The table summarizes sizes and genomic locations of DLK1-DIO3 syntenic loci and the miRNAs clusters in different vertebrate genomes. Where whole-genome assemblies are available, the chromosomes bearing DLK1-DIO3 loci and the miRNA cluster are listed. Genomic scaffold or contig numbers are provided for partially assembled genomes. ND, not determined; A, absent; the miRNA cluster is completely absent in nonplacental vertebrates.
Table 1
Synteny of the mir-379/mir-656 Cluster
Organism | Genome Size (Gb) | Chromosome, Scaffold, Contig | DLK1-DIO3 Locus Size (kb) | mir-379/mir-656 Size (kb) |
---|---|---|---|---|
Human | 2.8 | chr14 | 826.2 | 44.7 |
Chimp | 2.8 | chr14 | 846.3 | 46.5 |
Mouse | 2.5 | chr12 | 818.5 | 35.4 |
Rat | 2.7 | chr6 | ND | 37.9 |
Dog | 2.4 | chr8 | 726.7 | 41.4 |
Cow | 2.4 | chr21 | 859.5 | 42.9 |
Armadillo | ∼3.0 | Scaffold 5303, scaffold 19965 | ND | ∼38 |
Elephant | ∼3.0 | Scaffold 4770 | ND | ND |
Tenrec | ∼3.0 | ND | ND | ND |
Possum | 3.4 | chr1 | 1603.5 | A |
Platypus | ∼3.0 | Ultracontig 378 | ∼700 | A |
Chicken | 1.0 | chr5 | 346.3 | A |
Puffer fish | 0.33 | chrUn | ND | A |
Organism | Genome Size (Gb) | Chromosome, Scaffold, Contig | DLK1-DIO3 Locus Size (kb) | mir-379/mir-656 Size (kb) |
---|---|---|---|---|
Human | 2.8 | chr14 | 826.2 | 44.7 |
Chimp | 2.8 | chr14 | 846.3 | 46.5 |
Mouse | 2.5 | chr12 | 818.5 | 35.4 |
Rat | 2.7 | chr6 | ND | 37.9 |
Dog | 2.4 | chr8 | 726.7 | 41.4 |
Cow | 2.4 | chr21 | 859.5 | 42.9 |
Armadillo | ∼3.0 | Scaffold 5303, scaffold 19965 | ND | ∼38 |
Elephant | ∼3.0 | Scaffold 4770 | ND | ND |
Tenrec | ∼3.0 | ND | ND | ND |
Possum | 3.4 | chr1 | 1603.5 | A |
Platypus | ∼3.0 | Ultracontig 378 | ∼700 | A |
Chicken | 1.0 | chr5 | 346.3 | A |
Puffer fish | 0.33 | chrUn | ND | A |
NOTE.—The table summarizes sizes and genomic locations of DLK1-DIO3 syntenic loci and the miRNAs clusters in different vertebrate genomes. Where whole-genome assemblies are available, the chromosomes bearing DLK1-DIO3 loci and the miRNA cluster are listed. Genomic scaffold or contig numbers are provided for partially assembled genomes. ND, not determined; A, absent; the miRNA cluster is completely absent in nonplacental vertebrates.
Table 2
Evolution of the mir-379/mir-656 miRNA Cluster
Table 2
Evolution of the mir-379/mir-656 miRNA Cluster
To identify miRNAs orthologous to human miRNAs from the mir-379/mir-656 cluster within the syntenic DLK1-DIO3 regions of vertebrate genomes, we performed sequence similarity searches using BLAT (Kent 2002). The orthologous miRNAs clusters were easily identifiable in all examined genomes of placental mammals but were not detectable in the marsupial (Monodelphis domestica), monotreme (Ornithorhynchus anatinus), or any other nonmammalian vertebrate genome (tables 1 and 2 and Supplementary Material online). These data are consistent with the results of the similar analysis performed by Seitz et al. (2004) in worm (Caenorhabditis elegans), fruit fly (Drosophila melanogaster), and puffer fish (Fugu rubripes) genomes. Together, these results demonstrate that the mir-379/mir-656 cluster is an evolutionary innovation that is uniquely present in the placental mammals.
To examine the evolution of this cluster in detail, we looked at the evolutionarily conservation of the individual miRNA sequences within the cluster between different vertebrate genomes. The results of this analysis are summarized in the table 2. It is evident that most of the sequences of the known experimentally validated miRNAs are present in all examined genomes of the placental mammals. Although the assemblies of elephant (Loxodonta africana), lesser hedgehog (tenrec, Echinops telfairi), and armadillo (Dasypus novemcinctus) genomes are incomplete and the exact evolutionary fate of some miRNAs could not be resolved at present, it is important to note that most of the miRNA sequences from the mir-379/mir-656 cluster are present in these genomes. These 3 species are descendants of the lineages that diverged from the common placental ancestor at the early stages of mammalian evolution (fig. 1). The estimated divergence time between elephant, tenrec, armadillo, and human lineages is approximately 100 Myr, whereas separation of the eutherian lineage from the common mammalian ancestor are thought to have occurred between 180 and 140 MYA (Hedges et al. 2006). This leads us to conclude that the mir-379/mir-656 cluster emerged early in the eutherian lineage prior to radiation of modern placental mammals. The fact that the cluster has been maintained in different groups of placental mammals for approximately 100 Myr without any major structural rearrangements indicates that the whole cluster may function as a coordinated unit with an important biological role in this group of organisms.
FIG. 1.—
Summary of phylogenetic relationships of vertebrate species addressed in this study. The tree structure and estimated divergence times are used with modifications from Hedges et al. (2006) and Murphy et al. (2001).
Origin of the mir-379/mir-656 Cluster, Novel miRNA Candidates, and Regulatory Sequence Motifs
Sequence similarity observed between the individual miRNA precursors within the mir-379/mir-656 cluster led previous studies to conclude that these miRNAs originated from a common ancestral sequence by a process of tandem duplication (Seitz et al. 2004; Hertel et al. 2006). To identify the unit of amplification and to determine whether miRNAs from the cluster may share some regulatory elements, we examined human genomic sequences located between miRNA precursor sequences and 1 kb adjacent to the miRNA cluster on either side. We searched for overrepresented sequence motifs of a variable length using a motif discovery algorithm MEME (Bailey and Elkan 1994). This analysis identified 2 motifs that were very significantly overrepresented within the cluster as compared with a random set of genomic sequences of a similar total length. Motif 1 was 21 nt long and was present 147 times within the mir-379/mir-656 cluster, which corresponds to the MEME-calculated e value of 1.2 × 10−209 (fig. 2A). Motif 2 was 23 nt long and was present at 115 sites within the cluster, which corresponds to the MEME-calculated e value of 2.4 × 10−70 (fig. 2A). Further inspection of the distribution of the motifs within the miRNA cluster revealed that both motifs are often present adjacent to known experimentally validated miRNA precursor sequences suggesting a regulatory function in expression and/or processing of the primary miRNA transcripts. We also noticed that copies of motif 2 frequently followed a copy of motif 1 and that both motifs have a regular periodic distribution across the ∼45-kb genomic region encompassing the mir-379/mir-656 cluster (supplementary fig. 2, Supplementary Material online). To analyze this further, we calculated distances between neighboring pairs of motifs 1 and 2. After plotting the resulting distribution of the distances, we found that it had 1 major peak at ∼160 bases (fig. 3). Interestingly, the distribution of the distances constituting this peak strongly resembled length distributions of a subset of the 38 known human miRNA precursor sequences with the adjacent motifs 1 and 2 (fig. 2B). This result suggests that we are observing the vestiges of an original amplified array consisting of an ∼160 bases long repeat unit comprised of a single copy each of motif 1, motif 2, and miRNA precursor sequence (fig. 2B). Consistent with this, we were able to identify several additional sequences located within the ∼45-kb locus that share similarity with known active miRNA precursors. Although some of these are only partially similar to the known miRNAs and are likely to be remnants of the ancestral repeat sequences that gave rise to the mir-379/mir-656 cluster, others are highly similar to the known miRNAs and show evolutionary sequence conservation in at least 2 of the examined mammalian genomes. Table 2 shows 13 of these sequences. Importantly, 2 of them were experimentally validated by Berezikov, Thuemmler, et al. (2006a) during the course of this study (supplementary table 2, Supplementary Material online). Another 6 were previously identified as putative miRNA candidates (Seitz et al. 2004). The remaining 5 sequences are reported here as potential miRNA candidates for the first time (table 2, Additional file 2 [Supplementary Material online). Although there is no experimental evidence to identify these sequences as functional miRNAs, the pattern of their evolutionary conservation suggests that they are expressed at least in some of the eutherian mammals.
FIG. 2.—
Amplified repeat units and sequence motifs within the mir-379/mir-656 miRNAs cluster. (A) Sequence logos of the 2 overrepresented sequence motifs within human mir-379/mir-656 miRNAs cluster. The y axis shows informational content at each base position within the motif. Sequence logos for motifs 1 and 2 were created based on 147 and 115 individual sequences present within the mir-379/mir-656 miRNAs cluster, respectively. A blue bar indicates a possible hnRNP A1–binding site. (B) Structure of the proposed amplified repeat unit. The figure shows relative positions of the miRNA precursor sequence and of the 2 motifs. Vertical arrows indicate suggested positions of the splice sites boundaries in the ancestral amplified array.
FIG. 3.—
Size of the amplified repeat unit. Blue bars represent the distribution of pairwise distances between the adjacent motifs within human mir-379/mir-656 miRNAs cluster. Orange bars represent distribution of lengths of the 38 known human miRNA precursors from within the cluster measured with the 2 adjacent motifs. Each vertical bar represents a number of occurrences within a bin size 5. There were 21 individual single-point motif distances beyond cutoff of the x axis at 400 nt.
Evolving miRNAs
Despite the overall similarity in structure and sequence conservation between the orthologous mir-379/mir-656 miRNA clusters in placental mammals, we found several examples demonstrating an ongoing evolutionary selection of the individual miRNAs within the cluster. The loss and gain of the individual miRNAs is best illustrated by the rodent lineage. For example, we were able to identify mouse and rat sequences orthologous to human miRNAs mir-329-2, mir-655, mir-487a, and mir-656. However, the detailed analysis of multiple sequence alignments between rodents and other mammals showed that mouse and rat sequences have accumulated nucleotide substitutions, small deletions, and/or insertions, which are likely to affect secondary structure necessary for correct processing of the mature miRNAs. To assess this, we compared MFOLD 3.2–predicted RNA secondary structures of the rodent sequences and experimentally validated orthologous miRNA sequences from human (Zuker 2003). We found that rodent sequences failed to produce characteristic hairpin-like miRNA precursor structures (supplementary fig. 1, Supplementary Material online). Therefore, we conclude that these sequences do not code for functional miRNAs in either mouse or rat and are likely to be remnants of the ancestral miRNA sequences (table 2, supplementary fig. 1 [Supplementary Material online]). The opposite process of the evolutionarily fixation of the individual lineage-specific miRNAs is also evident in rodents. Using similar analysis, we found that whereas degenerate orthologous sequences for rodent mir-679, mir-666, and mir-667 are present in all genomes of the placental mammals, the functional miRNAs seem to be present only in mouse and rat. Examples of the lineage-specific evolutionary selection of miRNAs have been also reported for primate and other lineages (table 2) (Berezikov, Thuemmler, et al. 2006; Hertel et al. 2006).
The miRNA Target Genes Point to Eutherian-Specific Biological Processes
In the publication describing the mir-379/mir-656 cluster, Seitz et al. (2003, 2004) hypothesized that the common origin of the miRNAs within the cluster and their coexpression from a large polycistronic transcript Mirg may also result in a common set of target genes. To investigate this possibility, we used TargetScan 4.0 data for the predicted evolutionary conserved vertebrate target sites for miRNAs from mir-379/mir-656 cluster to examine GO annotations of biological process associated with miRNA target genes in the human and mouse genomes (Lewis et al. 2003). The results of this analysis show that 5 functional categories of GO terms were significantly overrepresented among predicted miRNA target genes (fig. 4, supplementary fig. 3 [Supplementary Material online]). These categories can be broadly defined as regulation of transcription, RNA metabolism, cell motility, neurogenesis, and embryonic development. While genes involved in regulation of transcription and RNA metabolism appear to be common targets for many miRNAs and have been reported in several studies (Lewis et al. 2003; John et al. 2004; Grun et al. 2005), overrepresentation of target genes involved in neurogenesis, cell motility, and embryonic development is highly specific for the mir-379/mir-656 cluster. Consistent with this result, these 3 groups of genes did not show any significant overrepresentation when the same analysis was repeated with a random set of miRNAs of the same size, although as expected regulation of transcription genes were overrepresented in this set (data not shown).
FIG. 4.—
GO terms significantly overrepresented among conserved vertebrate miRNA targets of the mir-379/mir-656 miRNAs cluster. The diagram shows significantly overrepresented GO terms from annotations of biological processes. Connections between broad high-hierarchy terms and more specific low-hierarchy terms are shown as arrows. Shading color code is as follows. Yellow shade boxes represent significantly overrepresented terms that passed both cutoff criteria, P < 1 × 10−5, and at least 2-fold enrichment. Open boxes represent associated terms with highly significant P values but lower than 2-fold enrichment. The terms in gray shade boxes do not show any enrichment and provided only as guidance for terms connections and hierarchy. The subset of the GO terms associated with regulation of transcription were not included in this diagram, see supplementary figure 3 (Supplementary Material online).
Because the miRNA cluster has emerged after the divergence of the bird lineage and prior to the radiation of the eutherian mammals, it is likely that most, if not all, of the targets would be present in the eutherian mammals but not in the birds. To test this hypothesis, we repeated the GO analysis with only those genes that contained predicted miRNA-binding sites that are conserved in the eutherian genomes but not in chicken genome. We found that GO terms associated with neurogenesis and cell motility were still significantly enriched in this set of target genes, but GO terms associated with embryonic development were no longer significantly overrepresented.
Interestingly, GO terms related to different aspects of nervous system development were most common in the whole set of the overrepresented terms. These terms showed higher overall enrichment and lower P values compared with terms related to other biological processes. The biological process term that showed the highest enrichment was axon guidance (fig. 4). Logically, this term unites 2 other significantly overrepresented biological process terms: cell migration and axonogenesis. Examination of the individual target genes within this class demonstrates that some of them, like brain-derived neurotrophic factor, contain up to 7 evolutionarily conserved miRNA target sites within their 3′ UTR for different miRNAs from the mir-379/mir-656 cluster. As can be expected from the GO annotations, the predicted miRNA target genes would be highly expressed in the tissues related to GO annotations—namely, embryonic tissues and various parts of the developing and the adult brain. Indeed, out of 18 miRNA target genes associated with the GO term of axon guidance, 14, including human homolog of Robo1, 2 ephrin receptors, and neurogenin 2, show high expression levels in various parts of the brain and the remaining 4 genes show moderate expression in at least one brain region (see GNF Gene Expression Atlas at http://symatlas.gnf.org/SymAtlas/ and Allen Brain Atlas at http://www.brain-map.org data [Su et al. 2004; Lein et al. 2007]). More importantly, our survey of the miRNA expression data from within the cluster also shows that most of these miRNAs were frequently detected in or cloned from the various adult brain–derived samples and some embryonic tissue samples (Seitz et al. 2004; Bentwich et al. 2005; Berezikov, Thuemmler, et al. 2006; Cummins et al. 2006) (for detailed summary and additional references, see supplementary table 2, Supplementary Material online).
These results demonstrate a significant overlap between the expression profile of the miRNAs from the mir-379/mir-656 cluster and their predicted target genes. Such overlap in the expression patterns between miRNAs and their predicted target genes strongly suggest that these results reflect biologically relevant miRNA–target interactions rather than unexpected biases in miRNA target predictions or GO annotations.
Discussion
Origin and Evolution of the mir-379/mir-656 miRNA Cluster
Consistent with the earlier studies, our results show that the mir-379/mir-656 cluster is an evolutionary innovation that appears first in the eutherian mammals (Seitz et al. 2004; Hertel et al. 2006). Acquisition of novel miRNA genes in evolution is a common trend in different groups of the metazoa that has been well documented recently (Hertel et al. 2006). However, unlike other novel miRNA genes that mostly originated from individual duplications of existing miRNA genes or exaptation of different genomic sequences (Smalheiser and Torvik 2005), mir-379/mir-656 cluster has a different origin. We have shown that not only the individual members of mir-379/mir-656 cluster but also the entire ∼45-kb genomic region encompassing these miRNAs originated from an ancestral repeat unit that was amplified over 250 times (fig. 2B). Although initially the amplified copies of the repeat unit might have been functionally identical, their subsequent evolutionary fate followed 1 of the 3 alternatives described by the duplication–degeneration–complementation model (Force et al. 1999). This model suggests that duplication of a gene results in complete loss of function of the redundant copy (degeneration) without necessarily loss of the sequence itself (generation of a pseudogene), evolutionary preservation of both copies if they evolve to perform complimentary functions (subfunctionalization), or one of the copies evolves to perform an entirely new function (neofunctionalization) (Force et al. 1999). It is evident that in the case of mir-379/mir-656 cluster, all these possibilities have been realized, resulting in the array of all known miRNAs within the cluster. Events of functional degeneration are readily identifiable within the mir-379/mir-656 cluster. Although the remains of many of the ancestral repeat units can still be readily recognized within the ∼45-kb region, most appear to have degenerated over the last 100 Myr and have lost one or more components required for activity. Interestingly, in humans, motif 1 and motif 2 appear to be preserved better than some of the ancestral sequences that gave rise to miRNA precursors. Significant overrepresentation of the detected sequence motifs within the mir-379/mir-656 cluster, but not elsewhere in the genome, clearly indicates a functional relationship between these motifs and miRNAs within the cluster. The fact that motifs are still detectable even in places where miRNA precursor sequences have degenerated beyond recognition, as well as the overall uniform distribution of the motifs across the 45-kb region of the mir-379/mir-656 cluster, suggests that their role may be in the regulation of the entire cluster as well as individual miRNAs within it. In this context, it is noteworthy that events of sub- and neofunctionalization also apply to regulatory elements controlling function of the duplicated gene. In practice, this could mean that some miRNAs with identical sequences may still perform different functions due to sub- and/or neofunctionalization of their regulatory elements. Consistent with this is the fact that the 45-kb region is relatively depleted in transposable elements indicating that the sequences between active pre-miRNA sequences may also be functional despite the lack of evolutionary conservation at the primary sequence level (Simons et al. 2006).
One possible explanation of the biological role of these motifs could be in the regulation of processing of the primary miRNA (pri-miRNA) transcript. In fact, transcription and expression data available for the ∼45-kb locus encompassing mir-379/mir-656 cluster suggest that the entire region may be transcribed into a single noncoding RNA precursor called Mirg, which is then processed to give rise to the individual miRNA precursors (Seitz et al. 2003, 2004; Mineno et al. 2006). Recent studies have demonstrated that processing of pri-miRNA transcripts can be complex and may include alternative pathways such as the mirtron pathway in Drosophila and the splicing repressor protein heterogeneous nuclear ribonuclear protein (hnRNP) A1–dependent miR-18a processing in humans (Guil and Caceres 2007; Okamura et al. 2007). In this context, it is worth noting that of the 11 known intron–exon boundaries from the miRNA cluster supported by expressed sequence tag and mRNA data (e.g., GenBank accessions AK021542 and AA861571 in humans, AJ517767 and AW244689 in mouse, and AW916103 in the rat), 5 are located between positions 5 and 9 of copies of motif 1. In contrast, no association between exon–intron boundaries and either motif is observed. However, one such splice site is in the vicinity of the probable 5′ end of the mir-369 precursor in mouse (in GenBank accession AJ517767). We speculate that the ancient amplified repeat unit may have contained a 3′ splice site within the motif 1 and perhaps also a 5′ splice site at the 5′ end of miRNA precursor sequence (fig. 2A). Over time many of these sites may have lost their function in splicing and other new sites may have evolved. We also note that motif 2 contains a conserved sequence that is similar to hnRNP A1–binding sites. Although we favor a role in processing for the motifs within the cluster, it is important to point out that several potentially overlapping molecular processes have been reported to take place within this cluster: maternal imprinting, RNA editing, and tissue-specific expression (Seitz et al. 2003, 2004; Kawahara et al. 2007). Each of these processes requires regulation at different levels, which can result in various sequence and structural constraints present in this genomic region. Our analyses suggest several hypotheses to be tested experimentally in the future.
Role of mir-379/mir-656 Cluster in Placental Mammals
To discuss possible biological roles of the members of the mir-379/mir-656 cluster, it is important to summarize the results of this and other studies that have shown that the mir-379/mir-656 cluster is uniquely present in the placental mammals, it originated from a common ancestral precursor sequence, and it is imprinted and expressed from the maternally derived chromosome predominantly in embryonic brain and placental tissues (Cavaille et al. 2002; Seitz et al. 2003, 2004; Hertel et al. 2006). Together, these findings consistently indicate involvement of the mir-379/mir-656 cluster in biological functions specific to eutherian mammals.
Our results showed that genes associated with the biological process of axon guidance are among the most likely candidates targeted by miRNAs from the mir-379/mir-656 cluster. Although neither axon guidance nor associated processes of neurogenesis and cell migration are exclusive to eutherian mammals, closer investigation reveals that the nervous system underwent a significant upgrade and rewiring in this group of organisms as compared with nonplacental mammals. For example, one of the most significant evolutionarily innovations in the eutherian brain is the emergence of a large intrahemispheric connective structure called “corpus callosum” (reviewed by Mihrshahi [2006]). Like the mir-379/mir-656 cluster, the corpus callosum is exclusively present in placental mammals and has not been found in any of the nonplacental species. Formation of the corpus callosum relies on the correct specification of the commissural neurons and precise axon guidance across the midline to their final destination in the opposite hemisphere (Mihrshahi 2006; Lindwall et al. 2007).
Although we do not have strong evidence to suggest that any of the miRNAs from the mir-379/mir-656 cluster are is directly involved in the regulation of axon guidance in developing corpus callosum, we find that a few genes with known functions in the development of corpus callosum, including Robo1 and SLIT-like proteins (SLITRK1, SLITRK2, SLITRK3, and SLITRK6), are present among predicted targets of miRNAs from the mir-379/mir-656 cluster (Lindwall et al. 2007). Other genes implicated in biological processes that involve regulation of axon guidance, such as thalamocortical patterning and motoneuron projections, were also predicted to be targeted by several miRNAs from the mir-379/mir-656 cluster.
Our survey of miRNA expression data also revealed that miRNAs from the mir-379/mir-656 cluster are often detectable in the placenta. However, analysis of miRNA target genes and associated GO biological processes failed to show any significant overrepresentation of terms related to placental development or function. There also appears to be relatively limited knowledge about many biological processes in the placenta and consequently a lack of explicit GO annotations relating to the placenta.
Conclusions
It is clear that the mir-379/mir-656 cluster of miRNAs was generated by a large amplification event between the branching of the marsupial lineage and the radiation of the eutherian mammals. This appears to have been followed by a fairly rapid divergence of the miRNA sequences some of which evolved into new specificities and have become fixed in evolution. The remnants of the original event can be seen today, but most of the sequence in the region appears to be nonfunctional. Consistency between the results of bioinformatics analyses of miRNA target genes, their function and expression pattern, as well as analyses of miRNA expression pattern strongly suggest that the miRNAs in the cluster are likely to act cooperatively to influence novel regulatory pathways that emerged in the eutherian mammals.
This work was supported by CSIRO Emerging Sciences Initiatives in Epigenetics and Cellular Reprogramming. The authors wish to acknowledge the members of the Broad Institute at MIT and Harvard, Baylor College of Medicine Sequencing Center, and Genome Sequencing Center at Washington University for making their data and genome assemblies available in advance of formal publications. The authors would like to thank Ross Tellam for encouraging us to study this region of the mammalian genome. The authors would like to thank Michael J. Pheasant, Cas Simons, Fai Wong, and Aaron Ingham for critical reading of the manuscript and discussions. E.A.G. performed detailed data analysis and wrote final version of the manuscript. B.P.D. initiated and coordinated this study, participated in its design, performed initial data analysis of the repeats, and prepared the initial draft of the manuscript. S.M. and W.C.B. performed initial data analysis of the repeats. All authors have read and approved the final manuscript.
References
et al.
(41 co-authors)
Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes
,
Science
,
2002
, vol.
297
(pg.
1301
-
1310
)
Fitting a mixture model by expectation maximization to discover motifs in biopolymers
,
Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology; Menlo Park (CA)
,
1994
Menlo Park (CA)
AAAI Press
(pg.
28
-
36
)
et al.
(13 co-authors)
Identification of hundreds of conserved and nonconserved human microRNAs
,
Nat Genet
,
2005
, vol.
37
(pg.
766
-
770
)
Diversity of microRNAs in human and chimpanzee brain
,
Nat Genet
,
2006
, vol.
38
(pg.
1375
-
1377
)
et al.
Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis
,
Genome Res
,
2006
, vol.
16
(pg.
1289
-
1298
)
(14 co-authors)
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
,
Nucleic Acids Res
,
2004
, vol.
32
(pg.
D262
-
D266
)
Identification of tandemly-repeated C/D snoRNA genes at the imprinted human 14q32 domain reminiscent of those at the Prader-Willi/Angelman syndrome region
,
Hum Mol Genet
,
2002
, vol.
11
(pg.
1527
-
1538
)
Initial sequence of the chimpanzee genome and comparison with the human genome
,
Nature
,
2005
, vol.
437
(pg.
69
-
87
)
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution
,
Nature
,
2004
, vol.
432
(pg.
695
-
716
)
Discovery and profiling of bovine microRNAs from immune-related and embryonic tissues
,
Physiol Genomics
,
2007
, vol.
29
(pg.
35
-
43
)
WebLogo: a sequence logo generator
,
Genome Res
,
2004
, vol.
14
(pg.
1188
-
1190
)
et al.
(16 co-authors)
The colorectal microRNAome
,
Proc Natl Acad Sci USA
,
2006
, vol.
103
(pg.
3687
-
3692
)
Preservation of duplicate genes by complementary, degenerative mutations
,
Genetics
,
1999
, vol.
151
(pg.
1531
-
1545
)
et al.
(203 co-authors)
Genome sequence of the Brown Norway rat yields insights into mammalian evolution
,
Nature
,
2004
, vol.
428
(pg.
493
-
521
)
miRBase: microRNA sequences, targets and gene nomenclature
,
Nucleic Acids Res
,
2006
, vol.
34
(pg.
D140
-
D144
)
Rfam: annotating non-coding RNAs in complete genomes
,
Nucleic Acids Res
,
2005
, vol.
33
(pg.
D121
-
D124
)
microRNA target predictions across seven Drosophila species and comparison to mammalian targets
,
PLoS Comput Biol
,
2005
, vol.
1
pg.
e13
The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a
,
Nat Struct Mol Biol
,
2007
, vol.
14
(pg.
591
-
596
)
et al.
(59 co-authors)
The Gene Ontology (GO) database and informatics resource
,
Nucleic Acids Res
,
2004
, vol.
32
(pg.
D258
-
D261
)
MicroRNAs: small RNAs with a big role in gene regulation
,
Nat Rev Genet
,
2004
, vol.
5
(pg.
522
-
531
)
TimeTree: a public knowledge-base of divergence times among organisms
,
Bioinformatics
,
2006
, vol.
22
(pg.
2971
-
2972
)
The expansion of the metazoan microRNA repertoire
,
BMC Genomics
,
2006
, vol.
7
pg.
25
Embryonic stem cell-specific MicroRNAs
,
Dev Cell
,
2003
, vol.
5
(pg.
351
-
358
)
Human microRNA targets
,
PLoS Biol
,
2004
, vol.
2
pg.
e363
et al.
(13 co-authors)
The UCSC genome browser database
,
Nucleic Acids Res
,
2003
, vol.
31
(pg.
51
-
54
)
Redirection of silencing targets by adenosine-to-inosine editing of miRNAs
,
Science
,
2007
, vol.
315
(pg.
1137
-
1140
)
BLAT–the BLAST-like alignment tool
,
Genome Res
,
2002
, vol.
12
(pg.
656
-
664
)
Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes
,
Proc Natl Acad Sci USA
,
2003
, vol.
100
(pg.
11484
-
11489
)
The human genome browser at UCSC
,
Genome Res
,
2002
, vol.
12
(pg.
996
-
1006
)
Identification of many microRNAs that copurify with polyribosomes in mammalian neurons
,
Proc Natl Acad Sci USA
,
2004
, vol.
101
(pg.
360
-
365
)
New microRNAs from mouse and human
,
RNA
,
2003
, vol.
9
(pg.
175
-
179
)
et al.
(255 co-authors)
Initial sequencing and analysis of the human genome
,
Nature
,
2001
, vol.
409
(pg.
860
-
921
)
et al.
(108 co-authors)
Genome-wide atlas of gene expression in the adult mouse brain
,
Nature
,
2007
, vol.
445
(pg.
168
-
176
)
Prediction of mammalian microRNA targets
,
Cell
,
2003
, vol.
115
(pg.
787
-
798
)
et al.
(236 co-authors)
Genome sequence, comparative analysis and haplotype structure of the domestic dog
,
Nature
,
2005
, vol.
438
(pg.
803
-
819
)
Commissure formation in the mammalian forebrain
,
Curr Opin Neurobiol
,
2007
, vol.
17
(pg.
3
-
14
)
The corpus callosum as an evolutionary innovation
,
J Exp Zool B Mol Dev Evol
,
2006
, vol.
306
(pg.
8
-
17
)
et al.
(235 co-authors)
Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences
,
Nature
,
2007
, vol.
447
(pg.
167
-
177
)
et al.
(11 co-authors)
The expression profile of microRNAs in mouse embryos
,
Nucleic Acids Res
,
2006
, vol.
34
(pg.
1765
-
1771
)
et al.
(11 co-authors)
Resolution of the early placental mammal radiation using Bayesian phylogenetics
,
Science
,
2001
, vol.
294
(pg.
2348
-
2351
)
The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila
,
Cell
,
2007
, vol.
130
(pg.
89
-
100
)
Micro RNAs in animal development
,
Cell
,
2006
, vol.
124
(pg.
877
-
881
)
Evidence for a microRNA expansion in the bilaterian ancestor
,
Dev Genes Evol
,
2007
, vol.
217
(pg.
73
-
77
)
Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs
,
Genome Res
,
2007
, vol.
17
(pg.
1850
-
1864
)
A large imprinted microRNA gene cluster at the mouse Dlk1-Gtl2 domain
,
Genome Res
,
2004
, vol.
14
(pg.
1741
-
1748
)
Imprinted microRNA genes transcribed antisense to a reciprocally imprinted retrotransposon-like gene
,
Nat Genet
,
2003
, vol.
34
(pg.
261
-
262
)
The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint
,
J Exp Zoolog B Mol Dev Evol
,
2006
, vol.
306B
(pg.
575
-
588
)
Transposon-free regions in mammalian genomes
,
Genome Res
,
2006
, vol.
16
(pg.
164
-
172
)
Mammalian microRNAs derived from genomic repeats
,
Trends Genet
,
2005
, vol.
21
(pg.
322
-
326
)
et al.
(13 co-authors)
A gene atlas of the mouse and human protein-encoding transcriptomes
,
Proc Natl Acad Sci USA
,
2004
, vol.
101
(pg.
6062
-
6067
)
et al.
(12 co-authors)
Human embryonic stem cells express a unique set of microRNAs
,
Dev Biol
,
2004
, vol.
270
(pg.
488
-
498
)
Molecular evolution of a microRNA cluster
,
J Mol Biol
,
2004
, vol.
339
(pg.
327
-
335
)
et al.
(222 co-authors)
Initial sequencing and comparative analysis of the mouse genome
,
Nature
,
2002
, vol.
420
(pg.
520
-
562
)
Mfold web server for nucleic acid folding and hybridization prediction
,
Nucleic Acids Res
,
2003
, vol.
31
(pg.
3406
-
3415
)
Author notes
Andrew Roger, Associate Editor
Published by Oxford University Press 2008
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 2,069
1,230 Pageviews
839 PDF Downloads
Since 12/1/2016
Month: | Total Views: |
---|---|
December 2016 | 1 |
February 2017 | 11 |
March 2017 | 13 |
April 2017 | 6 |
May 2017 | 15 |
June 2017 | 20 |
July 2017 | 10 |
August 2017 | 17 |
September 2017 | 8 |
October 2017 | 18 |
November 2017 | 5 |
December 2017 | 21 |
January 2018 | 30 |
February 2018 | 34 |
March 2018 | 32 |
April 2018 | 36 |
May 2018 | 29 |
June 2018 | 36 |
July 2018 | 35 |
August 2018 | 18 |
September 2018 | 5 |
October 2018 | 25 |
November 2018 | 32 |
December 2018 | 25 |
January 2019 | 28 |
February 2019 | 30 |
March 2019 | 19 |
April 2019 | 17 |
May 2019 | 24 |
June 2019 | 23 |
July 2019 | 29 |
August 2019 | 24 |
September 2019 | 22 |
October 2019 | 29 |
November 2019 | 34 |
December 2019 | 30 |
January 2020 | 21 |
February 2020 | 10 |
March 2020 | 19 |
April 2020 | 41 |
May 2020 | 32 |
June 2020 | 28 |
July 2020 | 23 |
August 2020 | 36 |
September 2020 | 16 |
October 2020 | 27 |
November 2020 | 31 |
December 2020 | 17 |
January 2021 | 15 |
February 2021 | 23 |
March 2021 | 45 |
April 2021 | 30 |
May 2021 | 25 |
June 2021 | 11 |
July 2021 | 10 |
August 2021 | 26 |
September 2021 | 31 |
October 2021 | 20 |
November 2021 | 11 |
December 2021 | 17 |
January 2022 | 21 |
February 2022 | 30 |
March 2022 | 18 |
April 2022 | 14 |
May 2022 | 21 |
June 2022 | 18 |
July 2022 | 29 |
August 2022 | 31 |
September 2022 | 12 |
October 2022 | 22 |
November 2022 | 31 |
December 2022 | 24 |
January 2023 | 30 |
February 2023 | 17 |
March 2023 | 18 |
April 2023 | 20 |
May 2023 | 14 |
June 2023 | 18 |
July 2023 | 22 |
August 2023 | 16 |
September 2023 | 18 |
October 2023 | 13 |
November 2023 | 13 |
December 2023 | 18 |
January 2024 | 18 |
February 2024 | 24 |
March 2024 | 24 |
April 2024 | 20 |
May 2024 | 23 |
June 2024 | 25 |
July 2024 | 28 |
August 2024 | 21 |
September 2024 | 27 |
October 2024 | 15 |
Citations
115 Web of Science
×
Email alerts
Email alerts
Citing articles via
More from Oxford Academic