Origin, Evolution, and Biological Role of miRNA Cluster in DLK-DIO3 Genomic Region in Placental Mammals (original) (raw)

Journal Article

,

Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia

Search for other works by this author on:

,

Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia

Search for other works by this author on:

,

Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia

Search for other works by this author on:

Commonwealth Scientific and Industrial Research Organisation CSIRO Livestock Industries, Queensland Bioscience Precinct QBP, St Lucia, Brisbane, Queensland, Australia

Search for other works by this author on:

Accepted:

04 February 2008

Published:

04 February 2008

Cite

Evgeny A. Glazov, Sean McWilliam, Wesley C. Barris, Brian P. Dalrymple, Origin, Evolution, and Biological Role of miRNA Cluster in DLK-DIO3 Genomic Region in Placental Mammals, Molecular Biology and Evolution, Volume 25, Issue 5, May 2008, Pages 939–948, https://doi.org/10.1093/molbev/msn045
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

MicroRNAs (miRNAs) are a rapidly growing family of small regulatory RNAs modulating gene expression in plants and animals. In animals, most of the miRNAs discovered in early studies were found to be evolutionarily conserved across the whole kingdom. More recent studies, however, have identified many miRNAs that are specific to a particular group of organisms or even a single species. These present a question about evolution of the individual miRNAs and their role in establishing and maintaining lineage-specific functions and characteristics.

In this study, we describe a detailed analysis of the miRNA cluster (hereafter mir-379/mir-656 cluster) located within the imprinted DLK-DIO3 region on human chromosome 14. We show that orthologous miRNA clusters are present in all sequenced genomes of the placental (eutherian) mammals but not in the marsupial (metatherian), monotreme (prototherian), or any other vertebrate genomes. We provide evidence that the locus encompassing this cluster emerged in an early eutherian ancestor prior to the radiation of modern placental mammals by tandem duplication of the ancient precursor sequence. The original amplified cluster may have contained in excess of 250 miRNA precursor sequences, most of which now appear to be inactive. Examination of the eutherian genomes showed that the cluster has been maintained in evolution for approximately 100 Myr.

Analysis of genes that contain predicted evolutionarily conserved targets for miRNAs from this cluster revealed significant overrepresentation of the Gene Ontology terms associated with biological processes such as neurogenesis, embryonic development, transcriptional regulation, and RNA metabolism. Consistent with these findings, a survey of the miRNA expression data within the cluster demonstrates a strong bias toward brain and placenta samples from adult organisms and some embryonic tissues.

Our results suggest that emergence of the mir-379/mir-656 miRNA cluster was one of the factors that facilitated evolution of the placental mammals. Overrepresentation of genes involved in regulation of neurogenesis among predicted miRNAs targets indicates an important role of the mir-379/mir-656 cluster in this biological process in the placental mammals.

Introduction

MicroRNAs (miRNAs) are small 21–25 nt regulatory RNAs modulating gene expression in animals and plants. In animals, regulation of gene expression by miRNAs is achieved by sequence-specific targeting of the 3′ untranslated regions (UTRs) of messenger RNAs (mRNAs) by the RNA-induced silencing complex that results in translational repression of the protein synthesis (He and Hannon 2004). In the past few years, the number of discovered miRNAs has increased from tens to thousands and is likely to grow further (Griffiths-Jones et al. 2006). Although most of the miRNAs discovered early were found to be highly conserved in evolution, more and more of the newly identified miRNAs are present in only a small group of organisms and in some cases in a single species (Bentwich et al. 2005; Berezikov, Thuemmler, et al. 2006; Berezikov, van Tetering, et al. 2006; Ruby et al. 2007). The functional significance of these evolutionarily divergent miRNAs has not been established experimentally. However, it is hypothesized that these miRNAs might play a role in establishing and maintaining phenotypic diversity between different groups of organisms (Plasterk 2006; Sempere et al. 2006). A few comparative studies have established connections between some miRNAs and evolutionary changes in animal body plan (Tanzer and Stadler 2004; Sempere et al. 2006; Prochnik et al. 2007).

The miRNA cluster mir-379/mir-656 was originally described as 2 families of related repeats adjacent to a small nucleolar RNA cluster located within the imprinted DLK-DIO3 region on human chromosome 14 (Cavaille et al. 2002). Since then, mature miRNAs derived from most of these repeats have been experimentally identified in mouse, rat, chimpanzee, human, and cow (Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Kim et al. 2004; Seitz et al. 2004; Suh et al. 2004; Bentwich et al. 2005; Berezikov, Thuemmler, et al. 2006; Berezikov, van Tetering, et al. 2006; Coutinho et al. 2007). Currently, the miRBase miRNA database contains 38 human and 35 mouse miRNAs that originate from this cluster, which makes it the largest known miRNA cluster in vertebrates (Griffiths-Jones et al. 2005). In mouse, some of these miRNAs were shown to be expressed as products of a large noncoding transcript named Mirg (Seitz et al. 2003).

We used a combination of comparative genomics and bioinformatics approaches to examine the evolutionary history of the mir-379/mir-656 cluster and its function in vertebrate biology and evolution.

Materials and Methods

Sources of Sequences and Assemblies

Draft genome assemblies of armadillo, elephant, and tenrec were produced by the Broad Institute at Massachusetts Institute of Technology and Harvard (http://www.broad.mit.edu/). Draft genome assembly of platypus genome was produced by the Genome Sequencing Center at Washington University School of Medicine in St Louis (http://genome.wustl.edu/genome_group_index.cgi). Draft genome assembly of cow genome was produced by Baylor College of Medicine Sequencing Center (http://www.hgsc.bcm.tmc.edu/projects/bovine/). Human, chimpanzee, mouse, rat, dog, possum, chicken, and puffer fish genomes were produced by their respective genome sequencing consortiums (Lander et al. 2001; Aparicio et al. 2002; Waterston et al. 2002; Consortium 2004, 2005; Gibbs et al. 2004; Lindblad-Toh et al. 2005; Mikkelsen et al. 2007). Unless specified otherwise, sequences of miRNA precursors and mature miRNAs were obtained from the latest release of miRNA registry (miRBase 10.0, August 2007, http://microrna.sanger.ac.uk/sequences/) (Griffiths-Jones et al. 2005, 2006).

Sequence Searches and Analysis

Pairwise whole-genome sequence alignments and “RefSeq” gene annotation data were obtained from UCSC genome browser (http://genome.ucsc.edu/) (Kent et al. 2002, 2003; Karolchik et al. 2003). DLK-DIO3 syntenic regions were identified using UCSC whole-genome chained sequence alignments as described by (Kent et al. 2003). The regions from different species were considered syntenic if the gene order was preserved. BLAT was used for sequence similarity cross-searches between different genomes (Kent 2002). BLAT parameters were determined empirically by searching for known human and mouse miRNAs reciprocally in the genomes of these 2 organisms. Maximum search sensitivity was achieved with a tile size set to 6 and overall minimum sequence identity set to 65%. These parameters were used to query vertebrate genomes using known human and mouse pre-miRNA sequences. Only the alignments covering at least 90% of the query pre-miRNA sequence were considered as orthologous pre-miRNA candidate sequences. The following genome assemblies were used in this study: human, hg18; chimpanzee, panTro2; mouse, mm8; rat, rn3; dog, canFam2; cow, bTau2; chicken, galGal3; elephant, loxAfr1; armadillo, dasNov1; opossum, monDom4; tenrec, echTel1; and fugu, fr2.

Detection and Analysis of Sequence Motifs

Sequence motifs were identified using MEME algorithm at the San Diego Supercomputing Center Web site (http://meme.sdsc.edu/meme/intro.html) (Bailey and Elkan 1994). The following parameters were applied: model = tcm, minimum width = 6, maximum width = 100, minimum sites = 2, maximum sites = 300. Sequence logos were generated using WebLogo 2.8.2 at http://weblogo.berkeley.edu/logo.cgi (Crooks et al. 2004).

The miRNA Target Genes, Gene Ontology Enrichment, and P Values

Predicted miRNA target genes were obtained from the TargetScan 4.0 Web site (http://www.targetscan.org/) (Lewis et al. 2003). To reduce false-positive rate of miRNA target prediction in our analyses, we considered a gene to be a true miRNA target if it contained at least 2 evolutionarily conserved miRNA target sites within its 3′ UTR. Gene Ontology (GO) annotations were downloaded from the GO consortium Web site (April 2007, http://www.geneontology.org/) (Camon et al. 2004; Harris et al. 2004). “Known Isoforms” identifiers for UCSC human (hg18) and mouse (mm8) “Known Genes” were used to make sure that a gene was only counted once where there were multiple isoforms. A Perl script and Structured Query Language code were created to calculate enrichment of terms and “Fisher's exact” P values against a background of all GO-annotated genes in the UCSC Known Genes database. For significance, we required at least 2-fold enrichment, P < 1 × 10−5, and at least 10 associated Known Genes in the target genes sample.

Results

Evolution of the mir-379/mir-656 Cluster

To address the evolutionary origin of the mir-379/mir-656 cluster, we examined the sequenced genomes of 11 mammals, chicken, and puffer fish. Using whole-genome sequence alignments from the UCSC genome browser, we searched for the DLK1-DIO3 syntenic regions in the assembled vertebrate genomes. We found that DLK1-DIO3 synteny is maintained in all mammalian and bird genomes but is absent in the puffer fish genome (table 1).

Table 1

Synteny of the mir-379/mir-656 Cluster

Organism Genome Size (Gb) Chromosome, Scaffold, Contig DLK1-DIO3 Locus Size (kb) mir-379/mir-656 Size (kb)
Human 2.8 chr14 826.2 44.7
Chimp 2.8 chr14 846.3 46.5
Mouse 2.5 chr12 818.5 35.4
Rat 2.7 chr6 ND 37.9
Dog 2.4 chr8 726.7 41.4
Cow 2.4 chr21 859.5 42.9
Armadillo ∼3.0 Scaffold 5303, scaffold 19965 ND ∼38
Elephant ∼3.0 Scaffold 4770 ND ND
Tenrec ∼3.0 ND ND ND
Possum 3.4 chr1 1603.5 A
Platypus ∼3.0 Ultracontig 378 ∼700 A
Chicken 1.0 chr5 346.3 A
Puffer fish 0.33 chrUn ND A
Organism Genome Size (Gb) Chromosome, Scaffold, Contig DLK1-DIO3 Locus Size (kb) mir-379/mir-656 Size (kb)
Human 2.8 chr14 826.2 44.7
Chimp 2.8 chr14 846.3 46.5
Mouse 2.5 chr12 818.5 35.4
Rat 2.7 chr6 ND 37.9
Dog 2.4 chr8 726.7 41.4
Cow 2.4 chr21 859.5 42.9
Armadillo ∼3.0 Scaffold 5303, scaffold 19965 ND ∼38
Elephant ∼3.0 Scaffold 4770 ND ND
Tenrec ∼3.0 ND ND ND
Possum 3.4 chr1 1603.5 A
Platypus ∼3.0 Ultracontig 378 ∼700 A
Chicken 1.0 chr5 346.3 A
Puffer fish 0.33 chrUn ND A

NOTE.—The table summarizes sizes and genomic locations of DLK1-DIO3 syntenic loci and the miRNAs clusters in different vertebrate genomes. Where whole-genome assemblies are available, the chromosomes bearing DLK1-DIO3 loci and the miRNA cluster are listed. Genomic scaffold or contig numbers are provided for partially assembled genomes. ND, not determined; A, absent; the miRNA cluster is completely absent in nonplacental vertebrates.

Table 1

Synteny of the mir-379/mir-656 Cluster

Organism Genome Size (Gb) Chromosome, Scaffold, Contig DLK1-DIO3 Locus Size (kb) mir-379/mir-656 Size (kb)
Human 2.8 chr14 826.2 44.7
Chimp 2.8 chr14 846.3 46.5
Mouse 2.5 chr12 818.5 35.4
Rat 2.7 chr6 ND 37.9
Dog 2.4 chr8 726.7 41.4
Cow 2.4 chr21 859.5 42.9
Armadillo ∼3.0 Scaffold 5303, scaffold 19965 ND ∼38
Elephant ∼3.0 Scaffold 4770 ND ND
Tenrec ∼3.0 ND ND ND
Possum 3.4 chr1 1603.5 A
Platypus ∼3.0 Ultracontig 378 ∼700 A
Chicken 1.0 chr5 346.3 A
Puffer fish 0.33 chrUn ND A
Organism Genome Size (Gb) Chromosome, Scaffold, Contig DLK1-DIO3 Locus Size (kb) mir-379/mir-656 Size (kb)
Human 2.8 chr14 826.2 44.7
Chimp 2.8 chr14 846.3 46.5
Mouse 2.5 chr12 818.5 35.4
Rat 2.7 chr6 ND 37.9
Dog 2.4 chr8 726.7 41.4
Cow 2.4 chr21 859.5 42.9
Armadillo ∼3.0 Scaffold 5303, scaffold 19965 ND ∼38
Elephant ∼3.0 Scaffold 4770 ND ND
Tenrec ∼3.0 ND ND ND
Possum 3.4 chr1 1603.5 A
Platypus ∼3.0 Ultracontig 378 ∼700 A
Chicken 1.0 chr5 346.3 A
Puffer fish 0.33 chrUn ND A

NOTE.—The table summarizes sizes and genomic locations of DLK1-DIO3 syntenic loci and the miRNAs clusters in different vertebrate genomes. Where whole-genome assemblies are available, the chromosomes bearing DLK1-DIO3 loci and the miRNA cluster are listed. Genomic scaffold or contig numbers are provided for partially assembled genomes. ND, not determined; A, absent; the miRNA cluster is completely absent in nonplacental vertebrates.

Table 2

Evolution of the mir-379/mir-656 miRNA Cluster

graphic

graphic

Table 2

Evolution of the mir-379/mir-656 miRNA Cluster

graphic

graphic

To identify miRNAs orthologous to human miRNAs from the mir-379/mir-656 cluster within the syntenic DLK1-DIO3 regions of vertebrate genomes, we performed sequence similarity searches using BLAT (Kent 2002). The orthologous miRNAs clusters were easily identifiable in all examined genomes of placental mammals but were not detectable in the marsupial (Monodelphis domestica), monotreme (Ornithorhynchus anatinus), or any other nonmammalian vertebrate genome (tables 1 and 2 and Supplementary Material online). These data are consistent with the results of the similar analysis performed by Seitz et al. (2004) in worm (Caenorhabditis elegans), fruit fly (Drosophila melanogaster), and puffer fish (Fugu rubripes) genomes. Together, these results demonstrate that the mir-379/mir-656 cluster is an evolutionary innovation that is uniquely present in the placental mammals.

To examine the evolution of this cluster in detail, we looked at the evolutionarily conservation of the individual miRNA sequences within the cluster between different vertebrate genomes. The results of this analysis are summarized in the table 2. It is evident that most of the sequences of the known experimentally validated miRNAs are present in all examined genomes of the placental mammals. Although the assemblies of elephant (Loxodonta africana), lesser hedgehog (tenrec, Echinops telfairi), and armadillo (Dasypus novemcinctus) genomes are incomplete and the exact evolutionary fate of some miRNAs could not be resolved at present, it is important to note that most of the miRNA sequences from the mir-379/mir-656 cluster are present in these genomes. These 3 species are descendants of the lineages that diverged from the common placental ancestor at the early stages of mammalian evolution (fig. 1). The estimated divergence time between elephant, tenrec, armadillo, and human lineages is approximately 100 Myr, whereas separation of the eutherian lineage from the common mammalian ancestor are thought to have occurred between 180 and 140 MYA (Hedges et al. 2006). This leads us to conclude that the mir-379/mir-656 cluster emerged early in the eutherian lineage prior to radiation of modern placental mammals. The fact that the cluster has been maintained in different groups of placental mammals for approximately 100 Myr without any major structural rearrangements indicates that the whole cluster may function as a coordinated unit with an important biological role in this group of organisms.

Summary of phylogenetic relationships of vertebrate species addressed in this study. The tree structure and estimated divergence times are used with modifications from Hedges et al. (2006) and Murphy et al. (2001).

FIG. 1.—

Summary of phylogenetic relationships of vertebrate species addressed in this study. The tree structure and estimated divergence times are used with modifications from Hedges et al. (2006) and Murphy et al. (2001).

Origin of the mir-379/mir-656 Cluster, Novel miRNA Candidates, and Regulatory Sequence Motifs

Sequence similarity observed between the individual miRNA precursors within the mir-379/mir-656 cluster led previous studies to conclude that these miRNAs originated from a common ancestral sequence by a process of tandem duplication (Seitz et al. 2004; Hertel et al. 2006). To identify the unit of amplification and to determine whether miRNAs from the cluster may share some regulatory elements, we examined human genomic sequences located between miRNA precursor sequences and 1 kb adjacent to the miRNA cluster on either side. We searched for overrepresented sequence motifs of a variable length using a motif discovery algorithm MEME (Bailey and Elkan 1994). This analysis identified 2 motifs that were very significantly overrepresented within the cluster as compared with a random set of genomic sequences of a similar total length. Motif 1 was 21 nt long and was present 147 times within the mir-379/mir-656 cluster, which corresponds to the MEME-calculated e value of 1.2 × 10−209 (fig. 2A). Motif 2 was 23 nt long and was present at 115 sites within the cluster, which corresponds to the MEME-calculated e value of 2.4 × 10−70 (fig. 2A). Further inspection of the distribution of the motifs within the miRNA cluster revealed that both motifs are often present adjacent to known experimentally validated miRNA precursor sequences suggesting a regulatory function in expression and/or processing of the primary miRNA transcripts. We also noticed that copies of motif 2 frequently followed a copy of motif 1 and that both motifs have a regular periodic distribution across the ∼45-kb genomic region encompassing the mir-379/mir-656 cluster (supplementary fig. 2, Supplementary Material online). To analyze this further, we calculated distances between neighboring pairs of motifs 1 and 2. After plotting the resulting distribution of the distances, we found that it had 1 major peak at ∼160 bases (fig. 3). Interestingly, the distribution of the distances constituting this peak strongly resembled length distributions of a subset of the 38 known human miRNA precursor sequences with the adjacent motifs 1 and 2 (fig. 2B). This result suggests that we are observing the vestiges of an original amplified array consisting of an ∼160 bases long repeat unit comprised of a single copy each of motif 1, motif 2, and miRNA precursor sequence (fig. 2B). Consistent with this, we were able to identify several additional sequences located within the ∼45-kb locus that share similarity with known active miRNA precursors. Although some of these are only partially similar to the known miRNAs and are likely to be remnants of the ancestral repeat sequences that gave rise to the mir-379/mir-656 cluster, others are highly similar to the known miRNAs and show evolutionary sequence conservation in at least 2 of the examined mammalian genomes. Table 2 shows 13 of these sequences. Importantly, 2 of them were experimentally validated by Berezikov, Thuemmler, et al. (2006a) during the course of this study (supplementary table 2, Supplementary Material online). Another 6 were previously identified as putative miRNA candidates (Seitz et al. 2004). The remaining 5 sequences are reported here as potential miRNA candidates for the first time (table 2, Additional file 2 [Supplementary Material online). Although there is no experimental evidence to identify these sequences as functional miRNAs, the pattern of their evolutionary conservation suggests that they are expressed at least in some of the eutherian mammals.

Amplified repeat units and sequence motifs within the mir-379/mir-656 miRNAs cluster. (A) Sequence logos of the 2 overrepresented sequence motifs within human mir-379/mir-656 miRNAs cluster. The y axis shows informational content at each base position within the motif. Sequence logos for motifs 1 and 2 were created based on 147 and 115 individual sequences present within the mir-379/mir-656 miRNAs cluster, respectively. A blue bar indicates a possible hnRNP A1–binding site. (B) Structure of the proposed amplified repeat unit. The figure shows relative positions of the miRNA precursor sequence and of the 2 motifs. Vertical arrows indicate suggested positions of the splice sites boundaries in the ancestral amplified array.

FIG. 2.—

Amplified repeat units and sequence motifs within the mir-379/mir-656 miRNAs cluster. (A) Sequence logos of the 2 overrepresented sequence motifs within human mir-379/mir-656 miRNAs cluster. The y axis shows informational content at each base position within the motif. Sequence logos for motifs 1 and 2 were created based on 147 and 115 individual sequences present within the mir-379/mir-656 miRNAs cluster, respectively. A blue bar indicates a possible hnRNP A1–binding site. (B) Structure of the proposed amplified repeat unit. The figure shows relative positions of the miRNA precursor sequence and of the 2 motifs. Vertical arrows indicate suggested positions of the splice sites boundaries in the ancestral amplified array.

Size of the amplified repeat unit. Blue bars represent the distribution of pairwise distances between the adjacent motifs within human mir-379/mir-656 miRNAs cluster. Orange bars represent distribution of lengths of the 38 known human miRNA precursors from within the cluster measured with the 2 adjacent motifs. Each vertical bar represents a number of occurrences within a bin size 5. There were 21 individual single-point motif distances beyond cutoff of the x axis at 400 nt.

FIG. 3.—

Size of the amplified repeat unit. Blue bars represent the distribution of pairwise distances between the adjacent motifs within human mir-379/mir-656 miRNAs cluster. Orange bars represent distribution of lengths of the 38 known human miRNA precursors from within the cluster measured with the 2 adjacent motifs. Each vertical bar represents a number of occurrences within a bin size 5. There were 21 individual single-point motif distances beyond cutoff of the x axis at 400 nt.

Evolving miRNAs

Despite the overall similarity in structure and sequence conservation between the orthologous mir-379/mir-656 miRNA clusters in placental mammals, we found several examples demonstrating an ongoing evolutionary selection of the individual miRNAs within the cluster. The loss and gain of the individual miRNAs is best illustrated by the rodent lineage. For example, we were able to identify mouse and rat sequences orthologous to human miRNAs mir-329-2, mir-655, mir-487a, and mir-656. However, the detailed analysis of multiple sequence alignments between rodents and other mammals showed that mouse and rat sequences have accumulated nucleotide substitutions, small deletions, and/or insertions, which are likely to affect secondary structure necessary for correct processing of the mature miRNAs. To assess this, we compared MFOLD 3.2–predicted RNA secondary structures of the rodent sequences and experimentally validated orthologous miRNA sequences from human (Zuker 2003). We found that rodent sequences failed to produce characteristic hairpin-like miRNA precursor structures (supplementary fig. 1, Supplementary Material online). Therefore, we conclude that these sequences do not code for functional miRNAs in either mouse or rat and are likely to be remnants of the ancestral miRNA sequences (table 2, supplementary fig. 1 [Supplementary Material online]). The opposite process of the evolutionarily fixation of the individual lineage-specific miRNAs is also evident in rodents. Using similar analysis, we found that whereas degenerate orthologous sequences for rodent mir-679, mir-666, and mir-667 are present in all genomes of the placental mammals, the functional miRNAs seem to be present only in mouse and rat. Examples of the lineage-specific evolutionary selection of miRNAs have been also reported for primate and other lineages (table 2) (Berezikov, Thuemmler, et al. 2006; Hertel et al. 2006).

The miRNA Target Genes Point to Eutherian-Specific Biological Processes

In the publication describing the mir-379/mir-656 cluster, Seitz et al. (2003, 2004) hypothesized that the common origin of the miRNAs within the cluster and their coexpression from a large polycistronic transcript Mirg may also result in a common set of target genes. To investigate this possibility, we used TargetScan 4.0 data for the predicted evolutionary conserved vertebrate target sites for miRNAs from mir-379/mir-656 cluster to examine GO annotations of biological process associated with miRNA target genes in the human and mouse genomes (Lewis et al. 2003). The results of this analysis show that 5 functional categories of GO terms were significantly overrepresented among predicted miRNA target genes (fig. 4, supplementary fig. 3 [Supplementary Material online]). These categories can be broadly defined as regulation of transcription, RNA metabolism, cell motility, neurogenesis, and embryonic development. While genes involved in regulation of transcription and RNA metabolism appear to be common targets for many miRNAs and have been reported in several studies (Lewis et al. 2003; John et al. 2004; Grun et al. 2005), overrepresentation of target genes involved in neurogenesis, cell motility, and embryonic development is highly specific for the mir-379/mir-656 cluster. Consistent with this result, these 3 groups of genes did not show any significant overrepresentation when the same analysis was repeated with a random set of miRNAs of the same size, although as expected regulation of transcription genes were overrepresented in this set (data not shown).

GO terms significantly overrepresented among conserved vertebrate miRNA targets of the mir-379/mir-656 miRNAs cluster. The diagram shows significantly overrepresented GO terms from annotations of biological processes. Connections between broad high-hierarchy terms and more specific low-hierarchy terms are shown as arrows. Shading color code is as follows. Yellow shade boxes represent significantly overrepresented terms that passed both cutoff criteria, P < 1 × 10−5, and at least 2-fold enrichment. Open boxes represent associated terms with highly significant P values but lower than 2-fold enrichment. The terms in gray shade boxes do not show any enrichment and provided only as guidance for terms connections and hierarchy. The subset of the GO terms associated with regulation of transcription were not included in this diagram, see supplementary figure 3 (Supplementary Material online).

FIG. 4.—

GO terms significantly overrepresented among conserved vertebrate miRNA targets of the mir-379/mir-656 miRNAs cluster. The diagram shows significantly overrepresented GO terms from annotations of biological processes. Connections between broad high-hierarchy terms and more specific low-hierarchy terms are shown as arrows. Shading color code is as follows. Yellow shade boxes represent significantly overrepresented terms that passed both cutoff criteria, P < 1 × 10−5, and at least 2-fold enrichment. Open boxes represent associated terms with highly significant P values but lower than 2-fold enrichment. The terms in gray shade boxes do not show any enrichment and provided only as guidance for terms connections and hierarchy. The subset of the GO terms associated with regulation of transcription were not included in this diagram, see supplementary figure 3 (Supplementary Material online).

Because the miRNA cluster has emerged after the divergence of the bird lineage and prior to the radiation of the eutherian mammals, it is likely that most, if not all, of the targets would be present in the eutherian mammals but not in the birds. To test this hypothesis, we repeated the GO analysis with only those genes that contained predicted miRNA-binding sites that are conserved in the eutherian genomes but not in chicken genome. We found that GO terms associated with neurogenesis and cell motility were still significantly enriched in this set of target genes, but GO terms associated with embryonic development were no longer significantly overrepresented.

Interestingly, GO terms related to different aspects of nervous system development were most common in the whole set of the overrepresented terms. These terms showed higher overall enrichment and lower P values compared with terms related to other biological processes. The biological process term that showed the highest enrichment was axon guidance (fig. 4). Logically, this term unites 2 other significantly overrepresented biological process terms: cell migration and axonogenesis. Examination of the individual target genes within this class demonstrates that some of them, like brain-derived neurotrophic factor, contain up to 7 evolutionarily conserved miRNA target sites within their 3′ UTR for different miRNAs from the mir-379/mir-656 cluster. As can be expected from the GO annotations, the predicted miRNA target genes would be highly expressed in the tissues related to GO annotations—namely, embryonic tissues and various parts of the developing and the adult brain. Indeed, out of 18 miRNA target genes associated with the GO term of axon guidance, 14, including human homolog of Robo1, 2 ephrin receptors, and neurogenin 2, show high expression levels in various parts of the brain and the remaining 4 genes show moderate expression in at least one brain region (see GNF Gene Expression Atlas at http://symatlas.gnf.org/SymAtlas/ and Allen Brain Atlas at http://www.brain-map.org data [Su et al. 2004; Lein et al. 2007]). More importantly, our survey of the miRNA expression data from within the cluster also shows that most of these miRNAs were frequently detected in or cloned from the various adult brain–derived samples and some embryonic tissue samples (Seitz et al. 2004; Bentwich et al. 2005; Berezikov, Thuemmler, et al. 2006; Cummins et al. 2006) (for detailed summary and additional references, see supplementary table 2, Supplementary Material online).

These results demonstrate a significant overlap between the expression profile of the miRNAs from the mir-379/mir-656 cluster and their predicted target genes. Such overlap in the expression patterns between miRNAs and their predicted target genes strongly suggest that these results reflect biologically relevant miRNA–target interactions rather than unexpected biases in miRNA target predictions or GO annotations.

Discussion

Origin and Evolution of the mir-379/mir-656 miRNA Cluster

Consistent with the earlier studies, our results show that the mir-379/mir-656 cluster is an evolutionary innovation that appears first in the eutherian mammals (Seitz et al. 2004; Hertel et al. 2006). Acquisition of novel miRNA genes in evolution is a common trend in different groups of the metazoa that has been well documented recently (Hertel et al. 2006). However, unlike other novel miRNA genes that mostly originated from individual duplications of existing miRNA genes or exaptation of different genomic sequences (Smalheiser and Torvik 2005), mir-379/mir-656 cluster has a different origin. We have shown that not only the individual members of mir-379/mir-656 cluster but also the entire ∼45-kb genomic region encompassing these miRNAs originated from an ancestral repeat unit that was amplified over 250 times (fig. 2B). Although initially the amplified copies of the repeat unit might have been functionally identical, their subsequent evolutionary fate followed 1 of the 3 alternatives described by the duplication–degeneration–complementation model (Force et al. 1999). This model suggests that duplication of a gene results in complete loss of function of the redundant copy (degeneration) without necessarily loss of the sequence itself (generation of a pseudogene), evolutionary preservation of both copies if they evolve to perform complimentary functions (subfunctionalization), or one of the copies evolves to perform an entirely new function (neofunctionalization) (Force et al. 1999). It is evident that in the case of mir-379/mir-656 cluster, all these possibilities have been realized, resulting in the array of all known miRNAs within the cluster. Events of functional degeneration are readily identifiable within the mir-379/mir-656 cluster. Although the remains of many of the ancestral repeat units can still be readily recognized within the ∼45-kb region, most appear to have degenerated over the last 100 Myr and have lost one or more components required for activity. Interestingly, in humans, motif 1 and motif 2 appear to be preserved better than some of the ancestral sequences that gave rise to miRNA precursors. Significant overrepresentation of the detected sequence motifs within the mir-379/mir-656 cluster, but not elsewhere in the genome, clearly indicates a functional relationship between these motifs and miRNAs within the cluster. The fact that motifs are still detectable even in places where miRNA precursor sequences have degenerated beyond recognition, as well as the overall uniform distribution of the motifs across the 45-kb region of the mir-379/mir-656 cluster, suggests that their role may be in the regulation of the entire cluster as well as individual miRNAs within it. In this context, it is noteworthy that events of sub- and neofunctionalization also apply to regulatory elements controlling function of the duplicated gene. In practice, this could mean that some miRNAs with identical sequences may still perform different functions due to sub- and/or neofunctionalization of their regulatory elements. Consistent with this is the fact that the 45-kb region is relatively depleted in transposable elements indicating that the sequences between active pre-miRNA sequences may also be functional despite the lack of evolutionary conservation at the primary sequence level (Simons et al. 2006).

One possible explanation of the biological role of these motifs could be in the regulation of processing of the primary miRNA (pri-miRNA) transcript. In fact, transcription and expression data available for the ∼45-kb locus encompassing mir-379/mir-656 cluster suggest that the entire region may be transcribed into a single noncoding RNA precursor called Mirg, which is then processed to give rise to the individual miRNA precursors (Seitz et al. 2003, 2004; Mineno et al. 2006). Recent studies have demonstrated that processing of pri-miRNA transcripts can be complex and may include alternative pathways such as the mirtron pathway in Drosophila and the splicing repressor protein heterogeneous nuclear ribonuclear protein (hnRNP) A1–dependent miR-18a processing in humans (Guil and Caceres 2007; Okamura et al. 2007). In this context, it is worth noting that of the 11 known intron–exon boundaries from the miRNA cluster supported by expressed sequence tag and mRNA data (e.g., GenBank accessions AK021542 and AA861571 in humans, AJ517767 and AW244689 in mouse, and AW916103 in the rat), 5 are located between positions 5 and 9 of copies of motif 1. In contrast, no association between exon–intron boundaries and either motif is observed. However, one such splice site is in the vicinity of the probable 5′ end of the mir-369 precursor in mouse (in GenBank accession AJ517767). We speculate that the ancient amplified repeat unit may have contained a 3′ splice site within the motif 1 and perhaps also a 5′ splice site at the 5′ end of miRNA precursor sequence (fig. 2A). Over time many of these sites may have lost their function in splicing and other new sites may have evolved. We also note that motif 2 contains a conserved sequence that is similar to hnRNP A1–binding sites. Although we favor a role in processing for the motifs within the cluster, it is important to point out that several potentially overlapping molecular processes have been reported to take place within this cluster: maternal imprinting, RNA editing, and tissue-specific expression (Seitz et al. 2003, 2004; Kawahara et al. 2007). Each of these processes requires regulation at different levels, which can result in various sequence and structural constraints present in this genomic region. Our analyses suggest several hypotheses to be tested experimentally in the future.

Role of mir-379/mir-656 Cluster in Placental Mammals

To discuss possible biological roles of the members of the mir-379/mir-656 cluster, it is important to summarize the results of this and other studies that have shown that the mir-379/mir-656 cluster is uniquely present in the placental mammals, it originated from a common ancestral precursor sequence, and it is imprinted and expressed from the maternally derived chromosome predominantly in embryonic brain and placental tissues (Cavaille et al. 2002; Seitz et al. 2003, 2004; Hertel et al. 2006). Together, these findings consistently indicate involvement of the mir-379/mir-656 cluster in biological functions specific to eutherian mammals.

Our results showed that genes associated with the biological process of axon guidance are among the most likely candidates targeted by miRNAs from the mir-379/mir-656 cluster. Although neither axon guidance nor associated processes of neurogenesis and cell migration are exclusive to eutherian mammals, closer investigation reveals that the nervous system underwent a significant upgrade and rewiring in this group of organisms as compared with nonplacental mammals. For example, one of the most significant evolutionarily innovations in the eutherian brain is the emergence of a large intrahemispheric connective structure called “corpus callosum” (reviewed by Mihrshahi [2006]). Like the mir-379/mir-656 cluster, the corpus callosum is exclusively present in placental mammals and has not been found in any of the nonplacental species. Formation of the corpus callosum relies on the correct specification of the commissural neurons and precise axon guidance across the midline to their final destination in the opposite hemisphere (Mihrshahi 2006; Lindwall et al. 2007).

Although we do not have strong evidence to suggest that any of the miRNAs from the mir-379/mir-656 cluster are is directly involved in the regulation of axon guidance in developing corpus callosum, we find that a few genes with known functions in the development of corpus callosum, including Robo1 and SLIT-like proteins (SLITRK1, SLITRK2, SLITRK3, and SLITRK6), are present among predicted targets of miRNAs from the mir-379/mir-656 cluster (Lindwall et al. 2007). Other genes implicated in biological processes that involve regulation of axon guidance, such as thalamocortical patterning and motoneuron projections, were also predicted to be targeted by several miRNAs from the mir-379/mir-656 cluster.

Our survey of miRNA expression data also revealed that miRNAs from the mir-379/mir-656 cluster are often detectable in the placenta. However, analysis of miRNA target genes and associated GO biological processes failed to show any significant overrepresentation of terms related to placental development or function. There also appears to be relatively limited knowledge about many biological processes in the placenta and consequently a lack of explicit GO annotations relating to the placenta.

Conclusions

It is clear that the mir-379/mir-656 cluster of miRNAs was generated by a large amplification event between the branching of the marsupial lineage and the radiation of the eutherian mammals. This appears to have been followed by a fairly rapid divergence of the miRNA sequences some of which evolved into new specificities and have become fixed in evolution. The remnants of the original event can be seen today, but most of the sequence in the region appears to be nonfunctional. Consistency between the results of bioinformatics analyses of miRNA target genes, their function and expression pattern, as well as analyses of miRNA expression pattern strongly suggest that the miRNAs in the cluster are likely to act cooperatively to influence novel regulatory pathways that emerged in the eutherian mammals.

This work was supported by CSIRO Emerging Sciences Initiatives in Epigenetics and Cellular Reprogramming. The authors wish to acknowledge the members of the Broad Institute at MIT and Harvard, Baylor College of Medicine Sequencing Center, and Genome Sequencing Center at Washington University for making their data and genome assemblies available in advance of formal publications. The authors would like to thank Ross Tellam for encouraging us to study this region of the mammalian genome. The authors would like to thank Michael J. Pheasant, Cas Simons, Fai Wong, and Aaron Ingham for critical reading of the manuscript and discussions. E.A.G. performed detailed data analysis and wrote final version of the manuscript. B.P.D. initiated and coordinated this study, participated in its design, performed initial data analysis of the repeats, and prepared the initial draft of the manuscript. S.M. and W.C.B. performed initial data analysis of the repeats. All authors have read and approved the final manuscript.

References

et al.

(41 co-authors)

Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes

,

Science

,

2002

, vol.

297

(pg.

1301

-

1310

)

Fitting a mixture model by expectation maximization to discover motifs in biopolymers

,

Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology; Menlo Park (CA)

,

1994

Menlo Park (CA)

AAAI Press

(pg.

28

-

36

)

et al.

(13 co-authors)

Identification of hundreds of conserved and nonconserved human microRNAs

,

Nat Genet

,

2005

, vol.

37

(pg.

766

-

770

)

Diversity of microRNAs in human and chimpanzee brain

,

Nat Genet

,

2006

, vol.

38

(pg.

1375

-

1377

)

et al.

Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis

,

Genome Res

,

2006

, vol.

16

(pg.

1289

-

1298

)

(14 co-authors)

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology

,

Nucleic Acids Res

,

2004

, vol.

32

(pg.

D262

-

D266

)

Identification of tandemly-repeated C/D snoRNA genes at the imprinted human 14q32 domain reminiscent of those at the Prader-Willi/Angelman syndrome region

,

Hum Mol Genet

,

2002

, vol.

11

(pg.

1527

-

1538

)

Initial sequence of the chimpanzee genome and comparison with the human genome

,

Nature

,

2005

, vol.

437

(pg.

69

-

87

)

Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution

,

Nature

,

2004

, vol.

432

(pg.

695

-

716

)

Discovery and profiling of bovine microRNAs from immune-related and embryonic tissues

,

Physiol Genomics

,

2007

, vol.

29

(pg.

35

-

43

)

WebLogo: a sequence logo generator

,

Genome Res

,

2004

, vol.

14

(pg.

1188

-

1190

)

et al.

(16 co-authors)

The colorectal microRNAome

,

Proc Natl Acad Sci USA

,

2006

, vol.

103

(pg.

3687

-

3692

)

Preservation of duplicate genes by complementary, degenerative mutations

,

Genetics

,

1999

, vol.

151

(pg.

1531

-

1545

)

et al.

(203 co-authors)

Genome sequence of the Brown Norway rat yields insights into mammalian evolution

,

Nature

,

2004

, vol.

428

(pg.

493

-

521

)

miRBase: microRNA sequences, targets and gene nomenclature

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

D140

-

D144

)

Rfam: annotating non-coding RNAs in complete genomes

,

Nucleic Acids Res

,

2005

, vol.

33

(pg.

D121

-

D124

)

microRNA target predictions across seven Drosophila species and comparison to mammalian targets

,

PLoS Comput Biol

,

2005

, vol.

1

pg.

e13

The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a

,

Nat Struct Mol Biol

,

2007

, vol.

14

(pg.

591

-

596

)

et al.

(59 co-authors)

The Gene Ontology (GO) database and informatics resource

,

Nucleic Acids Res

,

2004

, vol.

32

(pg.

D258

-

D261

)

MicroRNAs: small RNAs with a big role in gene regulation

,

Nat Rev Genet

,

2004

, vol.

5

(pg.

522

-

531

)

TimeTree: a public knowledge-base of divergence times among organisms

,

Bioinformatics

,

2006

, vol.

22

(pg.

2971

-

2972

)

The expansion of the metazoan microRNA repertoire

,

BMC Genomics

,

2006

, vol.

7

pg.

25

Embryonic stem cell-specific MicroRNAs

,

Dev Cell

,

2003

, vol.

5

(pg.

351

-

358

)

Human microRNA targets

,

PLoS Biol

,

2004

, vol.

2

pg.

e363

et al.

(13 co-authors)

The UCSC genome browser database

,

Nucleic Acids Res

,

2003

, vol.

31

(pg.

51

-

54

)

Redirection of silencing targets by adenosine-to-inosine editing of miRNAs

,

Science

,

2007

, vol.

315

(pg.

1137

-

1140

)

BLAT–the BLAST-like alignment tool

,

Genome Res

,

2002

, vol.

12

(pg.

656

-

664

)

Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes

,

Proc Natl Acad Sci USA

,

2003

, vol.

100

(pg.

11484

-

11489

)

The human genome browser at UCSC

,

Genome Res

,

2002

, vol.

12

(pg.

996

-

1006

)

Identification of many microRNAs that copurify with polyribosomes in mammalian neurons

,

Proc Natl Acad Sci USA

,

2004

, vol.

101

(pg.

360

-

365

)

New microRNAs from mouse and human

,

RNA

,

2003

, vol.

9

(pg.

175

-

179

)

et al.

(255 co-authors)

Initial sequencing and analysis of the human genome

,

Nature

,

2001

, vol.

409

(pg.

860

-

921

)

et al.

(108 co-authors)

Genome-wide atlas of gene expression in the adult mouse brain

,

Nature

,

2007

, vol.

445

(pg.

168

-

176

)

Prediction of mammalian microRNA targets

,

Cell

,

2003

, vol.

115

(pg.

787

-

798

)

et al.

(236 co-authors)

Genome sequence, comparative analysis and haplotype structure of the domestic dog

,

Nature

,

2005

, vol.

438

(pg.

803

-

819

)

Commissure formation in the mammalian forebrain

,

Curr Opin Neurobiol

,

2007

, vol.

17

(pg.

3

-

14

)

The corpus callosum as an evolutionary innovation

,

J Exp Zool B Mol Dev Evol

,

2006

, vol.

306

(pg.

8

-

17

)

et al.

(235 co-authors)

Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences

,

Nature

,

2007

, vol.

447

(pg.

167

-

177

)

et al.

(11 co-authors)

The expression profile of microRNAs in mouse embryos

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

1765

-

1771

)

et al.

(11 co-authors)

Resolution of the early placental mammal radiation using Bayesian phylogenetics

,

Science

,

2001

, vol.

294

(pg.

2348

-

2351

)

The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila

,

Cell

,

2007

, vol.

130

(pg.

89

-

100

)

Micro RNAs in animal development

,

Cell

,

2006

, vol.

124

(pg.

877

-

881

)

Evidence for a microRNA expansion in the bilaterian ancestor

,

Dev Genes Evol

,

2007

, vol.

217

(pg.

73

-

77

)

Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs

,

Genome Res

,

2007

, vol.

17

(pg.

1850

-

1864

)

A large imprinted microRNA gene cluster at the mouse Dlk1-Gtl2 domain

,

Genome Res

,

2004

, vol.

14

(pg.

1741

-

1748

)

Imprinted microRNA genes transcribed antisense to a reciprocally imprinted retrotransposon-like gene

,

Nat Genet

,

2003

, vol.

34

(pg.

261

-

262

)

The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint

,

J Exp Zoolog B Mol Dev Evol

,

2006

, vol.

306B

(pg.

575

-

588

)

Transposon-free regions in mammalian genomes

,

Genome Res

,

2006

, vol.

16

(pg.

164

-

172

)

Mammalian microRNAs derived from genomic repeats

,

Trends Genet

,

2005

, vol.

21

(pg.

322

-

326

)

et al.

(13 co-authors)

A gene atlas of the mouse and human protein-encoding transcriptomes

,

Proc Natl Acad Sci USA

,

2004

, vol.

101

(pg.

6062

-

6067

)

et al.

(12 co-authors)

Human embryonic stem cells express a unique set of microRNAs

,

Dev Biol

,

2004

, vol.

270

(pg.

488

-

498

)

Molecular evolution of a microRNA cluster

,

J Mol Biol

,

2004

, vol.

339

(pg.

327

-

335

)

et al.

(222 co-authors)

Initial sequencing and comparative analysis of the mouse genome

,

Nature

,

2002

, vol.

420

(pg.

520

-

562

)

Mfold web server for nucleic acid folding and hybridization prediction

,

Nucleic Acids Res

,

2003

, vol.

31

(pg.

3406

-

3415

)

Author notes

Andrew Roger, Associate Editor

Published by Oxford University Press 2008

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 2,069

1,230 Pageviews

839 PDF Downloads

Since 12/1/2016

Month: Total Views:
December 2016 1
February 2017 11
March 2017 13
April 2017 6
May 2017 15
June 2017 20
July 2017 10
August 2017 17
September 2017 8
October 2017 18
November 2017 5
December 2017 21
January 2018 30
February 2018 34
March 2018 32
April 2018 36
May 2018 29
June 2018 36
July 2018 35
August 2018 18
September 2018 5
October 2018 25
November 2018 32
December 2018 25
January 2019 28
February 2019 30
March 2019 19
April 2019 17
May 2019 24
June 2019 23
July 2019 29
August 2019 24
September 2019 22
October 2019 29
November 2019 34
December 2019 30
January 2020 21
February 2020 10
March 2020 19
April 2020 41
May 2020 32
June 2020 28
July 2020 23
August 2020 36
September 2020 16
October 2020 27
November 2020 31
December 2020 17
January 2021 15
February 2021 23
March 2021 45
April 2021 30
May 2021 25
June 2021 11
July 2021 10
August 2021 26
September 2021 31
October 2021 20
November 2021 11
December 2021 17
January 2022 21
February 2022 30
March 2022 18
April 2022 14
May 2022 21
June 2022 18
July 2022 29
August 2022 31
September 2022 12
October 2022 22
November 2022 31
December 2022 24
January 2023 30
February 2023 17
March 2023 18
April 2023 20
May 2023 14
June 2023 18
July 2023 22
August 2023 16
September 2023 18
October 2023 13
November 2023 13
December 2023 18
January 2024 18
February 2024 24
March 2024 24
April 2024 20
May 2024 23
June 2024 25
July 2024 28
August 2024 21
September 2024 27
October 2024 15

Citations

115 Web of Science

×

Email alerts

Email alerts

Citing articles via

More from Oxford Academic