Sonja Prohaska | Universität Leipzig (original) (raw)
Papers by Sonja Prohaska
Bioinformatics/computer Applications in The Biosciences, 2005
Summary: Most multi-alignment methods are fully automated, i.e. they are based on a x ed set of m... more Summary: Most multi-alignment methods are fully automated, i.e. they are based on a x ed set of math- ematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to inuence the alignment procedure. The user can specify parts of
In order to describe a cell at molecular level, a notion of a "gene" is neither necessa... more In order to describe a cell at molecular level, a notion of a "gene" is neither necessary nor helpful. It is suf - ficient to consider the molecules (i.e. chromosomes, tran- scripts, proteins) and their interactions to describe cell ular processes. The downside of the resulting high resolution is that it becomes very tedious to address features on the or-
Nucleic acids research, 2014
The cell cycle genes homology region (CHR) has been identified as a DNA element with an important... more The cell cycle genes homology region (CHR) has been identified as a DNA element with an important role in transcriptional regulation of late cell cycle genes. It has been shown that such genes are controlled by DREAM, MMB and FOXM1-MuvB and that these protein complexes can contact DNA via CHR sites. However, it has not been elucidated which sequence variations of the canonical CHR are functional and how frequent CHR-based regulation is utilized in mammalian genomes. Here, we define the spectrum of functional CHR elements. As the basis for a computational meta-analysis, we identify new CHR sequences and compile phylogenetic motif conservation as well as genome-wide protein-DNA binding and gene expression data. We identify CHR elements in most late cell cycle genes binding DREAM, MMB, or FOXM1-MuvB. In contrast, Myb- and forkhead-binding sites are underrepresented in both early and late cell cycle genes. Our findings support a general mechanism: sequential binding of DREAM, MMB and FO...
The analysis of the publicly available Hox gene sequences from the sea lamprey Petromyzon marinus... more The analysis of the publicly available Hox gene sequences from the sea lamprey Petromyzon marinus provides evidence that the Hox clusters in lampreys and other vertebrate species arose from independent duplications. In particular, our analysis supports the hypothesis that the last common ancestor of agnathans and gnathostomes had only a single Hox cluster which was subsequently duplicated independently in the two lineages.
The Hox gene clusters of gnathostomes have a strong tendency to exclude repetitive DNA elements. ... more The Hox gene clusters of gnathostomes have a strong tendency to exclude repetitive DNA elements. In contrast, no such trend can be found in the Hox gene clusters of protostomes. Repeats "invade" the gnathostome Hox clusters from the 5' and 3' ends while the core of the clusters remains virtually free of repetitive DNA.
Molecular Phylogenetics and Evolution, 2004
Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the... more Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. Since these elements are subject to stabilizing selection they evolve much more slowly than adjacent non-functional DNA. These so-called phylogenetic footprints can be detected by comparison of the sequences surrounding orthologous genes in different species. Therefore the loss of phylogenetic footprints as well as the acquisition of conserved non-coding sequences in some lineages, but not in others, can provide evidence for the evolutionary modification of cis-regulatory elements. We introduce here a statistical model of footprint evolution that allows us to estimate the loss of sequence conservation that can be attributed to gene loss and other structural reasons. This approach to studying the pattern of cis-regulatory element evolution, however, requires the comparison of relatively long sequences from many species. We have therefore developed an efficient software tool for the identification of corresponding footprints in long sequences from multiple species. We apply this novel method to the published sequences of HoxA clusters of shark, human, and the duplicated zebrafish and Takifugu clusters as well as the published HoxB cluster sequences. We find that there is a massive loss of sequence conservation in the intergenic region of the HoxA clusters, consistent with the finding in [Chiu et al., PNAS 99 (2002) 5492]. The loss of conservation after cluster duplication is more extensive than expected from structural reasons. This suggests that binding site turnover and/or adaptive modification may also contribute to the loss of sequence conservation.
Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathemati... more Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequences that are biologically related to each other; our software program will use these sites as anchor points and create a multiple alignment respecting these user-defined constraints. By using functionally, structurally or evolutionarily related positions of the input sequences as anchor points, our method can produce alignments that reflect the true biological relations among the input sequences more accurately than fully automated procedures can do. Availability: Our software is online available at GÖttingen BIoinformatics Compute Server (GOBICS),
Gene expression is a complex multiple-step process involving multiple levels of regulation, from ... more Gene expression is a complex multiple-step process involving multiple levels of regulation, from transcription, nuclear processing, export, posttranscriptional modifications, translation, to degradation. Over evolutionary timescales, many of the interactions determining the fate of a gene have left traces in the genomic DNA. Comparative genomics, therefore, promises a rich source of data on the functional interplay of cellular mechanisms. In this chapter we review a few aspects of such a research agenda.
The diverse fields of Omics research share a common logical structure combining a cataloging effo... more The diverse fields of Omics research share a common logical structure combining a cataloging effort for a particular class of molecules or interactions, the underlying -ome, and a quantitative aspect attempting to record spatiotemporal patterns of concentration, expression, or variation. Consequently, these fields also share a common set of difficulties and limitations. In spite of the great success stories of Omics projects over the last decade, much remains to be understood not only at the technological, but also at the conceptual level. Here, we focus on the dark corners of Omics research, where the problems, limitations, conceptual difficulties, and lack of knowledge are hidden.
Gene expression in eukaryotic cells is regulated by a complex network of interac- tions, in which... more Gene expression in eukaryotic cells is regulated by a complex network of interac- tions, in which transcription factors and their binding sites on the genomic DNA play a determining role. As transcriptions factor rarely, if ever, act in isolation, binding sites of interacting factors are typically arranged in close proximity forming so-called cis-regulatory modules. Even when the individual binding sites
PLoS ONE, 2014
The elucidation of orthology relationships is an important step both in gene function prediction ... more The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more finegrained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Theory in Biosciences, 2005
A plethora of new functions of non-coding RNAs have been discovered in past few years. In fact, R... more A plethora of new functions of non-coding RNAs have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and
Theory in Biosciences, 2004
Higher teleost fishes, including zebrafish and fugu, have duplicated their Hox genes relative to ... more Higher teleost fishes, including zebrafish and fugu, have duplicated their Hox genes relative to the gene inventory of other gnathostome lineages. The most widely accepted theory contends that the duplicate Hox clusters orginated synchronously during a single genome duplication event in the early history of ray-finned fishes. In this contribution we collect and re-evaluate all publicly available sequence information. In particular, we show that the short Hox gene fragments from published PCR surveys of the killifish Fundulus heteroclitus, the medaka Oryzias latipes and the goldfish Carassius auratus can be used to determine with little ambiguity not only their paralog group but also their membership in a particular cluster.Together with a survey of the genomic sequence data from the pufferfish Tetraodon nigroviridis we show that at least percomorpha, and possibly all eutelosts, share a system of 7 or 8 orthologous Hox gene clusters. There is little doubt about the orthology of the two teleost duplicates of the HoxA and HoxB clusters. A careful analysis of both the coding sequence of Hox genes and of conserved non-coding sequences provides additional support for the "duplication early" hypothesis that the Hox clusters in teleosts are derived from eight ancestral clusters by means of subsequent gene loss; the data remain ambiguous, however, in particular for the HoxC clusters.Assuming the "duplication early" hypothesis we use the new evidence on the Hox gene complements to determine the phylogenetic positions of gene-loss events in the wake of the cluster duplication. Surprisingly, we find that the resolution of redundancy seems to be a slow process that is still ongoing. A few suggestions on which additional sequence data would be most informative for resolving the history of the teleostean Hox genes are discussed.
Physical Biology, 2013
Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulato... more Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulatory switches within the transcriptome. Only recently, mathematical models of these mechanisms have been established. So far they have not been applied to genome-wide data. We here introduce a mathematical model of transcriptional regulation by histone modifications and apply it to data of trimethylation of histone 3 at lysine 4 (H3K4me3) and 27 (H3K27me3) in mouse pluripotent and lineage-committed cells. The model describes binding of protein complexes to chromatin which are capable of reading and writing histone marks. Molecular interactions of the complexes with DNA and modified histones create a regulatory switch of transcriptional activity. The regulatory states of the switch depend on the activity of histone (de-) methylases, the strength of complex-DNA-binding and the number of nucleosomes capable of cooperatively contributing to complex-binding. Our model explains experimentally measured length distributions of modified chromatin regions. It suggests (i) that high CpG-density facilitates recruitment of the modifying complexes in embryonic stem cells and (ii) that re-organization of extended chromatin regions during lineage specification into neuronal progenitor cells requires targeted de-modification. Our approach represents a basic step towards multi-scale models of transcriptional control during development and lineage specification.
Nature, 2007
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the pr... more Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
Molecular Phylogenetics and Evolution, 2004
Hox genes code for transcription factors that play a major role in the development of all animal ... more Hox genes code for transcription factors that play a major role in the development of all animal phyla. In invertebrates these genes usually occur as tightly linked cluster, with a few exceptions where the clusters have been dissolved. Only in vertebrates multiple clusters have been demonstrated which arose by duplication from a single ancestral cluster. This history of Hox cluster duplications, in particular during the early elaboration of the vertebrate body plan, is still poorly understood. In this paper we report the results of a PCR survey on genomic DNA of the pacific hagfish Eptatretus stoutii. Hagfishes are one of two clades of recent jawless fishes that are an offshoot of the early radiation of jawless vertebrates. Our data provides evidence for at least 33 distinct Hox genes in the hagfish genome, which is most compatible
Molecular Biology and Evolution, 2004
In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-prote... more In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important noncoding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative cis-regulatory elements.
Molecular Biology and Evolution, 2009
Vault RNAs (vRNAs) are small, about 100nt long, poly-III transcripts contained in the vault parti... more Vault RNAs (vRNAs) are small, about 100nt long, poly-III transcripts contained in the vault particles of eukaryotic cells. Presumably due to their enigmatic function they have received little attention compared to other ncRNA families. Here we report on a systematic study of this rapidly evolving class of ncRNAs in deuterostomes, providing a comprehensive collection of computationally predicted vRNA genes. Previously known vRNAs are located at a conserved genomic region linked to the protocadherin gene cluster, an association that is conserved throughout gnathostomes. Lineage specific expansions to small vRNA gene clusters are frequently observed at this locus. Expression of several paralogous vRNA genes, most but not all located at the canonical syntenically conserved locus, was verified by RT-PCR in both zebrafish and medaka. Homology search furthermore identifies an additional vRNA gene in eutheria that was misclassified as a microRNA. Lineage specific loss of one of the two loci in several eutherian lineages suggests compensation among vRNA transcripts and supports the annotation of the novel locus as functional vRNA. The comparative analysis of the promoter structure shows substantial differences between the two eutherian vRNA loci, explaining their differential expression patterns in human cancer cell lines.
Bioinformatics/computer Applications in The Biosciences, 2005
Summary: Most multi-alignment methods are fully automated, i.e. they are based on a x ed set of m... more Summary: Most multi-alignment methods are fully automated, i.e. they are based on a x ed set of math- ematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to inuence the alignment procedure. The user can specify parts of
In order to describe a cell at molecular level, a notion of a "gene" is neither necessa... more In order to describe a cell at molecular level, a notion of a "gene" is neither necessary nor helpful. It is suf - ficient to consider the molecules (i.e. chromosomes, tran- scripts, proteins) and their interactions to describe cell ular processes. The downside of the resulting high resolution is that it becomes very tedious to address features on the or-
Nucleic acids research, 2014
The cell cycle genes homology region (CHR) has been identified as a DNA element with an important... more The cell cycle genes homology region (CHR) has been identified as a DNA element with an important role in transcriptional regulation of late cell cycle genes. It has been shown that such genes are controlled by DREAM, MMB and FOXM1-MuvB and that these protein complexes can contact DNA via CHR sites. However, it has not been elucidated which sequence variations of the canonical CHR are functional and how frequent CHR-based regulation is utilized in mammalian genomes. Here, we define the spectrum of functional CHR elements. As the basis for a computational meta-analysis, we identify new CHR sequences and compile phylogenetic motif conservation as well as genome-wide protein-DNA binding and gene expression data. We identify CHR elements in most late cell cycle genes binding DREAM, MMB, or FOXM1-MuvB. In contrast, Myb- and forkhead-binding sites are underrepresented in both early and late cell cycle genes. Our findings support a general mechanism: sequential binding of DREAM, MMB and FO...
The analysis of the publicly available Hox gene sequences from the sea lamprey Petromyzon marinus... more The analysis of the publicly available Hox gene sequences from the sea lamprey Petromyzon marinus provides evidence that the Hox clusters in lampreys and other vertebrate species arose from independent duplications. In particular, our analysis supports the hypothesis that the last common ancestor of agnathans and gnathostomes had only a single Hox cluster which was subsequently duplicated independently in the two lineages.
The Hox gene clusters of gnathostomes have a strong tendency to exclude repetitive DNA elements. ... more The Hox gene clusters of gnathostomes have a strong tendency to exclude repetitive DNA elements. In contrast, no such trend can be found in the Hox gene clusters of protostomes. Repeats "invade" the gnathostome Hox clusters from the 5' and 3' ends while the core of the clusters remains virtually free of repetitive DNA.
Molecular Phylogenetics and Evolution, 2004
Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the... more Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. Since these elements are subject to stabilizing selection they evolve much more slowly than adjacent non-functional DNA. These so-called phylogenetic footprints can be detected by comparison of the sequences surrounding orthologous genes in different species. Therefore the loss of phylogenetic footprints as well as the acquisition of conserved non-coding sequences in some lineages, but not in others, can provide evidence for the evolutionary modification of cis-regulatory elements. We introduce here a statistical model of footprint evolution that allows us to estimate the loss of sequence conservation that can be attributed to gene loss and other structural reasons. This approach to studying the pattern of cis-regulatory element evolution, however, requires the comparison of relatively long sequences from many species. We have therefore developed an efficient software tool for the identification of corresponding footprints in long sequences from multiple species. We apply this novel method to the published sequences of HoxA clusters of shark, human, and the duplicated zebrafish and Takifugu clusters as well as the published HoxB cluster sequences. We find that there is a massive loss of sequence conservation in the intergenic region of the HoxA clusters, consistent with the finding in [Chiu et al., PNAS 99 (2002) 5492]. The loss of conservation after cluster duplication is more extensive than expected from structural reasons. This suggests that binding site turnover and/or adaptive modification may also contribute to the loss of sequence conservation.
Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathemati... more Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequences that are biologically related to each other; our software program will use these sites as anchor points and create a multiple alignment respecting these user-defined constraints. By using functionally, structurally or evolutionarily related positions of the input sequences as anchor points, our method can produce alignments that reflect the true biological relations among the input sequences more accurately than fully automated procedures can do. Availability: Our software is online available at GÖttingen BIoinformatics Compute Server (GOBICS),
Gene expression is a complex multiple-step process involving multiple levels of regulation, from ... more Gene expression is a complex multiple-step process involving multiple levels of regulation, from transcription, nuclear processing, export, posttranscriptional modifications, translation, to degradation. Over evolutionary timescales, many of the interactions determining the fate of a gene have left traces in the genomic DNA. Comparative genomics, therefore, promises a rich source of data on the functional interplay of cellular mechanisms. In this chapter we review a few aspects of such a research agenda.
The diverse fields of Omics research share a common logical structure combining a cataloging effo... more The diverse fields of Omics research share a common logical structure combining a cataloging effort for a particular class of molecules or interactions, the underlying -ome, and a quantitative aspect attempting to record spatiotemporal patterns of concentration, expression, or variation. Consequently, these fields also share a common set of difficulties and limitations. In spite of the great success stories of Omics projects over the last decade, much remains to be understood not only at the technological, but also at the conceptual level. Here, we focus on the dark corners of Omics research, where the problems, limitations, conceptual difficulties, and lack of knowledge are hidden.
Gene expression in eukaryotic cells is regulated by a complex network of interac- tions, in which... more Gene expression in eukaryotic cells is regulated by a complex network of interac- tions, in which transcription factors and their binding sites on the genomic DNA play a determining role. As transcriptions factor rarely, if ever, act in isolation, binding sites of interacting factors are typically arranged in close proximity forming so-called cis-regulatory modules. Even when the individual binding sites
PLoS ONE, 2014
The elucidation of orthology relationships is an important step both in gene function prediction ... more The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more finegrained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Theory in Biosciences, 2005
A plethora of new functions of non-coding RNAs have been discovered in past few years. In fact, R... more A plethora of new functions of non-coding RNAs have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and
Theory in Biosciences, 2004
Higher teleost fishes, including zebrafish and fugu, have duplicated their Hox genes relative to ... more Higher teleost fishes, including zebrafish and fugu, have duplicated their Hox genes relative to the gene inventory of other gnathostome lineages. The most widely accepted theory contends that the duplicate Hox clusters orginated synchronously during a single genome duplication event in the early history of ray-finned fishes. In this contribution we collect and re-evaluate all publicly available sequence information. In particular, we show that the short Hox gene fragments from published PCR surveys of the killifish Fundulus heteroclitus, the medaka Oryzias latipes and the goldfish Carassius auratus can be used to determine with little ambiguity not only their paralog group but also their membership in a particular cluster.Together with a survey of the genomic sequence data from the pufferfish Tetraodon nigroviridis we show that at least percomorpha, and possibly all eutelosts, share a system of 7 or 8 orthologous Hox gene clusters. There is little doubt about the orthology of the two teleost duplicates of the HoxA and HoxB clusters. A careful analysis of both the coding sequence of Hox genes and of conserved non-coding sequences provides additional support for the "duplication early" hypothesis that the Hox clusters in teleosts are derived from eight ancestral clusters by means of subsequent gene loss; the data remain ambiguous, however, in particular for the HoxC clusters.Assuming the "duplication early" hypothesis we use the new evidence on the Hox gene complements to determine the phylogenetic positions of gene-loss events in the wake of the cluster duplication. Surprisingly, we find that the resolution of redundancy seems to be a slow process that is still ongoing. A few suggestions on which additional sequence data would be most informative for resolving the history of the teleostean Hox genes are discussed.
Physical Biology, 2013
Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulato... more Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulatory switches within the transcriptome. Only recently, mathematical models of these mechanisms have been established. So far they have not been applied to genome-wide data. We here introduce a mathematical model of transcriptional regulation by histone modifications and apply it to data of trimethylation of histone 3 at lysine 4 (H3K4me3) and 27 (H3K27me3) in mouse pluripotent and lineage-committed cells. The model describes binding of protein complexes to chromatin which are capable of reading and writing histone marks. Molecular interactions of the complexes with DNA and modified histones create a regulatory switch of transcriptional activity. The regulatory states of the switch depend on the activity of histone (de-) methylases, the strength of complex-DNA-binding and the number of nucleosomes capable of cooperatively contributing to complex-binding. Our model explains experimentally measured length distributions of modified chromatin regions. It suggests (i) that high CpG-density facilitates recruitment of the modifying complexes in embryonic stem cells and (ii) that re-organization of extended chromatin regions during lineage specification into neuronal progenitor cells requires targeted de-modification. Our approach represents a basic step towards multi-scale models of transcriptional control during development and lineage specification.
Nature, 2007
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the pr... more Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
Molecular Phylogenetics and Evolution, 2004
Hox genes code for transcription factors that play a major role in the development of all animal ... more Hox genes code for transcription factors that play a major role in the development of all animal phyla. In invertebrates these genes usually occur as tightly linked cluster, with a few exceptions where the clusters have been dissolved. Only in vertebrates multiple clusters have been demonstrated which arose by duplication from a single ancestral cluster. This history of Hox cluster duplications, in particular during the early elaboration of the vertebrate body plan, is still poorly understood. In this paper we report the results of a PCR survey on genomic DNA of the pacific hagfish Eptatretus stoutii. Hagfishes are one of two clades of recent jawless fishes that are an offshoot of the early radiation of jawless vertebrates. Our data provides evidence for at least 33 distinct Hox genes in the hagfish genome, which is most compatible
Molecular Biology and Evolution, 2004
In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-prote... more In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important noncoding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative cis-regulatory elements.
Molecular Biology and Evolution, 2009
Vault RNAs (vRNAs) are small, about 100nt long, poly-III transcripts contained in the vault parti... more Vault RNAs (vRNAs) are small, about 100nt long, poly-III transcripts contained in the vault particles of eukaryotic cells. Presumably due to their enigmatic function they have received little attention compared to other ncRNA families. Here we report on a systematic study of this rapidly evolving class of ncRNAs in deuterostomes, providing a comprehensive collection of computationally predicted vRNA genes. Previously known vRNAs are located at a conserved genomic region linked to the protocadherin gene cluster, an association that is conserved throughout gnathostomes. Lineage specific expansions to small vRNA gene clusters are frequently observed at this locus. Expression of several paralogous vRNA genes, most but not all located at the canonical syntenically conserved locus, was verified by RT-PCR in both zebrafish and medaka. Homology search furthermore identifies an additional vRNA gene in eutheria that was misclassified as a microRNA. Lineage specific loss of one of the two loci in several eutherian lineages suggests compensation among vRNA transcripts and supports the annotation of the novel locus as functional vRNA. The comparative analysis of the promoter structure shows substantial differences between the two eutherian vRNA loci, explaining their differential expression patterns in human cancer cell lines.