Adam Siepel | Cornell University (original) (raw)
Papers by Adam Siepel
Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence ... more Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data.
Abstract Recent advances in computing have revolutionized many branches of science, but none more... more Abstract Recent advances in computing have revolutionized many branches of science, but none more so than molecular biology and genetics. Many of the hottest areas in these fields—such as genome sequencing, microarray analysis, comparative genomics, population genetics, and disease association mapping—rest on a foundation of computational tools and resources. Furthermore, computing has moved beyond its initial support role in these fields into the forefront of biological discovery.
Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a ph... more Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a phylogenetic model, a probabilistic model of molecular evolution that considers the phylogeny, or evolutionary tree, by which a set of present-day organisms are related. The EM algorithm is then extended for use with a combined phylogenetic and hidden Markov model.
Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplicatio... more Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest.
ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonp... more ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonpoint sources from agricultural land. Such issues are addressed by the Water Erosion Prediction Project (WEPP) of the USDA-ARS. The WEPP technology is relatively demanding in parameter information which is commonly available only in the USA.
Abstract Comparative genomics of closely related bacterial species with different pathogenesis an... more Abstract Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen.
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here ... more The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering~ 4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for~ 60% of constrained bases.
Abstract While most work in computational molecular biology since its inception in the 1970s has ... more Abstract While most work in computational molecular biology since its inception in the 1970s has focused on problems involving DNA and amino acid sequences, there has been growing interest during the past decade in the use of alternative models of molecular evolution that are based on the order and content of genes in complete genomes.
Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence ana... more Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets.
Transcription is the first step connecting genetic information with an organism's phenotype. Whil... more Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq) to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain.
Comparative genomics allow us to search the human genome for segments that were extensively chang... more Comparative genomics allow us to search the human genome for segments that were extensively changed in the last~ 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human.
'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the south... more 'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes.
Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expec... more Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny.
Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is pro... more Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome. ucsc. edu, downloaded in bulk by anonymous FTP from http://hgdownload. cse. ucsc. edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2. bx. psu. edu.
A few models have appeared in recent years that consider not only the way substitutions occur thr... more A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site.
ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin ... more ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin immunoprecipitation coupled with DNA microarrays to determine the localization of RNA polymerase II (Pol II), estrogen receptor alpha (ERα), steroid receptor coactivator proteins (SRC), and acetylated histones H3/H4 (AcH) at estrogen-regulated promoters in MCF-7 cells with or without estradiol (E2) treatment.
Abstract The problem of estimating evolutionary distance from differences in gene order has been ... more Abstract The problem of estimating evolutionary distance from differences in gene order has been distilled to the problem of finding the reversal distance between two signed permutations. During the last decade, much progress was made both in computing reversal distance and in finding a minimum sequence of sorting reversals.
Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance contai... more Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance containing uc. 338. A UCSC genome browser shot (http://genome. ucsc. edu) of the PCBP2 exonized instance and the two exons flanking it (3.6 kb region). Tracks (top to bottom) show: The region conserved with coelacanth; PCBP2 (whole and fragmented) isoforms, showing the alternatively-spliced nature of the exonization event; Multi-species conservation track (Siepel et al., 2005); Location of uc.
The rearrangement distance between single-chromosome genomes can be estimated as the minimum numb... more The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as" inversion distance," can be computed as the reversal distance between signed permutations.
Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence ... more Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data.
Abstract Recent advances in computing have revolutionized many branches of science, but none more... more Abstract Recent advances in computing have revolutionized many branches of science, but none more so than molecular biology and genetics. Many of the hottest areas in these fields—such as genome sequencing, microarray analysis, comparative genomics, population genetics, and disease association mapping—rest on a foundation of computational tools and resources. Furthermore, computing has moved beyond its initial support role in these fields into the forefront of biological discovery.
Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a ph... more Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a phylogenetic model, a probabilistic model of molecular evolution that considers the phylogeny, or evolutionary tree, by which a set of present-day organisms are related. The EM algorithm is then extended for use with a combined phylogenetic and hidden Markov model.
Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplicatio... more Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest.
ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonp... more ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonpoint sources from agricultural land. Such issues are addressed by the Water Erosion Prediction Project (WEPP) of the USDA-ARS. The WEPP technology is relatively demanding in parameter information which is commonly available only in the USA.
Abstract Comparative genomics of closely related bacterial species with different pathogenesis an... more Abstract Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen.
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here ... more The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering~ 4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for~ 60% of constrained bases.
Abstract While most work in computational molecular biology since its inception in the 1970s has ... more Abstract While most work in computational molecular biology since its inception in the 1970s has focused on problems involving DNA and amino acid sequences, there has been growing interest during the past decade in the use of alternative models of molecular evolution that are based on the order and content of genes in complete genomes.
Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence ana... more Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets.
Transcription is the first step connecting genetic information with an organism's phenotype. Whil... more Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq) to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain.
Comparative genomics allow us to search the human genome for segments that were extensively chang... more Comparative genomics allow us to search the human genome for segments that were extensively changed in the last~ 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human.
'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the south... more 'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes.
Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expec... more Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny.
Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is pro... more Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome. ucsc. edu, downloaded in bulk by anonymous FTP from http://hgdownload. cse. ucsc. edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2. bx. psu. edu.
A few models have appeared in recent years that consider not only the way substitutions occur thr... more A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site.
ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin ... more ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin immunoprecipitation coupled with DNA microarrays to determine the localization of RNA polymerase II (Pol II), estrogen receptor alpha (ERα), steroid receptor coactivator proteins (SRC), and acetylated histones H3/H4 (AcH) at estrogen-regulated promoters in MCF-7 cells with or without estradiol (E2) treatment.
Abstract The problem of estimating evolutionary distance from differences in gene order has been ... more Abstract The problem of estimating evolutionary distance from differences in gene order has been distilled to the problem of finding the reversal distance between two signed permutations. During the last decade, much progress was made both in computing reversal distance and in finding a minimum sequence of sorting reversals.
Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance contai... more Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance containing uc. 338. A UCSC genome browser shot (http://genome. ucsc. edu) of the PCBP2 exonized instance and the two exons flanking it (3.6 kb region). Tracks (top to bottom) show: The region conserved with coelacanth; PCBP2 (whole and fragmented) isoforms, showing the alternatively-spliced nature of the exonization event; Multi-species conservation track (Siepel et al., 2005); Location of uc.
The rearrangement distance between single-chromosome genomes can be estimated as the minimum numb... more The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as" inversion distance," can be computed as the reversal distance between signed permutations.