Adam Siepel | Cornell University (original) (raw)

Papers by Adam Siepel

Research paper thumbnail of The UCSC genome browser database: update 2006

Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence ... more Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data.

Research paper thumbnail of Computational education for molecular biology and genetics

Abstract Recent advances in computing have revolutionized many branches of science, but none more... more Abstract Recent advances in computing have revolutionized many branches of science, but none more so than molecular biology and genetics. Many of the hottest areas in these fields—such as genome sequencing, microarray analysis, comparative genomics, population genetics, and disease association mapping—rest on a foundation of computational tools and resources. Furthermore, computing has moved beyond its initial support role in these fields into the forefront of biological discovery.

Research paper thumbnail of Expectation Maximization for Combined Phylogenetic and Hidden Markov Models

Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a ph... more Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a phylogenetic model, a probabilistic model of molecular evolution that considers the phylogeny, or evolutionary tree, by which a set of present-day organisms are related. The EM algorithm is then extended for use with a combined phylogenetic and hidden Markov model.

Research paper thumbnail of Evolutionary history reconstruction for Mammalian complex gene clusters

Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplicatio... more Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest.

Research paper thumbnail of A SIMPLE HILLSLOPE EROSION MODEL WITH VEGETATION ELEMENTS

ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonp... more ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonpoint sources from agricultural land. Such issues are addressed by the Water Erosion Prediction Project (WEPP) of the USDA-ARS. The WEPP technology is relatively demanding in parameter information which is commonly available only in the USA.

Research paper thumbnail of Patterns of positive selection on the mammalian tree

Research paper thumbnail of Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution

Abstract Comparative genomics of closely related bacterial species with different pathogenesis an... more Abstract Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen.

Research paper thumbnail of A high-resolution map of human evolutionary constraint using 29 mammals

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here ... more The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering~ 4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for~ 60% of constrained bases.

Research paper thumbnail of Exact algorithms for the reversal median problem

Abstract While most work in computational molecular biology since its inception in the 1970s has ... more Abstract While most work in computational molecular biology since its inception in the 1970s has focused on problems involving DNA and amino acid sequences, there has been growing interest during the past decade in the use of alternative models of molecular evolution that are based on the order and content of genes in complete genomes.

Research paper thumbnail of Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence ana... more Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets.

Research paper thumbnail of Intergenic and repeat transcription in human, chimpanzee and macaque brains measured by RNA-Seq

Transcription is the first step connecting genetic information with an organism's phenotype. Whil... more Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq) to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain.

Research paper thumbnail of Forces shaping the fastest evolving regions in the human genome

Comparative genomics allow us to search the human genome for segments that were extensively chang... more Comparative genomics allow us to search the human genome for segments that were extensively changed in the last~ 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human.

Research paper thumbnail of Comparative and demographic analysis of orang-utan genomes

'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the south... more 'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes.

Research paper thumbnail of Detection of nonneutral substitution rates on mammalian phylogenies

Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expec... more Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny.

Research paper thumbnail of 28-way vertebrate alignment and conservation track in the UCSC Genome Browser

Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is pro... more Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome. ucsc. edu, downloaded in bulk by anonymous FTP from http://hgdownload. cse. ucsc. edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2. bx. psu. edu.

Research paper thumbnail of Combining phylogenetic and hidden Markov models in biosequence analysis

A few models have appeared in recent years that consider not only the way substitutions occur thr... more A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site.

Research paper thumbnail of Genomic analyses of transcription factor binding, histone acetylation, and gene expression reveal mechanistically distinct classes of estrogen-regulated promoters

ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin ... more ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin immunoprecipitation coupled with DNA microarrays to determine the localization of RNA polymerase II (Pol II), estrogen receptor alpha (ERα), steroid receptor coactivator proteins (SRC), and acetylated histones H3/H4 (AcH) at estrogen-regulated promoters in MCF-7 cells with or without estradiol (E2) treatment.

Research paper thumbnail of An algorithm to enumerate all sorting reversals

Abstract The problem of estimating evolutionary distance from differences in gene order has been ... more Abstract The problem of estimating evolutionary distance from differences in gene order has been distilled to the problem of finding the reversal distance between two signed permutations. During the last decade, much progress was made both in computing reversal distance and in finding a minimum sequence of sorting reversals.

Research paper thumbnail of An enhancer near ISL1 and an ultraconserved PCBP2 exon are derived from a novel retroposon

Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance contai... more Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance containing uc. 338. A UCSC genome browser shot (http://genome. ucsc. edu) of the PCBP2 exonized instance and the two exons flanking it (3.6 kb region). Tracks (top to bottom) show: The region conserved with coelacanth; PCBP2 (whole and fragmented) isoforms, showing the alternatively-spliced nature of the exonization event; Multi-species conservation track (Siepel et al., 2005); Location of uc.

Research paper thumbnail of An algorithm to enumerate sorting reversals for signed permutations

The rearrangement distance between single-chromosome genomes can be estimated as the minimum numb... more The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as" inversion distance," can be computed as the reversal distance between signed permutations.

Research paper thumbnail of The UCSC genome browser database: update 2006

Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence ... more Abstract The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data.

Research paper thumbnail of Computational education for molecular biology and genetics

Abstract Recent advances in computing have revolutionized many branches of science, but none more... more Abstract Recent advances in computing have revolutionized many branches of science, but none more so than molecular biology and genetics. Many of the hottest areas in these fields—such as genome sequencing, microarray analysis, comparative genomics, population genetics, and disease association mapping—rest on a foundation of computational tools and resources. Furthermore, computing has moved beyond its initial support role in these fields into the forefront of biological discovery.

Research paper thumbnail of Expectation Maximization for Combined Phylogenetic and Hidden Markov Models

Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a ph... more Abstract An expectation maximization (EM) algorithm is derived to estimate the parameters of a phylogenetic model, a probabilistic model of molecular evolution that considers the phylogeny, or evolutionary tree, by which a set of present-day organisms are related. The EM algorithm is then extended for use with a combined phylogenetic and hidden Markov model.

Research paper thumbnail of Evolutionary history reconstruction for Mammalian complex gene clusters

Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplicatio... more Abstract Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest.

Research paper thumbnail of A SIMPLE HILLSLOPE EROSION MODEL WITH VEGETATION ELEMENTS

ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonp... more ABSTRACT Soil and water conservation practices are increasingly being considered for curbing nonpoint sources from agricultural land. Such issues are addressed by the Water Erosion Prediction Project (WEPP) of the USDA-ARS. The WEPP technology is relatively demanding in parameter information which is commonly available only in the USA.

Research paper thumbnail of Patterns of positive selection on the mammalian tree

Research paper thumbnail of Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution

Abstract Comparative genomics of closely related bacterial species with different pathogenesis an... more Abstract Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen.

Research paper thumbnail of A high-resolution map of human evolutionary constraint using 29 mammals

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here ... more The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering~ 4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for~ 60% of constrained bases.

Research paper thumbnail of Exact algorithms for the reversal median problem

Abstract While most work in computational molecular biology since its inception in the 1970s has ... more Abstract While most work in computational molecular biology since its inception in the 1970s has focused on problems involving DNA and amino acid sequences, there has been growing interest during the past decade in the use of alternative models of molecular evolution that are based on the order and content of genes in complete genomes.

Research paper thumbnail of Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence ana... more Abstract A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets.

Research paper thumbnail of Intergenic and repeat transcription in human, chimpanzee and macaque brains measured by RNA-Seq

Transcription is the first step connecting genetic information with an organism's phenotype. Whil... more Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq) to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain.

Research paper thumbnail of Forces shaping the fastest evolving regions in the human genome

Comparative genomics allow us to search the human genome for segments that were extensively chang... more Comparative genomics allow us to search the human genome for segments that were extensively changed in the last~ 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human.

Research paper thumbnail of Comparative and demographic analysis of orang-utan genomes

'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the south... more 'Orang-utan'is derived from a Malay term meaning 'man of the forest'and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes.

Research paper thumbnail of Detection of nonneutral substitution rates on mammalian phylogenies

Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expec... more Abstract Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny.

Research paper thumbnail of 28-way vertebrate alignment and conservation track in the UCSC Genome Browser

Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is pro... more Abstract This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome. ucsc. edu, downloaded in bulk by anonymous FTP from http://hgdownload. cse. ucsc. edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2. bx. psu. edu.

Research paper thumbnail of Combining phylogenetic and hidden Markov models in biosequence analysis

A few models have appeared in recent years that consider not only the way substitutions occur thr... more A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site.

Research paper thumbnail of Genomic analyses of transcription factor binding, histone acetylation, and gene expression reveal mechanistically distinct classes of estrogen-regulated promoters

ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin ... more ABSTRACT To explore the global mechanisms of estrogen-regulated transcription, we used chromatin immunoprecipitation coupled with DNA microarrays to determine the localization of RNA polymerase II (Pol II), estrogen receptor alpha (ERα), steroid receptor coactivator proteins (SRC), and acetylated histones H3/H4 (AcH) at estrogen-regulated promoters in MCF-7 cells with or without estradiol (E2) treatment.

Research paper thumbnail of An algorithm to enumerate all sorting reversals

Abstract The problem of estimating evolutionary distance from differences in gene order has been ... more Abstract The problem of estimating evolutionary distance from differences in gene order has been distilled to the problem of finding the reversal distance between two signed permutations. During the last decade, much progress was made both in computing reversal distance and in finding a minimum sequence of sorting reversals.

Research paper thumbnail of An enhancer near ISL1 and an ultraconserved PCBP2 exon are derived from a novel retroposon

Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance contai... more Figure S1: Distinctive accumulation of short human paralogs to the PCBP2 exonized instance containing uc. 338. A UCSC genome browser shot (http://genome. ucsc. edu) of the PCBP2 exonized instance and the two exons flanking it (3.6 kb region). Tracks (top to bottom) show: The region conserved with coelacanth; PCBP2 (whole and fragmented) isoforms, showing the alternatively-spliced nature of the exonization event; Multi-species conservation track (Siepel et al., 2005); Location of uc.

Research paper thumbnail of An algorithm to enumerate sorting reversals for signed permutations

The rearrangement distance between single-chromosome genomes can be estimated as the minimum numb... more The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as" inversion distance," can be computed as the reversal distance between signed permutations.