Jennifer Wortman - Academia.edu (original) (raw)

Papers by Jennifer Wortman

Research paper thumbnail of The Genome Sequence of Trypanosoma cruzi , Etiologic Agent of Chagas Disease

Science, 2005

Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid gen... more Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei , and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.

Research paper thumbnail of Revitalization of a Forward Genetic Screen Identifies Three New Regulators of Fungal Secondary Metabolism in the Genus Aspergillus

mBio

The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to ... more The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to aflatoxin’s carcinogenic properties and frequency as a food and feed contaminant. Significant progress has been made by utilizing the model organism Aspergillus nidulans to characterize the regulation of sterigmatocystin (ST), the penultimate precursor of aflatoxin. A previous forward genetic screen identified 23 A. nidulans mutants involved in regulating ST production. Six mutants were characterized from this screen using classical mapping (five mutations in mcsA ) and complementation with a cosmid library (one mutation in laeA ). The remaining mutants were backcrossed and sequenced using Illumina and Ion Torrent sequencing platforms. All but one mutant contained one or more sequence variants in predicted open reading frames. Deletion of these genes resulted in identification of mutant alleles responsible for the loss of ST production in 12 of the 17 remaining mutants. Eight of these mu...

Research paper thumbnail of Emergence of epidemic multidrug-resistant Enterococcus faecium from animal and commensal strains

Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-... more Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-resistant hospital-acquired infection in the 1980s. As the living record of its adaptation to changes in habitat, we sequenced the genomes of 51 strains, isolated from various ecological environments, to understand how E. faecium emerged as a leading hospital pathogen. Because of the scale and diversity of the sampled strains, we were able to resolve the lineage responsible for epidemic, multidrug-resistant human infection from other strains and to measure the evolutionary distances between groups. We found that the epidemic hospital-adapted lineage is rapidly evolving and emerged approximately 75 years ago, concomitant with the introduction of antibiotics, from a population that included the majority of animal strains, and not from human commensal lines. We further found that the lineage that included most strains of animal origin diverged from the main human commensal line approximately 3,000 years ago, a time that corresponds to increasing urbanization of humans, development of hygienic practices, and domestication of animals, which we speculate contributed to their ecological separation. Each bifurcation was accompanied by the acquisition of new metabolic capabilities and colonization traits on mobile elements and the loss of function and genome remodeling associated with mobile element insertion and movement. As a result, diversity within the species, in terms of sequence divergence as well as gene content, spans a range usually associated with speciation.

Research paper thumbnail of Identification of 50 class D β-lactamases and 65 Acinetobacter-derived cephalosporinases in Acinetobacter spp

Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification... more Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification of 50 Class D http://aac.asm.org/content/58/2/936 Updated information and services can be found at:

Research paper thumbnail of 3S-Ca01 The Aspergillus Genome Database: Integrating a Wealth of Aspergillus Omics Data

Research paper thumbnail of Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundre... more Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

Research paper thumbnail of Genomics of Loa loa, a Wolbachia-free filarial parasite of humans

Research paper thumbnail of DAGchainer: a tool for mining segmental genome duplications and synteny

The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site... more The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit this interdependence to aid motif discovery, we extend the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space. We then combine the model sampling step with the Gibbs sampling framework for de novo motif discoveries. Results: Testing on experimentally validated binding sites, we find that about 25% of the transcription factor binding motifs show significant within-site position correlations, and 80% of these motif models can be improved by considering the correlated positions. Using both simulated data and real promoter sequences, we show that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms.

Research paper thumbnail of Unraveling the genomic diversity of small eukaryotes

Research paper thumbnail of Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum

Nature Biotechnology, 2008

Research paper thumbnail of Evolution of extensively drug-resistant tuberculosis over four decades revealed by whole genome sequencing of Mycobacterium tuberculosis from KwaZulu-Natal, South Africa

International Journal of Mycobacteriology, 2015

I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 ... more I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 5 H O S T E D BY Av ai la bl e at w w w . s c i e n c e di r e c t .c om ScienceDirect j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / I J M Y C O

Research paper thumbnail of Illumina identification of RsrA, a conserved C2H2 transcription factor coordinating the NapA mediated oxidative stress signaling pathway in Aspergillus

BMC Genomics, 2014

Background: Chemical mutagenesis screens are useful to identify mutants involved in biological pr... more Background: Chemical mutagenesis screens are useful to identify mutants involved in biological processes of interest. Identifying the mutation from such screens, however, often fails when using methodologies involving transformation of the mutant to wild type phenotype with DNA libraries.

Research paper thumbnail of The genomic diversification of the whole Acinetobacter genus: origins, mechanisms, and consequences

Genome biology and evolution, 2014

Bacterial genomics has greatly expanded our understanding of microdiversification patterns within... more Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surpr...

Research paper thumbnail of Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily

Bioinformatics, 2013

Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspec... more Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspects eukaryotic cellular behavior and have provided several drug targets including kinases dysregulated in cancers. The rapid increase in the number of genomic sequences has created an acute need to identify and classify members of this important class of enzymes efficiently and accurately. Results: Kinannote produces a draft kinome and comparative analyses for a predicted proteome using a single line command, and it is currently the only tool that automatically classifies protein kinases using the controlled vocabulary of Hanks and Hunter . A hidden Markov model in combination with a position-specific scoring matrix is used by Kinannote to identify kinases, which are subsequently classified using a BLAST comparison with a local version of KinBase, the curated protein kinase dataset from www.kinase.com. Kinannote was tested on the predicted proteomes from four divergent species. The average sensitivity and precision for kinome retrieval from the test species are 94.4 and 96.8%. The ability of Kinannote to classify identified kinases was also evaluated, and the average sensitivity and precision for full classification of conserved kinases are 71.5 and 82.5%, respectively. Kinannote has had a significant impact on eukaryotic genome annotation, providing protein kinase annotations for 36 genomes made public by the Broad Institute in the period spanning 2009 to the present. Availability: Kinannote is freely available at http://sourceforge.net/pro jects/kinannote.

Research paper thumbnail of Genomics of Aspergillus fumigatus

Revista Iberoamericana De Micologia, 2005

Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. I... more Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. It is also a human pathogen and induces allergenic response, negatively impacting health care and associated costs significantly around the world. Much of the basic biology of this organism is only poorly understood, but the recent completion and publication of its genome sequence provides an excellent tool for researchers to gain insight into these processes. In this review we will summarize some of the more salient features revealed by analysis of the genome, including the search for candidate pathogenicity genes and the switch to a pathogenic lifestyle, allergen proteins, DNA repair, secondary metabolite gene clusters that produce compounds both useful and toxic, a theoretical capability of this asexual organism to reproduce sexually, signalling, and transcription. A. fumigatus was compared with the food biotechnology fungus Aspergillus oryzae and sexual fungus Aspergillus nidulans, as well as other fungi, in an attempt to discern key differences between these organisms.

Research paper thumbnail of Sequencing and comparative analysis of Aspergillus nidulans

Research paper thumbnail of Structural, Functional, and Comparative Annotation of Plant Genomes

Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-... more Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-Nissen, Jennifer Wortman, C. Robin Buell, and Wei Zhu Abstract While genome sequencing technologies have advanced ...

Research paper thumbnail of O24-1

Research paper thumbnail of Annotation of the rice genome

ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to m... more ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to maximally use the sequences, quality annotation of the genes and genome features is necessary. The process of annotation is iterative in nature and requires the application and refinement of computational tools coupled with manual curation and evalutation. We are funded by the U.S. National Science Foundation to annotate the rice genome and have constructed pseudomolecules for the 12 Oryza sativa subspecies japonica var. Nipponbare chromosomes, which are publicly available through our project Web site (http://rice.tigr.org). We identified genes, gene models, and other annotation features in the rice genome. We expanded our annotation features to include a rice transcript assembly and its alignment with the rice genome, small noncoding RNAs, simple sequence repeats, as well as single nucleotide polymorphisms and insertions/deletions based on alignment with the indica subspecies. We updated our Oryza repeat database, which has allowed us to better quantify the repetitive sequences within the rice genome, which total 29% of the genome. To assist users in accessing the genome and our annotation, we expanded the content and functions of our Rice Genome Browser such that it supports 37 annotation tracks and data downloads of the underlying annotation data in various formats.

Research paper thumbnail of The Human Microbiome Project (HMP) and the Data Analysis and Coordination Center (DAAC) portal to the HMP (GSC 8 Meeting)

Research paper thumbnail of The Genome Sequence of Trypanosoma cruzi , Etiologic Agent of Chagas Disease

Science, 2005

Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid gen... more Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei , and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.

Research paper thumbnail of Revitalization of a Forward Genetic Screen Identifies Three New Regulators of Fungal Secondary Metabolism in the Genus Aspergillus

mBio

The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to ... more The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to aflatoxin’s carcinogenic properties and frequency as a food and feed contaminant. Significant progress has been made by utilizing the model organism Aspergillus nidulans to characterize the regulation of sterigmatocystin (ST), the penultimate precursor of aflatoxin. A previous forward genetic screen identified 23 A. nidulans mutants involved in regulating ST production. Six mutants were characterized from this screen using classical mapping (five mutations in mcsA ) and complementation with a cosmid library (one mutation in laeA ). The remaining mutants were backcrossed and sequenced using Illumina and Ion Torrent sequencing platforms. All but one mutant contained one or more sequence variants in predicted open reading frames. Deletion of these genes resulted in identification of mutant alleles responsible for the loss of ST production in 12 of the 17 remaining mutants. Eight of these mu...

Research paper thumbnail of Emergence of epidemic multidrug-resistant Enterococcus faecium from animal and commensal strains

Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-... more Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-resistant hospital-acquired infection in the 1980s. As the living record of its adaptation to changes in habitat, we sequenced the genomes of 51 strains, isolated from various ecological environments, to understand how E. faecium emerged as a leading hospital pathogen. Because of the scale and diversity of the sampled strains, we were able to resolve the lineage responsible for epidemic, multidrug-resistant human infection from other strains and to measure the evolutionary distances between groups. We found that the epidemic hospital-adapted lineage is rapidly evolving and emerged approximately 75 years ago, concomitant with the introduction of antibiotics, from a population that included the majority of animal strains, and not from human commensal lines. We further found that the lineage that included most strains of animal origin diverged from the main human commensal line approximately 3,000 years ago, a time that corresponds to increasing urbanization of humans, development of hygienic practices, and domestication of animals, which we speculate contributed to their ecological separation. Each bifurcation was accompanied by the acquisition of new metabolic capabilities and colonization traits on mobile elements and the loss of function and genome remodeling associated with mobile element insertion and movement. As a result, diversity within the species, in terms of sequence divergence as well as gene content, spans a range usually associated with speciation.

Research paper thumbnail of Identification of 50 class D β-lactamases and 65 Acinetobacter-derived cephalosporinases in Acinetobacter spp

Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification... more Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification of 50 Class D http://aac.asm.org/content/58/2/936 Updated information and services can be found at:

Research paper thumbnail of 3S-Ca01 The Aspergillus Genome Database: Integrating a Wealth of Aspergillus Omics Data

Research paper thumbnail of Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundre... more Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

Research paper thumbnail of Genomics of Loa loa, a Wolbachia-free filarial parasite of humans

Research paper thumbnail of DAGchainer: a tool for mining segmental genome duplications and synteny

The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site... more The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit this interdependence to aid motif discovery, we extend the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space. We then combine the model sampling step with the Gibbs sampling framework for de novo motif discoveries. Results: Testing on experimentally validated binding sites, we find that about 25% of the transcription factor binding motifs show significant within-site position correlations, and 80% of these motif models can be improved by considering the correlated positions. Using both simulated data and real promoter sequences, we show that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms.

Research paper thumbnail of Unraveling the genomic diversity of small eukaryotes

Research paper thumbnail of Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum

Nature Biotechnology, 2008

Research paper thumbnail of Evolution of extensively drug-resistant tuberculosis over four decades revealed by whole genome sequencing of Mycobacterium tuberculosis from KwaZulu-Natal, South Africa

International Journal of Mycobacteriology, 2015

I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 ... more I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 5 H O S T E D BY Av ai la bl e at w w w . s c i e n c e di r e c t .c om ScienceDirect j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / I J M Y C O

Research paper thumbnail of Illumina identification of RsrA, a conserved C2H2 transcription factor coordinating the NapA mediated oxidative stress signaling pathway in Aspergillus

BMC Genomics, 2014

Background: Chemical mutagenesis screens are useful to identify mutants involved in biological pr... more Background: Chemical mutagenesis screens are useful to identify mutants involved in biological processes of interest. Identifying the mutation from such screens, however, often fails when using methodologies involving transformation of the mutant to wild type phenotype with DNA libraries.

Research paper thumbnail of The genomic diversification of the whole Acinetobacter genus: origins, mechanisms, and consequences

Genome biology and evolution, 2014

Bacterial genomics has greatly expanded our understanding of microdiversification patterns within... more Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surpr...

Research paper thumbnail of Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily

Bioinformatics, 2013

Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspec... more Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspects eukaryotic cellular behavior and have provided several drug targets including kinases dysregulated in cancers. The rapid increase in the number of genomic sequences has created an acute need to identify and classify members of this important class of enzymes efficiently and accurately. Results: Kinannote produces a draft kinome and comparative analyses for a predicted proteome using a single line command, and it is currently the only tool that automatically classifies protein kinases using the controlled vocabulary of Hanks and Hunter . A hidden Markov model in combination with a position-specific scoring matrix is used by Kinannote to identify kinases, which are subsequently classified using a BLAST comparison with a local version of KinBase, the curated protein kinase dataset from www.kinase.com. Kinannote was tested on the predicted proteomes from four divergent species. The average sensitivity and precision for kinome retrieval from the test species are 94.4 and 96.8%. The ability of Kinannote to classify identified kinases was also evaluated, and the average sensitivity and precision for full classification of conserved kinases are 71.5 and 82.5%, respectively. Kinannote has had a significant impact on eukaryotic genome annotation, providing protein kinase annotations for 36 genomes made public by the Broad Institute in the period spanning 2009 to the present. Availability: Kinannote is freely available at http://sourceforge.net/pro jects/kinannote.

Research paper thumbnail of Genomics of Aspergillus fumigatus

Revista Iberoamericana De Micologia, 2005

Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. I... more Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. It is also a human pathogen and induces allergenic response, negatively impacting health care and associated costs significantly around the world. Much of the basic biology of this organism is only poorly understood, but the recent completion and publication of its genome sequence provides an excellent tool for researchers to gain insight into these processes. In this review we will summarize some of the more salient features revealed by analysis of the genome, including the search for candidate pathogenicity genes and the switch to a pathogenic lifestyle, allergen proteins, DNA repair, secondary metabolite gene clusters that produce compounds both useful and toxic, a theoretical capability of this asexual organism to reproduce sexually, signalling, and transcription. A. fumigatus was compared with the food biotechnology fungus Aspergillus oryzae and sexual fungus Aspergillus nidulans, as well as other fungi, in an attempt to discern key differences between these organisms.

Research paper thumbnail of Sequencing and comparative analysis of Aspergillus nidulans

Research paper thumbnail of Structural, Functional, and Comparative Annotation of Plant Genomes

Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-... more Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-Nissen, Jennifer Wortman, C. Robin Buell, and Wei Zhu Abstract While genome sequencing technologies have advanced ...

Research paper thumbnail of O24-1

Research paper thumbnail of Annotation of the rice genome

ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to m... more ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to maximally use the sequences, quality annotation of the genes and genome features is necessary. The process of annotation is iterative in nature and requires the application and refinement of computational tools coupled with manual curation and evalutation. We are funded by the U.S. National Science Foundation to annotate the rice genome and have constructed pseudomolecules for the 12 Oryza sativa subspecies japonica var. Nipponbare chromosomes, which are publicly available through our project Web site (http://rice.tigr.org). We identified genes, gene models, and other annotation features in the rice genome. We expanded our annotation features to include a rice transcript assembly and its alignment with the rice genome, small noncoding RNAs, simple sequence repeats, as well as single nucleotide polymorphisms and insertions/deletions based on alignment with the indica subspecies. We updated our Oryza repeat database, which has allowed us to better quantify the repetitive sequences within the rice genome, which total 29% of the genome. To assist users in accessing the genome and our annotation, we expanded the content and functions of our Rice Genome Browser such that it supports 37 annotation tracks and data downloads of the underlying annotation data in various formats.

Research paper thumbnail of The Human Microbiome Project (HMP) and the Data Analysis and Coordination Center (DAAC) portal to the HMP (GSC 8 Meeting)