Jennifer Wortman - Academia.edu (original) (raw)
Papers by Jennifer Wortman
Science, 2005
Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid gen... more Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei , and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
mBio
The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to ... more The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to aflatoxin’s carcinogenic properties and frequency as a food and feed contaminant. Significant progress has been made by utilizing the model organism Aspergillus nidulans to characterize the regulation of sterigmatocystin (ST), the penultimate precursor of aflatoxin. A previous forward genetic screen identified 23 A. nidulans mutants involved in regulating ST production. Six mutants were characterized from this screen using classical mapping (five mutations in mcsA ) and complementation with a cosmid library (one mutation in laeA ). The remaining mutants were backcrossed and sequenced using Illumina and Ion Torrent sequencing platforms. All but one mutant contained one or more sequence variants in predicted open reading frames. Deletion of these genes resulted in identification of mutant alleles responsible for the loss of ST production in 12 of the 17 remaining mutants. Eight of these mu...
Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-... more Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-resistant hospital-acquired infection in the 1980s. As the living record of its adaptation to changes in habitat, we sequenced the genomes of 51 strains, isolated from various ecological environments, to understand how E. faecium emerged as a leading hospital pathogen. Because of the scale and diversity of the sampled strains, we were able to resolve the lineage responsible for epidemic, multidrug-resistant human infection from other strains and to measure the evolutionary distances between groups. We found that the epidemic hospital-adapted lineage is rapidly evolving and emerged approximately 75 years ago, concomitant with the introduction of antibiotics, from a population that included the majority of animal strains, and not from human commensal lines. We further found that the lineage that included most strains of animal origin diverged from the main human commensal line approximately 3,000 years ago, a time that corresponds to increasing urbanization of humans, development of hygienic practices, and domestication of animals, which we speculate contributed to their ecological separation. Each bifurcation was accompanied by the acquisition of new metabolic capabilities and colonization traits on mobile elements and the loss of function and genome remodeling associated with mobile element insertion and movement. As a result, diversity within the species, in terms of sequence divergence as well as gene content, spans a range usually associated with speciation.
Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification... more Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification of 50 Class D http://aac.asm.org/content/58/2/936 Updated information and services can be found at:
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundre... more Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site... more The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit this interdependence to aid motif discovery, we extend the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space. We then combine the model sampling step with the Gibbs sampling framework for de novo motif discoveries. Results: Testing on experimentally validated binding sites, we find that about 25% of the transcription factor binding motifs show significant within-site position correlations, and 80% of these motif models can be improved by considering the correlated positions. Using both simulated data and real promoter sequences, we show that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms.
Nature Biotechnology, 2008
International Journal of Mycobacteriology, 2015
I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 ... more I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 5 H O S T E D BY Av ai la bl e at w w w . s c i e n c e di r e c t .c om ScienceDirect j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / I J M Y C O
BMC Genomics, 2014
Background: Chemical mutagenesis screens are useful to identify mutants involved in biological pr... more Background: Chemical mutagenesis screens are useful to identify mutants involved in biological processes of interest. Identifying the mutation from such screens, however, often fails when using methodologies involving transformation of the mutant to wild type phenotype with DNA libraries.
Genome biology and evolution, 2014
Bacterial genomics has greatly expanded our understanding of microdiversification patterns within... more Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surpr...
Bioinformatics, 2013
Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspec... more Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspects eukaryotic cellular behavior and have provided several drug targets including kinases dysregulated in cancers. The rapid increase in the number of genomic sequences has created an acute need to identify and classify members of this important class of enzymes efficiently and accurately. Results: Kinannote produces a draft kinome and comparative analyses for a predicted proteome using a single line command, and it is currently the only tool that automatically classifies protein kinases using the controlled vocabulary of Hanks and Hunter . A hidden Markov model in combination with a position-specific scoring matrix is used by Kinannote to identify kinases, which are subsequently classified using a BLAST comparison with a local version of KinBase, the curated protein kinase dataset from www.kinase.com. Kinannote was tested on the predicted proteomes from four divergent species. The average sensitivity and precision for kinome retrieval from the test species are 94.4 and 96.8%. The ability of Kinannote to classify identified kinases was also evaluated, and the average sensitivity and precision for full classification of conserved kinases are 71.5 and 82.5%, respectively. Kinannote has had a significant impact on eukaryotic genome annotation, providing protein kinase annotations for 36 genomes made public by the Broad Institute in the period spanning 2009 to the present. Availability: Kinannote is freely available at http://sourceforge.net/pro jects/kinannote.
Revista Iberoamericana De Micologia, 2005
Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. I... more Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. It is also a human pathogen and induces allergenic response, negatively impacting health care and associated costs significantly around the world. Much of the basic biology of this organism is only poorly understood, but the recent completion and publication of its genome sequence provides an excellent tool for researchers to gain insight into these processes. In this review we will summarize some of the more salient features revealed by analysis of the genome, including the search for candidate pathogenicity genes and the switch to a pathogenic lifestyle, allergen proteins, DNA repair, secondary metabolite gene clusters that produce compounds both useful and toxic, a theoretical capability of this asexual organism to reproduce sexually, signalling, and transcription. A. fumigatus was compared with the food biotechnology fungus Aspergillus oryzae and sexual fungus Aspergillus nidulans, as well as other fungi, in an attempt to discern key differences between these organisms.
Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-... more Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-Nissen, Jennifer Wortman, C. Robin Buell, and Wei Zhu Abstract While genome sequencing technologies have advanced ...
ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to m... more ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to maximally use the sequences, quality annotation of the genes and genome features is necessary. The process of annotation is iterative in nature and requires the application and refinement of computational tools coupled with manual curation and evalutation. We are funded by the U.S. National Science Foundation to annotate the rice genome and have constructed pseudomolecules for the 12 Oryza sativa subspecies japonica var. Nipponbare chromosomes, which are publicly available through our project Web site (http://rice.tigr.org). We identified genes, gene models, and other annotation features in the rice genome. We expanded our annotation features to include a rice transcript assembly and its alignment with the rice genome, small noncoding RNAs, simple sequence repeats, as well as single nucleotide polymorphisms and insertions/deletions based on alignment with the indica subspecies. We updated our Oryza repeat database, which has allowed us to better quantify the repetitive sequences within the rice genome, which total 29% of the genome. To assist users in accessing the genome and our annotation, we expanded the content and functions of our Rice Genome Browser such that it supports 37 annotation tracks and data downloads of the underlying annotation data in various formats.
Science, 2005
Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid gen... more Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei , and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
mBio
The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to ... more The study of aflatoxin in Aspergillus spp. has garnered the attention of many researchers due to aflatoxin’s carcinogenic properties and frequency as a food and feed contaminant. Significant progress has been made by utilizing the model organism Aspergillus nidulans to characterize the regulation of sterigmatocystin (ST), the penultimate precursor of aflatoxin. A previous forward genetic screen identified 23 A. nidulans mutants involved in regulating ST production. Six mutants were characterized from this screen using classical mapping (five mutations in mcsA ) and complementation with a cosmid library (one mutation in laeA ). The remaining mutants were backcrossed and sequenced using Illumina and Ion Torrent sequencing platforms. All but one mutant contained one or more sequence variants in predicted open reading frames. Deletion of these genes resulted in identification of mutant alleles responsible for the loss of ST production in 12 of the 17 remaining mutants. Eight of these mu...
Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-... more Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-resistant hospital-acquired infection in the 1980s. As the living record of its adaptation to changes in habitat, we sequenced the genomes of 51 strains, isolated from various ecological environments, to understand how E. faecium emerged as a leading hospital pathogen. Because of the scale and diversity of the sampled strains, we were able to resolve the lineage responsible for epidemic, multidrug-resistant human infection from other strains and to measure the evolutionary distances between groups. We found that the epidemic hospital-adapted lineage is rapidly evolving and emerged approximately 75 years ago, concomitant with the introduction of antibiotics, from a population that included the majority of animal strains, and not from human commensal lines. We further found that the lineage that included most strains of animal origin diverged from the main human commensal line approximately 3,000 years ago, a time that corresponds to increasing urbanization of humans, development of hygienic practices, and domestication of animals, which we speculate contributed to their ecological separation. Each bifurcation was accompanied by the acquisition of new metabolic capabilities and colonization traits on mobile elements and the loss of function and genome remodeling associated with mobile element insertion and movement. As a result, diversity within the species, in terms of sequence divergence as well as gene content, spans a range usually associated with speciation.
Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification... more Cephalosporinases in Acinetobacter spp. and 65 Acinetobacter-Derived -Lactamases β Identification of 50 Class D http://aac.asm.org/content/58/2/936 Updated information and services can be found at:
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundre... more Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site... more The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit this interdependence to aid motif discovery, we extend the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space. We then combine the model sampling step with the Gibbs sampling framework for de novo motif discoveries. Results: Testing on experimentally validated binding sites, we find that about 25% of the transcription factor binding motifs show significant within-site position correlations, and 80% of these motif models can be improved by considering the correlated positions. Using both simulated data and real promoter sequences, we show that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms.
Nature Biotechnology, 2008
International Journal of Mycobacteriology, 2015
I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 ... more I n t e r n a t i o n a l J o u r n a l o f M y c o b a c t e r i o l o g y 4 ( 2 0 1 5 ) 2 4 -2 5 H O S T E D BY Av ai la bl e at w w w . s c i e n c e di r e c t .c om ScienceDirect j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / I J M Y C O
BMC Genomics, 2014
Background: Chemical mutagenesis screens are useful to identify mutants involved in biological pr... more Background: Chemical mutagenesis screens are useful to identify mutants involved in biological processes of interest. Identifying the mutation from such screens, however, often fails when using methodologies involving transformation of the mutant to wild type phenotype with DNA libraries.
Genome biology and evolution, 2014
Bacterial genomics has greatly expanded our understanding of microdiversification patterns within... more Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surpr...
Bioinformatics, 2013
Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspec... more Motivation: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspects eukaryotic cellular behavior and have provided several drug targets including kinases dysregulated in cancers. The rapid increase in the number of genomic sequences has created an acute need to identify and classify members of this important class of enzymes efficiently and accurately. Results: Kinannote produces a draft kinome and comparative analyses for a predicted proteome using a single line command, and it is currently the only tool that automatically classifies protein kinases using the controlled vocabulary of Hanks and Hunter . A hidden Markov model in combination with a position-specific scoring matrix is used by Kinannote to identify kinases, which are subsequently classified using a BLAST comparison with a local version of KinBase, the curated protein kinase dataset from www.kinase.com. Kinannote was tested on the predicted proteomes from four divergent species. The average sensitivity and precision for kinome retrieval from the test species are 94.4 and 96.8%. The ability of Kinannote to classify identified kinases was also evaluated, and the average sensitivity and precision for full classification of conserved kinases are 71.5 and 82.5%, respectively. Kinannote has had a significant impact on eukaryotic genome annotation, providing protein kinase annotations for 36 genomes made public by the Broad Institute in the period spanning 2009 to the present. Availability: Kinannote is freely available at http://sourceforge.net/pro jects/kinannote.
Revista Iberoamericana De Micologia, 2005
Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. I... more Aspergillus fumigatus is a filamentous fungal saprophyte that is ubiquitous in the environment. It is also a human pathogen and induces allergenic response, negatively impacting health care and associated costs significantly around the world. Much of the basic biology of this organism is only poorly understood, but the recent completion and publication of its genome sequence provides an excellent tool for researchers to gain insight into these processes. In this review we will summarize some of the more salient features revealed by analysis of the genome, including the search for candidate pathogenicity genes and the switch to a pathogenic lifestyle, allergen proteins, DNA repair, secondary metabolite gene clusters that produce compounds both useful and toxic, a theoretical capability of this asexual organism to reproduce sexually, signalling, and transcription. A. fumigatus was compared with the food biotechnology fungus Aspergillus oryzae and sexual fungus Aspergillus nidulans, as well as other fungi, in an attempt to discern key differences between these organisms.
Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-... more Page 1. 18 Structural, Functional, and Comparative Annotation of Plant Genomes Françoise Thibaud-Nissen, Jennifer Wortman, C. Robin Buell, and Wei Zhu Abstract While genome sequencing technologies have advanced ...
ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to m... more ABSTRACT A high-quality finished sequence of the rice genome was completed in 2005. However, to maximally use the sequences, quality annotation of the genes and genome features is necessary. The process of annotation is iterative in nature and requires the application and refinement of computational tools coupled with manual curation and evalutation. We are funded by the U.S. National Science Foundation to annotate the rice genome and have constructed pseudomolecules for the 12 Oryza sativa subspecies japonica var. Nipponbare chromosomes, which are publicly available through our project Web site (http://rice.tigr.org). We identified genes, gene models, and other annotation features in the rice genome. We expanded our annotation features to include a rice transcript assembly and its alignment with the rice genome, small noncoding RNAs, simple sequence repeats, as well as single nucleotide polymorphisms and insertions/deletions based on alignment with the indica subspecies. We updated our Oryza repeat database, which has allowed us to better quantify the repetitive sequences within the rice genome, which total 29% of the genome. To assist users in accessing the genome and our annotation, we expanded the content and functions of our Rice Genome Browser such that it supports 37 annotation tracks and data downloads of the underlying annotation data in various formats.