João Setubal - Academia.edu (original) (raw)
Papers by João Setubal
Annual Review of Phytopathology, 2002
This review deals with a comparative analysis of seven genome sequences from plant-associated bac... more This review deals with a comparative analysis of seven genome sequences from plant-associated bacteria. These are the genomes of Agrobacterium tumefaciens, Mesorhizobium loti, Sinorhizobium meliloti, Xanthomonas campestris pv campestris, Xanthomonas axonopodis pv citri, Xylella fastidiosa, and Ralstonia solanacearum. Genome structure and the metabolism pathways available highlight the compromise between the genome size and lifestyle. Despite the recognized importance of the type III secretion system in controlling host compatibility, its presence is not universal in all necrogenic pathogens. Hemolysins, hemagglutinins, and some adhesins, previously reported only for mammalian pathogens, are present in most organisms discussed. Different numbers and combinations of cell wall degrading enzymes and genes to overcome the oxidative burst generally induced by the plant host are characterized in these genomes. A total of 19 genes not involved in housekeeping functions were found common to all these bacteria.
Abstract We describe a program that builds contig scaffolds from contig assemblies, to be used in... more Abstract We describe a program that builds contig scaffolds from contig assemblies, to be used in a whole-genome sequencing project. Our program builds scaffolds based on forward/reverse pair information (both from small clones, such as plasmids, and from large ...
We present experimental results for 3 bipartite matching algorithmson three classes of sparse gra... more We present experimental results for 3 bipartite matching algorithmson three classes of sparse graphs. Goldberg's maximumflow algorithm [Gol87, GT88b], specialized for unweighted bipartitegraphs, is the most robust algorithm, being the fastest by asignificant margin in two of the classes and competitive in the otherone. The other two algorithms are Hopcroft and Karp's [HK73] andAlt, Blum, Mehlhorn, and Paul's [ABMP91], and
We present experimental results for four bipartite matching algorithms on 11 classesof graphs. Th... more We present experimental results for four bipartite matching algorithms on 11 classesof graphs. The algorithms are depth-first search (dfs), breadth-first search (bfs), thepush-relabel algorithm [GT88b], and the algorithm by Alt, Blum, Mehlhorn, and Paul(abmp) [ABMP91]. dfs was thought to be a good choice for bipartite matching but ourresults show that, depending on the input graph, it can have very poor
ACM Journal of Experimental Algorithms, 1998
We conduct a computational study of unit capacity flow and bipartite matching algorithms. Our goa... more We conduct a computational study of unit capacity flow and bipartite matching algorithms. Our goal is to determine which variant of the push-relabel method is most efficient in practice and to compare push-relabel algorithms with augmenting path algorithms. We have implemented and compared three push-relabel algorithms, three augmenting-path algorithms (one of which is new), and one augment-relabel algorithm. The depth-first
Brazilian Workshop on Bioinformatics, 2004
The present study reports the identification of immune related transcripts from hemocytes of the ... more The present study reports the identification of immune related transcripts from hemocytes of the spider Acanthoscurria gomesiana by high throughput sequencing of expressed sequence tags (ESTs). To generate ESTs from hemocytes, two cDNA libraries were prepared: one by directional cloning (primary) and the other by the normalization of the first (normalized). A total of 7584 clones were sequenced and the identical ESTs were clustered, resulting in 3723 assembled sequences (AS). At least 20% of these sequences are putative novel genes. The automatic functional annotation of AS based on Gene Ontology revealed several abundant transcripts related to the following functional classes: hemocyanin, lectin, and structural constituents of ribosome and cytoskeleton. From this annotation, 73 transcripts possibly involved in immune response were also identified, suggesting the existence of several molecular processes not previously described for spiders, such as: pathogen recognition, coagulation, complement activation, cell adhesion and intracellular signaling pathway for the activation of cellular defenses. q
Brazilian Workshop on Bioinformatics, 2003
Microbiology (Reading, England), 2006
Lipoproteins are of great interest in understanding the molecular pathogenesis of spirochaetes. B... more Lipoproteins are of great interest in understanding the molecular pathogenesis of spirochaetes. Because spirochaete lipobox sequences exhibit more plasticity than those of other bacteria, application of existing prediction algorithms to emerging sequence data has been problematic. In this paper a novel lipoprotein prediction algorithm is described, designated SpLip, constructed as a hybrid of a lipobox weight matrix approach supplemented by a set of lipoprotein signal peptide rules allowing for conservative amino acid substitutions. Both the weight matrix and the rules are based on a training set of 28 experimentally verified spirochaetal lipoproteins. The performance of the SpLip algorithm was compared to that of the hidden Markov model-based LipoP program and the rules-based algorithm Psort for all predicted protein-coding genes of Leptospira interrogans sv. Copenhageni, L. interrogans sv. Lai, Borrelia burgdorferi, Borrelia garinii, Treponema pallidum and Treponema denticola. Pso...
The ISME journal, Jan 17, 2015
Understanding the evolutionary history and potential of bacterial pathogens is critical to preven... more Understanding the evolutionary history and potential of bacterial pathogens is critical to prevent the emergence of new infectious bacterial diseases. Xanthomonas axonopodis subsp. citri (Xac) (synonym X. citri subsp. citri), which causes citrus canker, is one of the hardest-fought plant bacterial pathogens in US history. Here, we sequenced 21 Xac strains (14 XacA, 3 XacA* and 4 XacA(w)) with different host ranges from North America and Asia and conducted comparative genomic and evolutionary analyses. Our analyses suggest that acquisition of beneficial genes and loss of detrimental genes most likely allowed XacA to infect a broader range of hosts as compared with XacA(w) and XacA*. Recombination was found to have occurred frequently on the relative ancient branches, but rarely on the young branches of the clonal genealogy. The ratio of recombination/mutation ρ/θ was 0.0790±0.0005, implying that the Xac population was clonal in structure. Positive selection has affected 14% (395 out ...
Frontiers in microbiology, 2014
Dramatic increases in research in the area of microbial biofuel production coupled with high-thro... more Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO) fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial ENergy processes Gene Ontology () project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of ...
Frontiers in Microbiology, 2014
Methane (CH 4 ) is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse g... more Methane (CH 4 ) is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse gas. Release of CH 4 into the atmosphere contributes to climate change. Biological CH 4 production or methanogenesis is mostly performed by methanogens, a group of strictly anaerobic archaea. The direct substrates for methanogenesis are H 2 plus CO 2 , acetate, formate, methylamines, methanol, methyl sulfides, and ethanol or a secondary alcohol plus CO 2 . In numerous anaerobic niches in nature, methanogenesis facilitates mineralization of complex biopolymers such as carbohydrates, lipids and proteins generated by primary producers. Thus, methanogens are critical players in the global carbon cycle. The same process is used in anaerobic treatment of municipal, industrial and agricultural wastes, reducing the biological pollutants in the wastes and generating methane. It also holds potential for commercial production of natural gas from renewable resources. This process operates in digestive systems of many animals, including cattle, and humans. In contrast, in deep-sea hydrothermal vents methanogenesis is a primary production process, allowing chemosynthesis of biomaterials from H 2 plus CO 2 . In this report we present Gene Ontology (GO) terms that can be used to describe processes, functions and cellular components involved in methanogenic biodegradation and biosynthesis of specialized coenzymes that methanogens use. Some of these GO terms were previously available and the rest were generated in our Microbial Energy Gene Ontology (MENGO) project. A recently discovered non-canonical CH 4 production process is also described. We have performed manual GO annotation of selected methanogenesis genes, based on experimental evidence, providing "gold standards" for machine annotation and automated discovery of methanogenesis genes or systems in diverse genomes. Most of the GO-related information presented in this report is available at the MENGO website (http://www.mengo.biochem.vt.edu/).
Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures - SPAA '92, 1992
We describe an efficient parallel implementation of Goldberg’s maximum flow algorithm for a share... more We describe an efficient parallel implementation of Goldberg’s maximum flow algorithm for a shared-memory multiprocessor. Our main technical innovation is a method that allows a “global relabeling” heuristic to be executed concurrently with the main algorithm; this heuristic is essential for good performance in practice. We present performance results from a Sequent Symmetry for a variety of input distributions. We
Both authors contributed equally to this work.
PloS one, 2014
Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight ... more Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight into the composition of marine microbial populations and developing a better understanding of the interplay between the functional genome content of microbial communities and abiotic factors. Here we present a comprehensive analysis of 24 datasets covering surface and depth-related environments at 11 sites around the world's oceans. The complete datasets comprises approximately 12 million sequences, totaling 5,358 Mb. Based on profiling patterns of Clusters of Orthologous Groups (COGs) of proteins, a core set of reference photic and aphotic depth-related COGs, and a collection of COGs that are associated with extreme oxygen limitation were defined. Their inferred functions were utilized as indicators to characterize the distribution of light- and oxygen-related biological activities in marine environments. The results reveal that, while light level in the water column is a major dete...
BMC microbiology, Jan 3, 2014
BackgroundToday there are more than 2 billion alcohol users and about 1.3 billion tobacco users w... more BackgroundToday there are more than 2 billion alcohol users and about 1.3 billion tobacco users worldwide. The chronic and heavy use of these two substances is at the heart of numerous diseases and may wreak havoc on the human oral microbiome. This study delves into the changes that alcohol and tobacco may cause on biofilms of the human oral microbiome. To do so, we used swabs to sample the oral biofilm of 22 subjects; including 9 control-individuals with no or very low consumption of alcohol and no consumption of tobacco, 7 who were chronic and heavy users of both substances and 6 active smokers that reported no significant alcohol consumption. DNA was extracted from swabs and the V1 region of the 16S rRNA gene was PCR amplified and sequenced using the Ion Torrent PGM platform, generating 3.7 million high quality reads. DNA sequences were clustered and OTUs were assigned using the ARB SILVA database and Qiime.ResultsWe found no differences in species diversity and evenness among th...
Annual Review of Phytopathology, 2002
This review deals with a comparative analysis of seven genome sequences from plant-associated bac... more This review deals with a comparative analysis of seven genome sequences from plant-associated bacteria. These are the genomes of Agrobacterium tumefaciens, Mesorhizobium loti, Sinorhizobium meliloti, Xanthomonas campestris pv campestris, Xanthomonas axonopodis pv citri, Xylella fastidiosa, and Ralstonia solanacearum. Genome structure and the metabolism pathways available highlight the compromise between the genome size and lifestyle. Despite the recognized importance of the type III secretion system in controlling host compatibility, its presence is not universal in all necrogenic pathogens. Hemolysins, hemagglutinins, and some adhesins, previously reported only for mammalian pathogens, are present in most organisms discussed. Different numbers and combinations of cell wall degrading enzymes and genes to overcome the oxidative burst generally induced by the plant host are characterized in these genomes. A total of 19 genes not involved in housekeeping functions were found common to all these bacteria.
Abstract We describe a program that builds contig scaffolds from contig assemblies, to be used in... more Abstract We describe a program that builds contig scaffolds from contig assemblies, to be used in a whole-genome sequencing project. Our program builds scaffolds based on forward/reverse pair information (both from small clones, such as plasmids, and from large ...
We present experimental results for 3 bipartite matching algorithmson three classes of sparse gra... more We present experimental results for 3 bipartite matching algorithmson three classes of sparse graphs. Goldberg's maximumflow algorithm [Gol87, GT88b], specialized for unweighted bipartitegraphs, is the most robust algorithm, being the fastest by asignificant margin in two of the classes and competitive in the otherone. The other two algorithms are Hopcroft and Karp's [HK73] andAlt, Blum, Mehlhorn, and Paul's [ABMP91], and
We present experimental results for four bipartite matching algorithms on 11 classesof graphs. Th... more We present experimental results for four bipartite matching algorithms on 11 classesof graphs. The algorithms are depth-first search (dfs), breadth-first search (bfs), thepush-relabel algorithm [GT88b], and the algorithm by Alt, Blum, Mehlhorn, and Paul(abmp) [ABMP91]. dfs was thought to be a good choice for bipartite matching but ourresults show that, depending on the input graph, it can have very poor
ACM Journal of Experimental Algorithms, 1998
We conduct a computational study of unit capacity flow and bipartite matching algorithms. Our goa... more We conduct a computational study of unit capacity flow and bipartite matching algorithms. Our goal is to determine which variant of the push-relabel method is most efficient in practice and to compare push-relabel algorithms with augmenting path algorithms. We have implemented and compared three push-relabel algorithms, three augmenting-path algorithms (one of which is new), and one augment-relabel algorithm. The depth-first
Brazilian Workshop on Bioinformatics, 2004
The present study reports the identification of immune related transcripts from hemocytes of the ... more The present study reports the identification of immune related transcripts from hemocytes of the spider Acanthoscurria gomesiana by high throughput sequencing of expressed sequence tags (ESTs). To generate ESTs from hemocytes, two cDNA libraries were prepared: one by directional cloning (primary) and the other by the normalization of the first (normalized). A total of 7584 clones were sequenced and the identical ESTs were clustered, resulting in 3723 assembled sequences (AS). At least 20% of these sequences are putative novel genes. The automatic functional annotation of AS based on Gene Ontology revealed several abundant transcripts related to the following functional classes: hemocyanin, lectin, and structural constituents of ribosome and cytoskeleton. From this annotation, 73 transcripts possibly involved in immune response were also identified, suggesting the existence of several molecular processes not previously described for spiders, such as: pathogen recognition, coagulation, complement activation, cell adhesion and intracellular signaling pathway for the activation of cellular defenses. q
Brazilian Workshop on Bioinformatics, 2003
Microbiology (Reading, England), 2006
Lipoproteins are of great interest in understanding the molecular pathogenesis of spirochaetes. B... more Lipoproteins are of great interest in understanding the molecular pathogenesis of spirochaetes. Because spirochaete lipobox sequences exhibit more plasticity than those of other bacteria, application of existing prediction algorithms to emerging sequence data has been problematic. In this paper a novel lipoprotein prediction algorithm is described, designated SpLip, constructed as a hybrid of a lipobox weight matrix approach supplemented by a set of lipoprotein signal peptide rules allowing for conservative amino acid substitutions. Both the weight matrix and the rules are based on a training set of 28 experimentally verified spirochaetal lipoproteins. The performance of the SpLip algorithm was compared to that of the hidden Markov model-based LipoP program and the rules-based algorithm Psort for all predicted protein-coding genes of Leptospira interrogans sv. Copenhageni, L. interrogans sv. Lai, Borrelia burgdorferi, Borrelia garinii, Treponema pallidum and Treponema denticola. Pso...
The ISME journal, Jan 17, 2015
Understanding the evolutionary history and potential of bacterial pathogens is critical to preven... more Understanding the evolutionary history and potential of bacterial pathogens is critical to prevent the emergence of new infectious bacterial diseases. Xanthomonas axonopodis subsp. citri (Xac) (synonym X. citri subsp. citri), which causes citrus canker, is one of the hardest-fought plant bacterial pathogens in US history. Here, we sequenced 21 Xac strains (14 XacA, 3 XacA* and 4 XacA(w)) with different host ranges from North America and Asia and conducted comparative genomic and evolutionary analyses. Our analyses suggest that acquisition of beneficial genes and loss of detrimental genes most likely allowed XacA to infect a broader range of hosts as compared with XacA(w) and XacA*. Recombination was found to have occurred frequently on the relative ancient branches, but rarely on the young branches of the clonal genealogy. The ratio of recombination/mutation ρ/θ was 0.0790±0.0005, implying that the Xac population was clonal in structure. Positive selection has affected 14% (395 out ...
Frontiers in microbiology, 2014
Dramatic increases in research in the area of microbial biofuel production coupled with high-thro... more Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO) fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial ENergy processes Gene Ontology () project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of ...
Frontiers in Microbiology, 2014
Methane (CH 4 ) is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse g... more Methane (CH 4 ) is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse gas. Release of CH 4 into the atmosphere contributes to climate change. Biological CH 4 production or methanogenesis is mostly performed by methanogens, a group of strictly anaerobic archaea. The direct substrates for methanogenesis are H 2 plus CO 2 , acetate, formate, methylamines, methanol, methyl sulfides, and ethanol or a secondary alcohol plus CO 2 . In numerous anaerobic niches in nature, methanogenesis facilitates mineralization of complex biopolymers such as carbohydrates, lipids and proteins generated by primary producers. Thus, methanogens are critical players in the global carbon cycle. The same process is used in anaerobic treatment of municipal, industrial and agricultural wastes, reducing the biological pollutants in the wastes and generating methane. It also holds potential for commercial production of natural gas from renewable resources. This process operates in digestive systems of many animals, including cattle, and humans. In contrast, in deep-sea hydrothermal vents methanogenesis is a primary production process, allowing chemosynthesis of biomaterials from H 2 plus CO 2 . In this report we present Gene Ontology (GO) terms that can be used to describe processes, functions and cellular components involved in methanogenic biodegradation and biosynthesis of specialized coenzymes that methanogens use. Some of these GO terms were previously available and the rest were generated in our Microbial Energy Gene Ontology (MENGO) project. A recently discovered non-canonical CH 4 production process is also described. We have performed manual GO annotation of selected methanogenesis genes, based on experimental evidence, providing "gold standards" for machine annotation and automated discovery of methanogenesis genes or systems in diverse genomes. Most of the GO-related information presented in this report is available at the MENGO website (http://www.mengo.biochem.vt.edu/).
Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures - SPAA '92, 1992
We describe an efficient parallel implementation of Goldberg’s maximum flow algorithm for a share... more We describe an efficient parallel implementation of Goldberg’s maximum flow algorithm for a shared-memory multiprocessor. Our main technical innovation is a method that allows a “global relabeling” heuristic to be executed concurrently with the main algorithm; this heuristic is essential for good performance in practice. We present performance results from a Sequent Symmetry for a variety of input distributions. We
Both authors contributed equally to this work.
PloS one, 2014
Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight ... more Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight into the composition of marine microbial populations and developing a better understanding of the interplay between the functional genome content of microbial communities and abiotic factors. Here we present a comprehensive analysis of 24 datasets covering surface and depth-related environments at 11 sites around the world's oceans. The complete datasets comprises approximately 12 million sequences, totaling 5,358 Mb. Based on profiling patterns of Clusters of Orthologous Groups (COGs) of proteins, a core set of reference photic and aphotic depth-related COGs, and a collection of COGs that are associated with extreme oxygen limitation were defined. Their inferred functions were utilized as indicators to characterize the distribution of light- and oxygen-related biological activities in marine environments. The results reveal that, while light level in the water column is a major dete...
BMC microbiology, Jan 3, 2014
BackgroundToday there are more than 2 billion alcohol users and about 1.3 billion tobacco users w... more BackgroundToday there are more than 2 billion alcohol users and about 1.3 billion tobacco users worldwide. The chronic and heavy use of these two substances is at the heart of numerous diseases and may wreak havoc on the human oral microbiome. This study delves into the changes that alcohol and tobacco may cause on biofilms of the human oral microbiome. To do so, we used swabs to sample the oral biofilm of 22 subjects; including 9 control-individuals with no or very low consumption of alcohol and no consumption of tobacco, 7 who were chronic and heavy users of both substances and 6 active smokers that reported no significant alcohol consumption. DNA was extracted from swabs and the V1 region of the 16S rRNA gene was PCR amplified and sequenced using the Ion Torrent PGM platform, generating 3.7 million high quality reads. DNA sequences were clustered and OTUs were assigned using the ARB SILVA database and Qiime.ResultsWe found no differences in species diversity and evenness among th...