Wolfgang Gerlach - Academia.edu (original) (raw)

Uploads

Papers by Wolfgang Gerlach

Research paper thumbnail of Dynamic FM-Index for a Collection of Texts with Application to Space-ecient Construction of the

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Container Orchestration for Scientific Workflows

2015 IEEE International Conference on Cloud Engineering, 2015

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A scalable data analysis platform for metagenomics

2013 IEEE International Conference on Big Data, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compressed suffix tree--a basis for genome-scale sequence analysis

Bioinformatics (Oxford, England), 2007

Suffix tree is one of the most fundamental data structures in string algorithms and biological se... more Suffix tree is one of the most fundamental data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet sigma = {A, C, G, T} can be stored in n log absolute value(sigma) = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We provide an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log) absolute value(sigma)) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Building Scalable Data Management and Analysis Infrastructure for Metagenomics

Bookmarks Related papers MentionsView impact

Research paper thumbnail of MG-RAST Technical report and manual for version 3.3. 6–rev

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Workload characterization for MG-RAST metagenomic data analytics service in the cloud

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Skyport: container-based execution environment management for multi-cloud scientific workflows

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A metagenomics portal for a democratized sequencing world

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST add... more The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A RESTful API for Accessing Microbial Community Data for MG-RAST

PLoS computational biology, 2015

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as ... more Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web serv...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mycoplasma salivarium as a Dominant Coloniser of Fanconi Anaemia Associated Oral Carcinoma

PLoS ONE, 2014

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Taxonomic classification of metagenomic shotgun sequences with CARMA3

Nucleic Acids Research, 2011

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Engineering a compressed suffix tree implementation

Journal of Experimental Algorithmics, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compressed suffix tree a basis for genome-scale sequence analysis

Bioinformatics, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of GUUGle: a utility for fast exact matching under RNA complementary rules including G-U base pairing

Bioinformatics, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

BMC Bioinformatics, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Dynamic FM-Index for a Collection of Texts with Application to Space-ecient Construction of the

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Container Orchestration for Scientific Workflows

2015 IEEE International Conference on Cloud Engineering, 2015

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A scalable data analysis platform for metagenomics

2013 IEEE International Conference on Big Data, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compressed suffix tree--a basis for genome-scale sequence analysis

Bioinformatics (Oxford, England), 2007

Suffix tree is one of the most fundamental data structures in string algorithms and biological se... more Suffix tree is one of the most fundamental data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet sigma = {A, C, G, T} can be stored in n log absolute value(sigma) = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We provide an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log) absolute value(sigma)) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Building Scalable Data Management and Analysis Infrastructure for Metagenomics

Bookmarks Related papers MentionsView impact

Research paper thumbnail of MG-RAST Technical report and manual for version 3.3. 6–rev

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Workload characterization for MG-RAST metagenomic data analytics service in the cloud

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Skyport: container-based execution environment management for multi-cloud scientific workflows

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A metagenomics portal for a democratized sequencing world

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST add... more The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A RESTful API for Accessing Microbial Community Data for MG-RAST

PLoS computational biology, 2015

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as ... more Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web serv...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mycoplasma salivarium as a Dominant Coloniser of Fanconi Anaemia Associated Oral Carcinoma

PLoS ONE, 2014

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Taxonomic classification of metagenomic shotgun sequences with CARMA3

Nucleic Acids Research, 2011

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Engineering a compressed suffix tree implementation

Journal of Experimental Algorithmics, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compressed suffix tree a basis for genome-scale sequence analysis

Bioinformatics, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of GUUGle: a utility for fast exact matching under RNA complementary rules including G-U base pairing

Bioinformatics, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

BMC Bioinformatics, 2009

Bookmarks Related papers MentionsView impact