Wolfgang Gerlach - Academia.edu (original) (raw)

Uploads

Papers by Wolfgang Gerlach

Bookmarks Related papers MentionsView impact

2015 IEEE International Conference on Cloud Engineering, 2015

Bookmarks Related papers MentionsView impact

2013 IEEE International Conference on Big Data, 2013

Bookmarks Related papers MentionsView impact

Bioinformatics (Oxford, England), 2007

Suffix tree is one of the most fundamental data structures in string algorithms and biological se... more Suffix tree is one of the most fundamental data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet sigma = {A, C, G, T} can be stored in n log absolute value(sigma) = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We provide an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log) absolute value(sigma)) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix...

Bookmarks Related papers MentionsView impact

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST add... more The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.

Bookmarks Related papers MentionsView impact

PLoS computational biology, 2015

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as ... more Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web serv...

Bookmarks Related papers MentionsView impact

PLoS ONE, 2014

Bookmarks Related papers MentionsView impact

Nucleic Acids Research, 2011

Bookmarks Related papers MentionsView impact

Journal of Experimental Algorithmics, 2009

Bookmarks Related papers MentionsView impact

Bioinformatics, 2007

Bookmarks Related papers MentionsView impact

Bioinformatics, 2006

Bookmarks Related papers MentionsView impact

BMC Bioinformatics, 2009

Bookmarks Related papers MentionsView impact

2015 IEEE International Conference on Cloud Engineering, 2015

Bookmarks Related papers MentionsView impact

2013 IEEE International Conference on Big Data, 2013

Bookmarks Related papers MentionsView impact

Bioinformatics (Oxford, England), 2007

Bookmarks Related papers MentionsView impact

PLoS computational biology, 2015

Bookmarks Related papers MentionsView impact

PLoS ONE, 2014

Bookmarks Related papers MentionsView impact

Nucleic Acids Research, 2011

Bookmarks Related papers MentionsView impact

Journal of Experimental Algorithmics, 2009

Bookmarks Related papers MentionsView impact

Bioinformatics, 2007

Bookmarks Related papers MentionsView impact

Bioinformatics, 2006

Bookmarks Related papers MentionsView impact

BMC Bioinformatics, 2009

Bookmarks Related papers MentionsView impact