Wolfgang Gerlach - Academia.edu (original) (raw)
Uploads
Papers by Wolfgang Gerlach
Bookmarks Related papers MentionsView impact
2015 IEEE International Conference on Cloud Engineering, 2015
Bookmarks Related papers MentionsView impact
2013 IEEE International Conference on Big Data, 2013
Bookmarks Related papers MentionsView impact
Bioinformatics (Oxford, England), 2007
Suffix tree is one of the most fundamental data structures in string algorithms and biological se... more Suffix tree is one of the most fundamental data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet sigma = {A, C, G, T} can be stored in n log absolute value(sigma) = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We provide an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log) absolute value(sigma)) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix...
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST add... more The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.
Bookmarks Related papers MentionsView impact
PLoS computational biology, 2015
Metagenomic sequencing has produced significant amounts of data in recent years. For example, as ... more Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web serv...
Bookmarks Related papers MentionsView impact
PLoS ONE, 2014
Bookmarks Related papers MentionsView impact
Nucleic Acids Research, 2011
Bookmarks Related papers MentionsView impact
Journal of Experimental Algorithmics, 2009
Bookmarks Related papers MentionsView impact
Bioinformatics, 2007
Bookmarks Related papers MentionsView impact
Bioinformatics, 2006
Bookmarks Related papers MentionsView impact
BMC Bioinformatics, 2009
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
2015 IEEE International Conference on Cloud Engineering, 2015
Bookmarks Related papers MentionsView impact
2013 IEEE International Conference on Big Data, 2013
Bookmarks Related papers MentionsView impact
Bioinformatics (Oxford, England), 2007
Suffix tree is one of the most fundamental data structures in string algorithms and biological se... more Suffix tree is one of the most fundamental data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet sigma = {A, C, G, T} can be stored in n log absolute value(sigma) = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We provide an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log) absolute value(sigma)) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix...
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST add... more The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.
Bookmarks Related papers MentionsView impact
PLoS computational biology, 2015
Metagenomic sequencing has produced significant amounts of data in recent years. For example, as ... more Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web serv...
Bookmarks Related papers MentionsView impact
PLoS ONE, 2014
Bookmarks Related papers MentionsView impact
Nucleic Acids Research, 2011
Bookmarks Related papers MentionsView impact
Journal of Experimental Algorithmics, 2009
Bookmarks Related papers MentionsView impact
Bioinformatics, 2007
Bookmarks Related papers MentionsView impact
Bioinformatics, 2006
Bookmarks Related papers MentionsView impact
BMC Bioinformatics, 2009
Bookmarks Related papers MentionsView impact