A compression mechanism for sequence databases to improve the efficiency of conventional tools (original) (raw)

Journal Article

R. Doelz ,

Biocomputing, Basel University

Biozentrum, Klingelbergstrasse 70, CH-4056 Basel, Switzerland

Search for other works by this author on:

F. Eggenberger

Biocomputing, Basel University

Biozentrum, Klingelbergstrasse 70, CH-4056 Basel, Switzerland

1To whom correspondence should be addressed

Search for other works by this author on:

Received:

01 November 1994

Revision received:

11 November 1994

Accepted:

13 January 1995

Navbar Search Filter Mobile Enter search term Search

Abstract

This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program ‘compress’. Our tool improved the efficiency of ‘compress’ on average by 16%.

This content is only available as a PDF.