A compression mechanism for sequence databases to improve the efficiency of conventional tools (original) (raw)
Journal Article
R. Doelz ,
Biocomputing, Basel University
Biozentrum, Klingelbergstrasse 70, CH-4056 Basel, Switzerland
Search for other works by this author on:
Biocomputing, Basel University
Biozentrum, Klingelbergstrasse 70, CH-4056 Basel, Switzerland
1To whom correspondence should be addressed
Search for other works by this author on:
Received:
01 November 1994
Revision received:
11 November 1994
Accepted:
13 January 1995
Navbar Search Filter Mobile Enter search term Search
Abstract
This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program ‘compress’. Our tool improved the efficiency of ‘compress’ on average by 16%.
This content is only available as a PDF.
© Oxford University Press