Trace2PS and FSA2PS: Two software toolkits for converting trace and fsa files to PostScript format (original) (raw)

A software system for data analysis in automated DNA sequencing

Genome research, 1998

Software for gel image analysis and base-calling in fluorescence-based sequencing consisting of two primary programs, BaseFinder and GelImager, is described. BaseFinder is a framework for trace processing, analysis, and base-calling. BaseFinder is highly extensible, allowing the addition of trace analysis and processing modules without recompilation. Powerful scripting capabilities combined with modularity and multilane handling allow the user to customize BaseFinder to virtually any type of trace processing. We have developed an extensive set of data processing and analysis modules for use with the program in fluorescence-based sequencing. GelImager is a framework for gel image manipulation. It can be used for gel visualization, lane retracking, and as a front end to the Washington University Getlanes program. The programs were designed using a cross-platform development environment, currently allowing them to run in Windows NT, Windows 95, Openstep/Mach, and Rhapsody. Work is ongo...

A trace display and editing program for data from fluorescence based sequencing machines

Nucleic Acids Research, 1991

'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a standalone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.

ZTR: a new format for DNA sequence trace data

Bioinformatics (Oxford, England), 2002

To produce an open and extensible file format for DNA trace data which produces compact files suitable for large-scale storage and efficient use of internet bandwidth. We have created an extensible format named ZTR. For a set of data taken from an ABI-3700 the ZTR format produces trace files which require 61.6% of the disk space used by gzipped SCFv3, and which can be written and read at greater speed. The compression algorithms used for the trace amplitudes are used within the National Center for Biotechnology Information (NCBI) trace archive. lmb.cam.ac.uk/pub/staden/io_lib/test_data.

preAssemble: a tool for automatic sequencer trace data processing

BMC Bioinformatics, 2006

Background Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages – Phred and Staden are used by preAssemble to perform sequence quality processing. Results The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. Conclusion preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.

IMAS: The Interactive Multigenomic Analysis System

This paper introduces a new Visual Analysis tool named IMAS (Interactive Multigenomic Analysis System), which combines common analysis tools such as Glimmer, BLAST, and Clustal-W into a unified Visual Analytic framework. IMAS displays the primary DNA sequence being analyzed by the biologist in a highly interactive, zoomable visual display. The user may analyze the sequence in a number of ways, and visualize these analyses in a coherent, sequence aligned form, with all related analysis products grouped together. This enables the user to rapidly perform analyses of DNA sequences without the need for tedious and error-prone cutting and pasting of sequence data from text files to and from web-based databases and data analysis services, as is now common practice.

Challenges and requirements for an effective trace exploration tool

2004

Abstract Building efficient tools for the analysis and exploration of large execution traces can be a very challenging task. Our experience with building a tool called SEAT (software exploration and analysis tool) shows that there is a need to address several key research questions in order to overcome these challenges. SEAT is intended to integrate several filtering techniques to tackle the size explosion problem that make traces hard to understand.

Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation

BMC Bioinformatics, 2012

Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/ computational service provider. Furthermore, such services are often performed using specialized computational facilities. Results: We present a Windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim\_Filter/). Conclusion: The presented Windows-based application has been developed with the goal of providing individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.

WebTraceMiner: a web service for processing and mining EST sequence trace files

Nucleic Acids Research, 2007

Expressed sequence tags (ESTs) remain a dominant approach for characterizing the protein-encoding portions of various genomes. Due to inherent deficiencies, they also present serious challenges for data quality control. Before GenBank submission, EST sequences are typically screened and trimmed of vector and adapter/linker sequences, as well as polyA/T tails. Removal of these sequences presents an obstacle for data validation of errorprone ESTs and impedes data mining of certain functional motifs, whose detection relies on accurate annotation of positional information for polyA tails added posttranscriptionally. As raw DNA sequence information is made increasingly available from public repositories, such as NCBI Trace Archive, new tools will be necessary to reanalyze and mine this data for new information. WebTraceMiner (www.conifergdb.org/software/ wtm) was designed as a public sequence processing service for raw EST traces, with a focus on detection and mining of sequence features that help characterize 3 0 and 5 0 termini of cDNA inserts, including vector fragments, adapter/linker sequences, insert-flanking restriction endonuclease recognition sites and polyA or polyT tails. WebTraceMiner complements other public EST resources and should prove to be a unique tool to facilitate data validation and mining of error-prone ESTs (e.g. discovery of new functional motifs).

TraceVisReport2011.pdf

The analysis of execution traces can reveal important information about the behaviour of software. This information can in turn be used to help with a variety of software engineering applications including software maintenance, performance analysis, and software security. Traces, however, tend to be extremely large. Various visualization techniques have been proposed to help software engineers browse large traces in an effective manner. These techniques include the use of graphs, UML-like diagrams, metaphors, space-filling methods, and many more. Also, existing tools employ several interaction features. However, it is still not clear how these features and the visualization techniques compare to each other, what their advantages and limitations are, and how they can be better characterized to enable their reuse, and to prevent reinventing

Anvaya: A workflows environment for automated genome analysis

2012

Anvaya is a workflow environment for automated genome analysis that provides an interface for several bioinformatics tools and databases, loosely coupled together in a coordinated system, enabling the execution of a set of analyses tools in series or in parallel. It is a client-server workflow environment that has an advantage over existing software as it enables extensive pre & post processing of biological data in an efficient manner. \Anvaya" offers the user, novel functionalities to carry out exhaustive comparative analysis via \custom tools," which are tools with new functionality not available in standard tools, and \built-in PERL parsers," which automate data-flow between tools that hitherto, required manual intervention. It also provides a set of 11 pre-defined workflows for frequently used pipelines in genome annotation and comparative genomics ranging from EST assembly and annotation to phylogenetic reconstruction and microarray analysis. It provides a platform that serves as a single-stop solution for biologists to carry out hassle-free and comprehensive analysis, without being bothered about the nuances involved in tool installation, command line parameters, format conversions required to connect tools and manage/process multiple data sets at a single instance.