Automated finishing with autofinish - PubMed (original) (raw)

Comparative Study

Automated finishing with autofinish

D Gordon et al. Genome Res. 2001 Apr.

Abstract

Currently, the genome sequencing community is producing shotgun sequence data at a very high rate, but finishing (collecting additional directed sequence data to close gaps and improve the quality of the data) is not matching that rate. One reason for the difference is that shotgun sequencing is highly automated but finishing is not: Most finishing decisions, such as which directed reads to obtain and which specialized sequencing techniques to use, are made by people. If finishing rates are to increase to match shotgun sequencing rates, most finishing decisions also must be automated. The Autofinish computer program (which is part of the computer software package) does this by automatically choosing finishing reads. Autofinish is able to suggest most finishing reads required for completion of each sequencing project, greatly reducing the amount of human attention needed. sometimes completely finishes the project, with no human decisions required. It cannot solve the most complex problems, so we recommend that Autofinish be allowed to suggest reads for the first three rounds of finishing, and if the project still is not finished completely, a human finisher complete the work. We compared this Autofinish-Hybrid method of finishing against a human finisher in five different projects with a variety of shotgun depths by finishing each project twice--once with each method. This comparison shows that the Autofinish-Hybrid method saves many hours over a human finisher alone, while using roughly the same number and type of reads and closing gaps at roughly the same rate. Autofinish currently is in production use at several large sequencing centers. It is designed to be adaptable to the finishing strategy of the lab--it can finish using some or all of the following: resequencing reads, reverses, custom primer walks on either subclone templates or whole clone templates, PCR, or minilibraries. Autofinish has been used for finishing cDNA, genomic clones, and whole bacterial genomes (see http://www.phrap.org).

PubMed Disclaimer

Figures

Figure 1

Figure 1

Finishing procedures with

Autofinish

and with a human finisher.

Figure 2

Figure 2

Gap Closing:

Autofinish

-Hybrid vs. Human-only for five different BACs.

Figure 2

Figure 2

Gap Closing:

Autofinish

-Hybrid vs. Human-only for five different BACs.

Figure 2

Figure 2

Gap Closing:

Autofinish

-Hybrid vs. Human-only for five different BACs.

Figure 2

Figure 2

Gap Closing:

Autofinish

-Hybrid vs. Human-only for five different BACs.

Figure 2

Figure 2

Gap Closing:

Autofinish

-Hybrid vs. Human-only for five different BACs.

Figure 3

Figure 3

Finishing reads required:

Autofinish

-Hybrid vs. Human-only.

Figure 4

Figure 4

PCR reads required:

Autofinish

-Hybrid vs. Human-only.

Figure 5

Figure 5

Custom primers required:

Autofinish

-Hybrid vs. Human-only.

Figure 6

Figure 6

Human hours required for choosing reads:

Autofinish

-Hybrid vs. Human-only.

Figure 7

Figure 7

Autofinish

checks each contig to see if either end is the clone end. Masked clone vector and subclone vector both appear as Xs, and Bs are high quality bases that do not match subclone or clone vector. Notice that vector-insert junctions of reads 1–4 are aligned. If read 5 were not present, this figure would suggest a typical clone end with the Xs clone vector. If only read 1 and read 2 were present, the Xs could instead be subclone vector, which just happens to align; but the presence of two additional reads (read 3 and read 4), or additional vector bases in one read, would make this less likely. However, if read 5 were present (all high quality bases surrounding the putative vector/insert junction), it would be unlikely that this is the clone end.

Similar articles

Cited by

References

    1. Ewing B, Hillier L, Wendl M, Green P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. - PubMed
    1. Ewing B, Green P. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–194. - PubMed
    1. Gordon D, Abajian C, Green P. Consed: A graphical tool for sequence finishing. Genome Res. 1998;8:195–202. - PubMed
    1. McMurray A, Sulston J, Quail M. Short-Insert Libraries as a Method of Problem Solving in Genome Sequencing. Genome Res. 1998;8:562–566. - PMC - PubMed
    1. Nickerson D, Tobe V, Taylor S. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 1997;25:2745–2751. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources