Short-insert libraries as a method of problem solving in genome sequencing - PubMed (original) (raw)

Short-insert libraries as a method of problem solving in genome sequencing

A A McMurray et al. Genome Res. 1998 May.

Abstract

As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the sequencing process) form complex folded secondary structures that are inaccessible to the enzymes. The solution, however, is simply to break them up. Specifically, the offending fragments are sonicated heavily and recloned, as much smaller fragments, into pUC vector. The sequences obtained from the resulting library can subsequently be assembled, free from the effects of secondary structure, to produce high-quality, complete sequence. Because of the success and simplicity of this procedure, we have begun to use it for the sequencing of all regions in which standard primer walking has been at all difficult.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Y48E1–C. elegans chromosome II. Restriction digest analysis showed that a fragment of 800 bases was missing from the assembly, and although three pUC18 shotgun subclones spanned the gap, they were unsequenceable in that region. The small-insert clones obtained from the inserts of two of the shotgun pUC18 subclones provided complete and unequivocal contiguation of the gap, which could then be identified as containing one arm of a 1-kb inverted repeat. EMBL accession no. Z93392, bases 263250–264250. Sequence starts, ATCATGGTTGATAACGTAAATTCCCAGAC; sequence ends, CGCTGCGTATCGATTTTTATGAAACTGTG.

Figure 2

Figure 2

179I15–H. sapiens chromosome 13 BRCA2 region. (A) After finishing, 179I15 contained a region of 11 bp within a CpG island in which the sequence was unreadable using standard dye primer or dye terminator sequencing. (B) An example of a reverse direction dye primer terminator sequencing reaction over the region (read no. 3284); the sequence obtained from a small insert clone across the same region (read no. 4052). EMBL accession no. Z92540, bases 13000–131140. Sequence starts, CCTGCACGGCTCCCGGGAGCTGGGAGAAA; sequence ends, GTGAGTGCGAGGGGCCAGGCGGAGGGCCA.

Figure 2

Figure 2

179I15–H. sapiens chromosome 13 BRCA2 region. (A) After finishing, 179I15 contained a region of 11 bp within a CpG island in which the sequence was unreadable using standard dye primer or dye terminator sequencing. (B) An example of a reverse direction dye primer terminator sequencing reaction over the region (read no. 3284); the sequence obtained from a small insert clone across the same region (read no. 4052). EMBL accession no. Z92540, bases 13000–131140. Sequence starts, CCTGCACGGCTCCCGGGAGCTGGGAGAAA; sequence ends, GTGAGTGCGAGGGGCCAGGCGGAGGGCCA.

Figure 3

Figure 3

F59D12–C. elegans chromosome II. (A) Restriction digest revealed that although the assembly appeared contiguous, there was a 400-bp fragment missing between two identical repeat motifs. This was present in the cosmid but had become deleted from all shotgun subclones. (B) A PCR product was obtained across the region, but this concurred with the original deleted assembly. The PCR reaction had “skipped” between the two repeat regions giving a product that was also missing the 400-bp fragment. (C) A restriction fragment containing the missing sequence was isolated and sonicated to give a small insert library which, when sequenced, revealed the missing 400 bp. EMBL accession no. Z81558, bases 18910–19550. Sequence starts, GTCCACTTACGGGAAAAGGCAAAAATTTA; sequence ends, TTCCCATGACTTTCCGAAAAAAAGGCGGG.

Figure 4

Figure 4

View of the finished region of F59D12 in DOTTER (Sonnhammer et al. 1994) showing comparison of the sequence with itself. The main diagonal from top left to bottom right shows the in-phase identity. The three broken lines perpendicular to the main diagonal represent the three inverted repeats that caused the problem. The short lines parallel to the main diagonal are the tandem repeats that allowed 400 bp to delete.

Similar articles

Cited by

References

    1. Alderton R, Kitau J, Beck S. Automated DNA hybridization. Analy Biochem. 1994;218:98–102. - PubMed
    1. Barnes W. PCR amplification of up to 35kB DNA with high fidelity and high yield from lambda-bacteriophage templates. Proc Natl Acad Sci. 1994;91:2216–2220. - PMC - PubMed
    1. Beck S, Alderton R. A strategy for the amplification, purification, and selection of M13 templates for large-scale DNA-sequencing. Analy Biochem. 1993;212:498–505. - PubMed
    1. Berks M. The C. elegans genome sequencing project. Genome Res. 1995;5:99–104. - PubMed
    1. Devine SE, Chissoe SL, Eby Y, Wilson RK, Boeke JD. A transposon-based strategy for sequencing repetitive DNA in eukaryotic genomes. Genome Res. 1997;7:551–563. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources