Short-insert libraries as a method of problem solving in genome sequencing - PubMed (original) (raw)
Short-insert libraries as a method of problem solving in genome sequencing
A A McMurray et al. Genome Res. 1998 May.
Abstract
As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the sequencing process) form complex folded secondary structures that are inaccessible to the enzymes. The solution, however, is simply to break them up. Specifically, the offending fragments are sonicated heavily and recloned, as much smaller fragments, into pUC vector. The sequences obtained from the resulting library can subsequently be assembled, free from the effects of secondary structure, to produce high-quality, complete sequence. Because of the success and simplicity of this procedure, we have begun to use it for the sequencing of all regions in which standard primer walking has been at all difficult.
Figures
Figure 1
Y48E1–C. elegans chromosome II. Restriction digest analysis showed that a fragment of 800 bases was missing from the assembly, and although three pUC18 shotgun subclones spanned the gap, they were unsequenceable in that region. The small-insert clones obtained from the inserts of two of the shotgun pUC18 subclones provided complete and unequivocal contiguation of the gap, which could then be identified as containing one arm of a 1-kb inverted repeat. EMBL accession no. Z93392, bases 263250–264250. Sequence starts, ATCATGGTTGATAACGTAAATTCCCAGAC; sequence ends, CGCTGCGTATCGATTTTTATGAAACTGTG.
Figure 2
179I15–H. sapiens chromosome 13 BRCA2 region. (A) After finishing, 179I15 contained a region of 11 bp within a CpG island in which the sequence was unreadable using standard dye primer or dye terminator sequencing. (B) An example of a reverse direction dye primer terminator sequencing reaction over the region (read no. 3284); the sequence obtained from a small insert clone across the same region (read no. 4052). EMBL accession no. Z92540, bases 13000–131140. Sequence starts, CCTGCACGGCTCCCGGGAGCTGGGAGAAA; sequence ends, GTGAGTGCGAGGGGCCAGGCGGAGGGCCA.
Figure 2
179I15–H. sapiens chromosome 13 BRCA2 region. (A) After finishing, 179I15 contained a region of 11 bp within a CpG island in which the sequence was unreadable using standard dye primer or dye terminator sequencing. (B) An example of a reverse direction dye primer terminator sequencing reaction over the region (read no. 3284); the sequence obtained from a small insert clone across the same region (read no. 4052). EMBL accession no. Z92540, bases 13000–131140. Sequence starts, CCTGCACGGCTCCCGGGAGCTGGGAGAAA; sequence ends, GTGAGTGCGAGGGGCCAGGCGGAGGGCCA.
Figure 3
F59D12–C. elegans chromosome II. (A) Restriction digest revealed that although the assembly appeared contiguous, there was a 400-bp fragment missing between two identical repeat motifs. This was present in the cosmid but had become deleted from all shotgun subclones. (B) A PCR product was obtained across the region, but this concurred with the original deleted assembly. The PCR reaction had “skipped” between the two repeat regions giving a product that was also missing the 400-bp fragment. (C) A restriction fragment containing the missing sequence was isolated and sonicated to give a small insert library which, when sequenced, revealed the missing 400 bp. EMBL accession no. Z81558, bases 18910–19550. Sequence starts, GTCCACTTACGGGAAAAGGCAAAAATTTA; sequence ends, TTCCCATGACTTTCCGAAAAAAAGGCGGG.
Figure 4
View of the finished region of F59D12 in DOTTER (Sonnhammer et al. 1994) showing comparison of the sequence with itself. The main diagonal from top left to bottom right shows the in-phase identity. The three broken lines perpendicular to the main diagonal represent the three inverted repeats that caused the problem. The short lines parallel to the main diagonal are the tandem repeats that allowed 400 bp to delete.
Similar articles
- Generation of an integrated transcription map of the BRCA2 region on chromosome 13q12-q13.
Couch FJ, Rommens JM, Neuhausen SL, Bélanger C, Dumont M, Abel K, Bell R, Berry S, Bogden R, Cannon-Albright L, Farid L, Frye C, Hattier T, Janecki T, Jiang P, Kehrer R, Leblanc JF, McArthur-Morrison J, Meney D, Miki Y, Peng Y, Samson C, Schroeder M, Snyder SC, Simard J, et al. Couch FJ, et al. Genomics. 1996 Aug 15;36(1):86-99. doi: 10.1006/geno.1996.0428. Genomics. 1996. PMID: 8812419 - How the worm was won. The C. elegans genome sequencing project.
Wilson RK. Wilson RK. Trends Genet. 1999 Feb;15(2):51-8. doi: 10.1016/s0168-9525(98)01666-7. Trends Genet. 1999. PMID: 10098407 - The nematode Caenorhabditis elegans and its genome.
Hodgkin J, Plasterk RH, Waterston RH. Hodgkin J, et al. Science. 1995 Oct 20;270(5235):410-4. doi: 10.1126/science.270.5235.410. Science. 1995. PMID: 7569995 Review. - Genomic DNA sequencing methods.
Favello A, Hillier L, Wilson RK. Favello A, et al. Methods Cell Biol. 1995;48:551-69. doi: 10.1016/s0091-679x(08)61403-x. Methods Cell Biol. 1995. PMID: 8531742 Review.
Cited by
- Scalable noninvasive amplicon-based precision sequencing (SNAPseq) for genetic diagnosis and screening of β-thalassemia and sickle cell disease using a next-generation sequencing platform.
Gupta P, Arvinden VR, Thakur P, Bhoyar RC, Saravanakumar V, Gottumukkala NV, Goswami SG, Nafiz M, Iyer AR, Vignesh H, Soni R, Bhargava N, Gunda P, Jain S, Gupta V, Sivasubbu S, Scaria V, Ramalingam S. Gupta P, et al. Front Mol Biosci. 2023 Dec 13;10:1244244. doi: 10.3389/fmolb.2023.1244244. eCollection 2023. Front Mol Biosci. 2023. PMID: 38152111 Free PMC article. - Mitochondrial Genomes of Two Thaparocleidus Species (Platyhelminthes: Monogenea) Reveal the First rRNA Gene Rearrangement among the Neodermata.
Zhang D, Zou H, Jakovlić I, Wu SG, Li M, Zhang J, Chen R, Li WX, Wang GT. Zhang D, et al. Int J Mol Sci. 2019 Aug 28;20(17):4214. doi: 10.3390/ijms20174214. Int J Mol Sci. 2019. PMID: 31466297 Free PMC article. - Three new Diplozoidae mitogenomes expose unusual compositional biases within the Monogenea class: implications for phylogenetic studies.
Zhang D, Zou H, Wu SG, Li M, Jakovlić I, Zhang J, Chen R, Li WX, Wang GT. Zhang D, et al. BMC Evol Biol. 2018 Sep 3;18(1):133. doi: 10.1186/s12862-018-1249-3. BMC Evol Biol. 2018. PMID: 30176801 Free PMC article. - A Sequel to Sanger: amplicon sequencing that scales.
Hebert PDN, Braukmann TWA, Prosser SWJ, Ratnasingham S, deWaard JR, Ivanova NV, Janzen DH, Hallwachs W, Naik S, Sones JE, Zakharov EV. Hebert PDN, et al. BMC Genomics. 2018 Mar 27;19(1):219. doi: 10.1186/s12864-018-4611-3. BMC Genomics. 2018. PMID: 29580219 Free PMC article. - Expanded genetic codes in next generation sequencing enable decontamination and mitochondrial enrichment.
McKernan KJ, Spangler J, Zhang L, Tadigotla V, McLaughlin S, Warner J, Zare A, Boles RG. McKernan KJ, et al. PLoS One. 2014 May 2;9(5):e96492. doi: 10.1371/journal.pone.0096492. eCollection 2014. PLoS One. 2014. PMID: 24788618 Free PMC article.
References
- Alderton R, Kitau J, Beck S. Automated DNA hybridization. Analy Biochem. 1994;218:98–102. - PubMed
- Beck S, Alderton R. A strategy for the amplification, purification, and selection of M13 templates for large-scale DNA-sequencing. Analy Biochem. 1993;212:498–505. - PubMed
- Berks M. The C. elegans genome sequencing project. Genome Res. 1995;5:99–104. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials