Systematic sequencing of cDNA clones using the transposon Tn5 - PubMed (original) (raw)

Systematic sequencing of cDNA clones using the transposon Tn5

Yuriy Shevchenko et al. Nucleic Acids Res. 2002.

Abstract

In parallel with the production of genomic sequence data, attention is being focused on the generation of comprehensive cDNA-sequence resources. Such efforts are increasingly emphasizing the production of high-accuracy sequence corresponding to the entire insert of cDNA clones, especially those presumed to reflect the full-length mRNA. The complete sequencing of cDNA clones on a large scale presents unique challenges because of the generally small, yet heterogeneous, sizes of the cloned inserts. We have developed a strategy for high-throughput sequencing of cDNA clones using the transposon Tn5. This approach has been tailored for implementation within an existing large-scale 'shotgun-style' sequencing program, although it could be readily adapted for use in virtually any sequencing environment. In addition, we have developed a modified version of our strategy that can be applied to cDNA clones with large cloning vectors, thereby overcoming a potential limitation of transposon-based approaches. Here we describe the details of our cDNA-sequencing pipeline, including a summary of the experience in sequencing more than 4200 cDNA clones to produce more than 8 million base pairs of high-accuracy cDNA sequence. These data provide both convincing evidence that the insertion of Tn5 into cDNA clones is sufficiently random for its effective use in large-scale cDNA sequencing as well as interesting insight about the sequence context preferred for insertion by Tn5.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Pipeline for transposon-based sequencing of cDNA clones. The general pipeline for the systematic sequencing of cDNA clones using the transposon Tn5 is depicted, with additional details provided in the text. Note that ‘transposon subclones’ correspond to subclones derived from the starting cDNA clone by the insertion of a transposon. For some steps, the thickness of the arrows is intended to reflect the relative number of cDNA clones traversing that portion of the pipeline (see Table 1). The gray box designates the portion of the pipeline that can be readily performed as part of the ‘main production’ component of a DNA-sequencing facility.

Figure 2

Figure 2

Assessing randomness of transposon Tn5 insertions. The binomial test was used to assess the distribution of transposon-insertion events. The insertions of Tn5 into 1955 cDNA clones were analyzed and assigned to bins (see Materials and Methods for details). The resulting _P_-values reflect the likelihood that the observed insertion events were not random. Plotted are the numbers of bins grouped into _P_-value ranges of 0.01. _P_-values >0.05 correspond to bins for which the observed insertion events are likely to be random. _P_-values ≤0.05 (indicated by gray bars) correspond to bins for which the observed insertion events cannot be confidently described as random occurrences.

Figure 3

Figure 3

Base composition at the Tn5-insertion site and immediately flanking it. The frequency of each base flanking 24 493 Tn5-insertion events was cataloged (see Table 2). From those data, the relative compositions of GC and AT (A) and pyrimidine (Py) and purine (Pu) nucleotides (B) were determined and then plotted relative to the 9-bp target site, where position 1 is the 5′ end of the site.

Figure 4

Figure 4

Modified strategy for transposon-based sequencing of cDNA clones involving Gateway cloning technology. The transposon-based approach for sequencing cDNA clones described here can be implemented in the most straightforward fashion with clones containing relatively small vectors, such as pOTB7 (A). In these cases, most of the resulting transposon-containing subclones harbor a transposon within the cDNA insert. While insertions within the vector backbone occur, those inserting within the essential components of the vector (e.g. antibiotic resistance gene, origin of replication) yield non-viable subclones; thus, only a small minority of the recovered subclones harbor a transposon in the vector. For cDNA clones with larger vectors, such as pCMV-SPORT6.0 (B), a much larger proportion of transposon-insertion events occur within the vector backbone, with only a small fraction occurring within the essential components of the vector. Undesirable ‘background’ subclones (i.e. those with an inserted transposon in the vector) can be eliminated by using the Gateway-transfer system (27,28) to shuttle the cDNA inserts into a suitable recipient vector (e.g. pDONR223). By then selecting for the recipient vector backbone and the presence of a transposon, virtually all of the resulting subclones should harbor a transposon within the transferred cDNA insert. Subclones containing a cDNA insert devoid of a transposon would be non-viable (indicated by crosses). Note that the vectors and cDNA inserts are not drawn to scale. At both ends of each inserted transposon are annealing sites for sequencing primers (arrows).

Similar articles

Cited by

References

    1. Green E.D. (2001) Strategies for the systematic sequencing of complex genomes. Nature Rev. Genet., 2, 573–583. - PubMed
    1. Adams M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M., Polymeropoulos,M.H., Xiao,H., Merril,C.R., Wu,A., Olde,B., Moreno,R.F. et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656. - PubMed
    1. Hillier L., Lennon,G., Becker,M., Bonaldo,M.F., Chiapelli,B., Chissoe,S., Dietrich,N., DuBuque,T., Favello,A., Gish,W. et al. (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res., 6, 807–828. - PubMed
    1. Marra M., Hillier,L., Kucaba,T., Allen,M., Barstead,R., Beck,C., Blistain,A., Bonaldo,M., Bowers,Y., Bowles,L. et al. (1999) An encyclopedia of mouse genes. Nature Genet., 21, 191–194. - PubMed
    1. Marra M.A., Hillier,L. and Waterston,R.H. (1998) Expressed sequence tags—ESTablishing bridges between genomes. Trends Genet., 14, 4–7. - PubMed

MeSH terms

Substances

LinkOut - more resources