High throughput fingerprint analysis of large-insert clones - PubMed (original) (raw)

High throughput fingerprint analysis of large-insert clones

M A Marra et al. Genome Res. 1997 Nov.

Abstract

As part of the Human Genome Project, the Washington University Genome Sequencing Center has commenced systematic sequencing of human chromsome 7. To organize and supply the effort, we have undertaken the construction of sequence-ready physical maps for defined chromosomal intervals. Map construction is a serial process composed of three main activities. First, candidate STS-positive large-insert PAC and BAC clones are identified. Next, these candidate clones are subjected to fingerprint analysis. Finally, the fingerprint data are used to assemble sequence-ready maps. The fingerprinting method we have devised is key to the success of the overall approach. We present here the details of the method and show that the fingerprints are of sufficient quality to permit the construction of megabase-size contigs in defined regions of the human genome. We anticipate that the high throughput and precision characteristic of our fingerprinting method will make it of general utility.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Schematic illustration of the role played by fingerprinting in the construction of sequence-ready contigs. (A) For construction of “local” sequence-ready maps, STSs, and probes specific for a small region, typically 1–2 Mb, are used to identify clones, either by hybridization or PCR methods. The fingerprinting produces higher resolution clone-specific information and facilitates contig construction and selection of clones for sequencing. Intercontig gaps are closed by chromosomal walks using probes developed from end sequences. Clones identified during walks are fingerprinted to determine their relationship to the contigs, permitting recognition of clones spanning intercontig gaps. (B) A typical agarose-mapping gel showing human PACs digested with _Hin_dIII. Clones are present in triplicate to verify stability during propagation and to control for the possibility of cross-contaminated glycerol stocks in the intial 384-well format. DNA size standards, which are a mixture of three commercially available markers (see Methods), are present every fifth lane. The sizes, in base pairs, of the marker fragments are indicated.

Figure 1

Figure 1

Schematic illustration of the role played by fingerprinting in the construction of sequence-ready contigs. (A) For construction of “local” sequence-ready maps, STSs, and probes specific for a small region, typically 1–2 Mb, are used to identify clones, either by hybridization or PCR methods. The fingerprinting produces higher resolution clone-specific information and facilitates contig construction and selection of clones for sequencing. Intercontig gaps are closed by chromosomal walks using probes developed from end sequences. Clones identified during walks are fingerprinted to determine their relationship to the contigs, permitting recognition of clones spanning intercontig gaps. (B) A typical agarose-mapping gel showing human PACs digested with _Hin_dIII. Clones are present in triplicate to verify stability during propagation and to control for the possibility of cross-contaminated glycerol stocks in the intial 384-well format. DNA size standards, which are a mixture of three commercially available markers (see Methods), are present every fifth lane. The sizes, in base pairs, of the marker fragments are indicated.

Figure 2

Figure 2

A specific example of contig construction, on human chromosome 7, in FPC. Contigs are constructed as described in Methods (computer analysis and contig construction). Windows A, B, and C show the results of searching individual clones against FPC. The PAC clones 0897M19a, 1136G02b, and 0975M14a, used here to query the FPC database, correspond to the theoretical “clone 1,” “clone 2,” and “clone 3” described in Methods (computer analysis and contig construction). Window D shows the FPC fingerprint viewing tool displaying the fingerprints of these three clones as a representation of the original agarose gel fingerprint data output from the program Image. Note, however, that the lanes displayed are substantially altered versions of the original agarose gel image collected on the Molecular Dynamics FluorImager. The band positions have been normalized to the band positions of the marker DNA and only a one pixel-wide “slice” of the original 17 pixel-wide agarose gel image can be displayed. Manually verified band positions are indicated by the hashmarks flanking the lanes. Only these bands are used by FPC in database comparisons. Other band-like entities present in the lanes correspond to computer artifacts generated during the original gel image modification for display in FPC, to fluorescent foreign particles embedded in the agarose detected by the FluorImager during data acquisition, and rarely to legitimate bands that were not identified during the manual band calling routine conducted in Image. Examination of the original agarose gel images, retained in hard copy and electronic format, allows resolution of these alternate possibilities. Window E shows the restriction fragment sizes (in base pairs) for the clone 0897M19a. The + symbols to the right of the fragment sizes indicate fragments that have been user-selected by clicking with a mouse on the hashmarks in the FPC fingerprint viewer (D). Clicking a second time on the hashmark deselects the fragment. The sum of the sizes of the selected fragments is given at the bottom of window E, as is the total estimated size of the clone. The total size is calculated by summing the sizes of all the restriction fragments. This latter feature is a non-FPC function added at Washington University. Window F is an FPC display of a manually generated diagram of the contig after analysis is complete. The asterisks beside some clones indicate that redundant clones are contained entirely within these clones. Redundant clones are referred to as “buried” clones and are not visible in this view. A total of 40 clones have been incorporated into this contig. The clones considered specifically in this example are indicated with a box. Not shown are the additional search results used to construct this contig. G provides an example of the raw data used by FPC in performing the calculation of the Sulston score. Shown are the normalized relative mobilities (in tenths of a millimeter) for all of the detected restriction fragments for the clones 0897M19a and 1136G02b. The shaded values are considered common between these two clones. The values contained within open boxes indicate fragments unique to 1136G02b, all of which (except two fragments that are presumed to be the anomalous vector-insert junction fragments found in PACs digested with _Hin_dIII) are “confirmed” (Methods) by their prescence in the clone 0975M14a. Only the confirming fragments for 0975M14a are shown. These are indicated with a Y (for Yes). Details of FPC search results: The result of searching a PAC clone 0897M19a against the FPC database is given in Figure 2A. Shown in the window labeled xterm are the Sulston scores (see Methods) for the PAC clones 0741B13a and 1136G02b. The smaller score, indicating relatively more extensive overlap with 0897M19a (Methods), is exhibited by 0741B13a. Manual comparison of the fingerprints (not shown) of these two clones revealed that 0741B13a (24 restriction fragments) contains only two novel restriction fragments, which may correspond to the PAC vector-insert junction fragments. Manual comparison (Figure 2D) of the 1136G02b fingerprint (43 restriction fragments) with that of 0897M19a (31 restriction fragments) revealed that 20 of these fragments were unique to 1136G02b. 1136G02b was then used to query FPC. The results of this search are shown in Figure 2B. Strong matches to the previously mentioned clones are indicated, as are matches to the clones 0975M14a (40 restriction fragments) and 0659J06a (41 restriction fragments). Manual comparison of the 0975M14a fingerprint to that of 1136G02b (Figure 2D) reveals that the restriction fragments identified as unique to 1136G02b in the 1136G02b–0897M19a comparison are present in the fingerprint of 0975M14a (see Fig. 2D,G). These fragments are thus said to have been “confirmed” (Methods) and 0975M14a can be incorporated into the nascent contig.

Figure 2

Figure 2

A specific example of contig construction, on human chromosome 7, in FPC. Contigs are constructed as described in Methods (computer analysis and contig construction). Windows A, B, and C show the results of searching individual clones against FPC. The PAC clones 0897M19a, 1136G02b, and 0975M14a, used here to query the FPC database, correspond to the theoretical “clone 1,” “clone 2,” and “clone 3” described in Methods (computer analysis and contig construction). Window D shows the FPC fingerprint viewing tool displaying the fingerprints of these three clones as a representation of the original agarose gel fingerprint data output from the program Image. Note, however, that the lanes displayed are substantially altered versions of the original agarose gel image collected on the Molecular Dynamics FluorImager. The band positions have been normalized to the band positions of the marker DNA and only a one pixel-wide “slice” of the original 17 pixel-wide agarose gel image can be displayed. Manually verified band positions are indicated by the hashmarks flanking the lanes. Only these bands are used by FPC in database comparisons. Other band-like entities present in the lanes correspond to computer artifacts generated during the original gel image modification for display in FPC, to fluorescent foreign particles embedded in the agarose detected by the FluorImager during data acquisition, and rarely to legitimate bands that were not identified during the manual band calling routine conducted in Image. Examination of the original agarose gel images, retained in hard copy and electronic format, allows resolution of these alternate possibilities. Window E shows the restriction fragment sizes (in base pairs) for the clone 0897M19a. The + symbols to the right of the fragment sizes indicate fragments that have been user-selected by clicking with a mouse on the hashmarks in the FPC fingerprint viewer (D). Clicking a second time on the hashmark deselects the fragment. The sum of the sizes of the selected fragments is given at the bottom of window E, as is the total estimated size of the clone. The total size is calculated by summing the sizes of all the restriction fragments. This latter feature is a non-FPC function added at Washington University. Window F is an FPC display of a manually generated diagram of the contig after analysis is complete. The asterisks beside some clones indicate that redundant clones are contained entirely within these clones. Redundant clones are referred to as “buried” clones and are not visible in this view. A total of 40 clones have been incorporated into this contig. The clones considered specifically in this example are indicated with a box. Not shown are the additional search results used to construct this contig. G provides an example of the raw data used by FPC in performing the calculation of the Sulston score. Shown are the normalized relative mobilities (in tenths of a millimeter) for all of the detected restriction fragments for the clones 0897M19a and 1136G02b. The shaded values are considered common between these two clones. The values contained within open boxes indicate fragments unique to 1136G02b, all of which (except two fragments that are presumed to be the anomalous vector-insert junction fragments found in PACs digested with _Hin_dIII) are “confirmed” (Methods) by their prescence in the clone 0975M14a. Only the confirming fragments for 0975M14a are shown. These are indicated with a Y (for Yes). Details of FPC search results: The result of searching a PAC clone 0897M19a against the FPC database is given in Figure 2A. Shown in the window labeled xterm are the Sulston scores (see Methods) for the PAC clones 0741B13a and 1136G02b. The smaller score, indicating relatively more extensive overlap with 0897M19a (Methods), is exhibited by 0741B13a. Manual comparison of the fingerprints (not shown) of these two clones revealed that 0741B13a (24 restriction fragments) contains only two novel restriction fragments, which may correspond to the PAC vector-insert junction fragments. Manual comparison (Figure 2D) of the 1136G02b fingerprint (43 restriction fragments) with that of 0897M19a (31 restriction fragments) revealed that 20 of these fragments were unique to 1136G02b. 1136G02b was then used to query FPC. The results of this search are shown in Figure 2B. Strong matches to the previously mentioned clones are indicated, as are matches to the clones 0975M14a (40 restriction fragments) and 0659J06a (41 restriction fragments). Manual comparison of the 0975M14a fingerprint to that of 1136G02b (Figure 2D) reveals that the restriction fragments identified as unique to 1136G02b in the 1136G02b–0897M19a comparison are present in the fingerprint of 0975M14a (see Fig. 2D,G). These fragments are thus said to have been “confirmed” (Methods) and 0975M14a can be incorporated into the nascent contig.

Figure 3

Figure 3

Graph showing accuracy of restriction fragment sizes. On the y axis “% deviation from true size” is the size, in base pairs, predicted from agarose gel analysis divided by the size of the restriction fragment as determined from sequence analysis of the entire clone, converted to a percent value. The line indicates a moving average calculated at data-point intervals of 40. The box encases 95% (102 of 107) of the data points for fragments of <12 kilobases.

Figure 4

Figure 4

The FPC contig view of two X chromosome contigs juxtaposed, showing the relative positions of the various clones. Contig A is on the left and Contig B is on the right. Overlap between the PAC 0545D18a and the BAC R038K21a has yet to be verified.

Figure 5

Figure 5

The FPC contig view of a chromosome 7q22 contig. The contig consists of 163 BAC and PAC clones in total. The majority of the redundant clones are hidden in this view. Clones labeled with a “w” were identified by hybridization. Clones labeled with a “p” were identified by PCR. The contig spans ∼2 Mb.

References

    1. Coulson A, Sulston J, Brenner S, Karn J. Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc Natl Acad Sci. 1986;83:7821–7825. - PMC - PubMed
    1. Coulson A, Kozono Y, Lutterbach B, Shownkeen R, Sulston J, Waterston R. YACS and the C. elegans genome. BioEssays. 1991;13:413–417. - PubMed
    1. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin W, Oliver SG. Life with 6,000 genes. Science. 1996;274:546–567. - PubMed
    1. Marra MA, Weinstock LA, Mardis ER. End sequence determination from large insert clones using energy transfer fluorescent primers. Genome Res. 1996;6:1118–1122. - PubMed
    1. Nagaraja R, MacMillan S, Kere J, Jones C, Cox S, Schmatz M, Terrell J, Shomaker M, Jermak C, Hoff C, et al. X chromosome map at 75 kb STS resolution, revealing extremes of recombination and GC content. Genome Res. 1997;7:210–222. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources