A physical map of the chicken genome (original) (raw)

For large genome sequence assemblies so far, contiguous clone-based maps provide the framework for organizing the sequence5,6. With the increasing emphasis on whole-genome shotgun assemblies, integration of mapping data throughout the assembly and finishing process is key to improving efficiency and assuring accuracy equivalent to human genome standards. The basis of map construction for any genome is the reliance on random breakage of genome structure and the subsequent ordering of these genome pieces. Once assembled, physical maps are a key resource in the dissemination of clones by known location for use in all disciplines of biology.

A large number of chicken genomic resources are available, including a collection of large-insert bacterial artificial chromosome (BAC) clone libraries7,8. These libraries were made from the inbred red jungle fowl 256 (RJF, G. gallus) strain used in whole-genome sequencing and from the white leghorn, a domestic breed selected for egg production. Meiotic linkage maps for three mapping populations were used to develop a consensus map containing about 2,000 sequence-tagged site (STS) markers9,10. Markers that were assigned to BACs were a key resource in map construction.

We constructed a physical clone map from 154,560 fingerprints of G. gallus BACs. Automated band recognition software identified restriction digest fragments11. A small fraction (15%) of these fingerprints were lost due to empty lanes, empty insert clones, incomplete restriction enzyme digestion, or failure by the software to recognize lanes. The FPC program processed the remaining 130,486 fingerprints12. Automated overlap evaluation of these fingerprints generated 6,509 contigs using a Sulston score13 threshold of 1 × 10-17, tolerance of 7. The Sulston score approximates the probability of two clones realizing a given number of fingerprint band matches by coincidence. These parameters left 51,954 singleton clones not assigned to a contig. The final automated step refined clone order using the program CORAL14. CORAL has an improved clone-ordering algorithm proven effective when applied to individual FPC contigs.

Subsequent to automated map assembly, manual review is an essential step in building a reliable fingerprint map. Although a required element in high-throughput generation of fingerprints, automated identification of bands produces band-calling errors that cannot be completely eliminated. These errors propagate during clone ordering and result in incorrectly assembled map contigs. Dispersed and tandem sequence repeats can also confound software programs, collapsing different regions into a complex mix of falsely overlapping clones. Finally, there is currently no automated procedure for merging related contigs. Therefore, we visually examined the fingerprint images of digested clones in each contig to limit errors in automated contig construction. Clone order errors were resolved and fingerprints not meeting expectations for band pattern were removed from contigs and returned to the singleton pool. To merge overlapping contigs, we relaxed the Sulston score threshold to 1 × 10-10 and then to 1 × 10-7. Clones comprising the terminal ends of contigs were allowed to join other contigs after manual review. We added individual singleton clones to contigs as needed to increase coverage of sparse regions. If a clone did not provide further band information, it was left in the singleton pool. Of the initial 6,509 contigs, 95% were joined to produce 320 contigs.

Markers are essential for anchoring the fingerprint map to chromosomes, and they also permit validation of contig integrity. A limited number (n = 911) of STS markers have been mapped to fingerprinted RJF clones; 730 of these have chromosome placements15. The white leghorn fingerprint database, however, is a rich source of marker data, with 1,830 markers (1,717 assigned to chromosomes). To increase the number of links to the genetic map, we added the fingerprints of 49,805 white leghorn BAC clones to the physical map. We assigned 29,663 white leghorn clones to contigs, based on the restrictive condition that each of these clones had a fingerprint match to at least five RJF contig clones. Manual inspection confirmed the positions of 2,244 white leghorn clones having assigned markers. The white leghorn clones were used for their marker information only; their fingerprints were not used to join RJF contigs.

We also took advantage of the draft sequence assembly to refine further the fingerprint map. The assembly and fingerprint map are linked by 128,523 BAC-end sequences (BES). After requiring a minimum of six end-sequence links, a total of 189 fingerprint contigs were reliably assigned to the sequence assembly. Some of the contigs left unassigned were linked to assembly contigs in a topologically impossible manner, indicating errors in the assembly or fingerprint map. For example, a pair of contigs cannot be correctly linked by their ends if the middle portions are linked to different contigs. Potentially incorrect regions of the assembly and fingerprint maps were reciprocally examined, revealing 14 misassembled fingerprint contigs that were subsequently split apart. These cases were primarily due to contaminated clones missed during the manual review process.

The assembly also provided preliminary order to 3,557 singleton clones not assigned to fingerprint contigs. Small groups of overlapping singletons often served as bridges that could join contigs. Merges suggested by the assembly were examined and made only when supported by fingerprint data, although a Sulston score of 1 × 10-6 was accepted with the additional sequence evidence. A summary of clone distribution in the final 260 contigs is shown in Table 1.

Table 1 Summary of clone distribution in the physical map

Full size table

The genetic map was then used to anchor the fingerprint contigs to chromosomes. We determined the distribution of contigs on each chicken chromosome using a simple plurality of chromosome assignments for the marker–clone pairs in each contig. Markers assigned to white leghorn clones were not considered unless the contig location of the clone had been manually confirmed. A total of 186 contigs were assigned in this way. Linkage of the fingerprint map to the sequence assembly provides positional information for many contigs. All 125 contigs suggested by the assembly to be collinear had consistent chromosome assignments. An additional seven contigs lacking independent marker data were given chromosome assignments based on their linkage to an assigned FPC contig. Finally, markers positioned on assembly contigs by BLAST sequence comparison provided additional support of the chromosome assignments and allowed localization of an additional 33 fingerprint contigs based on their linkage to assembled sequence. A summary of the 226 contigs mapped to chromosomes is found in Table 2.

Table 2 Summary statistics of chicken clone map coverage by chromosome

Full size table

A minimally overlapping set of clones spanning the map contigs is useful for comparative genomic studies16, as a source of specific cloned sequence, and as a reliable estimator of the physical length of each contig. We used the software Minilda (http://mkweb.bcgsc.ca/minilda/) to select clones with the goals of maximizing the amount of unique content in each clone selection, limiting excessive overlap, avoiding gaps between adjacent selections, and avoiding clones with fingerprint bands unconfirmed by overlapping clones. Our estimated minimum tiling path set consists of 9,210 BAC clones with an average clone overlap of 77 kilobases (kb) (http://mkweb.bcgsc.ca/chicken/images/?list=003). Estimates of the physical size of each contig were made and the totals per chromosome are listed in Table 2. The total amount of sequence represented in fingerprint contigs is 0.97 gigabases (Gb), or 91% of the current sequence assembly4.

Low coverage BAC-based fingerprint maps of the chicken genome were recently published7,8. The maps represent preliminary efforts to generate resources for the community, such as region-specific BAC clones as templates for polymorphic marker development. Ren et al.7 report 2,331 contigs, estimated to cover ∼7.5 × genome equivalents, with 367 markers used to anchor only 11% of the contigs. The estimated 3.6 × genome coverage by Aerts et al.8 was insufficient for accurate comparisons of clone distribution by contig. Both of these reports represented progress towards developing clone resources for functional genomic studies and for improving knowledge of local clone order for selected chromosomes; for example, G. gallus chromosome 10 (GGA10)8. However, neither report created the comprehensive coverage required for sequence assembly validation or provided a sufficient resource to pick tiling paths of reduced genome representation. Our manually curated map of 180,291 clones and 260 contigs represents a major advance upon these early efforts, while at the same time incorporating the identical libraries. In particular, this map integrates an additional, much larger BAC library (CHORI-261), 2,628 STS markers, 128,523 BAC-end sequences, the chicken whole-genome sequence assembly, and much higher clone redundancy. These additional resources allowed us to improve greatly clone contig distribution as demonstrated by a 100-fold increase in large contigs, defined as >200 clones per contig (see Table 1). The physical map is available in standard formats, such as CMAP (http://gmod.wustl.edu/cgi-bin/cmap/viewer) and GBROWSE (http://www.animalsciences.nl/ChickFPC). It can also be downloaded directly from http://genome.wustl.edu/pub/groups/mapping/fpc_files/chicken/.

Combining physical maps with a whole-genome, shotgun-based strategy provides an ideal blueprint for sequence assembly17. Critically, the physical map with corresponding marker information allowed reciprocal error checking of the physical map and the whole-genome shotgun assembly while at the same time providing invaluable long-range linking information. This allowed anchoring, ordering and orienting the sequence along the chicken genome. The clone-based map now also provides a key resource for moving towards improving the chicken genome sequence, filling gaps and sorting out difficult regions of the chicken genome. Future avian molecular genetic research will be greatly aided by this BAC-based map and accompanying minimum tiling path.