High-throughput genotyping by whole-genome resequencing - PubMed (original) (raw)

. 2009 Jun;19(6):1068-76.

doi: 10.1101/gr.089516.108. Epub 2009 May 6.

Qi Feng, Qian Qian, Qiang Zhao, Lu Wang, Ahong Wang, Jianping Guan, Danlin Fan, Qijun Weng, Tao Huang, Guojun Dong, Tao Sang, Bin Han

Affiliations

High-throughput genotyping by whole-genome resequencing

Xuehui Huang et al. Genome Res. 2009 Jun.

Abstract

The next-generation sequencing technology coupled with the growing number of genome sequences opens the opportunity to redesign genotyping strategies for more effective genetic mapping and genome analysis. We have developed a high-throughput method for genotyping recombinant populations utilizing whole-genome resequencing data generated by the Illumina Genome Analyzer. A sliding window approach is designed to collectively examine genome-wide single nucleotide polymorphisms for genotype calling and recombination breakpoint determination. Using this method, we constructed a genetic map for 150 rice recombinant inbred lines with an expected genotype calling accuracy of 99.94% and a resolution of recombination breakpoints within an average of 40 kb. In comparison to the genetic map constructed with 287 PCR-based markers for the rice population, the sequencing-based method was approximately 20x faster in data collection and 35x more precise in recombination breakpoint determination. Using the sequencing-based genetic map, we located a quantitative trait locus of large effect on plant height in a 100-kb region containing the rice "green revolution" gene. Through computer simulation, we demonstrate that the method is robust for different types of mapping populations derived from organisms with variable quality of genome sequences and is feasible for organisms with large genome sizes and low polymorphisms. With continuous advances in sequencing technologies, this genome-based method may replace the conventional marker-based genotyping approach to provide a powerful tool for large-scale gene discovery and for addressing a wide range of biological questions.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Sequence-based high-throughput genotyping. Rice RILs were developed from a cross between indica and japonica cultivars. Genome sequences of the parents were aligned and SNPs were identified. Genomes of the RILs were resequenced on the Illumina Genome Analyzer using the multiplexed sequencing strategy. Three-base indexed DNAs of 16 RILs were combined and sequenced in one lane. Sequences were sorted and aligned with the pseudomolecules of parental genome sequences for SNP detection. Detected SNPs were arranged along chromosomes according to their physical locations with genotypes indicated. A sliding window approach was used for genotype calling, recombination breakpoint determination, and map construction.

Figure 2.

Figure 2.

Sliding window approach for genotype calling and recombination breakpoint determination. (A) The top stripe of blocks represents SNPs along the hypothetical chromosomal region. This was redrawn from the two stripes of short vertical lines below illustrating SNPs detected by aligning 33-mers with the parental genome sequences. (Red) Indica genotype; (blue) japonica genotype. A sliding window covering 15 SNPs moves from left to right one base at a time. For each window, the ratio of the number of indica to japonica SNPs (ind:jap) is calculated. (B) Genotype calling based on the highest expected probabilities: Call homozygous indica genotype (ind/ind) when ind:jap ≥ 11:4; call heterozygous genotype (ind/jap) when 10:5 ≥ ind:jap ≥ 3:12; call homozygous japonica genotype (jap/jap) when ind:jap ≤ 2:13. Adding together the probabilities of these callings (shaded in black) gives the calling accuracy of 99.94%. (C) As the window slides, genotypes are called and recombination breakpoints are determined. Green and brown arrows point to breakpoints between two homozygous genotypes and between the heterozygous and homozygous genotypes, respectively. The resulting recombination map for this chromosomal region is illustrated in a solid bar, in which red, blue, and yellow represent genotypes ind/ind, jap/jap, and ind/jap, respectively. Identified breakpoints are indicated between SNPs.

Figure 3.

Figure 3.

Simulation of genotype calling accuracy. (A) Effect of parental genome sequence quality on calling accuracy. (Left) One parent has high-quality genome sequences that give an SNP error rate of 1%, while the genome sequence quality of the other parent is allowed to vary and gives SNP error rates from 2% to 20%. (Right) Genome sequence qualities of both parents are allowed to vary and give the same SNP error rates from 2% to 20%. Two types of populations, RIL and F2, are considered, with ratios of three genotypes set at 49.5:1:49.5 and 1:2:1, respectively. Window size is set at 15. Genotype calling accuracy is calculated according to Equation 8 in Methods. (B) The effect of window size on calling accuracy. (Left) The critical error rate of 6% that drops the calling accuracy of F2 below 99% in the above figure is used. (Right) Three critical error rates are used, including 16% for both parents that drops the calling accuracy of RIL below 99%, 4% for both parents that drops the accuracy of F2 below 99%, and 12% for both parents that drops the accuracy of F2 below 95%, in the above figure. When window sizes are measured by the number of SNPs covering the same physical distance, increase in window sizes is equivalent to the increase in resequencing coverage. Rice is taken as an example to show resequencing coverage for the corresponding window size. (C) The amount of effective sequences (Se) required for a RIL to reach a range of mapping resolutions (R) as SNP densities (D) vary. (Left) Simulation for the rice genome size, 389 Mb. Red dot indicates the location of the rice RIL of this study (D = 3.2 SNPs/kb, R = 25 SNPs/Mb). (Right) Simulation for the mouse genome size, 2500 Mb. Red dot indicates Se required for a mouse RIL with D = 1.3 and R = 25.

Figure 4.

Figure 4.

Recombination and bin maps. (A) Aligned recombination maps of 150 rice RILs. Red, ind/ind; blue, jap/jap; yellow, ind/jap. (B) Aligned chromosome 1 of the first ten RILs. Scale indicates physical distance. A vertical line labels a recombination breakpoint. A region between two vertical lines across all RILs is recognized as a recombination bin. (C) Bin map of the 10 RILs.

Similar articles

Cited by

References

    1. Craig D.W., Pearson J.V., Szelinger S., Sekar A., Redman M., Corneveaux J.J., Pawlowski T.L., Laub T., Nunn G., Stephan D.A., et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nat. Methods. 2008;5:887–893. - PMC - PubMed
    1. Cronn R., Liston A., Parks M., Gernandt D.S., Shen R., Mockler T. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Res. 2008;36:e122. doi: 10.1093/nar/gkn502. - DOI - PMC - PubMed
    1. Dohm J.C., Lottaz C., Borodina T., Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. doi: 10.1093/nar/gkn425. - DOI - PMC - PubMed
    1. Frazer K.A., Eskin E., Kang H.M., Bogue M.A., Hinds D.A., Beilharz E.J., Gupta R.V., Montgomery J., Morenzoni M.M., Nilsen G.B., et al. A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature. 2007;448:1050–1053. - PubMed
    1. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature. 2005;436:793–800. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources