Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing - PubMed (original) (raw)

doi: 10.1038/nbt.1523. Epub 2009 Feb 1.

Alexandre Melnikov, Jared Maguire, Peter Rogov, Emily M LeProust, William Brockman, Timothy Fennell, Georgia Giannoukos, Sheila Fisher, Carsten Russ, Stacey Gabriel, David B Jaffe, Eric S Lander, Chad Nusbaum

Affiliations

Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing

Andreas Gnirke et al. Nat Biotechnol. 2009 Feb.

Abstract

Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA 'baits' to fish targets out of a 'pond' of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that approximately 60% of target bases in the exonic 'catch', and approximately 80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Overview of hybrid selection method. Illustrated are steps involved in the preparation of a complex pool of biotinylated RNA capture probes (“bait”; top left), whole-genome fragment input library (“pond”; top right) and hybrid-selected enriched output library (“catch”; bottom). Two sequencing targets and their respective baits are shown in red and blue. Thin and thick lines represent single and double strands, respectively. Universal adapter sequences are grey. The excess of single-stranded non-self-complementary RNA (wavy lines) drives the hybridization. See main text and Methods for details.

Figure 2

Figure 2

Coverage profiles of exon targets by end sequencing and shotgun sequencing. Shown are cumulative coverage profiles that sum the per-base sequencing coverage along 7,052 single-bait target exons. Only free-standing baits that were not within 500 bases of another one were included in this analysis. End sequencing of exon capture 1 with 36-base reads (a) produced a bimodal profile with high sequence coverage near and slightly beyond the ends of the 170-base baits (indicated by the horizontal bar). Shotgun sequencing of capture 2 from a different pond library (containing fragments with generic rather than Illumina-specific adapters) with 36-base reads after concatenating and re-shearing (b) gave more coverage on bait (shaded area) than near bait. Re-sequencing of capture 1 with 76-base end reads (c) had a similar effect, although the peak was slightly wider and the on-bait fraction of the peak area slightly less. Note that the scale on the Y-axis and hence the absolute peak height is different in each case. The different scales reflect the different numbers of sequenced bases which is much lower for GA-I lanes (a, b) than for a GA-II lane (c).

Figure 2

Figure 2

Coverage profiles of exon targets by end sequencing and shotgun sequencing. Shown are cumulative coverage profiles that sum the per-base sequencing coverage along 7,052 single-bait target exons. Only free-standing baits that were not within 500 bases of another one were included in this analysis. End sequencing of exon capture 1 with 36-base reads (a) produced a bimodal profile with high sequence coverage near and slightly beyond the ends of the 170-base baits (indicated by the horizontal bar). Shotgun sequencing of capture 2 from a different pond library (containing fragments with generic rather than Illumina-specific adapters) with 36-base reads after concatenating and re-shearing (b) gave more coverage on bait (shaded area) than near bait. Re-sequencing of capture 1 with 76-base end reads (c) had a similar effect, although the peak was slightly wider and the on-bait fraction of the peak area slightly less. Note that the scale on the Y-axis and hence the absolute peak height is different in each case. The different scales reflect the different numbers of sequenced bases which is much lower for GA-I lanes (a, b) than for a GA-II lane (c).

Figure 2

Figure 2

Coverage profiles of exon targets by end sequencing and shotgun sequencing. Shown are cumulative coverage profiles that sum the per-base sequencing coverage along 7,052 single-bait target exons. Only free-standing baits that were not within 500 bases of another one were included in this analysis. End sequencing of exon capture 1 with 36-base reads (a) produced a bimodal profile with high sequence coverage near and slightly beyond the ends of the 170-base baits (indicated by the horizontal bar). Shotgun sequencing of capture 2 from a different pond library (containing fragments with generic rather than Illumina-specific adapters) with 36-base reads after concatenating and re-shearing (b) gave more coverage on bait (shaded area) than near bait. Re-sequencing of capture 1 with 76-base end reads (c) had a similar effect, although the peak was slightly wider and the on-bait fraction of the peak area slightly less. Note that the scale on the Y-axis and hence the absolute peak height is different in each case. The different scales reflect the different numbers of sequenced bases which is much lower for GA-I lanes (a, b) than for a GA-II lane (c).

Figure 3

Figure 3

Sequence coverage along a contiguous target. Shown is base-by-base sequence coverage along a typical 11-kb segment (chr4:118635000-118646000) out of 1.7 Mb. Sequence corresponding to bait is marked in blue. Segments that had more than 40 repeat-masked bases per 170-base window were not targeted by baits and received little or no coverage with sequencing reads aligning uniquely to the genome except directly adjacent to a bait.

Figure 4

Figure 4

Normalized coverage-distribution plots. Shown is the fraction of bait-covered bases in the genome achieving coverage with uniquely aligned sequence equal or greater than the normalized coverage indicated on the X-axis. The absolute per base coverage was divided by the mean coverage of all bait positions (18 in a; 221 in b). The curve for the shotgun-sequenced exon capture (a) is steeper than the curve for the regional capture (b) indicating a less uniform representation of sequencing targets in the exon catch. Dashed lines point to the fraction of bases achieving at least half or one fifth the mean coverage.

Figure 4

Figure 4

Normalized coverage-distribution plots. Shown is the fraction of bait-covered bases in the genome achieving coverage with uniquely aligned sequence equal or greater than the normalized coverage indicated on the X-axis. The absolute per base coverage was divided by the mean coverage of all bait positions (18 in a; 221 in b). The curve for the shotgun-sequenced exon capture (a) is steeper than the curve for the regional capture (b) indicating a less uniform representation of sequencing targets in the exon catch. Dashed lines point to the fraction of bases achieving at least half or one fifth the mean coverage.

Figure 5

Figure 5

Reproducibility of hybrid selection. For each exon (n = 15,565), the ratio of the mean coverage in two independent hybrid selection experiments performed on the same source DNA (NA15510) was plotted over its mean coverage in one experiment (a). Coverage was normalized to adjust for the different number of sequencing reads. The average ratio (black line) is close to 1. Standard deviations are indicated by purple lines. The graph on the right (b) shows base-by-base sequence coverage along one target in three independent hybrid selections, two of them performed on NA15510 (purple and teal lines) and one on NA11994 source DNA (black). Note the similiarities at this fine resolution of the three profiles which were normalized to the same height. The position of target exon (ENSE00000968562) and bait is indicated by red and blue bars, respectively.

Figure 5

Figure 5

Reproducibility of hybrid selection. For each exon (n = 15,565), the ratio of the mean coverage in two independent hybrid selection experiments performed on the same source DNA (NA15510) was plotted over its mean coverage in one experiment (a). Coverage was normalized to adjust for the different number of sequencing reads. The average ratio (black line) is close to 1. Standard deviations are indicated by purple lines. The graph on the right (b) shows base-by-base sequence coverage along one target in three independent hybrid selections, two of them performed on NA15510 (purple and teal lines) and one on NA11994 source DNA (black). Note the similiarities at this fine resolution of the three profiles which were normalized to the same height. The position of target exon (ENSE00000968562) and bait is indicated by red and blue bars, respectively.

Similar articles

Cited by

References

    1. Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
    1. Shendure J, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728–1732. - PubMed
    1. Bentley DR, et al. Accurate whole genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
    1. Smith DR, et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res. 2008;18:1638–1642. - PMC - PubMed
    1. Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources