Large scale library generation for high throughput sequencing - PubMed (original) (raw)

Large scale library generation for high throughput sequencing

Erik Borgström et al. PLoS One. 2011.

Abstract

Background: Large efforts have recently been made to automate the sample preparation protocols for massively parallel sequencing in order to match the increasing instrument throughput. Still, the size selection through agarose gel electrophoresis separation is a labor-intensive bottleneck of these protocols.

Methodology/principal findings: In this study a method for automatic library preparation and size selection on a liquid handling robot is presented. The method utilizes selective precipitation of certain sizes of DNA molecules on to paramagnetic beads for cleanup and selection after standard enzymatic reactions.

Conclusions/significance: The method is used to generate libraries for de novo and re-sequencing on the Illumina HiSeq 2000 instrument with a throughput of 12 samples per instrument in approximately 4 hours. The resulting output data show quality scores and pass filter rates comparable to manually prepared samples. The sample size distribution can be adjusted for each application, and are suitable for all high throughput DNA processing protocols seeking to control size intervals.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Flowchart of the sample preparation.

A) Steps a through e explain the main steps in Illumina sample preparation, a) the initial genomic DNA, b) fragmentation of genomic DNA, c) end repair, d) addition of A bases to the fragment ends and e) ligation of the adaptors to the fragments. B) Overview of the automated the size selection protocol presented here. The first precipitation discards fragments larger than the desired interval. The second precipitation selects all fragments larger than the lower boundary of the desired interval.

Figure 2

Figure 2. Average base call quality per cycle.

Quality scores per cycle of 30 HiSeq 2000 lanes sequenced with manually (grey) and automatically (red) prepared spruce samples.

Figure 3

Figure 3. Automated size selection method.

Six different size intervals were selected from the same fragmented sample pool (red) resulting in discrete population sizes ranging between 200–700 bp in average length and about 100 bp wide.

Figure 4

Figure 4. Size distribution of libraries.

a. Bioanalyzer traces of generated libraries. Lane 4, 5, 7 and 8 correspond to libraries generated using the automatic size selection protocol. Lane 6 (blue) has been prepared using ordinary agarose gel selection. b. Insert size distributions of human cancer cell line libraries (lane 5, 6 and 8) acquired after mapping the reads to the human genome.

Similar articles

Cited by

References

    1. Pettersson E, Lundeberg J, Ahmadian A. Generations of sequencing technologies. Genomics. 2009;93:105–111. - PubMed
    1. Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010;11:31–46. - PubMed
    1. Pelak K, Shianna KV, Ge D, Maia JM, Zhu M, et al. The characterization of twenty sequenced human genomes. PLoS Genet. 2010;6 - PMC - PubMed
    1. Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PubMed
    1. Li Y, Vinckenbosch N, Tian G, Huerta-Sanchez E, Jiang T, et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet. 2010;42:969–972. - PubMed

Publication types

MeSH terms

LinkOut - more resources