Exome sequencing of a multigenerational human pedigree - PubMed (original) (raw)

. 2009 Dec 14;4(12):e8232.

doi: 10.1371/journal.pone.0008232.

Dan Burges, Eric Powell, Cherylyn Almonte, Jia Huang, Stuart Young, Benjamin Boese, Mike Schmidt, Margaret A Pericak-Vance, Eden Martin, Xinmin Zhang, Timothy T Harkins, Stephan Züchner

Affiliations

Exome sequencing of a multigenerational human pedigree

Dale J Hedges et al. PLoS One. 2009.

Erratum in

Abstract

Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or approximately 180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of > or = 3, 86% at a read depth of > or = 10, and over 50% of all targets were covered with > or = 20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at > or = 10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered > or = 8x. Our results offer guidance for "real-world" applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Roche provided complementary reagents and services for conducting some components of this study. The academic members of this study have no conflicting financial interests to declare, such as consulting fees, employment with a commercial partner, or similar. The manuscript was prepared solely under the direction of the academic authors. They adhere to all PLoS ONE policies on sharing data and materials, as detailed online in the guide for authors http://www.plosone.org/static/policies.action#sharing. Authors Benjamin Boese, Xinmin Zhang, and Timothy Harkins are employed by Roche, Inc., which develops and sells next generation sequencing and genome capture technology.

Figures

Figure 1

Figure 1. Studied three-generational pedigree.

Pedigree of eight individuals of European descent that was studied with exome capture arrays.

Figure 2

Figure 2. Sequence coverage of targeted exons.

The graph illustrates the cumulative coverage of targeted bases after sequencing 0.5 Gbp (red), 1 Gbp (blue), 1.5 Gbp (green), and 2 Gbp (purple). 1 Gb resulted in nearly 10x coverage of 50% of all targets; 2 Gb of data increase this number to 88%. Depending on a studies goal, maximum coverage might not always be required.

Figure 3

Figure 3. Estimated error rates.

Sensitivity of genotype calling based on HCDiff SNPs, AllDiff SNPs, and the proposed coverage-dependent genotype calling approach. A) False negative rates are based on concordance with a subset of 44,513 SNPs that overlapped with genotypes obtained with Illumina 1 M Duo BeadChips. The coverage-dependent variant calling approach that calibrates cut-off rates according to array-based genotypes is the most sensitive method, detecting >96% of SNPs at 5x coverage and >99% of all SNPs at ≥8x coverage. B) False positive rates. HCDiff is the most conservative algorithm, resulting in a smaller false positive rate, while the more relaxed dynamic genotype calling algorithm results in twice as high error rates at lower coverage.

Figure 4

Figure 4. Variant read distribution across eight exomes.

Illustration of the dynamic nature of optimal cut-off rates for calling heterozygous/homozygous variants. At lower coverage (<10x) the ideal cut-off is 88% variant reads in our data, while it is 78% at coverage ≥20. Optimal usage of data should take advantage even of low covered targets. Data are based on comparison to Illumina genotyped SNPs. Green triangles: Illumina heterozygous genotypes, Blue diamonds: Illumina homozygous genotypes. NGS genotypes are placed according to their percent variant reads (y axis).

Similar articles

Cited by

References

    1. Li JB, Gao Y, Aach J, Zhang K, Kryukov GV, et al. Multiplex padlock targeted sequencing reveal human hypermutable CpG variations. Genome Res. 2009;19(9):1606–15. - PMC - PubMed
    1. Taly V, Kelly BT, Griffiths AD. Droplets as microreactors for high-throughput biology. Chembiochem. 2007;8(3):263–272. - PubMed
    1. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, et al. Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007;39(12):1522–1527. - PubMed
    1. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27(2):182–189. - PMC - PubMed
    1. Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D, et al. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protoc. 2009;4(6):960–974. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources