Haplotyping germline and cancer genomes with high-throughput linked-read sequencing - PubMed (original) (raw)

doi: 10.1038/nbt.3432. Epub 2016 Feb 1.

Billy T Lau 2, Michael Schnall-Levin 1, Mirna Jarosz 1, John M Bell 2, Christopher M Hindson 1, Sofia Kyriazopoulou-Panagiotopoulou 1, Donald A Masquelier 1, Landon Merrill 1, Jessica M Terry 1, Patrice A Mudivarti 1, Paul W Wyatt 1, Rajiv Bharadwaj 1, Anthony J Makarewicz 1, Yuan Li 1, Phillip Belgrader 1, Andrew D Price 1, Adam J Lowe 1, Patrick Marks 1, Gerard M Vurens 1, Paul Hardenbol 1, Luz Montesclaros 1, Melissa Luo 1, Lawrence Greenfield 1, Alexander Wong 1, David E Birch 1, Steven W Short 1, Keith P Bjornson 1, Pranav Patel 1, Erik S Hopmans 2, Christina Wood 3, Sukhvinder Kaur 1, Glenn K Lockwood 1, David Stafford 1, Joshua P Delaney 1, Indira Wu 1, Heather S Ordonez 1, Susan M Grimes 2, Stephanie Greer 3, Josephine Y Lee 1, Kamila Belhocine 1, Kristina M Giorda 1, William H Heaton 1, Geoffrey P McDermott 1, Zachary W Bent 1, Francesca Meschi 1, Nikola O Kondov 1, Ryan Wilson 1, Jorge A Bernate 1, Shawn Gauby 1, Alex Kindwall 1, Clara Bermejo 1, Adrian N Fehr 1, Adrian Chan 1, Serge Saxonov 1, Kevin D Ness 1, Benjamin J Hindson 1, Hanlee P Ji 2 3

Affiliations

Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

Grace X Y Zheng et al. Nat Biotechnol. 2016 Mar.

Abstract

Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The following authors, as listed by initials, are employees of 10× Genomics: G.X.Y.Z., M.S., M.J., C.M.H., S.K.P, D.A.M., L.M., J.M.T., P.A.M., P.W.W., R.B., A.J.M., Y.L., P.B., A.D.P., A.J.L., P.J.M, G.M.V., P.H., L.M., M.L., L.G., A.W., D.E.B., S.W.S., K.P.B., P.V. P., S.K., G.K.L., D.L.S., J.P.D., I.W., H.S.O, J.Y.L., Z.K.B., K.M.G, W.H.H., G.P.D., Z.W.B., F.M., N.O.K., R.T.W., J.A.B., S.P.G., A.P.K., C.B., A.N.F., A.C., S.S., K.D.N., B.J.H.

Figures

Figure 1

Figure 1. Overview of the technology for generating linked-reads

(a) Gel beads loaded with primers and barcoded oligonucleotides are first mixed with DNA and enzyme mixture, and subsequently mixed with oil-surfactant solution at a microfluidic “double-cross” junction. Gel bead-containing droplets flow to a reservoir where gel beads are dissolved, initiating whole genome primer extension. The products are pooled from each droplet. The final library preparation requires shearing the libraries and incorporation of Illumina adapters. (b) Top panel, linked-reads of the ALK gene from the NA12878 WGS sample. Each line represents linked-reads with the same barcode, with dots representing reads, and color depicting reads with different barcodes. Middle panel, blue blocks showing exon boundaries of the ALK gene. Bottom panel, linked-reads of ALK gene from the NA12878 exome data. Although there are only reads in exon regions, reads from neighboring exons are linked because of common barcodes. Only a very small fraction of linked-reads is presented here to conserve space.

Figure 2

Figure 2. Phasing performance of NA12878 trio analysis

(a) Length-weighted molecule size histogram of the trio WGS data. The Y-axis represents the DNA mass in the molecule length bin, which is calculated to be the product of fraction of molecules in the length bin and median of the length bin. (b) Cumulative distribution function of phase block length of the trio WGS samples. (c) Phasing accuracy. For all pairs of SNVs that are on the same phasing block, the probability of correct phasing of a pair is plotted as a function of its distance. The insert shows SNV pairs that are at least 0.1Mb away from each other. (d) Haplotype blocks of LRRK2 gene of the trio exome libraries, demonstrating Mendelian inheritance. While most of this gene is phased in all trio samples, the beginning of the gene is not phased (represented by SNVs not as part of the haplotype block). For this gene, NA12882 (child) inherited one allele from Haplotype 2 from NA12877 (father), and Haplotype 1 from NA12878 (mother). Grey bars in the phase blocks represent reference alleles, and green bars represent alternative alleles.

Figure 3

Figure 3. Detecting genomic deletions in NA12878

(a) Heat map of overlapping barcodes is plotted for a deletion on Chr6: 78,967,194 – 79,036,419 in NA12878 (top). The areas circled in black represent overlapping barcodes near the breakpoints. The deletion is not observed in NA12882, and the heap map of barcodes in the same region is shown in the bottom as a negative control. (b) linked-read data of NA12878 WGS sample spanning Chr6: 78,967,194 – 79,036,419. Each line represents linked-reads with the same barcode, with dots representing reads, and color depicting reads with different barcodes. Dashed vertical black lines represent the breakpoints. The top panel represents the haplotype without a deletion. In this case, overlapping barcodes will only be observed in contiguous regions. Bottom panel represents the haplotype with a deletion, as shown by the gap in the linked-reads. In contrast to regions without a deletion, barcodes in the region before the gap will overlap with barcodes in the region after the gap. (c) Summary of 8 deletion candidates, including supporting evidence from overlapping barcode count, phasing of the deletion breakpoints, and inheritance support in NA12882. While all 5 high scoring SV candidates have support from each type of evidence, two of three lower scoring SV candidates lack support from any evidence included targeted sequencing. Haplotype assignment in one phase block is not necessarily the same as the haplotype assignment in a different phase block.

Figure 4

Figure 4. Rearrangement detection of an EML4-ALK gene fusion from exome sequencing of NCI-H2228

(a) Overlap of barcodes between exons 2–16 of ALK and exons 2–6 of EML4. The heat map depicts the number of overlapping barcodes. Blue bars on ALK and EML4 represent exons. (b) Overlap of barcodes between exon 1 of ALK and exons 7–16 of EML4. (c) Overlap of barcodes between exons 10–11 of ALK and 5′ half of PTPN3. (d) Barcode counts in ALK region of NCI-H2228 WGS sample. The top blue bar represents the schematics of ALK gene structure, with e1, e2, and e10 denoting exon 1, exon 2, and exon 10, respectively. (e) Schematics illustrating complex chromosomal rearrangement involving ALK, EML4 and PTPN3. Instead of seeing the simple inversion reported in the literature, we observed a deletion, an inversion of ALK on Chr2 with EML4, and an insertion of ALK into PTPN3 on Chr9. (f) Phasing support around ALK and PTPN3 breakpoints in EML4-ALK, and ALK-PTPN3 gene fusion. Haplotype assignment in one phase block is not necessarily the same as the haplotype assignment in a different phase block.

Figure 5

Figure 5. Phasing analysis of a primary colon cancer genome and structure of the TP53 driver event

(a) Length-weighted molecule size histogram of Patient 1532 Normal (green) and Tumor (purple) samples. (b) Cumulative distribution function of phase block length of the Normal and Tumor pair. (c) Phased haplotype block showing TP53 C to T mutation on Haplotype 2. (d) Minor allele fraction of the tumor sample (relative to the matched normal) on Chr 17. There is a deletion in the p-arm of Chr 17 in the tumor sample with a copy number of 1. (e) Barcode count throughout Chr 17 between the tumor (blue) and matched normal (grey). The red box depicts the region where TP53 is located. (f) Phasing analysis of TP53 between tumor and matched normal. Left, Ratio of SNV counts between Tumor and Normal in TP53 region, Haplotype 1 in red and Haplotype 2 in black. Right, Density of SNV ratios of Haplotype 1 and Haplotype 2. Whereas the SNV density centers around 1 for Haplotype 2, most SNV ratios between Tumor and Normal is only 0.5 on Haplotype 2, indicating that LOH is on Haplotype 2.

Comment in

Similar articles

Cited by

References

    1. Kitzman JO, et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol. 2011;29:59–63. - PMC - PubMed
    1. Adey A, et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature. 2013;500:207–211. - PMC - PubMed
    1. Genomes Project C et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed
    1. Suk EK, et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 2011;21:1672–1685. - PMC - PubMed
    1. Duitama J, et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 2012;40:2041–2053. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources