A hybrid approach for de novo human genome sequence assembly and phasing (original) (raw)
Accession codes
Primary accessions
BioProject
Sequence Read Archive
References
- Wheeler, D.A. & Wang, L. From human genome to cancer genome: the first decade. Genome Res. 23, 1054–1062 (2013).
Article CAS Google Scholar - Duncan, E., Brown, M. & Shore, E.M. The revolution in human monogenic disease mapping. Genes 5, 792–803 (2014).
Article CAS Google Scholar - Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
Article CAS Google Scholar - Tattini, L., D'Aurizio, R. & Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol. 3, 92 (2015).
Article Google Scholar - Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
Article CAS Google Scholar - Quick, J., Quinlan, A.R. & Loman, N.J. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. GigaScience 3, 22 (2014).
Article Google Scholar - Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 25, 1750–1756 (2015).
Article CAS Google Scholar - Landolin, J. et al. Initial de novo assemblies of the D. melanogaster genome using long-read PacBio sequencing. 55th Annual Drosophila Research Conference, San Diego (2014).
- Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).
Article CAS Google Scholar - Chaisson, M.J.P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Article CAS Google Scholar - Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).
- McCoy, R.C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9, e106689 (2014).
Article Google Scholar - Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Article CAS Google Scholar - Williams, L.J.S. et al. Paired-end sequencing of Fosmid libraries by Illumina. Genome Res. 22, 2241–2249 (2012).
Article CAS Google Scholar - Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
Article CAS Google Scholar - Suk, E. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).
Article CAS Google Scholar - Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).
Article CAS Google Scholar - Lo, C. et al. On the design of clone-based haplotyping. Genome Biol. 14, R100 (2013).
Article Google Scholar - Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
Article CAS Google Scholar - Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Article CAS Google Scholar - Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Article CAS Google Scholar - Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).
Article CAS Google Scholar - Steinberg, K.M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).
Article CAS Google Scholar - Mak, A.C. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).
Article CAS Google Scholar - Zook, J.M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Preprint at http://biorxiv.org/content/early/2015/12/23/026468 (2015).
- Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).
Article Google Scholar - Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997v2 (2013).
- Harris, R.S. Improved Pairwise Alignment of Genomic DNA PhD thesis, Pennsylvania State Univ. (2007).
- Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotech. 34, 303–311 (2016).
Article CAS Google Scholar - Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar - Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar - Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar - Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Acknowledgements
This work was supported in part by R01 HG005946 (P.-Y.K.). The DNA sample was obtained from the Coriell Institute for Medical Research, and the Illumina sequence data were obtained from the US National Institute of Standards and Technology (NIST). We thank the expert sequencing staff at the Institute for Human Genetics at UCSF for generating some of the sequencing data.
Author information
Authors and Affiliations
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA.,
Yulia Mostovoy, Michal Levy-Sakin, Jessica Lam, Catherine Chu, Chin Lin & Pui-Yan Kwok - BioNano Genomics, Inc., San Diego, California, USA
Ernest T Lam, Alex R Hastie, Joyce Lee, Željko Džakula & Han Cao - 10X Genomics, Inc., Pleasanton, California, USA
Patrick Marks, Kristina Giorda & Michael Schnall-Levin - Department of Molecular and Cell Biology, University of Cape Town, Cape Town, South Africa
Stephen A Schlebusch - Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA.,
Jeffrey D Wall & Pui-Yan Kwok - Department of Dermatology, University of California, San Francisco, San Francisco, California, USA.,
Pui-Yan Kwok
Authors
- Yulia Mostovoy
You can also search for this author inPubMed Google Scholar - Michal Levy-Sakin
You can also search for this author inPubMed Google Scholar - Jessica Lam
You can also search for this author inPubMed Google Scholar - Ernest T Lam
You can also search for this author inPubMed Google Scholar - Alex R Hastie
You can also search for this author inPubMed Google Scholar - Patrick Marks
You can also search for this author inPubMed Google Scholar - Joyce Lee
You can also search for this author inPubMed Google Scholar - Catherine Chu
You can also search for this author inPubMed Google Scholar - Chin Lin
You can also search for this author inPubMed Google Scholar - Željko Džakula
You can also search for this author inPubMed Google Scholar - Han Cao
You can also search for this author inPubMed Google Scholar - Stephen A Schlebusch
You can also search for this author inPubMed Google Scholar - Kristina Giorda
You can also search for this author inPubMed Google Scholar - Michael Schnall-Levin
You can also search for this author inPubMed Google Scholar - Jeffrey D Wall
You can also search for this author inPubMed Google Scholar - Pui-Yan Kwok
You can also search for this author inPubMed Google Scholar
Contributions
P.-Y.K., J.D.W., and Y.M. conceived the project and provided resources and oversight for sequencing and algorithmic analysis. K.G. prepared long libraries for 10XG GemCode sequencing. C.C. and C.L. performed long DNA preparation and BNG genome mapping experiments. E.T.L., A.R.H., Ž.D., J.Lee, and H.C. built initial genome maps and performed BNG alignment and structural variant calling. Y.M. and J.Lam performed scaffold analysis. E.T.L., A.R.H., and J.Lee performed hybrid genome assembly. P.M., K.G., and M.S.-L. performed scaffold phasing. Y.M., M.L.-S., E.T.L., J.Lam, J.Lee, and S.A.S. performed validation and quality measure analyses of the assembled data. Y.M., E.T.L., M.L.-S., and P.-Y.K. primarily wrote the manuscript and revisions, though many coauthors provided edits and Online Methods sections.
Corresponding author
Correspondence toPui-Yan Kwok.
Ethics declarations
Competing interests
E.T.L., A.R.H., J.Lee, Ž.D., and H.C. are employees of BioNano Genomics. P.M., K.G., and M.S.-L. are employees of 10X Genomics, and P.-Y.K. is on the scientific advisory board of BioNano Genomics.
Integrated supplementary information
Supplementary Figure 1 Ideograms showing scaffold boundaries and segmental duplication locations.
Blue lines mark the boundaries of assembly scaffolds. Black marks show the locations of segmental duplications. Magenta regions mark unassembled regions around the centromeres and telomeres. Ideogram were generated using The Genome Decoration Page, NCBI.
Supplementary Figure 2 Architecture of complex regions at the MHC and Amylase loci.
(a) MHC region (chr6: 28-32 Mb). Upper panel: green bar = reference, blue bars = hybrid assembly (bottom). Bottom panel: green phase blocks separated by SNVs in the hybrid assembly in the middle. (b) Amylase region (chr1: 160-163 Mb). Top panel: green bar = reference, blue bars = assemby. Assembly in the red box expanded to show nicking pattern in 450 kb region (bottom panel). (c) Haplotypes in increasing resolution to show alleles on the same phase block (green line = allele 1, grey line = allele 2).
Supplementary information
Source data
Rights and permissions
About this article
Cite this article
Mostovoy, Y., Levy-Sakin, M., Lam, J. et al. A hybrid approach for de novo human genome sequence assembly and phasing.Nat Methods 13, 587–590 (2016). https://doi.org/10.1038/nmeth.3865
- Received: 12 January 2016
- Accepted: 08 April 2016
- Published: 09 May 2016
- Issue Date: July 2016
- DOI: https://doi.org/10.1038/nmeth.3865