A hybrid approach for de novo human genome sequence assembly and phasing (original) (raw)

Accession codes

Primary accessions

BioProject

Sequence Read Archive

References

  1. Wheeler, D.A. & Wang, L. From human genome to cancer genome: the first decade. Genome Res. 23, 1054–1062 (2013).
    Article CAS Google Scholar
  2. Duncan, E., Brown, M. & Shore, E.M. The revolution in human monogenic disease mapping. Genes 5, 792–803 (2014).
    Article CAS Google Scholar
  3. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
    Article CAS Google Scholar
  4. Tattini, L., D'Aurizio, R. & Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol. 3, 92 (2015).
    Article Google Scholar
  5. Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
    Article CAS Google Scholar
  6. Quick, J., Quinlan, A.R. & Loman, N.J. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. GigaScience 3, 22 (2014).
    Article Google Scholar
  7. Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 25, 1750–1756 (2015).
    Article CAS Google Scholar
  8. Landolin, J. et al. Initial de novo assemblies of the D. melanogaster genome using long-read PacBio sequencing. 55th Annual Drosophila Research Conference, San Diego (2014).
  9. Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).
    Article CAS Google Scholar
  10. Chaisson, M.J.P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
    Article CAS Google Scholar
  11. Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).
  12. McCoy, R.C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9, e106689 (2014).
    Article Google Scholar
  13. Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
    Article CAS Google Scholar
  14. Williams, L.J.S. et al. Paired-end sequencing of Fosmid libraries by Illumina. Genome Res. 22, 2241–2249 (2012).
    Article CAS Google Scholar
  15. Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
    Article CAS Google Scholar
  16. Suk, E. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).
    Article CAS Google Scholar
  17. Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).
    Article CAS Google Scholar
  18. Lo, C. et al. On the design of clone-based haplotyping. Genome Biol. 14, R100 (2013).
    Article Google Scholar
  19. Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
    Article CAS Google Scholar
  20. Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
    Article CAS Google Scholar
  21. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
    Article CAS Google Scholar
  22. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).
    Article CAS Google Scholar
  23. Steinberg, K.M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).
    Article CAS Google Scholar
  24. Mak, A.C. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).
    Article CAS Google Scholar
  25. Zook, J.M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Preprint at http://biorxiv.org/content/early/2015/12/23/026468 (2015).
  26. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).
    Article Google Scholar
  27. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997v2 (2013).
  28. Harris, R.S. Improved Pairwise Alignment of Genomic DNA PhD thesis, Pennsylvania State Univ. (2007).
  29. Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotech. 34, 303–311 (2016).
    Article CAS Google Scholar
  30. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
    Article Google Scholar
  31. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
    Article CAS Google Scholar
  32. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
    Article CAS Google Scholar
  33. Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
    Article Google Scholar

Download references

Acknowledgements

This work was supported in part by R01 HG005946 (P.-Y.K.). The DNA sample was obtained from the Coriell Institute for Medical Research, and the Illumina sequence data were obtained from the US National Institute of Standards and Technology (NIST). We thank the expert sequencing staff at the Institute for Human Genetics at UCSF for generating some of the sequencing data.

Author information

Authors and Affiliations

  1. Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA.,
    Yulia Mostovoy, Michal Levy-Sakin, Jessica Lam, Catherine Chu, Chin Lin & Pui-Yan Kwok
  2. BioNano Genomics, Inc., San Diego, California, USA
    Ernest T Lam, Alex R Hastie, Joyce Lee, Željko Džakula & Han Cao
  3. 10X Genomics, Inc., Pleasanton, California, USA
    Patrick Marks, Kristina Giorda & Michael Schnall-Levin
  4. Department of Molecular and Cell Biology, University of Cape Town, Cape Town, South Africa
    Stephen A Schlebusch
  5. Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA.,
    Jeffrey D Wall & Pui-Yan Kwok
  6. Department of Dermatology, University of California, San Francisco, San Francisco, California, USA.,
    Pui-Yan Kwok

Authors

  1. Yulia Mostovoy
    You can also search for this author inPubMed Google Scholar
  2. Michal Levy-Sakin
    You can also search for this author inPubMed Google Scholar
  3. Jessica Lam
    You can also search for this author inPubMed Google Scholar
  4. Ernest T Lam
    You can also search for this author inPubMed Google Scholar
  5. Alex R Hastie
    You can also search for this author inPubMed Google Scholar
  6. Patrick Marks
    You can also search for this author inPubMed Google Scholar
  7. Joyce Lee
    You can also search for this author inPubMed Google Scholar
  8. Catherine Chu
    You can also search for this author inPubMed Google Scholar
  9. Chin Lin
    You can also search for this author inPubMed Google Scholar
  10. Željko Džakula
    You can also search for this author inPubMed Google Scholar
  11. Han Cao
    You can also search for this author inPubMed Google Scholar
  12. Stephen A Schlebusch
    You can also search for this author inPubMed Google Scholar
  13. Kristina Giorda
    You can also search for this author inPubMed Google Scholar
  14. Michael Schnall-Levin
    You can also search for this author inPubMed Google Scholar
  15. Jeffrey D Wall
    You can also search for this author inPubMed Google Scholar
  16. Pui-Yan Kwok
    You can also search for this author inPubMed Google Scholar

Contributions

P.-Y.K., J.D.W., and Y.M. conceived the project and provided resources and oversight for sequencing and algorithmic analysis. K.G. prepared long libraries for 10XG GemCode sequencing. C.C. and C.L. performed long DNA preparation and BNG genome mapping experiments. E.T.L., A.R.H., Ž.D., J.Lee, and H.C. built initial genome maps and performed BNG alignment and structural variant calling. Y.M. and J.Lam performed scaffold analysis. E.T.L., A.R.H., and J.Lee performed hybrid genome assembly. P.M., K.G., and M.S.-L. performed scaffold phasing. Y.M., M.L.-S., E.T.L., J.Lam, J.Lee, and S.A.S. performed validation and quality measure analyses of the assembled data. Y.M., E.T.L., M.L.-S., and P.-Y.K. primarily wrote the manuscript and revisions, though many coauthors provided edits and Online Methods sections.

Corresponding author

Correspondence toPui-Yan Kwok.

Ethics declarations

Competing interests

E.T.L., A.R.H., J.Lee, Ž.D., and H.C. are employees of BioNano Genomics. P.M., K.G., and M.S.-L. are employees of 10X Genomics, and P.-Y.K. is on the scientific advisory board of BioNano Genomics.

Integrated supplementary information

Supplementary Figure 1 Ideograms showing scaffold boundaries and segmental duplication locations.

Blue lines mark the boundaries of assembly scaffolds. Black marks show the locations of segmental duplications. Magenta regions mark unassembled regions around the centromeres and telomeres. Ideogram were generated using The Genome Decoration Page, NCBI.

Supplementary Figure 2 Architecture of complex regions at the MHC and Amylase loci.

(a) MHC region (chr6: 28-32 Mb). Upper panel: green bar = reference, blue bars = hybrid assembly (bottom). Bottom panel: green phase blocks separated by SNVs in the hybrid assembly in the middle. (b) Amylase region (chr1: 160-163 Mb). Top panel: green bar = reference, blue bars = assemby. Assembly in the red box expanded to show nicking pattern in 450 kb region (bottom panel). (c) Haplotypes in increasing resolution to show alleles on the same phase block (green line = allele 1, grey line = allele 2).

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Mostovoy, Y., Levy-Sakin, M., Lam, J. et al. A hybrid approach for de novo human genome sequence assembly and phasing.Nat Methods 13, 587–590 (2016). https://doi.org/10.1038/nmeth.3865

Download citation