Genetic variation and the de novo assembly of human genomes (original) (raw)
Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature491, 56–65 (2012).
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet.45, 1113–1120 (2013). PubMedPubMed Central Google Scholar
Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet.42, 30–35 (2010). CASPubMed Google Scholar
Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature470, 59–65 (2011). CASPubMedPubMed Central Google Scholar
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature517, 608–611 (2015). Long-read sequencing paired with local assembly reveals structural variation and closes or extends ~50% of the gaps in the reference human genome. CASPubMed Google Scholar
Steinberg, K. M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet.44, 872–880 (2012). High-quality sequencing of the 17q21.31 region reveals a complex haplotype polymorphic region in which certain structural haplotypes predispose for disease. CASPubMedPubMed Central Google Scholar
Boettger, L. M., Handsaker, R. E., Zody, M. C. & McCarroll, S. A. Structural haplotypes and recent evolution of the human 17q21.31 region. Nat. Genet.44, 881–885 (2012). Uses population genetics to infer the architecture and evolutionary history of chromosome 17q21.31 haplotypes.References 7 and 8 show a rapid rise of a particular inverted haplotype in European and Middle Eastern individuals that is consistent with adaptive selection. CASPubMedPubMed Central Google Scholar
Dennis, M. Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell149, 912–922 (2012). Shows that genes potentially responsible for unique aspects of human neuronal development were missing from the reference human genome, highlighting the importance of focusing on obtaining higher-quality reference sequences. CASPubMedPubMed Central Google Scholar
Motahari, A. S., Bresler, G. & Tse, D. N. C. Information theory of DNA shotgun sequencing. IEEE Trans. Inf. Theory59, 6273–6289 (2013). Google Scholar
Myers, E. W. et al. A whole-genome assembly of Drosophila. Science287, 2196–2204 (2000). CASPubMed Google Scholar
Weber, J. L. & Myers, E. W. Human whole-genome shotgun sequencing. Genome Res.7, 401–409 (1997). CASPubMed Google Scholar
Church, D. M. et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol.7, e1000112 (2009). PubMedPubMed Central Google Scholar
Myers, E. W. The fragment assembly string graph. Bioinformatics21, ii79–ii85 (2005). CASPubMed Google Scholar
Nagarajan, N. & Pop, M. Sequence assembly demystified. Nat. Rev. Genet.14, 157–167 (2013). A review of algorithmic details of fragment assembly. CASPubMed Google Scholar
Myers, E. W. Toward simplifying and accurately formulating fragment assembly. J. Comput. Biol.2, 275–290 (1995). CASPubMed Google Scholar
Huang, X., Wang, J., Aluru, S., Yang, S.-P. & Hillier, L. PCAP: a whole-genome assembly program. Genome Res.13, 2164–2170 (2003). CASPubMedPubMed Central Google Scholar
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA108, 1513–1518 (2011). CASPubMed Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience1, 18 (2012). PubMedPubMed Central Google Scholar
Pevzner, P. A., Tang, H. & Tesler, G. De novo repeat classification and fragment assembly. Genome Res.14, 1786–1796 (2004). CASPubMedPubMed Central Google Scholar
Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods10, 563–569 (2013). Describes the method of correcting sequencing error in long SMRT sequences with short SMRT sequences so that they may be assembled using the Celera assembler and consensus called with the Quiver method. CASPubMed Google Scholar
Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol.33, 623–630 (2015). Introduces one of the first SMS assemblers. Draft genomes on par with the original human draft sequence may be efficiently assembled with SMS reads. CASPubMed Google Scholar
Myers, G. in Algorithms in Bioinformatics (eds Raphael, B. & Tang, J.) 52–67 (Springer, 2014).
Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods12, 733–735 (2015). CASPubMed Google Scholar
Dilthey, A., Cox, C., Iqbal, Z., Nelson, M. R. & McVean, G. Improved genome inference in the MHC using a population reference graph. Nat. Genet.47, 682–688 (2015). The first practical study using a graphical representation of the genome to encode the structural diversity of the major histocompatibility complex region. CASPubMedPubMed Central Google Scholar
Yim, H. S. et al. Minke whale genome and aquatic adaptation in cetaceans. Nat. Genet.46, 88–92 (2014). CASPubMed Google Scholar
Parker, J. et al. Genome-wide signatures of convergent evolution in echolocating mammals. Nature502, 228–231 (2013). CASPubMed Google Scholar
Zhang, G. et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science346, 1311–1320 (2014). CASPubMedPubMed Central Google Scholar
Dong, Y. et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat. Biotechnol.31, 135–141 (2013). CASPubMed Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature409, 860–921 (2001). CASPubMed Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics23, 1061–1067 (2007). CASPubMed Google Scholar
Venter, J. C. et al. The sequence of the human genome. Science291, 1304–1351 (2001). CASPubMed Google Scholar
Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl Acad. Sci. USA101, 1916–1921 (2004). CASPubMed Google Scholar
Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics2, 231–239 (1988). CASPubMed Google Scholar
Alkan, C., Sajjadian, S. & Eichler, E. E. Limitations of next-generation genome sequence assembly. Nat. Methods8, 61–65 (2011). CASPubMed Google Scholar
Sharp, A. J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet.77, 78–88 (2005). CASPubMedPubMed Central Google Scholar
Antonacci, F. et al. Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability. Nat. Genet.46, 1293–1302 (2014). CASPubMedPubMed Central Google Scholar
Pyo, C. W. et al. Recombinant structures expand and contract inter and intragenic diversification at the KIR locus. BMC Genomics14, 89 (2013). CASPubMedPubMed Central Google Scholar
Altemose, N., Miga, K. H., Maggioni, M. & Willard, H. F. Genomic characterization of large heterochromatic gaps in the human genome assembly. PLoS Comput. Biol.10, e1003628 (2014). PubMedPubMed Central Google Scholar
Eichler, E. E. Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet.17, 661–669 (2001). CASPubMed Google Scholar
Li, H. Towards better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics30, 2843–2851 (2014). CASPubMedPubMed Central Google Scholar
Fuchshuber, A. et al. Refinement of the gene locus for autosomal dominant medullary cystic kidney disease type 1 (MCKD1) and construction of a physical and partial transcriptional map of the region. Genomics72, 278–284 (2001). CASPubMed Google Scholar
Kirby, A. et al. Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat. Genet.45, 299–303 (2013). CASPubMedPubMed Central Google Scholar
Renton, A. E. et al. A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron72, 257–268 (2011). CASPubMedPubMed Central Google Scholar
DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron72, 245–256 (2011). CASPubMedPubMed Central Google Scholar
Eichler, E. E. et al. Haplotype and interspersion analysis of the FMR1 CGG repeat identifies two different mutational pathways for the origin of the fragile X syndrome. Hum. Mol. Genet.5, 319–330 (1996). CASPubMed Google Scholar
Lemmers, R. J. et al. Digenic inheritance of an SMCHD1 mutation and an FSHD-permissive D4Z4 allele causes facioscapulohumeral muscular dystrophy type 2. Nat. Genet.44, 1370–1374 (2012). CASPubMedPubMed Central Google Scholar
Ryan, D. P. et al. Mutations in potassium channel Kir2.6 cause susceptibility to thyrotoxic hypokalemic periodic paralysis. Cell140, 88–98 (2010). CASPubMedPubMed Central Google Scholar
Kidd, J. M. et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell143, 837–847 (2010). CASPubMedPubMed Central Google Scholar
Genovese, G. et al. Using population admixture to help complete maps of the human genome. Nat. Genet.45, 406–414 (2013). CASPubMedPubMed Central Google Scholar
Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol.28, 57–63 (2010). Describes how the draft assembly of a personal genome using MPS uncovered 19–40 Mb of sequence missing from the reference. CASPubMed Google Scholar
Falchi, M. et al. Low copy number of the salivary amylase gene predisposes to obesity. Nat. Genet.46, 492–497 (2014). CASPubMedPubMed Central Google Scholar
Yang, Y. et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am. J. Hum. Genet.80, 1037–1054 (2007). CASPubMedPubMed Central Google Scholar
Shen, S., Pyo, C. W., Vu, Q., Wang, R. & Geraghty, D. E. The essential detail: the genetics and genomics of the primate immune response. ILAR J.54, 181–195 (2013). CASPubMed Google Scholar
Hollox, E. J. & Hoh, B. P. Human gene copy number variation and infectious disease. Hum. Genet.133, 1217–1233 (2014). CASPubMed Google Scholar
Usher, C. L. et al. Structural forms of the human amylase locus and their relationships to SNPs, haplotypes and obesity. Nat. Genet.47, 921–925 (2015). CASPubMedPubMed Central Google Scholar
Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet.37, 129–137 (2005). CASPubMed Google Scholar
Koolen, D. A. et al. A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism. Nat. Genet.38, 999–1001 (2006). CASPubMed Google Scholar
Charrier, C. et al. Inhibition of SRGAP2 function by its human-specific paralogs induces neoteny during spine maturation. Cell149, 923–935 (2012). CASPubMedPubMed Central Google Scholar
Florio, M. et al. Human-specific gene ARHGAP11B promotes basal progenitor amplification and neocortex expansion. Science347, 1465–1470 (2015). CASPubMed Google Scholar
Chikhi, R. & Rizk, G. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol.8, 22 (2013). PubMedPubMed Central Google Scholar
Simpson, J. T. & Durbin, R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res.22, 549–556 (2012). CASPubMedPubMed Central Google Scholar
Weisenfeld, N. I. et al. Comprehensive variation discovery in single human genomes. Nat. Genet.46, 1350–1355 (2014). Shows that MPS deduces more variation than do resequencing methods. CASPubMedPubMed Central Google Scholar
Li, H. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics28, 1838–1844 (2012). CASPubMedPubMed Central Google Scholar
Nurk, S. et al. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J. Comput. Biol.20, 714–737 (2013). CASPubMedPubMed Central Google Scholar
Howe, A. C. et al. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl Acad. Sci. USA111, 4904–4909 (2014). CASPubMed Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol.19, 455–477 (2012). CASPubMedPubMed Central Google Scholar
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res.24, 2041–2049 (2014). CASPubMedPubMed Central Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol.31, 1119–1125 (2013). CASPubMedPubMed Central Google Scholar
Selvaraj, S., Dixon, J. R., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol.31, 1111–1118 (2013). CASPubMedPubMed Central Google Scholar
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet.46, 1343–1349 (2014). CASPubMedPubMed Central Google Scholar
Snyder, M. W., Adey, A., Kitzman, J. O. & Shendure, J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet.16, 344–358 (2015). CASPubMed Google Scholar
Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl Acad. Sci. USA110, 5552–5557 (2013). CASPubMed Google Scholar
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol.32, 261–266 (2014). CASPubMedPubMed Central Google Scholar
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol.29, 59–63 (2011). CASPubMed Google Scholar
Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife2, e00569 (2013). PubMedPubMed Central Google Scholar
McCoy, R. C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS ONE9, e106689 (2014). PubMedPubMed Central Google Scholar
Schwartz, D. C. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science262, 110–114 (1993). CASPubMed Google Scholar
Onmus-Leone, F. et al. Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping. PLoS ONE8, e61762 (2013). CASPubMedPubMed Central Google Scholar
Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol.30, 771–776 (2012). CASPubMed Google Scholar
O'Bleness, M. et al. Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome. BMC Genomics15, 387 (2014). PubMedPubMed Central Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science323, 133–138 (2009). CASPubMed Google Scholar
Rosenstein, J. K., Wanunu, M., Merchant, C. A., Drndic, M. & Shepard, K. L. Integrated nanopore sensing platform with sub-microsecond temporal resolution. Nat. Methods9, 487–492 (2012). CASPubMed Google Scholar
Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res.19, 1270–1278 (2009). CASPubMedPubMed Central Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics28, i333–i339 (2012). CASPubMedPubMed Central Google Scholar
Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet.43, 269–276 (2011). CASPubMedPubMed Central Google Scholar
Sharp, A. J. et al. A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures. Nat. Genet.40, 322–328 (2008). CASPubMedPubMed Central Google Scholar
Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics13, 238 (2012). CASPubMedPubMed Central Google Scholar
Quick, J., Quinlan, A. R. & Loman, N. J. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. GigaScience3, 22 (2014). PubMedPubMed Central Google Scholar
Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res.24, 688–696 (2014). CASPubMedPubMed Central Google Scholar
Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol.30, 701–707 (2012). CASPubMedPubMed Central Google Scholar
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol.30, 693–700 (2012). CASPubMedPubMed Central Google Scholar
Prjibelski, A. D. et al. ExSPAnder: a universal repeat resolver for DNA fragment assembly. Bioinformatics30, i293–301 (2014). CASPubMedPubMed Central Google Scholar
English, A. C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE7, e47768 (2012). CASPubMedPubMed Central Google Scholar
Callaway, E. 'Platinum' genome takes on disease. Nature515, 323 (2014). CASPubMed Google Scholar
Human Genome Structural Variation Consortium. The phase 3 structural variant dataset. 1000 Genomes[online], (2015).
Bailey, J. A., Yavor, A. M., Massa, H. F., Trask, B. J. & Eichler, E. E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res.11, 1005–1017 (2001). CASPubMedPubMed Central Google Scholar
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature464, 704–712 (2010). CASPubMed Google Scholar