Characterization of missing human genome sequences and copy-number polymorphic insertions (original) (raw)
- Resource
- Published: 18 April 2010
- Nick Sampas2,
- Francesca Antonacci1,
- Tina Graves3,
- Robert Fulton3,
- Hillary S Hayden1,
- Can Alkan1,
- Maika Malig1,
- Mario Ventura4,
- Giuliana Giannuzzi4,
- Joelle Kallicki3,
- Paige Anderson2,
- Anya Tsalenko2,
- N Alice Yamada2,
- Peter Tsang2,
- Rajinder Kaul1,
- Richard K Wilson3,
- Laurakay Bruhn2 &
- …
- Evan E Eichler1,5
Nature Methods volume 7, pages 365–371 (2010)Cite this article
- 1140 Accesses
- 102 Citations
- 26 Altmetric
- Metrics details
Subjects
Abstract
The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 new insertion sequences corresponding to 720 genomic loci. We found that a substantial fraction of these sequences are either missing, fragmented or misassigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determined that 18–37% of these new insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identified new exons and conserved noncoding sequences not yet represented in the reference genome. We developed a method to accurately genotype these new insertions by mapping next-generation sequencing datasets to the breakpoint, thereby providing a means to characterize copy-number status for regions previously inaccessible to single-nucleotide polymorphism microarrays.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Similar content being viewed by others
Accession codes
Accessions
Gene Expression Omnibus
References
- International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
- Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Article Google Scholar - Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Article CAS Google Scholar - Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article CAS Google Scholar - Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Article CAS Google Scholar - McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. published online, doi:10.1101/gr.091868.109 (22 June 2009).
- Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).
Article CAS Google Scholar - Hormozdiari, F., Alkan, C., Eichler, E.E. & Sahinalp, S.C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).
Article CAS Google Scholar - Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Article CAS Google Scholar - Eichler, E.E. et al. Completing the map of human genetic variation. Nature 447, 161–165 (2007).
Article CAS Google Scholar - Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Article CAS Google Scholar - Bovee, D. et al. Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat. Genet. 40, 96–101 (2008).
Article CAS Google Scholar - The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).
- Perry, G.H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).
Article CAS Google Scholar - Weir, B.S. Genetic Data Analysis II (Sinauer, Sunderland, Massachusetts, USA, 1996).
- Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Article CAS Google Scholar - Enattah, N.S. et al. Identification of a variant associated with adult-type hypolactasia. Nat. Genet. 30, 233–237 (2002).
Article CAS Google Scholar - Pruitt, K.D., Tatusova, T., Klimke, W. & Maglott, D.R. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009).
Article CAS Google Scholar - Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article CAS Google Scholar - Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008).
Article CAS Google Scholar - Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008).
Article CAS Google Scholar - Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Article CAS Google Scholar - McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).
Article CAS Google Scholar - Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2009).
Article Google Scholar - Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Article CAS Google Scholar - Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).
Article CAS Google Scholar - Parsons, J.D. Miropeats: graphical DNA sequence comparisons. Comput. Appl. Biosci. 11, 615–619 (1995).
CAS PubMed Google Scholar
Acknowledgements
We thank C. Campbell, G. Cooper, T. Marques-Bonet for thoughtful discussion, P. Sudmant for assistance with Illumina sequence data and members of the University of Washington and Washington University Genomes Centers for assistance with data generation. J.M.K. is supported by a US National Science Foundation Graduate Research Fellowship. This work was supported by the US National Institutes of Health grant HG004120 to E.E.E. E.E.E. receives funds as an Investigator of the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
Jeffrey M Kidd, Francesca Antonacci, Hillary S Hayden, Can Alkan, Maika Malig, Rajinder Kaul & Evan E Eichler - Agilent Laboratories, Santa Clara, California, USA
Nick Sampas, Paige Anderson, Anya Tsalenko, N Alice Yamada, Peter Tsang & Laurakay Bruhn - Washington University Genome Sequencing Center, School of Medicine, St. Louis, Missouri, USA
Tina Graves, Robert Fulton, Joelle Kallicki & Richard K Wilson - Department of Genetics and Microbiology, University of Bari, Bari, Italy
Mario Ventura & Giuliana Giannuzzi - Howard Hughes Medical Institute, Seattle, Washington, USA
Evan E Eichler
Authors
- Jeffrey M Kidd
You can also search for this author inPubMed Google Scholar - Nick Sampas
You can also search for this author inPubMed Google Scholar - Francesca Antonacci
You can also search for this author inPubMed Google Scholar - Tina Graves
You can also search for this author inPubMed Google Scholar - Robert Fulton
You can also search for this author inPubMed Google Scholar - Hillary S Hayden
You can also search for this author inPubMed Google Scholar - Can Alkan
You can also search for this author inPubMed Google Scholar - Maika Malig
You can also search for this author inPubMed Google Scholar - Mario Ventura
You can also search for this author inPubMed Google Scholar - Giuliana Giannuzzi
You can also search for this author inPubMed Google Scholar - Joelle Kallicki
You can also search for this author inPubMed Google Scholar - Paige Anderson
You can also search for this author inPubMed Google Scholar - Anya Tsalenko
You can also search for this author inPubMed Google Scholar - N Alice Yamada
You can also search for this author inPubMed Google Scholar - Peter Tsang
You can also search for this author inPubMed Google Scholar - Rajinder Kaul
You can also search for this author inPubMed Google Scholar - Richard K Wilson
You can also search for this author inPubMed Google Scholar - Laurakay Bruhn
You can also search for this author inPubMed Google Scholar - Evan E Eichler
You can also search for this author inPubMed Google Scholar
Contributions
J.M.K., N.S., F.A., A.T., R.K. and E.E.E. analyzed data. N.S., P.A., A.T., N.A.Y., P.T. and L.B. performed array CGH and copy-number analysis. F.A., M.V. and G.G. performed FISH experiments. C.A. assembled contigs. T.G., R.F., H.S.H., M.M., J.K., R.K. and R.K.W. performed clone characterization and sequencing. J.M.K., R.K., L.B. and E.E.E. designed the study. J.M.K. and E.E.E. wrote the paper with contributions from the other authors.
Corresponding author
Correspondence toEvan E Eichler.
Ethics declarations
Competing interests
N.S., P.A., A.T., N.A.Y., P.T. and L.B. are employees of Agilent Technologies. E.E.E. is a scientific advisory board member for Pacific Biosciences.
Supplementary information
Rights and permissions
About this article
Cite this article
Kidd, J., Sampas, N., Antonacci, F. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions.Nat Methods 7, 365–371 (2010). https://doi.org/10.1038/nmeth.1451
- Received: 09 December 2009
- Accepted: 19 March 2010
- Published: 18 April 2010
- Issue Date: May 2010
- DOI: https://doi.org/10.1038/nmeth.1451