Paired-end mapping reveals extensive structural variation in the human genome - PubMed (original) (raw)
. 2007 Oct 19;318(5849):420-6.
doi: 10.1126/science.1149504. Epub 2007 Sep 27.
Alexander Eckehart Urban, Jason P Affourtit, Brian Godwin, Fabian Grubert, Jan Fredrik Simons, Philip M Kim, Dean Palejev, Nicholas J Carriero, Lei Du, Bruce E Taillon, Zhoutao Chen, Andrea Tanzer, A C Eugenia Saunders, Jianxiang Chi, Fengtang Yang, Nigel P Carter, Matthew E Hurles, Sherman M Weissman, Timothy T Harkins, Mark B Gerstein, Michael Egholm, Michael Snyder
Affiliations
- PMID: 17901297
- PMCID: PMC2674581
- DOI: 10.1126/science.1149504
Paired-end mapping reveals extensive structural variation in the human genome
Jan O Korbel et al. Science. 2007.
Abstract
Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.
Figures
Fig. 1
(A) Flow chart illustrating PEM. (i) Genomic DNA was sheared to yield DNA fragments ~3 kb; (ii) biotinylated hairpin adapters were ligated to the fragment ends; (iii) fragments were circularized (iv) and randomly sheared; (v) linker (+) fragments were isolated; (vi) the library was subjected to 454 sequencing (13). (vii) Paired ends were analyzed computationally to determine (viii) the distribution of “paired-end spans” (shown for a single 454 sequencing pool). (B) Types of SVs. Deletions were predicted from paired-end spans larger than a specified cutoff D; simple insertions had a span < cutoff I; inversions are seen when ends map to the genome at different relative orientations; other types of insertions (defined in the text as mated and unmated) were detected with evidence of sequence integration from a distal locus.
Fig. 2
SVs identified in two humans. (A) SVs mapped onto chromosomal ideograms (12). Right side: Red, deletion; blue, insertion; and yellow, inversion. Double length indicates SVs observed in both individuals. Left side: Log-scale size of an event (events ≥1 Mb are drawn at same length, corresponding to the maximum length of a line); unmated insertions [i.e., events lacking a predicted breakpoint and thus size information (12)] and simple insertions (12) are depicted with 1-kb lines; line colors indicate repetitive sequences in ±3-kb window of the predicted breakpoint junction (12): red, SDs; blue, LINEs; yellow, LTRs; green, satellites; black, two or more repetitive elements with equal frequency; gray, no repeat association. The arrow indicates the region in (B). A high-resolution image of this figure is available as fig. S1. (B) Amplified view of chromosome 4 region. SVs in NA18505 are indicated with dashed lines (validation: squares); NA15510, dotted lines (validation: circle). SVs shared between individuals are solid lines. Colors are as in (A).
Fig. 3
SV size distribution, sequence coverage, genes, and distribution of gene categories. (A) Size distribution of SVs (NA15510 and NA18505 combined). Arrow indicates the lower size cutoff for deletions. (B) Cumulative number of base pairs affected by SVs in relation to SV size (NA18505 only). (C) Solid line indicates cumulative number of RefSeq genes intersecting with SVs in relation to SV size (NA18505 only). Dashed line, randomly shuffled SV locations within the local genomic context (±50-kb window) exhibit an increase in gene overlap. (D) Enrichment or depletion of GO (annotation level 3) biological processes for genes intersecting with SVs (NA15510 and NA18505 combined). Annotations represented by <10 genes are designated “other” and are gray. **Significant enrichment in genes belonging to a category (P < 1_e_−14)(12); *significant depletion (P < 0.001).
Fig. 4
Validation of SVs. (A) A 170-kb deletion detected with both array-CGH and PEM. (B) PCR products validating SVs as originally predicted from NA18505 (lane 2). Lanes 1 to 4 use DNAs from NA15510, NA18505, NA11997 (HapMap CEU, cell lines derived from 30 trios of European descent), and NA18614 (HapMap CHB, Han Chinese from Beijing), respectively. Primer sequences can be found in table S6. (C) Fiber-FISH validation of heterozygous inversions in NA18505. The inversion in the upper panel was independently validated in NA15510. Alternating patterns of fluorescent labels from adjacent probes indicate genomic rearrangement.
Fig. 5
Sequencing and analysis of SV breakpoint junctions. (A) PCR fragments spanning SVs were pooled and sequenced; breakpoints were determined from assembled contigs or ≥2 sequencing reads. (B) Representative sequenced SVs shown in relation to previous SV and/or CNV assignments [earlier SV and CNV assignments often extend outside of the depicted regions (3, 4)]. From top to bottom: SVs resulting from NHEJ, L1 retrotransposition, HERVK (retrovirus) insertion, (nonallelic) homologous recombination, gap closure (blue, insertions; red, deletions; orange, sequence gap; and yellow, inversions). Note that some SVs affect annotated genes. (C) Example breakpoint sequences (12). Upper case and green letters are for unaltered sequence; lower case for SV indel; solid arrows show microhomologies (indicative of NHEJ), duplication of target sequences (at retrotransposon or retrovirus insertion sites), and long stretches of sequence identity (12) (indicative of homologous recombination). Note that the fourth sequence (from top to bottom) shows an OR gene fusion in the main reading frame (breakpoints occurred in the long stretch of sequence identity).
Similar articles
- The fine-scale architecture of structural variants in 17 mouse genomes.
Yalcin B, Wong K, Bhomra A, Goodson M, Keane TM, Adams DJ, Flint J. Yalcin B, et al. Genome Biol. 2012;13(3):R18. doi: 10.1186/gb-2012-13-3-r18. Genome Biol. 2012. PMID: 22439878 Free PMC article. - Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries.
Kerstens HH, Crooijmans RP, Dibbits BW, Vereijken A, Okimoto R, Groenen MA. Kerstens HH, et al. BMC Genomics. 2011 Feb 3;12:94. doi: 10.1186/1471-2164-12-94. BMC Genomics. 2011. PMID: 21291514 Free PMC article. - An integrative probabilistic model for identification of structural variation in sequencing data.
Sindi SS, Onal S, Peng LC, Wu HT, Raphael BJ. Sindi SS, et al. Genome Biol. 2012;13(3):R22. doi: 10.1186/gb-2012-13-3-r22. Genome Biol. 2012. PMID: 22452995 Free PMC article. - Disruption of regulatory domains and novel transcripts as disease-causing mechanisms.
Allou L, Mundlos S. Allou L, et al. Bioessays. 2023 Oct;45(10):e2300010. doi: 10.1002/bies.202300010. Epub 2023 Jun 29. Bioessays. 2023. PMID: 37381881 Review. - Challenges in studying genomic structural variant formation mechanisms: the short-read dilemma and beyond.
Onishi-Seebacher M, Korbel JO. Onishi-Seebacher M, et al. Bioessays. 2011 Nov;33(11):840-50. doi: 10.1002/bies.201100075. Epub 2011 Sep 30. Bioessays. 2011. PMID: 21959584 Review.
Cited by
- Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication.
Paudel Y, Madsen O, Megens HJ, Frantz LA, Bosse M, Bastiaansen JW, Crooijmans RP, Groenen MA. Paudel Y, et al. BMC Genomics. 2013 Jul 5;14:449. doi: 10.1186/1471-2164-14-449. BMC Genomics. 2013. PMID: 23829399 Free PMC article. - PAV markers in Sorghum bicolour: genome pattern, affected genes and pathways, and genetic linkage map construction.
Shen X, Liu ZQ, Mocoeur A, Xia Y, Jing HC. Shen X, et al. Theor Appl Genet. 2015 Apr;128(4):623-37. doi: 10.1007/s00122-015-2458-4. Epub 2015 Jan 30. Theor Appl Genet. 2015. PMID: 25634103 Free PMC article. - The Wukong Terminal-Repeat Retrotransposon in Miniature (TRIM) Elements in Diverse Maize Germplasm.
Liu Z, Li X, Wang T, Messing J, Xu JH. Liu Z, et al. G3 (Bethesda). 2015 May 26;5(8):1585-92. doi: 10.1534/g3.115.018317. G3 (Bethesda). 2015. PMID: 26019188 Free PMC article. - Assessing the risks of genotoxicity in the therapeutic development of induced pluripotent stem cells.
Hong SG, Dunbar CE, Winkler T. Hong SG, et al. Mol Ther. 2013 Feb;21(2):272-81. doi: 10.1038/mt.2012.255. Epub 2012 Dec 4. Mol Ther. 2013. PMID: 23207694 Free PMC article. Review. - Interchromosomal segmental duplication drives translocation and loss of P. falciparum histidine-rich protein 3.
Hathaway NJ, Kim IE, WernsmanYoung N, Hui ST, Crudale R, Liang EY, Nixon CP, Giesbrecht D, Juliano JJ, Parr JB, Bailey JA. Hathaway NJ, et al. Elife. 2024 Oct 7;13:RP93534. doi: 10.7554/eLife.93534. Elife. 2024. PMID: 39373634 Free PMC article.
References
- Sebat J, et al. Science. 2004;305:525. - PubMed
- Iafrate AJ, et al. Nat. Genet. 2004;36:949. - PubMed
- Tuzun E, et al. Nat. Genet. 2005;37:727. - PubMed
- Redon R, et al. Nature. 2006;444:444. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- 077008/WT_/Wellcome Trust/United Kingdom
- RR19895/RR/NCRR NIH HHS/United States
- S10 RR019895/RR/NCRR NIH HHS/United States
- 077014/WT_/Wellcome Trust/United Kingdom
- WT_/Wellcome Trust/United Kingdom
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials