Mapping copy number variation by population-scale genome sequencing - PubMed (original) (raw)

. 2011 Feb 3;470(7332):59-65.

doi: 10.1038/nature09708.

Klaudia Walter, Chip Stewart, Robert E Handsaker, Ken Chen, Can Alkan, Alexej Abyzov, Seungtai Chris Yoon, Kai Ye, R Keira Cheetham, Asif Chinwalla, Donald F Conrad, Yutao Fu, Fabian Grubert, Iman Hajirasouliha, Fereydoun Hormozdiari, Lilia M Iakoucheva, Zamin Iqbal, Shuli Kang, Jeffrey M Kidd, Miriam K Konkel, Joshua Korn, Ekta Khurana, Deniz Kural, Hugo Y K Lam, Jing Leng, Ruiqiang Li, Yingrui Li, Chang-Yun Lin, Ruibang Luo, Xinmeng Jasmine Mu, James Nemesh, Heather E Peckham, Tobias Rausch, Aylwyn Scally, Xinghua Shi, Michael P Stromberg, Adrian M Stütz, Alexander Eckehart Urban, Jerilyn A Walker, Jiantao Wu, Yujun Zhang, Zhengdong D Zhang, Mark A Batzer, Li Ding, Gabor T Marth, Gil McVean, Jonathan Sebat, Michael Snyder, Jun Wang, Kenny Ye, Evan E Eichler, Mark B Gerstein, Matthew E Hurles, Charles Lee, Steven A McCarroll, Jan O Korbel; 1000 Genomes Project

Collaborators, Affiliations

PMID: 21293372
PMCID: PMC3077050
DOI: 10.1038/nature09708

Mapping copy number variation by population-scale genome sequencing

Ryan E Mills et al. Nature. 2011.

Abstract

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

PubMed Disclaimer

Figures

Figure 1. SV discovery and genotyping in population scale sequence data

A. Schematic depicting the different modes (i.e., approaches) of sequence based SV detection we used. The RP approach assesses the orientation and spacing of the mapped reads of paired-end sequences, (reads are denoted by arrows); the RD approach evaluates the read depth-of-coverage,; the SR approach maps the boundaries (breakpoints) of SVs by sequence alignment,; the AS approach assembles SVs,,. B. Integrated pipeline for SV discovery, validation, and genotyping. Colored circles represent individual SV discovery methods (listed in Supplementary Table 1), with modes indicated by a color scheme: green=RP; yellow=RD; purple=SR; red=AS; green and yellow=methods evaluating RP and RD (abbreviated as ‘PD’). C. Example of a deletion, previously associated with BMI, identified independently with RP (green), RD (yellow), and SR (red) methods. Grey dots indicate position and mapping quality for individual sequence reads. Targeted assembly confirmed the breakpoints detected by SR.

Figure 2. Comparative assessment of deletion discovery methods

A. Deletion size-range ascertained by different modes of SV discovery. Three groups are visible, with AS and SR, PD and RP, as well as RD and ‘RL’ (RP analysis involving relatively long range (≥1 kb) insert size libraries, resulting in a different deletion detection size range compared to the predominantly used <500kb insert size libraries), respectively, ascertaining similar size-ranges. Pie charts display the contribution of different SV discovery modes to the release set. Outer pie = based on number of SV calls; inner pie = based on total number of variable nucleotides. Of note, not all approaches were applied across all individuals (see Supplementary Table 2). B. Sensitivity and FDR estimates for individual deletion discovery methods based on gold standard sets for individuals sequenced at high (NA12878) and low-coverage (NA12156), respectively. All depicted estimates are summarized in Supplementary Tables 3, 4, 6. Vertical dotted lines correspond to the specificity threshold (FDR≤10%). C. Breakpoint mapping resolution of three deletion discovery methods (the respective method names are in Supplementary Table 2). The blue and red histograms are the breakpoint residuals for predicted deletion start and end coordinates, respectively, relative to assembled coordinates (here assessed in low-coverage data). The horizontal lines at the top of each plot mark the 98% confidence intervals (labeled for each panel), with vertical notches indicating the positions of the most probable breakpoint (the distribution mode).

Figure 3. Analysis of deletion presence and absence in two populations

A-C. Deletion allele frequencies and observed sharing of alleles across populations, displayed for deletions discovered in the CEU, YRI, and JPT+CHB population samples in terms of stacked bars. D. Allele frequency spectra for deletions intersecting with intergenic (blue), intronic (yellow), and protein-coding sequences (red).

Figure 4. Contribution of SV formation mechanisms to the SV size spectrum

A. Breakpoint junction homology/microhomology length plotted as a function of SV size for SVs originally identified as deletions compared to a human reference. Dots are colored according to the SVs’ classification as deletions, insertions/duplications, or “undetermined” relative to inferred ancestral genomic loci. Gray lines mark groups of SVs likely formed by a common formation mechanism. The diagonal highlights tandem duplications (and few reciprocal deletion events), in which the length of the duplicated sequence correlates linearly with the length of the longest breakpoint junction sequence identity stretch. The ellipses indicate MEIs, i.e., Alu (~300 bp) and L1 (~6 kb) insertions, associated with target site duplications of up to 28 bp in size at the breakpoints. The horizontal group corresponds mostly to NH-associated deletions with <10 bp microhomology at the breakpoints. The remaining (ungrouped) SVs comprise truncated MEIs, VNTR expansion and shrinkage events, as well as NAHR-associated deletions and duplications. B. Relative contributions of SV formation mechanisms in the genome. Numbers of SVs are displayed on the outer pie chart and affected base pairs on the inner. Left panel: SVs classified as deletions relative to ancestral loci. Right panel: SVs classified as insertions/duplications. C. Size spectra of deletions classified relative to ancestral loci. D. Size spectra of insertions/duplications.

Figure 5. Mapping hotspots of SV formation in the genome

A. Distribution of SVs on chromosome 10 (“chr10”). Above the ideogram, colored bars indicate SV formation mechanisms (same color scheme as in B and C); bar lengths relate to the logarithm of SV size. Below the ideogram, bar lengths are directly proportional to allele frequencies. Arrows indicate an SV hotspot near the centromere underlying mainly VNTR, and several hotspots near the telomeres underlying mainly NAHR events. B. Enrichment of SVs inferred to be formed by the same formation mechanism for different genomic window sizes. Displayed is an enrichment of nearby, non-overlapping SVs formed by the same mechanism relative to an SV set where mechanism assignments are shuffled randomly. C. SV hotspots are mostly dominated by a single formation mechanism. Colored bars depict numbers of SV hotspots in which at least 50% of the variants were inferred to be formed by a single formation mechanism. The average abundance of NAHR-classified SVs in NAHR hotspots was 70% (compared with 77% for VNTR-hotspots; 69% for NH). The gray bar (“mixed”) corresponds to SV hotspots with no single mechanism dominating.

Cited by

Analysis of genomic copy number variations through whole-genome scan in Yunling cattle.
Dang D, Zhang L, Gao L, Peng L, Chen J, Yang L. Dang D, et al. Front Vet Sci. 2024 Jul 22;11:1413504. doi: 10.3389/fvets.2024.1413504. eCollection 2024. Front Vet Sci. 2024. PMID: 39104544 Free PMC article.
Genome-wide detection of copy number variation in American mink using whole-genome sequencing.
Davoudi P, Do DN, Rathgeber B, Colombo SM, Sargolzaei M, Plastow G, Wang Z, Karimi K, Hu G, Valipour S, Miar Y. Davoudi P, et al. BMC Genomics. 2022 Sep 13;23(1):649. doi: 10.1186/s12864-022-08874-1. BMC Genomics. 2022. PMID: 36096727 Free PMC article.
Human-specific CpG "beacons" identify loci associated with human-specific traits and disease.
Bell CG, Wilson GA, Butcher LM, Roos C, Walter L, Beck S. Bell CG, et al. Epigenetics. 2012 Oct;7(10):1188-99. doi: 10.4161/epi.22127. Epub 2012 Sep 11. Epigenetics. 2012. PMID: 22968434 Free PMC article.
Whole-genome detection of disease-associated deletions or excess homozygosity in a case-control study of rheumatoid arthritis.
Wu CC, Shete S, Jo EJ, Xu Y, Lu EY, Chen WV, Amos CI. Wu CC, et al. Hum Mol Genet. 2013 Mar 15;22(6):1249-61. doi: 10.1093/hmg/dds512. Epub 2012 Dec 6. Hum Mol Genet. 2013. PMID: 23223014 Free PMC article.
Accurate indel prediction using paired-end short reads.
Grimm D, Hagmann J, Koenig D, Weigel D, Borgwardt K. Grimm D, et al. BMC Genomics. 2013 Feb 27;14:132. doi: 10.1186/1471-2164-14-132. BMC Genomics. 2013. PMID: 23442375 Free PMC article.

References

1. Conrad DF, et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–712. - PMC - PubMed
1. Pinto D, et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature. 2010;466:368–372. - PMC - PubMed
1. Sebat J, et al. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–449. - PMC - PubMed
1. Stefansson H, et al. Large recurrent microdeletions associated with schizophrenia. Nature. 2008;455:232–236. - PMC - PubMed
1. McCarthy SE, et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat Genet. 2009;41:1223–1227. - PMC - PubMed

Mapping copy number variation by population-scale genome sequencing - PubMed (original) (raw)

Mapping copy number variation by population-scale genome sequencing

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials