CopywriteR: DNA copy number detection from off-target sequence data - PubMed (original) (raw)

doi: 10.1186/s13059-015-0617-1.

Arno Velds 2, Kristel Kemper 3, Marco Ranzani 4, Lorenzo Bombardelli 5, Marlous Hoogstraat 6, Ekaterina Nevedomskaya 7 8, Guotai Xu 9, Julian de Ruiter 10 11, Martijn P Lolkema 12, Bauke Ylstra 13, Jos Jonkers 14, Sven Rottenberg 15 16, Lodewyk F Wessels 17, David J Adams 18, Daniel S Peeper 19, Oscar Krijgsman 20

Affiliations

CopywriteR: DNA copy number detection from off-target sequence data

Thomas Kuilman et al. Genome Biol. 2015.

Abstract

Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting 'off-target' sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Copy number information can be obtained from off-target reads. (A) Screenshot from the IGV genome browser, showing an example of a genomic region with sequence reads mapping to the genome before and after removal of reads in Model-based Analysis for ChIPseq (MACS) called peaks. In addition, the location of MACS-peaks, capture regions, and genes are shown. (B) Germline DNA sample C41 was subjected to WES with capture set Agilent SureSelect Human Exon Kit V4. The nature of MACS-peaks that do not overlap capture regions is displayed. The fraction of these orphan peaks that overlap with pseudogene or Ensembl exons, that do not map to any of the reference genome chromosomes, that are unmappable, and that do not belong to any of these categories are shown. (C) The distribution of sequence reads of both germline and tumor DNA samples is shown for the indicated capture sets. Sequence reads are classified into one of these categories: (1) low mapping quality reads (Phred-score < 37 and/or reads do not pair properly); (2) mitochondrial reads; (3) reads in MACS-peaks; (4) remaining reads. Error bars represent standard deviations. (D) Germline DNA sample C45 was subjected to WES, and the amount of reads after compensation for reduced effective bin size is calculated and compared to the corresponding read counts from an exon-based method. Density plots of the number of sequence reads per data point are shown for each method. (E) A flowchart of the steps incorporated in the CopywriteR tool.

Figure 2

Figure 2

CopywriteR compares to dedicated copy number detection methods. (A) Six PDX-derived human melanoma were subjected to WES and analyzed on SNP6 arrays. Pseudo counts were derived (see Materials and methods), and used as a basis for copy number profiles, with segmentation values (CBS) depicted in red (left panel). After segmentation, segmentation values were represented as a heatmap to show concordance of the two methods. (B) Four murine small-cell lung carcinomas (SCLC) were subjected to WES and analyzed by arrayCGH. Pseudo counts were created and used for creating copy number profiles, with segmentation values (CBS) depicted in red (right panel). Segmentation values were plotted as in (A) for comparison of the two methods (left panel). (C) Tumor T20 from a breast cancer mouse model was subjected to WES or LC-WGS. Copy number profiles of chromosome 12 generated with onTarget or CopywriteR methods are compared to the profile from LC-WGS data of the same material, with segmentation values (CBS) depicted in red (left panel). Segmentation values of onTarget and CopywriteR methods are plotted against the LC-WGS method, and Euclidian distances and Pearson correlation coefficients of segmentation values are displayed (right panel).

Figure 3

Figure 3

CopywriteR outperforms exonic depth of coverage-based methods. (A) Tumors from a breast cancer mouse model were subjected to WES or LC-WGS, and analyzed using CopywriteR or onTarget methods. Subsequently, copy number data were segmented using propSeg or CBS, while the integrated EXCAVATOR tool was used in addition. Weighted Euclidian distances (left) and Pearson correlation coefficients (right) were calculated between the different approaches for every sample, and the means of those values across all samples are represented as clustered heatmaps. (B) As in (A); the genome-wide copy number plots for sample T3 are displayed for the indicated analysis methods, with segmentation values depicted in red.

Figure 4

Figure 4

Copy number detection in the absence of a reference. (A) CopywriteR and onTarget methods were applied to WES data of melanoma PDX sample T99, either with or without C43 as a reference. Genome-wide copy number profiles are shown, with segmentation values (CBS) depicted in red. (B) CBS-derived segmentation values of the analysis in (A) are represented in a heatmap. (C) Segmentation values of all six melanoma PDX samples were treated as in (A) and (B), and the weighted Euclidian distances and Pearson correlation coefficients were calculated for every sample between the different methods. The means of those values across all samples are represented as clustered heatmaps.

Figure 5

Figure 5

CopywriteR is widely applicable. (A) Sample T97 (FFPE) was subjected to WES, and copy number profiles relative to C41 (fresh frozen reference material) are displayed for onTarget and CopywriteR methods, with segmentation values (CBS) depicted in red (left panel: whole-genome; right panel: chromosome 9). (B, left panel) ChIPseq data were obtained from ChIP experiments on the MCF7 cell line with the indicated set of antibodies, or from the relevant input control. Copy number data were extracted using CopywriteR, and further analyzed employing CBS. Segmentation values are represented as a heatmap. (B, right panel) Data were analyzed as for the left panel. ChIPseq data were obtained from ChIP experiments on ER+ breast cancer with ER-antibodies (E), or from the relevant input (I) control. (C, left panel) A set of matched pre- and post-vemurafenib treatment melanoma samples were subjected to targeted sequencing on a 1,977-gene panel. Copy number information was extracted using CopywriteR and example regions of the resulting copy number profiles are presented, with segmentation values (CBS) depicted in red. (C, right panel) Segmentation values were plotted as a heatmap for the pre/post-treatment pairs.

Similar articles

Cited by

References

    1. Mardis ER, Wilson RK. Cancer genome sequencing: a review. Hum Mol Genet. 2009;18:R163–8. doi: 10.1093/hmg/ddp396. - DOI - PMC - PubMed
    1. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–45. doi: 10.1038/nbt1486. - DOI - PubMed
    1. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–6. doi: 10.1038/nature08250. - DOI - PMC - PubMed
    1. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, et al. Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007;39:1522–7. doi: 10.1038/ng.2007.42. - DOI - PubMed
    1. Zack TI, Schumacher SE, Carter SL, Cherniack AD, Saksena G, Tabak B, et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013;45:1134–40. doi: 10.1038/ng.2760. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources