Absolute quantification of somatic DNA alterations in human cancer - PubMed (original) (raw)

Kristian Cibulskis, Elena Helman, Aaron McKenna, Hui Shen, Travis Zack, Peter W Laird, Robert C Onofrio, Wendy Winckler, Barbara A Weir, Rameen Beroukhim, David Pellman, Douglas A Levine, Eric S Lander, Matthew Meyerson, Gad Getz

Affiliations

Absolute quantification of somatic DNA alterations in human cancer

Scott L Carter et al. Nat Biotechnol. 2012 May.

Abstract

We describe a computational method that infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations. The method, named ABSOLUTE, can detect subclonal heterogeneity and somatic homozygosity, and it can calculate statistical sensitivity for detection of specific aberrations. We used ABSOLUTE to analyze exome sequencing data from 214 ovarian carcinoma tumor-normal pairs. This analysis identified both pervasive subclonal somatic point-mutations and a small subset of predominantly clonal and homozygous mutations, which were overrepresented in the tumor suppressor genes TP53 and NF1 and in a candidate tumor suppressor gene CDK12. We also used ABSOLUTE to infer absolute allelic copy-number profiles from 3,155 diverse cancer specimens, revealing that genome-doubling events are common in human cancer, likely occur in cells that are already aneuploid, and influence pathways of tumor progression (for example, with recessive inactivation of NF1 being less common after genome doubling). ABSOLUTE will facilitate the design of clinical sequencing studies and studies of cancer genome evolution and intra-tumor heterogeneity.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Overview of tumor DNA analysis using ABSOLUTE

A constant mass of DNA is extracted from a heterogeneous cell population consisting of cancer and normal cells. This DNA is profiled using either microarray or massively parallel sequencing technology, giving a genome-wide profile of DNA concentrations (blue lines). ABSOLUTE uses statistical models of recurrent cancer karyotypes to interpret the DNA concentrations as discrete copy states corresponding to predominantly clonal somatic copy number alterations, although some subclonal alterations are often present. If somatic point mutation data are available (from sequencing of the DNA), then the allelic fractions (fraction of sequencing reads bearing the non-reference allele) of these mutations may be used help to interpret the DNA concentrations. In addition, the allelic fractions may be reinterpreted as integer allelic copies per cancer cell (multiplicity), potentially revealing subclonal point mutations.

Figure 2

Figure 2. ABSOLUTE method validation and comparison a-d Performance of ABSOLUTE and ASCAT on 4 validation assays

RMSE: root mean squared error. _P_-values were calculated on the squared errors using the paired one-sided Wilcoxon test (*: P < 0.05, **: P < 0.001). See Supplementary Note 1 for the ASCAT2.1 protocol. a, FACS-based ploidy measurements vs. inferred ploidy estimates for 37 primary tumor samples.Dashed line indicates _y_=x. b, SKY-based ploidy measurements vs. inferred ploidy estimates for 33 cancer cell-lines. Dataare displayed as in a. c, Estimated purity of the 33 cell lines shown in (b) Dashed horizontal line indicates the truepurity (1.0). d, Cancer-normal DNA mixing experiment results for two cell lines. DNA from each cancercell line was mixed with DNA from the matched B-lymphocyte in varying proportions (_x_-axis).(top) predicted vs. true DNA mixing fractions compared to the _y_=x line (dashed). (bottom)predicted cancer cell-line ploidy vs. mixture purity. The copy-profile of several samples wasmisinterpreted (x's); these points were not included in the RMSE calculations. Ploidy estimateswere generally consistent with previous SKY analysis of these cell lines:

http://www.path.cam.ac.uk/∼pawefish/cell%20line%20catalogues/breast-cell-lines.htm

. e, Leukocyte methylation signature enrichment in tumors of histologicaly underestimatedpurity. HGS-OvCa samples are shown grouped according to the indicated histological purityestimates (_x_-axis). Black horizontal lines indicate the median purity of each group, asestimated by ABSOLUTE (_y_-axis). The color of each point corresponds to the degree to whichthat sample's methylation profile resembled that of purified leukocytes (Online Methods).

Figure 3

Figure 3. Pan-cancer application of ABSOLUTE

a, ABSOLUTE result types: (i) ‘called’ -- unique purity/ploidy solution; (ii) ‘non-aberrant’ --sample has no detectable somatic copy-number alterations; (iii) ‘insufficient purity’ – insufficient fraction of cancer cells; (iv) ‘polygenomic’ discrete copy-ratio levels could not be determined. See Online Methods and Supplementary Fig. 5 for a description and examples of each result type. b, Distribution of estimated tumor purity for several datasets. The number of called tumor samples in each group is shown in parentheses. We note that, because heavily contaminated tumors are difficult to call using ABSOLUTE, several of these distributions are biased towards higher purity samples. c, The number of called tumor samples in each group is shown in parentheses. Because tumors without SCNAs cannot be called using ABSOLUTE, these distributions do not incorporate the prevalence of such samples.

Figure 4

Figure 4. Characterization of subclonal evolution in ovarian cancer by integrative analysis of SNP array and whole-exome sequencing data

a, Histogram of allelic fraction (alternate/total read-count) values for 29,628 somatic point-mutations detected in 214 primary HGS-OvCa samples. b, Allelic fractions for the mutations shown in (a) were converted to point estimates of integer allele-counts per cancer cell (cellular multiplicity; _x_-axis) by correcting for sample purity and local copy-numbers. Subclonal mutations were identified using the model defined in equation 10. c, The fraction of each of the 6 distinguishable nucleotide substitutions for clonal vs. subclonal point-mutations. The solid grey line indicates _y_=x. RMSE: root mean squared error. d-f, Analysis of distinct subclonal populations in HGS-OvCa sample TCGA-24-1603 (purity=0.96, ploidy= 1.75). d, Tumor SCNA profile with modeled absolute copy-numbers, as in Supplementary Fig. 1c,h. Regions of normal homologous copy-number = 1 are grayed out, clonal SCNAs are brown.Subclonal SCNAs (light blue) appear in several clusters (arrows). e, Point mutation allelic-fraction profile. Each solid curve corresponds to a single mutation, withthe density according to the posterior (Beta) distribution implied by the observed allelic fractionand local read depth (Online Methods, Eq. 12). Color indicates degree of classification as clonalor subclonal, as in (b). Dashed curves indicate summed density of individual posteriors. f, SCNAs from (d) and point mutations from (e) were rescaled to units of cancer cell fraction. Subclonal cancer cell fractions of ∼0.2, 0.3, and 0.6 are supported both by SCNAs and point mutations (purple, blue, and orange arrows, respectively; see corresponding copy ratios in d).

Figure 5

Figure 5. Classification of somatic mutations by multiplicity analysis in 214 primary HGS-OvCa tumor samples

a, Empirical density estimate of allelic concentration-ratios, which are obtained by multiplication of the allelic fraction by the copy-ratio at that locus. b, Density estimate of allelic multiplicity estimates, as in Fig. 4b, for reference vs. mutant allele.Mutations were classified into the four indicated categories according to their mutant andreference allele multiplicity. c, The density estimates of allelic concentration-ratios are shown for each of the four mutation classes in b are shown superimposed. d, Mutation classification profiles of genes identified as significantly recurrent in HGS-OvCa, aswell as several COSMIC genes with previously observed mutations in these data. Note that onlyindividual point mutations were considered here; the possibility of recessive inactivation viamultiple events (compound heterozygosity) was not considered. Histograms of gene classification fractions for 1412 genes having at least 5 recurrent mutations. Dashed vertical lines denote the 5th (top) and 95th (other) percentiles of each distribution. No mutations occurring at multiplicity > 1were observed in NF1 (not shown).

Figure 6

Figure 6. Incidence and timing of whole genome doubling events in primary cancers

a, b, Ploidy estimates were obtained from ABSOLUTE. Mean homologue imbalance was calculated as the average difference in the homologous copy numbers at every position in the genome. Genome doubling status was inferred from the homologous copy numbers (Online Methods, Supplementary Fig. 9). c, MPD – myeloproliferative disease, ALL – acute lymphoblastic leukemia, GBM - Glioblastomamultiforme, RCC - renal cell carcinoma, HCC - hepatocellular carcinoma, HGS-OvCa - high-gradeserous ovarian carcinoma. d, LOH (loss of heterozygosity) was defined as 0 allelic copies. Amplification was defined as > 1allelic copy for samples with 0 genome doublings, and as > 2 allelic copies for those with 1genome doubling. Calls were made based on the modal allelic copy numbers of eachchromosome arm. Dashed lines indicate _y_=x. e, SCNAs, defined as regions differing from the modal absolute copy number of each sample,were binned at adaptive resolution to maintain 200 SCNAs per bin, and renormalized by binlength. The value in each bin was further divided by the number of tumor samples in eachgenome doubling class, indicated by color as in a. The black line indicates slope = −1. Linearregression models were fit independently for each class using SCNAs 0.5 < _x_ < 20 Mb. Thisresulted in fitted slope values of -1.05, -0.96, and -0.88 for 0, 1, and > 1 genome doublings,respectively (not shown).

Figure 7

Figure 7. Genetic and clinical associations with genome doubling in primary HGS-OvCa samples

a-e, Colors correspond to putative genome doubling status, as indicated. Significance codes: **– P < 10-5, * – _P_ < 0.05, NS – _P_ > 0.05. a-c, Number of mutations in indicated classes as a function of genome doublings. _P_-values were calculated with the two-sided Wilcoxin rank-sum test comparing samples with 0 and 1 genome doublings. Error bars indicate standard errors of the means. d, _P_-values were calculated with the two-sided Wilcoxin rank-sum test. e, _P_-values were calculated using the log-rank test.

Comment in

Similar articles

Cited by

References

    1. Pinkel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20:207–211. - PubMed
    1. Mei R, et al. Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome Res. 2000;10:1126–1137. - PMC - PubMed
    1. Lindblad-Toh K, et al. Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nat Biotechnol. 2000;18:1001–1005. - PubMed
    1. Zhao X, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 2004;64:3060–3071. - PubMed
    1. Bignell GR, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004;14:287–295. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources