Absolute quantification of somatic DNA alterations in human cancer - PubMed (original) (raw)

Kristian Cibulskis, Elena Helman, Aaron McKenna, Hui Shen, Travis Zack, Peter W Laird, Robert C Onofrio, Wendy Winckler, Barbara A Weir, Rameen Beroukhim, David Pellman, Douglas A Levine, Eric S Lander, Matthew Meyerson, Gad Getz

Affiliations

PMID: 22544022
PMCID: PMC4383288
DOI: 10.1038/nbt.2203

Absolute quantification of somatic DNA alterations in human cancer

Scott L Carter et al. Nat Biotechnol. 2012 May.

Abstract

We describe a computational method that infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations. The method, named ABSOLUTE, can detect subclonal heterogeneity and somatic homozygosity, and it can calculate statistical sensitivity for detection of specific aberrations. We used ABSOLUTE to analyze exome sequencing data from 214 ovarian carcinoma tumor-normal pairs. This analysis identified both pervasive subclonal somatic point-mutations and a small subset of predominantly clonal and homozygous mutations, which were overrepresented in the tumor suppressor genes TP53 and NF1 and in a candidate tumor suppressor gene CDK12. We also used ABSOLUTE to infer absolute allelic copy-number profiles from 3,155 diverse cancer specimens, revealing that genome-doubling events are common in human cancer, likely occur in cells that are already aneuploid, and influence pathways of tumor progression (for example, with recessive inactivation of NF1 being less common after genome doubling). ABSOLUTE will facilitate the design of clinical sequencing studies and studies of cancer genome evolution and intra-tumor heterogeneity.

PubMed Disclaimer

Figures

Figure 1. Overview of tumor DNA analysis using ABSOLUTE

A constant mass of DNA is extracted from a heterogeneous cell population consisting of cancer and normal cells. This DNA is profiled using either microarray or massively parallel sequencing technology, giving a genome-wide profile of DNA concentrations (blue lines). ABSOLUTE uses statistical models of recurrent cancer karyotypes to interpret the DNA concentrations as discrete copy states corresponding to predominantly clonal somatic copy number alterations, although some subclonal alterations are often present. If somatic point mutation data are available (from sequencing of the DNA), then the allelic fractions (fraction of sequencing reads bearing the non-reference allele) of these mutations may be used help to interpret the DNA concentrations. In addition, the allelic fractions may be reinterpreted as integer allelic copies per cancer cell (multiplicity), potentially revealing subclonal point mutations.

Figure 2. ABSOLUTE method validation and comparison a-d Performance of ABSOLUTE and ASCAT on 4 validation assays

RMSE: root mean squared error. _P_-values were calculated on the squared errors using the paired one-sided Wilcoxon test (*: P < 0.05, **: P < 0.001). See Supplementary Note 1 for the ASCAT2.1 protocol. a, FACS-based ploidy measurements vs. inferred ploidy estimates for 37 primary tumor samples.Dashed line indicates _y_=x. b, SKY-based ploidy measurements vs. inferred ploidy estimates for 33 cancer cell-lines. Dataare displayed as in a. c, Estimated purity of the 33 cell lines shown in (b) Dashed horizontal line indicates the truepurity (1.0). d, Cancer-normal DNA mixing experiment results for two cell lines. DNA from each cancercell line was mixed with DNA from the matched B-lymphocyte in varying proportions (_x_-axis).(top) predicted vs. true DNA mixing fractions compared to the _y_=x line (dashed). (bottom)predicted cancer cell-line ploidy vs. mixture purity. The copy-profile of several samples wasmisinterpreted (x's); these points were not included in the RMSE calculations. Ploidy estimateswere generally consistent with previous SKY analysis of these cell lines:

http://www.path.cam.ac.uk/∼pawefish/cell%20line%20catalogues/breast-cell-lines.htm

. e, Leukocyte methylation signature enrichment in tumors of histologicaly underestimatedpurity. HGS-OvCa samples are shown grouped according to the indicated histological purityestimates (_x_-axis). Black horizontal lines indicate the median purity of each group, asestimated by ABSOLUTE (_y_-axis). The color of each point corresponds to the degree to whichthat sample's methylation profile resembled that of purified leukocytes (Online Methods).

Figure 3. Pan-cancer application of ABSOLUTE

a, ABSOLUTE result types: (i) ‘called’ -- unique purity/ploidy solution; (ii) ‘non-aberrant’ --sample has no detectable somatic copy-number alterations; (iii) ‘insufficient purity’ – insufficient fraction of cancer cells; (iv) ‘polygenomic’ discrete copy-ratio levels could not be determined. See Online Methods and Supplementary Fig. 5 for a description and examples of each result type. b, Distribution of estimated tumor purity for several datasets. The number of called tumor samples in each group is shown in parentheses. We note that, because heavily contaminated tumors are difficult to call using ABSOLUTE, several of these distributions are biased towards higher purity samples. c, The number of called tumor samples in each group is shown in parentheses. Because tumors without SCNAs cannot be called using ABSOLUTE, these distributions do not incorporate the prevalence of such samples.

Figure 4. Characterization of subclonal evolution in ovarian cancer by integrative analysis of SNP array and whole-exome sequencing data

a, Histogram of allelic fraction (alternate/total read-count) values for 29,628 somatic point-mutations detected in 214 primary HGS-OvCa samples. b, Allelic fractions for the mutations shown in (a) were converted to point estimates of integer allele-counts per cancer cell (cellular multiplicity; _x_-axis) by correcting for sample purity and local copy-numbers. Subclonal mutations were identified using the model defined in equation 10. c, The fraction of each of the 6 distinguishable nucleotide substitutions for clonal vs. subclonal point-mutations. The solid grey line indicates _y_=x. RMSE: root mean squared error. d-f, Analysis of distinct subclonal populations in HGS-OvCa sample TCGA-24-1603 (purity=0.96, ploidy= 1.75). d, Tumor SCNA profile with modeled absolute copy-numbers, as in Supplementary Fig. 1c,h. Regions of normal homologous copy-number = 1 are grayed out, clonal SCNAs are brown.Subclonal SCNAs (light blue) appear in several clusters (arrows). e, Point mutation allelic-fraction profile. Each solid curve corresponds to a single mutation, withthe density according to the posterior (Beta) distribution implied by the observed allelic fractionand local read depth (Online Methods, Eq. 12). Color indicates degree of classification as clonalor subclonal, as in (b). Dashed curves indicate summed density of individual posteriors. f, SCNAs from (d) and point mutations from (e) were rescaled to units of cancer cell fraction. Subclonal cancer cell fractions of ∼0.2, 0.3, and 0.6 are supported both by SCNAs and point mutations (purple, blue, and orange arrows, respectively; see corresponding copy ratios in d).

Figure 5. Classification of somatic mutations by multiplicity analysis in 214 primary HGS-OvCa tumor samples

a, Empirical density estimate of allelic concentration-ratios, which are obtained by multiplication of the allelic fraction by the copy-ratio at that locus. b, Density estimate of allelic multiplicity estimates, as in Fig. 4b, for reference vs. mutant allele.Mutations were classified into the four indicated categories according to their mutant andreference allele multiplicity. c, The density estimates of allelic concentration-ratios are shown for each of the four mutation classes in b are shown superimposed. d, Mutation classification profiles of genes identified as significantly recurrent in HGS-OvCa, aswell as several COSMIC genes with previously observed mutations in these data. Note that onlyindividual point mutations were considered here; the possibility of recessive inactivation viamultiple events (compound heterozygosity) was not considered. Histograms of gene classification fractions for 1412 genes having at least 5 recurrent mutations. Dashed vertical lines denote the 5th (top) and 95th (other) percentiles of each distribution. No mutations occurring at multiplicity > 1were observed in NF1 (not shown).

Figure 6. Incidence and timing of whole genome doubling events in primary cancers

a, b, Ploidy estimates were obtained from ABSOLUTE. Mean homologue imbalance was calculated as the average difference in the homologous copy numbers at every position in the genome. Genome doubling status was inferred from the homologous copy numbers (Online Methods, Supplementary Fig. 9). c, MPD – myeloproliferative disease, ALL – acute lymphoblastic leukemia, GBM - Glioblastomamultiforme, RCC - renal cell carcinoma, HCC - hepatocellular carcinoma, HGS-OvCa - high-gradeserous ovarian carcinoma. d, LOH (loss of heterozygosity) was defined as 0 allelic copies. Amplification was defined as > 1allelic copy for samples with 0 genome doublings, and as > 2 allelic copies for those with 1genome doubling. Calls were made based on the modal allelic copy numbers of eachchromosome arm. Dashed lines indicate _y_=x. e, SCNAs, defined as regions differing from the modal absolute copy number of each sample,were binned at adaptive resolution to maintain 200 SCNAs per bin, and renormalized by binlength. The value in each bin was further divided by the number of tumor samples in eachgenome doubling class, indicated by color as in a. The black line indicates slope = −1. Linearregression models were fit independently for each class using SCNAs 0.5 < _x_ < 20 Mb. Thisresulted in fitted slope values of -1.05, -0.96, and -0.88 for 0, 1, and > 1 genome doublings,respectively (not shown).

Figure 7. Genetic and clinical associations with genome doubling in primary HGS-OvCa samples

a-e, Colors correspond to putative genome doubling status, as indicated. Significance codes: **– P < 10-5, * – _P_ < 0.05, NS – _P_ > 0.05. a-c, Number of mutations in indicated classes as a function of genome doublings. _P_-values were calculated with the two-sided Wilcoxin rank-sum test comparing samples with 0 and 1 genome doublings. Error bars indicate standard errors of the means. d, _P_-values were calculated with the two-sided Wilcoxin rank-sum test. e, _P_-values were calculated using the log-rank test.

Comment in

Genetics: Understanding the ABSOLUTE genome.
Razzak M. Razzak M. Nat Rev Clin Oncol. 2012 May 15;9(7):370. doi: 10.1038/nrclinonc.2012.86. Nat Rev Clin Oncol. 2012. PMID: 22585000 No abstract available.
ABSOLUTE cancer genomics.
Van Loo P, Campbell PJ. Van Loo P, et al. Nat Biotechnol. 2012 Jul 10;30(7):620-1. doi: 10.1038/nbt.2293. Nat Biotechnol. 2012. PMID: 22781683 Free PMC article.

Cited by

Aneuploidy as a driver of human cancer.
Sdeor E, Okada H, Saad R, Ben-Yishay T, Ben-David U. Sdeor E, et al. Nat Genet. 2024 Oct 2. doi: 10.1038/s41588-024-01916-2. Online ahead of print. Nat Genet. 2024. PMID: 39358600 Review.
Prognostic and therapeutic implications of tumor-restrictive type III collagen in the breast cancer microenvironment.
Stewart DC, Brisson BK, Dekky B, Berger AC, Yen W, Mauldin EA, Loebel C, Gillette D, Assenmacher CA, Quincey C, Stefanovski D, Cristofanilli M, Cukierman E, Burdick JA, Borges VF, Volk SW. Stewart DC, et al. NPJ Breast Cancer. 2024 Oct 2;10(1):86. doi: 10.1038/s41523-024-00690-y. NPJ Breast Cancer. 2024. PMID: 39358397 Free PMC article.
Proteogenomic characterization of skull-base chordoma.
Zhang Q, Xu Z, Han R, Wang Y, Ye Z, Zhu J, Cai Y, Zhang F, Zhao J, Yao B, Qin Z, Qiao N, Huang R, Feng J, Wang Y, Rui W, He F, Zhao Y, Ding C. Zhang Q, et al. Nat Commun. 2024 Sep 27;15(1):8338. doi: 10.1038/s41467-024-52285-7. Nat Commun. 2024. PMID: 39333076 Free PMC article.
Prediction of the 3D cancer genome from whole-genome sequencing using InfoHiC.
Lee Y, Park SH, Lee H. Lee Y, et al. Mol Syst Biol. 2024 Sep 25. doi: 10.1038/s44320-024-00065-2. Online ahead of print. Mol Syst Biol. 2024. PMID: 39322849
An elevated rate of whole-genome duplications in cancers from Black patients.
Brown LM, Hagenson RA, Koklič T, Urbančič I, Qiao L, Strancar J, Sheltzer JM. Brown LM, et al. Nat Commun. 2024 Sep 19;15(1):8218. doi: 10.1038/s41467-024-52554-5. Nat Commun. 2024. PMID: 39300140 Free PMC article.

References

1. Pinkel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20:207–211. - PubMed
1. Mei R, et al. Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome Res. 2000;10:1126–1137. - PMC - PubMed
1. Lindblad-Toh K, et al. Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nat Biotechnol. 2000;18:1001–1005. - PubMed
1. Zhao X, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 2004;64:3060–3071. - PubMed
1. Bignell GR, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004;14:287–295. - PMC - PubMed

Absolute quantification of somatic DNA alterations in human cancer - PubMed (original) (raw)