TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data - PubMed (original) (raw)
. 2014 Nov;24(11):1881-93.
doi: 10.1101/gr.180281.114. Epub 2014 Jul 24.
Andrew Roth 2, Jaswinder Khattra 3, Julie Ho 4, Damian Yap 3, Leah M Prentice 4, Nataliya Melnyk 4, Andrew McPherson 2, Ali Bashashati 3, Emma Laks 3, Justina Biele 3, Jiarui Ding 5, Alan Le 3, Jamie Rosner 3, Karey Shumansky 3, Marco A Marra 6, C Blake Gilks 7, David G Huntsman 8, Jessica N McAlpine 9, Samuel Aparicio 10, Sohrab P Shah 11
Affiliations
- PMID: 25060187
- PMCID: PMC4216928
- DOI: 10.1101/gr.180281.114
TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data
Gavin Ha et al. Genome Res. 2014 Nov.
Abstract
The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN.
© 2014 Ha et al.; Published by Cold Spring Harbor Laboratory Press.
Figures
Figure 1.
Detection of subclonal deletions in whole-genome sequencing data of a triple negative breast cancer genome. Copy number is represented as the log ratio of tumor and normal read depth. Discrete copy number status shown is predicted as a hemizygous deletion (HEMD; green), copy neutral (NEUT; blue), or gain/amplification (AMP; red). Allelic ratios are computed as the proportion of reads matching the reference genome. The LOH status shown is heterozygous (HET; gray), LOH (green), copy neutral LOH (NLOH; blue), or allele-specific gain/amplification (ASCNA; red). Subclonal deletions are observed to have a weaker log ratio signal that is closer to zero and shows less spreading in allelic ratios (Deletion I) compared to clonal deletions (Deletion II); the sample cellular prevalence estimates (proportion of sample) for “Deletion I” indicate it is in a subclonal cluster “Z2.” “Deletion I” and “Deletion III” are clustered into the same subclonal cluster because they share similar signals, and therefore the same cellular prevalence in the data. “Deletion II” is present in all tumor cells, indicated by being in the clonal cluster “Z1.” Tumor cellularity of 84% (normal contamination of 16%) is denoted with a black horizontal line. The average tumor ploidy (haploid coverage factor) was estimated as 1.66 by genome-wide analysis (right). The log ratio and symmetric allelic ratio (max(reference reads, variant reads)/depth) for Gaussian kernel densities are shown for all deletions on Chr 2.
Figure 2.
Description of the TITAN probabilistic framework. (A) Representation of the aggregate copy number signal from mixed populations in a heterogeneous tumor sample. c is the aggregate signal that is composed of three components: normal population (white circles), tumor populations with the deletion (green decagons) and without the event (blue decagons). n is the normal proportion; s z is the tumor proportion for the z th clonal cluster that does not contain the event; c norm and c DEL are normal and tumor copy numbers. Therefore, (1 − s z) corresponds to the proportion of tumor harboring the event, also defined as the tumor cellular prevalence of the z th clonal cluster. (B) Analysis workflow for TITAN. Three inputs are required: (1) Heterozygous positions identified in the normal DNA predicted by genotyping tools such as SAMtools mpileup (Li et al. 2009); (2) reference counts a and read depth N are extracted at these positions from aligned reads in the tumor DNA sequence data; and (3) the tumor and normal read depths, N and N N, are normalized independently to correct GC content and mappability biases; log ratios l = log(N/N N) of the corrected read counts are computed. The output is the optimal sequence of CNA/LOH genotypes and clonal cluster memberships at each position. Model parameters for normal contamination n, tumor cellular prevalence s z, and tumor ploidy φ are estimated. (C) Probabilistic graphical model of TITAN. Shaded nodes are known or observed quantities; open nodes are random variables of unknown quantities. Arrows represent conditional dependence between random variables. Full details and definitions are in Methods and Supplemental Table 13. (D) Parameter trace of ω g,z and μ g,z when cellular prevalence varies. _s_1 and _s_2 are shown as the tumor cellular prevalence (i.e., transformed using 1 − s z). n is normal proportion and φ is average tumor ploidy. Each CNA/LOH genotype is shown (Supplemental Table 14) with the associated integer copy number in parentheses.
Figure 3.
Performance of TITAN in serial and merging simulations using real intratumoral samples from a HGS ovarian carcinoma. (A) Patient DG1136 had biopsies synchronously resected from four sites in the primary tumor of the right ovary and one site from the left pelvic sidewall metastasis. (B) Illustration demonstrating the expected proportions in a simulation of two tumor subpopulations. The tumor content of Sample a (80%) and Sample b (70%) inform the sample cellular prevalence in the merged Sample a + b. Events found in all samples of the mixture represent simulated clonal events. For example, the (green) deletion is present in 75% of the merged sample (or 100% of tumor cells) given that the normal proportion is 25%. Events present in a subset of samples in the mixture simulate subclonal events such as for the (red) gain unique to Sample a which is present in 40% of the merged sample or 53% of the tumor cells. (C–F) Performance of the serial mixture experiment between TITAN, APOLLOH (Ha et al. 2012) (which includes HMMcopy), Control-FREEC (Boeva et al. 2012), and BIC-seq (Xi et al. 2011). The mixture proportion includes 0.1:0.9, 0.2:0.8,…, 0.9:0.1 relative ratios of DG1136e:DG1136g. Precision (C) and recall (D) are shown for subclonal and clonal events averaged across gains, deletions, and LOH events. Recall performance for truth events found uniquely in Sample e (E) or Sample g (F) are shown. “Mixture Proportion” is defined as the ideal mixing fractions (e.g., 10%, 20%, etc.); expected tumor “cellular prevalence” is defined as the expected tumor contribution, at a given mixture proportion, from each individual sample making up the mixture. The expected tumor cellular prevalence shown was computed by adjusting the mixture proportion for tumor content of 67% and 56% for DG1136e and DG1136g, respectively. Ground truth events were identified in the individual samples of the mixture using APOLLOH/HMMcopy, and expected tumor cellular prevalence values are shown in Supplemental Table 3B. (G,H) Serial mixture performance for TITAN runs initialized with number of clusters ranging from one to five. Recall performance for events found uniquely in DG1136e (G) or DG1136g (H) represent events that are subclonal within the simulated mixture. Average recall across deletions, gains, and LOH events are shown. The one-cluster run represents the scenario in which only one tumor population exists. (I,J) Comparison of recall performance distributions across 10 paired (I) and 10 triplet (J) merging simulations for TITAN (T), APOLLOH/HMMcopy (A), and Control-FREEC (CF). Performance is shown for simulated subclonal events, which were present uniquely in exactly one (Subclonal 1) and exactly two (Subclonal 2) samples making up the mixture; and in contrast, clonally dominant events were present in all samples of the mixture (Clonal).
Figure 4.
Performance of TITAN tumor cellular prevalence estimates for serial (30×) and pairwise (60×)/triplet (90×) merging simulations of intratumor samples from a HGS ovarian carcinoma. Pearson correlation coefficients (r) and root mean squared error (RMSE) were computed for TITAN (A–C) and THetA (Oesper et al. 2013) (D,E). Correlation and RMSE were computed by comparing the cellular prevalences of the predicted clusters with the prevalence of the expected clusters across the mixture samples. Each data point represents an expected clonal cluster with a unique tumor cellular prevalence. Ground truth and expected tumor cellular prevalence values were computed from the tumor contribution from each individual sample making up the simulated mixture (Supplemental Table 3B–D).
Figure 5.
Fluorescence in situ hybridization (FISH) validation of TITAN predictions for Chromosomes 1 and 17 in DG1136g. (A) Subclonal hemizygous deletion, SC-DLOH-1, in Chromosome 1 was validated using BAC probe RP11-795A13 (orange, Chr1:69851036–70025173). Control probe for copy neutral regions was RP11-159J14 (green, Chr1:69454844–69606688). FISH imaging shows tumor cells with a deletion (green arrow) and diploid (white arrow) at this region. (B) Clonal deletion, C-DLOH-1, in Chromosome 17 was validated using the centromeric probe, CEP 17. The BAC probes RP11-147K16 (orange, Chr17:3294803–3452243) and RP11-982O5 (blue, Chr17:55475584–55662513) were used as controls. The majority of cells were observed to harbor the deletion. FISH count prevalence was computed as the proportion of nuclei with event:control count ratio that is <1 (deletion) or >1 (gain) (Supplemental Table 9H). FISH imaging is shown at 63× magnification. Copy number predictions are shown using log ratios (normalized tumor depth/normal depth). Copy neutral (blue), hemizygous deletion (green), and copy gain (red) predictions are shown. Cellular prevalence estimates for clonal cluster 1 (Z1) and cluster 2 (Z2) predicted by TITAN are shown; tumor cellularity is indicated by the black horizontal line.
Figure 6.
Single-cell validation of subclonal deletions in DG1136g using deep DNA sequencing of individual nuclei. (A,B) The 28 nuclei for Set1 and 18 nuclei for Set2 were designated as tumor and normal cell type using the status of mutations. The mutant allele ratio (variant reads/depth) for mutations and symmetric allele ratio (max(reference reads, variant reads)/depth) for SNP positions are shown for Set1 (A) and Set2 (B) events. Low coverage positions are shaded in gray. (C,D) The LOH status for each event for Set1 (C) and Set2 (D) were determined using the binomial test for dropout and Wilcoxon rank sum test for allelic ratios. TP53 mutation status is shown. The LOH status for each heterozygous (HET) and LOH (C-DLOH, SC-DLOH) event is shown. “Tumor” nuclei having the LOH event (green) or not having the event (blue) are shown to illustrate the original three-component mixture model (Fig. 2A). Normal nuclei are designated “Normal” (white). Unknown events (gray) were inconclusive for HET or LOH status. See Supplemental Methods for details.
Similar articles
- CLImAT-HET: detecting subclonal copy number alterations and loss of heterozygosity in heterogeneous tumor samples from whole-genome sequencing data.
Yu Z, Li A, Wang M. Yu Z, et al. BMC Med Genomics. 2017 Mar 15;10(1):15. doi: 10.1186/s12920-017-0255-4. BMC Med Genomics. 2017. PMID: 28298214 Free PMC article. - MixClone: a mixture model for inferring tumor subclonal populations.
Li Y, Xie X. Li Y, et al. BMC Genomics. 2015;16 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-16-S2-S1. Epub 2015 Jan 21. BMC Genomics. 2015. PMID: 25707430 Free PMC article. - VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.
Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. Koboldt DC, et al. Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2. Genome Res. 2012. PMID: 22300766 Free PMC article. - Principles of Reconstructing the Subclonal Architecture of Cancers.
Dentro SC, Wedge DC, Van Loo P. Dentro SC, et al. Cold Spring Harb Perspect Med. 2017 Aug 1;7(8):a026625. doi: 10.1101/cshperspect.a026625. Cold Spring Harb Perspect Med. 2017. PMID: 28270531 Free PMC article. Review. - Methods for copy number aberration detection from single-cell DNA-sequencing data.
Mallory XF, Edrisi M, Navin N, Nakhleh L. Mallory XF, et al. Genome Biol. 2020 Aug 17;21(1):208. doi: 10.1186/s13059-020-02119-8. Genome Biol. 2020. PMID: 32807205 Free PMC article. Review.
Cited by
- A comprehensive molecular characterization of a claudin-low luminal B breast tumor.
Giovannini S, Smirnov A, Concetti L, Scimeca M, Mauriello A, Bischof J, Rovella V, Melino G, Buonomo CO, Candi E, Bernassola F. Giovannini S, et al. Biol Direct. 2024 Aug 16;19(1):66. doi: 10.1186/s13062-024-00482-1. Biol Direct. 2024. PMID: 39152485 Free PMC article. - An enhanced genetic model of colorectal cancer progression history.
Yang L, Wang S, Lee JJ, Lee S, Lee E, Shinbrot E, Wheeler DA, Kucherlapati R, Park PJ. Yang L, et al. Genome Biol. 2019 Aug 15;20(1):168. doi: 10.1186/s13059-019-1782-4. Genome Biol. 2019. PMID: 31416464 Free PMC article. - Integrative genomic analysis of matched primary and metastatic pediatric osteosarcoma.
Negri GL, Grande BM, Delaidelli A, El-Naggar A, Cochrane D, Lau CC, Triche TJ, Moore RA, Jones SJ, Montpetit A, Marra MA, Malkin D, Morin RD, Sorensen PH. Negri GL, et al. J Pathol. 2019 Nov;249(3):319-331. doi: 10.1002/path.5319. Epub 2019 Aug 28. J Pathol. 2019. PMID: 31236944 Free PMC article. - The Integrated Genomic Landscape of Thymic Epithelial Tumors.
Radovich M, Pickering CR, Felau I, Ha G, Zhang H, Jo H, Hoadley KA, Anur P, Zhang J, McLellan M, Bowlby R, Matthew T, Danilova L, Hegde AM, Kim J, Leiserson MDM, Sethi G, Lu C, Ryan M, Su X, Cherniack AD, Robertson G, Akbani R, Spellman P, Weinstein JN, Hayes DN, Raphael B, Lichtenberg T, Leraas K, Zenklusen JC; Cancer Genome Atlas Network; Fujimoto J, Scapulatempo-Neto C, Moreira AL, Hwang D, Huang J, Marino M, Korst R, Giaccone G, Gokmen-Polar Y, Badve S, Rajan A, Ströbel P, Girard N, Tsao MS, Marx A, Tsao AS, Loehrer PJ. Radovich M, et al. Cancer Cell. 2018 Feb 12;33(2):244-258.e10. doi: 10.1016/j.ccell.2018.01.003. Cancer Cell. 2018. PMID: 29438696 Free PMC article. - Exome and genome sequencing of nasopharynx cancer identifies NF-κB pathway activating mutations.
Li YY, Chung GT, Lui VW, To KF, Ma BB, Chow C, Woo JK, Yip KY, Seo J, Hui EP, Mak MK, Rusan M, Chau NG, Or YY, Law MH, Law PP, Liu ZW, Ngan HL, Hau PM, Verhoeft KR, Poon PH, Yoo SK, Shin JY, Lee SD, Lun SW, Jia L, Chan AW, Chan JY, Lai PB, Fung CY, Hung ST, Wang L, Chang AM, Chiosea SI, Hedberg ML, Tsao SW, van Hasselt AC, Chan AT, Grandis JR, Hammerman PS, Lo KW. Li YY, et al. Nat Commun. 2017 Jan 18;8:14121. doi: 10.1038/ncomms14121. Nat Commun. 2017. PMID: 28098136 Free PMC article.
References
- Aparicio S, Caldas C. 2013. The implications of clonal genome evolution for cancer medicine. N Engl J Med 368: 842–851 - PubMed
- Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 57: 289–300
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources