The life history of 21 breast cancers - PubMed (original) (raw)

. 2012 May 25;149(5):994-1007.

doi: 10.1016/j.cell.2012.04.023. Epub 2012 May 17.

Peter Van Loo, David C Wedge, Ludmil B Alexandrov, Christopher D Greenman, King Wai Lau, Keiran Raine, David Jones, John Marshall, Manasa Ramakrishna, Adam Shlien, Susanna L Cooke, Jonathan Hinton, Andrew Menzies, Lucy A Stebbings, Catherine Leroy, Mingming Jia, Richard Rance, Laura J Mudie, Stephen J Gamble, Philip J Stephens, Stuart McLaren, Patrick S Tarpey, Elli Papaemmanuil, Helen R Davies, Ignacio Varela, David J McBride, Graham R Bignell, Kenric Leung, Adam P Butler, Jon W Teague, Sancha Martin, Goran Jönsson, Odette Mariani, Sandrine Boyault, Penelope Miron, Aquila Fatima, Anita Langerød, Samuel A J R Aparicio, Andrew Tutt, Anieta M Sieuwerts, Åke Borg, Gilles Thomas, Anne Vincent Salomon, Andrea L Richardson, Anne-Lise Børresen-Dale, P Andrew Futreal, Michael R Stratton, Peter J Campbell; Breast Cancer Working Group of the International Cancer Genome Consortium

Affiliations

The life history of 21 breast cancers

Serena Nik-Zainal et al. Cell. 2012.

Erratum in

Abstract

Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer's life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer's lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer's development, triggering diagnosis.

Copyright © 2012 Elsevier Inc. All rights reserved.

PubMed Disclaimer

Figures

None

Graphical abstract

Figure 1

Figure 1

Genomic Architecture of PD4120a, a Breast Cancer Genome Sequenced to 188-Fold Coverage (A) Copy number profile of the sample, with the upper panel showing the logR of intensity and the middle panel showing the B allele fraction (BAF) of germline heterozygous SNPs. Genomic segments of constant logR and BAF value were identified by the ASCAT algorithm (green lines). These were interpreted to give estimated overall copy number (purple lines) and copy number of the minor allele (blue lines) across the genome (lower panel). (B) Distribution of 70,690 somatically acquired base substitutions according to the total number of reads across that base (x axis) and the fraction of those reads reporting the variant (y axis). Points are colored according to the chromosome the mutation derives from. (C) Statistical modeling of the distribution of clonal and subclonal mutations by a Bayesian Dirichlet process. The empiric histogram of mutations is shown in pale blue, with the fitted distribution as a dark green line. Also shown are the 95% posterior confidence intervals for the fitted distribution (pale green area). Four separate clusters of mutations, named A–D, are identified. (D) Estimated number of mutations found in clusters A–D, with the error bars representing the 95% posterior confidence intervals.

Figure 2

Figure 2

Subclonal Genetic Variation in PD4120a (A) Battenberg plots of allele fractions for phased parental haplotypes for four chromosomes. Germline SNPs are phased by imputation, with observed allele fraction for one phased chromosomal copy plotted in blue and the other in red. (B) Phasing of mutations (stars) with adjacent germline heterozygous SNPs (vertical lines) allows determination of whether a mutation is on the retained or subclonally deleted parental copy of a chromosome. (C) Distribution of somatically acquired base substitutions on chromosome 13 according to the total number of reads across that base (x axis) and the fraction of those reads reporting the variant (y axis). Points are colored according to whether the mutation derives from the retained copy of chromosome 13 (green points), the subclonally deleted copy of chromosome 13 (brown points) or whether it could not be phased with a nearby heterozygous SNP (black points).

Figure 3

Figure 3

Reconstructing the Evolution of PD4120a (A) Distribution of clonal and subclonal mutations phased onto specific chromosomes. The empiric histogram of mutations is shown in pale blue, with the fitted distribution and posterior intervals as dark green lines. (B) Allele fractions for pairs of subclonal mutations that are found on separate branches of the phylogenetic tree, by virtue of no sequencing read evincing both mutations together. Error bars represent the 95% confidence intervals for the observed fractions. (C) Allele fractions for pairs of subclonal mutations found in the same subclone, where one occurred temporally later than the other. Error bars represent the 95% confidence intervals for the observed fractions. (D) Reconstruction of the phylogenetic tree for PD4120a. The thickness of the branches reflects the proportion of tumor cells comprising that lineage. The length of the branches reflects the number of mutations specific to that lineage.

Figure 4

Figure 4

Timing of Copy Number Gains in 16 Informative Breast Cancer Genomes from the Ploidy of Mutations The point estimates of timing for specific copy number gains are shown as arrows colored by the type of chromosomal aberration, with 95% confidence intervals generated by bootstrapping shown as horizontal lines. Molecular time is shown as an arrow, with the timing estimated as a fraction of point mutation time.

Figure 5

Figure 5

Comparison of Early and Late Point Mutation Signatures in 14 Informative Breast Cancer Genomes (A) Stacked bar charts showing the fraction of early mutations (ploidy > 1) and late mutations (ploidy = 1) accounted for by each mutation type. The p values refer to the overall difference in distribution between early and late mutations (chi-square test). The numbers above each bar refer to the number of mutations in the early or late fraction. (B) Stacked bar charts showing comparison of mutational processes identified by nonnegative matrix factorization. The comparison is across early clonal mutations (ploidy > 1), late fully clonal mutations (ploidy = 1) and subclonal mutations (ploidy < 1) for eight samples. Signature A describes C>T mutations at Xp

C

pG trinucleotides. Signature B was composed predominantly of C>T, C>G mutations, and C>A mutations in a Tp

C

context. Signature C and Signature D were relatively uniform processes across all 96 possible mutated trinucleotides. Signature E specifically identifies C>G mutations at Tp

C

pA, Tp

C

pC, and Tp

C

pT trinucleotides. (C) Timing of kataegis mutation clusters in PD4103a for the amplicon involving chromosome 12 (left) and a TP53 deletion (right). The top panel shows the copy number profiles with genomic rearrangements. The lower panel shows the point mutations as filled black circles for C>∗ mutations in a TpC context (as for kataegis) and open circles for other types of mutation. The y axis denotes the variant allele fraction, divided by the colored bars into the proportions of reads derived from contaminating normal cells (gray bars) and the fraction coming from each copy of that segment in the tumor cells (multiple colored bars).

Figure 6

Figure 6

Subclonal Genetic Variation among 20 Breast Cancers (A) Bar chart showing point estimates and 95% posterior confidence intervals for the number of fully clonal mutations (blue bars), mutations found in 50%–95% tumor cells (pink bars), and 25%–50% tumor cells (green bars). (B) Distribution of clonal and subclonal mutations for three representative cancers. The empiric histogram of mutations is shown in pale blue, with the fitted distribution and 95% posterior intervals as dark green lines. (C) Subclonal copy number variation for the 20 breast cancer genomes, estimated by using the Battenberg algorithm. The height of each bar reflects the estimated copy number, and segments are colored by whether they show no subclonal variation (gray) or the estimated frequency of the minor subclone at the given region (green to yellow to brown).

Figure 7

Figure 7

A Model for Breast Cancer Development over Molecular Time The cancer evolves through acquisition of driver mutations (black stars), which produce clonal expansions. These driver mutations occur only infrequently in long-lived lineages of cells, which passively accumulate many mutations without expansion.

Figure S1

Figure S1

Subclonal Mutations in PD4120a, Related to Figure 1 (A) Observed distribution of mutation signatures for different values of the variant allele fraction, showing that even with rare mutations, the C>∗ signature in a TpC context is preserved. (B) Observed fraction of mutation signatures for different values of the variant allele fraction. Those levels of variant allele fraction that show a significantly different distribution from the distribution of fully clonal mutations are marked with an asterisk (∗). (C) Lattice plot showing the distribution of mutations separately for each copy number segment in the PD4120a genome. The x axis denotes the total number of reads covering the mutations and the y axis the variant allele fraction. The points are colored according to the spectrum of mutations, by using the key shown for (A).

Figure S2

Figure S2

Modeling Clusters of Subclonal Mutations, Related to Figure 1 (A) Mutations (blue histogram) from an in silico simulation of a tumor in which fully clonal mutations account for 20% of mutations, 40% mutations are found in a subclone representing 60% of tumor cells, 10% mutations in a subclone at 30% and 20% mutations in a subclone at 20% of tumor cells (pink bars). The simulated mutations have also been subject to correction for the sensitivity of detection at different fractions of tumor cells, hence there are fewer “observed” mutations at 20% of tumor cells than at 100% despite there being more “true” mutations at this level. Statistical modeling by a Bayesian Dirichlet process of the simulated mutations is shown as a dark green line. Also shown are the 95% posterior confidence intervals for the fitted distribution (pale green area). (B) Mutations (blue histogram) from an in silico simulation of a tumor in which there are 40 subclones, evenly spread from 0%–100% of tumor cells and each contributing 2.5% of mutations (pink bars). The simulated mutations have been subject to correction for the sensitivity of detection at different fractions of tumor cells, hence there are fewer “observed” mutations at 20% of tumor cells than at 100% despite there being the same number of “true” mutations at this level. Statistical modeling by a Bayesian Dirichlet process of the simulated mutations is shown as a dark green line. Also shown are the 95% posterior confidence intervals for the fitted distribution (pale green area). (C) Box and whisker plots showing the posterior distributions for the weights of each of the 30 clusters, ordered from greatest to least, for the two simulations shown in (D) and (E). The first simulation, based on four subclones, shows nonnegligible weights for the first 4–5 subclones, but rapidly tails to 0 thereafter. For the second simulation, based on 40 subclones, all at constant weight, the distribution of weights is much flatter, and does not hit 0 until beyond 15–20 clusters. (D) Box and whisker plots showing the posterior distributions for the weights of each of the 30 clusters, ordered from greatest to least, for PD4120a. The first 4–5 clusters show nonnegligible weights, but they tail rapidly to 0 thereafter. (E) Circle plot showing the copy number (black points) and rearrangements for a chromothripsis event involving chromosomes 2, 4, 18, and 21.

Figure S3

Figure S3

Subclonal copy number variation in PD4120a, Related to Figure 2 (A) Battenberg plots of allele fractions for phased parental haplotypes for various chromosomes. Each heterozygous germline SNP is phased into two possible parental states by imputation. The observed allele fraction for one phased chromosomal copy is plotted in blue and the other in red. For a chromosome, such as chromosome 5, showing no subclonal copy number variation, both parental copies are present at exactly equal proportions and the red and blue points are superimposed around an allele fraction of 0.5. For chromosomes showing subclonal copy number variation, such as chromosome 8 and chromosome 13, the parental copies are present at unequal ratios, leading to separation between the red and blue segments. The extent of separation is correlated with the fraction of tumor cells showing the chromosomal gain or loss. (B) Copy number profiles for logR and B allele fraction for chromosomes 2, 5, and 7 of PD4120a. Note that the logR value for chromosome 2 is virtually the same as for chromosome 7 and substantially lower than that for chromosome 5, indicating that, like chromosome 7, chromosome 2 is deleted in a sizable subclone of cells. However, the B allele fraction for chromosome 2 is exactly balanced at 0.5, implying that both parental copies are deleted in equal proportions.

Figure S4

Figure S4

Phasing Pairs of Subclonal Point Mutations, Related to Figure 3 (A) Phasing of subclonal mutations (stars) with other nearby subclonal mutations allows determination of whether they are in separate phylogenetic lineages, in which case no sequencing reads will report both variants together (mutually exclusive pair of mutations). (B) Similar phasing analysis can identify cases where the later subclonal mutation has arisen on an allele linked with a previous subclonal mutation. (C) Example of a mutually exclusive pair of mutations from PD4120a. Sequencing read pairs are shown as yellow and blue bars linked by a dotted line. Base calls varying from the reference genome are shown as red squares. Two nearby mutations, indicated by arrows, are never found on the same read pair. (D) Example of a pair of mutations showing subclonal evolution in PD4120a. The right-hand subclonal mutation occurred on an allele already carrying the left-hand mutation, as evidenced by the existence of reads reporting both together, the left-hand but not the right-hand mutation but never the right-hand mutation without the left-hand one. (E) A pair of mutations, both of which phase with the subclonally deleted copy of chromosome 13, but are mutually exclusive.

Figure S5

Figure S5

Timing of Chromosomal Gains and Genomic Amplifications, Related to Figure 4 (A) Forest plots showing the point estimates (diamonds) and 95% confidence intervals estimated by bootstrapping of particular chromosomes for two breast cancer genomes. The size of the diamond is proportional to the number of mutations considered, and the color by whether the chromosomal gain reflects uniparental disomy (blue) or tetraploid chromosomes (green). The estimates show significant heterogeneity for PD4248a (p < 0.0001) but not for PD4116a (p = 0.3), with the latter indicating the possibility of all the gains occurring as a single endoreduplication event. (B and C) Timing of ERBB2 genomic amplification for PD4199a (B) and PD4192a (C). Here, the top panel shows the copy number segments for the region of chromosome 17 around ERBB2. The lower panel shows the point mutations as black points, with the x axis reflecting the genomic position and the y axis the variant allele fraction. The 95% confidence intervals for the variant allele fraction are shown as vertical bars for each mutation. The allele fraction is divided by the colored bars into the proportions of reads derived from contaminating normal cells (gray bars) and the fraction coming from each of the copies of that segment in the tumor cells (the multiple bars from green to yellow to pink to white). Early mutations will be found relatively higher up these bars, whereas late ones will be seen toward the bottom of the variant allele fraction.

Figure S6

Figure S6

Comparison of Early and Late C>T Point Mutation Signatures in 14 Informative Breast Cancer Genomes, Related to Figure 5 Stacked bar charts showing the fraction of early mutations (ploidy > 1) and late mutations (ploidy = 1) accounted for by each mutation type. The p values refer to the overall difference in distribution between early and late C>T mutations (chi-square test). The numbers above each stacked bar denotes the number of early or late mutations analyzed.

Figure S7

Figure S7

Patterns of Subclonal Mutation in 20 Breast Cancer Genomes, Related to Figure 6 (A) Fitted three-parameter logistic curves to bootstrapped estimates of sensitivity for mutations at different levels of subclonality derived from each of the 20 breast cancer genomes. For the five samples colored blue the raw bootstrapped values are shown (as plus [+] symbols), to allow assessment of goodness-of-fit of the logistic curve to the raw data. (B) Comparison of the empiric distributions of subclonal mutations between PCR with deep pyrosequencing on the 454 platform and exome pull-down and sequencing for four patients. For each histogram, point mutations called in the original whole-genome sequencing were identified for which there was independent validation by either 454 sequencing or exome pull-down. The distributions of subclonality obtained from each validation method are then plotted in the relevant histogram. (C) Statistical modeling by a Bayesian Dirichlet process of the distribution of clonal and subclonal mutations for 16 breast cancers. The empiric histogram of mutations is shown in pale blue, with the fitted distribution as a dark green line. Also shown are the 95% posterior confidence intervals for the fitted distribution (pale green area).

Figure S8

Figure S8

Phylogenetic Trees for Three Breast Cancer Patients, Linked to Figure 7 (A) Phylogenetic tree and supporting data for PD3890a. The Battenberg plot for chromosome 7 shows evidence that the q arm shows loss of heterozygosity (LOH; 1+0) in 90% of tumor cells, with normal diploidy (1+1) in 10%. Because LOH is a one-directional “valve” (once lost, heterozygosity cannot be regained), it follows that diploidy is the ancestral state and LOH is the derived state. Hence, this is direct evidence of a subclone representing 90% of tumor cells, also carrying several other copy number changes and approximately 100 point mutations. The centromeric portion of 4q shows evidence for a mix of 2+1 copies in 80% of tumor cells and normal diploidy (1+1) in 20% of tumor cells. From the rearrangement data, this copy number gain is caused by a subclonal tandem duplication, implying that the 2+1 copy number state found in 80% tumor cells is the derived state. This indicates the existence of an 80% subclone, matched by a small cluster of point mutations seen on sequencing of the exome (inset). Finally, chromosome X, among others, shows evidence for a 62% subclone and 20% subclone. By repeated application of the pigeonhole principle, each of these subclones must be collinear on the phylogenetic tree. (B) Phylogenetic tree for PD4199a. (C) Phylogenetic tree for PD4005a.

Comment in

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Anderson K., Lutz C., van Delft F.W., Bateman C.M., Guo Y., Colman S.M., Kempski H., Moorman A.V., Titley I., Swansbury J. Genetic variegation of clonal architecture and propagating cells in leukaemia. Nature. 2011;469:356–361. - PubMed
    1. Armitage P., Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer. 1954;8:1–12. - PMC - PubMed
    1. Beroukhim R., Mermel C.H., Porter D., Wei G., Raychaudhuri S., Donovan J., Barretina J., Boehm J.S., Dobson J., Urashima M. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463:899–905. - PMC - PubMed
    1. Bignell G.R., Santarius T., Pole J.C., Butler A.P., Perry J., Pleasance E., Greenman C., Menzies A., Taylor S., Edkins S. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 2007;17:1296–1303. - PMC - PubMed

Supplemental References

    1. 1000 Genomes Project Consortium. (2010). A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073. - PMC - PubMed
    1. Baumbusch, L.O., Aarøe, J., Johansen, F.E., Hicks, J., Sun, H., Bruhn, L., Gunderson, K., Naume, B., Kristensen, V.N., Liestøl, K., et al. (2008). Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors. BMC Genomics 9, 379. - PMC - PubMed
    1. Dunson, D.B. (2010). Nonparametric Bayes applications to biostatistics. In Bayesian Nonparametrics, N.L. Hjort, C. Holmes, P. Müller, and S.G. Walker, eds. (Cambridge: Cambridge University Press).
    1. Greenman, C.D., Pleasance, E.D., Newman, S., Yang, F., Fu, B., Nik-Zainal, S., Jones, D., Lau, K.W., Carter, N., Edwards, P.A., et al. (2012). Estimation of rearrangement phylogeny for cancer genomes. Genome Res. 22, 346–361. - PMC - PubMed
    1. Howie, B.N., Donnelly, P., and Marchini, J. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources