Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations - PubMed (original) (raw)

Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations

Hannah Carter et al. Cancer Res. 2009.

Abstract

Large-scale sequencing of cancer genomes has uncovered thousands of DNA alterations, but the functional relevance of the majority of these mutations to tumorigenesis is unknown. We have developed a computational method, called Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM), to identify and prioritize those missense mutations most likely to generate functional changes that enhance tumor cell proliferation. The method has high sensitivity and specificity when discriminating between known driver missense mutations and randomly generated missense mutations (area under receiver operating characteristic curve, >0.91; area under Precision-Recall curve, >0.79). CHASM substantially outperformed previously described missense mutation function prediction methods at discriminating known oncogenic mutations in P53 and the tyrosine kinase epidermal growth factor receptor. We applied the method to 607 missense mutations found in a recent glioblastoma multiforme sequencing study. Based on a model that assumed the glioblastoma multiforme mutations are a mixture of drivers and passengers, we estimate that 8% of these mutations are drivers, causally contributing to tumorigenesis.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Principal components analysis of nsSNPs vs. synthetic passenger mutations

Synthetic passenger mutations (red) and high MAF nsSNPs from the HapMap project (blue) have substantial overlap in the space defined by principal components one, three, and four, but there are regions in the space occupied only by high MAF nsSNPs and regions occupied only by synthetic passengers.

Figure 2

Figure 2. ROC and PR curves calculated for A) CHASM, B) PolyPhen PSIC, and C) SIFT on the training set mutations

CHASM training out-of-bag scores were used to generate the ROC and PR curves in A). A color version is available as Supplementary Figure 6.

Figure 3

Figure 3. ROC and PR curves calculated for A) CHASM, B) PolyPhen PSIC, and C) SIFT on TP53 and synthetic passenger mutations held out of the CHASM training set

A color version is available as Supplementary Figure 7.

Figure 4

Figure 4. ROC and PR curves calculated for A) CHASM, B) PolyPhen PSIC, and C) SIFT on EGFR and synthetic passenger mutations held out of the CHASM training set

A color version is available as Supplementary Figure 8.

Figure 5

Figure 5. Histograms of CHASM scores for driver mutations and passenger mutations held out from the training set, and 607 mutations experimentally identified in GBM

Estimated kernel density for each set of scores (solid line) and fitted mixture of the driver and passenger score densities (dashed line) are shown superimposed on the histograms.

Similar articles

Cited by

References

    1. Greenman C, Stephens P, Smith R, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446(7132):153–8. - PMC - PubMed
    1. Jones S, Zhang Z, Parsons DW, et al. Core signaling pathways in human pancreatic cancer revealed by tumor genome analysis. Science. 2008;321(5897):1801–06. - PMC - PubMed
    1. Kaminker JS, Zhang Y, Waugh A, et al. Distinguishing Cancer-Associated Missense Mutations from Common Polymorphisms. Cancer Res. 2007;67(2):465–73. - PubMed
    1. Parsons DW, Jones S, Zhang X, et al. An integrated genomic analysis of glioblastoma multiforme. Science. 2008;321(5897):1807–12. - PMC - PubMed
    1. Sjoblom T, Jones S, Wood LD, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314(5797):268–74. - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources