Simultaneous identification of multiple driver pathways in cancer - PubMed (original) (raw)

Simultaneous identification of multiple driver pathways in cancer

Mark D M Leiserson et al. PLoS Comput Biol. 2013.

Abstract

Distinguishing the somatic mutations responsible for cancer (driver mutations) from random, passenger mutations is a key challenge in cancer genomics. Driver mutations generally target cellular signaling and regulatory pathways consisting of multiple genes. This heterogeneity complicates the identification of driver mutations by their recurrence across samples, as different combinations of mutations in driver pathways are observed in different samples. We introduce the Multi-Dendrix algorithm for the simultaneous identification of multiple driver pathways de novo in somatic mutation data from a cohort of cancer samples. The algorithm relies on two combinatorial properties of mutations in a driver pathway: high coverage and mutual exclusivity. We derive an integer linear program that finds set of mutations exhibiting these properties. We apply Multi-Dendrix to somatic mutations from glioblastoma, breast cancer, and lung cancer samples. Multi-Dendrix identifies sets of mutations in genes that overlap with known pathways - including Rb, p53, PI(3)K, and cell cycle pathways - and also novel sets of mutually exclusive mutations, including mutations in several transcription factors or other genes involved in transcriptional regulation. These sets are discovered directly from mutation data with no prior knowledge of pathways or gene interactions. We show that Multi-Dendrix outperforms other algorithms for identifying combinations of mutations and is also orders of magnitude faster on genome-scale data. Software available at: http://compbio.cs.brown.edu/software.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. The Multi-Dendrix pipeline.

Multi-Dendrix analyzes integrated mutation data from a variety of sources including single-nucleotide mutations and copy number aberrations. Multiple gene set are identified using a combinatorial optimization approaches. The output is analyzed for subtype-specific mutations and summarized across multiple values of the parameters: formula image, number of gene sets, and formula image, maximum size per gene set.

Figure 2

Figure 2. Multi-Dendrix results on the GBM dataset.

(Left) Nodes represent genes in four modules found by Multi-Dendrix using formula image gene sets of minimize size formula image and maximum size formula image. Genes with “(A)” appended are amplification events, genes with “(D)” appended are deletion events, and genes with no annotation are SNVs. Edges connect genes that appear in the same gene set for more than one value of the parameters, with labels indicating the fraction of parameter values for which the pair of genes appear in the same gene set. Color of nodes indicates membership in three signaling pathways noted in as important for GBM: RB, p53, and RTK/RAS/PI(3)K signaling. Shape of nodes indicates genes whose mutations are associated with specific GBM subtypes, and dashed edges connect genes associated with different subtypes. The direct interactions statistic formula image of this collection of gene sets is significant (formula image). (Middle) Known interactions between proteins in each set and formula image-value for observed number of interactions. (Right) Mutation matrix for each of four modules with mutual exclusive (blue) and co-occurring mutations (orange).

Figure 3

Figure 3. Multi-Dendrix results on the BRCA dataset.

Graphical elements are as described in Figure 2 caption, except for the following. Color of nodes indicates membership in four signaling pathways noted in as important for BRCA: p53 signaling, PI(3)K/AKT signaling, cell cycle checkpoints, and p38-JNK1. The top row of each mutation matrix annotates the subtype of each patient. The regulatory interaction between GATA3 and CDH1 is shown as a dashed line. The direct interactions statistic formula image of this collection of gene sets is significant (formula image).

Similar articles

Cited by

References

    1. Gonzalez-Perez A, Lopez-Bigas N (2012) Functional impact bias reveals cancer drivers. Nucleic acids research 40: 1–10. - PMC - PubMed
    1. Adzhubei IA, Scmidt S, Peshkin L, Ramensky VE, Gerasimoa A, et al. (2010) A method and server for predicting damaging missense mutations. Nature methods 7: 248–249. - PMC - PubMed
    1. Reva B, Antipin Y, Sander C (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic acids research 39: e118. - PMC - PubMed
    1. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols 4: 1073–81. - PubMed
    1. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, et al. (2006) The consensus coding sequences of human breast and colorectal cancers. Science 314: 268–274. - PubMed

Publication types

MeSH terms

Grants and funding

This work is supported by NSF grant IIS-1016648. BJR is supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, an Alfred P. Sloan Research Fellowship, and an NSF CAREER Award (CCF-1053753). RS was supported by a research grant from the Israel Science Foundation (grant no. 241/11). MDML was supported by NSF GRFP DGE 0228243. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources