HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology - PubMed (original) (raw)

HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology

Raunak Shrestha et al. Genome Res. 2017 Sep.

Abstract

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology.

© 2017 Shrestha et al.; Published by Cold Spring Harbor Laboratory Press.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Overview of HIT'nDRIVE algorithmic framework. (A) HIT'nDRIVE integrates sequence-wise changes in genome with expression changes in transcriptome obtained from patients’ tumor samples. The influence values derived from the protein interaction network indicate how likely a driver gene influences its downstream target genes in the network. (B) The predicted driver genes are used as seeds to discover modules of genes that discriminate between the sample phenotypes using OptDis. (C) Based on this, the driver modules are ranked and thus prioritized.

Figure 2.

Figure 2.

Summary of driver genes prioritized by HIT'nDRIVE. (A) Distribution of predicted driver genes in cancer genes databases. The CGC database contains genes for which mutations have been causally implicated in cancer. Genes curated in the CGC database represent likely drivers of cancer. COSMIC is a comprehensive database of somatic mutations that have been reported in different cancers. However, every gene present in COSMIC database may not represent drivers of cancer. (B) Distribution of driver genes in druggable genes databases. Actionable genes in cancer therapy were derived from the TARGET database. List of druggable genes were extracted from DGI database. (A,B) The numbers in the panel represent the number of genes in respective categories. (C) Distribution of patient druggability. Patient druggability was accessed using information in the TARGET and DGI databases. The numbers in the panel represent the number of patients in respective categories.

Figure 3.

Figure 3.

Network properties of driver genes. (A) The centrality of the predicted drivers in STRING v10 network. The size of the circles is proportional to the alteration frequency of the driver gene. The color scale represents the total influence of the driver gene on the expression outliers. (B) Correlation between influence and centrality. Each dot represents a target node receiving a certain amount of influence from all source nodes in the network. A lowess regression line is represented in blue. (C) Correlation between incoming and outgoing influence of a node. Each dot represents a node in the network, and the color scale represents its betweenness centrality. A linear regression line is represented in blue. (D) Boxplot of the total influence of driver genes predicted by HIT'nDRIVE on the expression outliers compared with that of other altered genes (genes not predicted as drivers). (E) Correlation between gene influence and its alteration frequency in the respective patient cohort. (F) Relative influence of driver genes in each patient in GBM cohort with mutation in ABCB1. (G) Relative influence of driver genes in each patient in PRAD cohort with mutation in BRAF. All gene influence values have been multiplied by 105 before log transformation.

Figure 4.

Figure 4.

Phenotype classification using driver-seeded modules. (A) Phenotype (tumor vs. normal) classification accuracy in gene-expression data sets of different cancer types using three different methods: HIT'nDRIVE-unsupervised (left), HITn'DRIVE-OptDis (middle), and DriverNet (right). (B) Comparison of HIT'nDRIVE with DriverNet.

Figure 5.

Figure 5.

BRCA subtype classification using driver modules. (A) Performance accuracy of classifying different subtypes for breast cancer using the activity score of subtype-specific driver modules as features in three distinct data sets. (B) Box plot comparing subtype-specific driver-seeded modules and driver-free modules with respect to three distinct measures: log-rank test _P_-value, hazard ratio (HR), and concordance index (c-index). (C) A BRCA subtype-specific driver module (BASAL-02) seeded by NCOA3 that distinguished the Basal subtype from rest of the BRCA subtypes. (D) Activity score of the BASAL-02 module across different BRCA subtypes. (E) Kaplan-Meier plot showing the significant association of the BASAL-02 module with patients’ clinical outcome in the three data sets considered.

Figure 6.

Figure 6.

Drug efficacy predicted by HIT'nDRIVE-seeded driver genes. (A) Accuracy of drug-response phenotype classification for all 265 drugs used in the GDSC study across 25 cancer types (the remaining five cancer types for which only a very limited number of cell lines have been made available are statistically insignificant and thus have not been used). The classification accuracy for each drug on each cancer type is measured based on the collective use of at most 10 best-discriminating modules; i.e., the accuracy is maximized across the range of one to 10 (best-discriminating) modules. Note that many of the drugs were not tested on all cancer types; in fact, for the vast majority of cancer types, only a handful of drugs were tested. (B) Classification accuracy of modules that distinguish the drug-response phenotypes after treatment with Gefitinib in BRCA cell lines (top), temozolomide in GBM cell lines (middle), and Nutlin-3a in OV cell lines (bottom). Important genes identified in the modules and involved in the dysregulated signaling pathways have been highlighted. (C_–_E) The figures represent the dysregulated signaling pathways in the respective drug perturbation.

Similar articles

Cited by

References

    1. Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC, Pochanard P, Mozes E, Garraway LA, Pe'er D. 2010. An integrated approach to uncover drivers of cancer. Cell 143: 1005–1017. - PMC - PubMed
    1. Barbieri CE, Baca SC, Lawrence MS, Demichelis F, Blattner M, Theurillat JP, White TA, Stojanov P, Van Allen E, Stransky N, et al. 2012. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat Genet 44: 685–689. - PMC - PubMed
    1. Bashashati A, Haffari G, Ding J, Ha G, Lui K, Rosner J, Huntsman DG, Caldas C, Aparicio SA, Shah SP. 2012. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol 13: R124. - PMC - PubMed
    1. Beer DG, Kardia SLR, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, et al. 2002. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8: 816–824. - PubMed
    1. Beltran H, Eng K, Mosquera JM, Sigaras A, Romanel A, Rennert H, Kossai M, Pauli C, Faltas B, Fontugne J, et al. 2015. Whole-exome sequencing of metastatic cancer and biomarkers of treatment response. JAMA Oncol 1: 466–474. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources