Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature - PubMed (original) (raw)

Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature

Yoosup Chang et al. Sci Rep. 2018.

Abstract

In the era of precision medicine, cancer therapy can be tailored to an individual patient based on the genomic profile of a tumour. Despite the ever-increasing abundance of cancer genomic data, linking mutation profiles to drug efficacy remains a challenge. Herein, we report Cancer Drug Response profile scan (CDRscan) a novel deep learning model that predicts anticancer drug responsiveness based on a large-scale drug screening assay data encompassing genomic profiles of 787 human cancer cell lines and structural profiles of 244 drugs. CDRscan employs a two-step convolution architecture, where the genomic mutational fingerprints of cell lines and the molecular fingerprints of drugs are processed individually, then merged by 'virtual docking', an in silico modelling of drug treatment. Analysis of the goodness-of-fit between observed and predicted drug response revealed a high prediction accuracy of CDRscan (R2 > 0.84; AUROC > 0.98). We applied CDRscan to 1,487 approved drugs and identified 14 oncology and 23 non-oncology drugs having new potential cancer indications. This, to our knowledge, is the first-time application of a deep learning model in predicting the feasibility of drug repurposing. By further clinical validation, CDRscan is expected to allow selection of the most effective anticancer drugs for the genomic profile of the individual patient.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1

Figure 1

Overview of Cancer Drug Response profile scan (CDRscan). (a) Two main applications of CDRscan and dataset structure. For any given genomic fingerprint (i.e., a list of somatic mutations) of a tumour, CDRscan predicts which of 244 Genomics in Drug Sensitivity in Cancer (GDSC) anticancer drugs would be effective. The input of CDRscan can be molecular information of a particular small molecule for which CDRscan reports the predicted sensitivity of 787 cancer cell lines. The datasets used to train CDRscan were extracted from COSMIC cell line project (CCLP) and GDSC databases which represent 787 cancer cell lines across 25 cancer types defined by TCGA, 28,328 mutation positions in 567 cancer associated genes, and assay results from treatment of 244 anticancer drugs. (b) Data filtering procedure and final datasets. CCLP and GDSC databases contain genomic characterisation of 1,001 cancer cell lines and IC50 values measured from treatment of 1,001 cell lines with 265 anticancer drugs. The datasets were refined to include only the 567 Cosmic Cancer Gene Census genes and the cancer types that have at least 10 cell lines. Drugs without PubChem Compound Identifier or having molecular weight greater than 1000 g/mol were excluded. Totals of 28,328 and 3,072 features were extracted from cell line genomic signatures and drugs, respectively, constituting binary encoding of 31,400 features in total. The graphical image used in Fig. 1a is an original creation by Ye-Bin Jung and is reprinted under a CC BY license with permission from Ye-Bin Jung. All rights reserved.

Figure 2

Figure 2

Assessment of prediction accuracy of CDRscan. (a) Scatter plots showing correlation between the observed and predicted IC50 values for CDRscan and two other machine learning models to benchmark the prediction accuracy. The test datasets, which correspond to 5% of the total cell line-drug pairs, were used to assess the coefficient of determination (R2). (b) Table summarizing the R2 values and root mean squared errors (RMSE) of CDRscan (mean value of the five models and values for individual models), random forest, and support vector machine.

Figure 3

Figure 3

Cell line- and drug-centric correlation analyses. (a) Prediction accuracy assessment for each cell line. Scatter plots show the correlation between observed and CDRscan-predicted ln(IC50) values for the cell lines that showed the strongest (BFTC-909, left) and the lowest agreement (COR-L32, right). The COSMIC IDs of the two cell lines and the corresponding cancer types are indicated above the scatter plots, and the R2 values, Pearson correlation coefficient (r), p values, and the number of instances (n) are shown in the upper left corner of each plot. Histograms on the right show the overall distribution of prediction accuracy assessed for individual cell lines using indicated metrics. (b) Scatter plots showing the strongest and weakest agreement between observed and CDRscan-predicted ln(IC50) in drug-centric correlation analysis. The drug name and its PubCHEM ID are indicated in each plot. The R2 values, Pearson correlation coefficient (r), p values, and the instance counts (n) are also indicated. Histograms on the right show the overall distribution of prediction accuracy (R2) assessed for individual drugs using indicated metrics.

Figure 4

Figure 4

Feasibility of drug repurposing using CDRscan (a) Approved anticancer drugs with potential repurposing opportunity. CDRscan predicted that 23 out of 102 approved anticancer drugs have activity against at least one new cancer type in addition to the originally approved indications. Nine of these showed predictive sensitivity of more than 90% cancer types, indicating nonspecific antiproliferative/cytotoxicitc effects. (b) Approved non-oncology drugs with potential repurposing opportunity. Of the 1,385 non-oncology drugs, 27 showed potential anticancer activity. Four of these 27 drugs were predicted to have activity against over 90% of cancer types.

Similar articles

Cited by

References

    1. Forbes SA, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017;45:777–83. doi: 10.1093/nar/gkw1121. - DOI - PMC - PubMed
    1. Lawrence MS, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505:495–501. doi: 10.1038/nature12912. - DOI - PMC - PubMed
    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. doi: 10.1038/nature07943. - DOI - PMC - PubMed
    1. Williams SP, McDermott U. The pursuit of therapeutic biomarkers with high-throughput cancer cell drug screens. Cell Chem Biol. 2017;24:1066–1074. doi: 10.1016/j.chembiol.2017.06.011. - DOI - PubMed
    1. Barretina J, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. - DOI - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources