The Exomes of the NCI-60 Panel: A Genomic Resource for Cancer Biology and Systems Pharmacology (original) (raw)

Therapeutics, Targets, and Chemical Biology| July 15 2013

Authors' Affiliations: 1Genetics Branch; 2Laboratory of Molecular Pharmacology, Center for Cancer Research; and 3Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, Bethesda, Maryland

Search for other works by this author on:

Eric C. Polley;

Search for other works by this author on:

Sean R. Davis;

Search for other works by this author on:

Yuelin J. Zhu;

Search for other works by this author on:

Sven Bilke;

Search for other works by this author on:

Robert L. Walker;

Search for other works by this author on:

Marbin Pineda;

Search for other works by this author on:

Yevgeniy Gindin;

Search for other works by this author on:

Yuan Jiang;

Search for other works by this author on:

William C. Reinhold;

Search for other works by this author on:

Susan L. Holbeck;

Search for other works by this author on:

Richard M. Simon;

Search for other works by this author on:

James H. Doroshow;

Search for other works by this author on:

Yves Pommier;

Corresponding Authors: Yves Pommier, Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, 37 Convent Dr., Bethesda, MD 20982. Phone: 301-496-5944; Fax: 301-402-0752; E-mail: pommier@nih.gov; and Paul S. Meltzer, Genetics Branch, National Cancer Institute, 37 Convent Dr., Bethesda, MD 20982. Phone: 301-496-5266; Fax: 301-402-3241; E-mail: pmeltzer@mail.nih.gov

Search for other works by this author on:

Paul S. Meltzer

Search for other works by this author on:

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).

O.D. Abaan, E.C. Polley, and S.R. Davis contributed equally to this work.

Received: August 24 2012

Revision Received: March 26 2013

Accepted: April 26 2013

Online ISSN: 1538-7445

Print ISSN: 0008-5472

2013

Cancer Res (2013) 73 (14): 4372–4382.

Split-Screen
Views Icon Views
Open the PDF for in another window
Tools Icon Tools
Search Site
Article Versions Icon Versions
- Version of Record July 15 2013
- Proof July 15 2013

Citation

Ogan D. Abaan, Eric C. Polley, Sean R. Davis, Yuelin J. Zhu, Sven Bilke, Robert L. Walker, Marbin Pineda, Yevgeniy Gindin, Yuan Jiang, William C. Reinhold, Susan L. Holbeck, Richard M. Simon, James H. Doroshow, Yves Pommier, Paul S. Meltzer; The Exomes of the NCI-60 Panel: A Genomic Resource for Cancer Biology and Systems Pharmacology. _Cancer Res 15 July 2013; 73 (14): 4372–4382. https://doi.org/10.1158/0008-5472.CAN-12-3342

Download citation file:

Abstract

The NCI-60 cell lines are the most frequently studied human tumor cell lines in cancer research. This panel has generated the most extensive cancer pharmacology database worldwide. In addition, these cell lines have been intensely investigated, providing a unique platform for hypothesis-driven research focused on enhancing our understanding of tumor biology. Here, we report a comprehensive analysis of coding variants in the NCI-60 panel of cell lines identified by whole exome sequencing, providing a list of possible cancer specific variants for the community. Furthermore, we identify pharmacogenomic correlations between specific variants in genes such as TP53, BRAF, ERBBs, and ATAD5 and anticancer agents such as nutlin, vemurafenib, erlotinib, and bleomycin showing one of many ways the data could be used to validate and generate novel hypotheses for further investigation. As new cancer genes are identified through large-scale sequencing studies, the data presented here for the NCI-60 will be an invaluable resource for identifying cell lines with mutations in such genes for hypothesis-driven research. To enhance the utility of the data for the greater research community, the genomic variants are freely available in different formats and from multiple sources including the CellMiner and Ingenuity websites. Cancer Res; 73(14); 4372–82. ©2013 AACR.

Introduction

The NCI-60 human tumor cell line panel (1) is used by a broad range of cancer investigators and by the NCI Developmental Therapeutics Program (DTP) to discover novel anticancer drugs (2). This panel represents an invaluable and publicly accessible platform of pharmacological, genomic, metabolomic, biochemical, and molecular datasets (3–8). This study reports findings from whole exome sequencing (WES) of the NCI-60 panel of cell lines. In addition, pharmacogenomic analyses provide examples of a few of the many ways the variant data could be used to generate novel hypotheses. Our study complements two recently published large-scale cancer cell line sequencing studies, which used a limited number of genes (9, 10), because our work provides the whole exome variants for the entire NCI-60 cell lines. The data are made available through the CellMiner, NCI DTP and Ingenuity Systems' websites (11).

Materials and Methods

Cell lines

The list of cell lines in the NCI-60 panel and their tissue origins are given in Supplementary Fig. S8. DNA was extracted from cells and fingerprinted as described before (12).

Exome capture and sequencing

Briefly, 38 Mb of coding region for each cell line was captured using the Agilent SureSelect All Exon v1.0 Kit (Agilent). Genomic DNA (3 μg) was sheared using the Covaris S2 ultra-sonicator (Covaris) using the settings duty cycle 10%, intensity 5%, cycle/burst 200, and time 60s, which yielded a fragment size distribution with a mean at 200 bp. Libraries were generated using standard Illumina library protocol (Illumina) followed by size selection using ChromaSpin TE200 spin columns (Clonetech). Pre- and postcapture steps were conducted following the manufacturers' protocol (Agilent). The samples were sequenced as paired-end 80-mer reads on an Illumina Genome Analyzer IIx instrument (Illumina) following the manufacturers' protocol.

Data processing and variant calls

Fastq files were aligned against the reference human genome build 19 (hg19) using the Burrows-Wheeler Aligner (13). Alignment files were base quality score recalibrated and locally realigned around indels with GATK (14) and marked for duplicates using PICARD tools (picard.sourceforge.net). Alignment files and variant calls can be accessed from the links provided (11). Consensus genotype calls were generated using samtools mpileup (15) and annotated using the Annovar package (16). Variants were further filtered for the SureSelect bait region, a minimum read depth of 6 and a minimum quality score of 30 for single nucleotide variant (SNV) and 60 for indels, producing the final variant calls.

Drug activity determination

Drug activity was determined by the DTP human cancer cell line screen (11). The concentration of agent required to cause 50% growth inhibition (GI50) as measured at 48 hours by the sulphorhodamine B assay (17) was determined.

Gene expression and other NCI-60 molecular characterization

mRNA expression, miRNA expression, copy number, and protein measurements are publicly available from DTP or from CellMiner (excluding the protein data; ref. 11). The details pertaining to data acquisition and analysis were previously published (18).

Volcano plots

The _x_-axis of a volcano plot depicts the difference in mean log GI50 between the cell lines containing a mutation in the specified gene and the cell lines not containing such a mutation. The _y_-axis depicts the statistical significance level for the comparison of log GI50 for those 2 groups of cell lines with larger values indicating smaller P values. On a volcano plot for a gene, the points represent the compounds. On a volcano plot for a compound, the points represent the genes. For a volcano plot representing a gene, the false discovery rate can be limited to 0.2 or less by restricting attention to the 310 clinical and investigational compounds with P values no greater than 0.0005. When examining all of the screening compounds, the false discovery rate will be greater unless attention is restricted by a more stringent significance cut-off (e.g., 10−4) and an imposed cut-off on difference in log GI50 between mutated and wild-type groups (e.g., ± 0.5). In general, however, the volcano plots are used either to confirm previously identified hypotheses or to generate hypotheses that require independent validation.

Super Learner prediction models

Using GI50 data on the NCI-60 for 103 U.S. Food and Drug Administration (FDA)-approved and 207 investigational oncology drugs and the 711 genes with at least 5 cell lines containing a type II variant in the gene, we estimated a predictor for each drug using the Super Learner algorithm (19). The predictor uses the gene-level mutation profile to predict the log GI50 for each drug. The Super Learner is an ensemble-based prediction methodology that combines different machine-learning predictors into a single optimal predictor based on minimizing the cross-validated risk. The base algorithms for the Super Learner include elastic net regression, gradient-boosting regression, bagging, CART, random forests, neural networks, and support vector machines. In total, 35 prediction algorithms were combined for the Super Learner ensemble. We do not expect a single prediction algorithm (e.g., elastic net regression) to be optimal across all 310 drugs, and the Super Learner allows the final predictor to data-adaptively up-weight the best algorithms for the final predictor. Examining the weights for each algorithm across the 310 drugs (data not shown) shows great variability, indicating we should see a benefit with the Super Learner ensemble approach. Within a drug, the Super Learner predicts the log GI50 based on the gene-level mutation profiles. To compare across the drugs with different potencies, the log GI50 values need to be normalized. We define the normalized log GI50 for a cell line as the log GI50 minus the mean log GI50 for that drug in all the other cell lines. For ROC analysis, we classified a cell line as sensitive to a drug if its true-normalized log GI50 was less than −0.5, and insensitive if the value was greater than 0.5.

Results

The variant calls were generated as described in Materials and Methods, where we filtered variants with a minimum quality of 30 (60 for small insertions/deletions) and a minimum depth of 6 with at least 3 alternate alleles over the targeted 38 Mb coding region. Because matched normals are not available for cell lines, we conducted a more stringent filtering to identify potential cancer-specific variants. Using this filtering, the variants were divided into 2 groups: type I variants corresponding to common (and possibly germline) variants and type II variants enriched for acquired cancer-specific variants (Supplementary Figs. S1 and S2). We obtained more than 1.2 million type I and 60,005 type II variants in the NCI-60 cell lines.

Although a limitation of cell line sequencing is the lack of available normal-matched tissue for comparison, the NCI-60 panel does allow comparisons between cell lines from 9 distinct tissues of origin. NCI-60 cell lines with known microsatellite instability (MSI; Supplementary Fig. S3) have very high type II variant counts (Fig. 1A). However, HCC2998, a colon cancer cell line not known to have MSI, has the highest number of type II variants. In contrast to the known MSI cell lines, more than 98% of HCC2998 type II variants are SNVs (Supplementary Fig. S4), suggesting that this hypermutator phenotype arises from a mechanism other than MSI. Of interest, HCC2998 carries a POLE exonuclease domain missence variant coding for a P286R mutation in POLϵ (Supplementary Fig. S5). Previous reports indicate that impaired POLϵ proofreading results in a high rate of single nucleotide substitutions and increased tumor formation (20) and POLE mutations in colorectal cancer has recently been reported (21). HCC2998 seems to exemplify this phenomenon, providing a reagent for further investigation and illustrating the utility of the NCI-60 WES data.

Figure 1.

$Figure 1. Results of WES variant calling. A, variant counts for each cell line from each tumor type are plotted for types I and II fraction as green squares and red diamonds, respectively. Within each tumor type, the variant counts are sorted from lowest to highest, and a box blot is superimposed to show subgroup mean and spread. Microsatellite unstable cell lines are marked with a red asterisk. B, base ti/tv ratio is plotted for each tumor type in the NCI-60 panel for type II variants that may likely be tumor specific. The y-axis represents the fraction of base conversions from a C:G or a T:A base pair to any other possible base pair change, which cumulatively equals 1. See also Supplementary Fig. S1 for additional details.$

Results of WES variant calling. A, variant counts for each cell line from each tumor type are plotted for types I and II fraction as green squares and red diamonds, respectively. Within each tumor type, the variant counts are sorted from lowest to highest, and a box blot is superimposed to show subgroup mean and spread. Microsatellite unstable cell lines are marked with a red asterisk. B, base ti/tv ratio is plotted for each tumor type in the NCI-60 panel for type II variants that may likely be tumor specific. The _y_-axis represents the fraction of base conversions from a C:G or a T:A base pair to any other possible base pair change, which cumulatively equals 1. See also Supplementary Fig. S1 for additional details.

Figure 1.

Close modal

Given the diversity in the NCI-60 panel based on the tissue of origin, the WES data reveal important information about the etiology of each subgroup. As is evident from Fig. 1B, there is a wide range of transition-to-transversion ratios (ti/tv) among the NCI-60 panel. Melanoma cell lines have the highest ti/tv (3.93) with higher C:G to T:A transitions, which is the major mode of change for UV-induced DNA damage (22). In contrast, lung cancer cell lines have a ti/tv (0.67) indicative of tobacco smoke-induced DNA damage (23). Thus, the WES data supports the prior notion that the NCI-60 panel retains disease etiology signatures (7).

Figure 2A shows a map of the 10 most frequently mutated genes in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (24). We annotated the WES variant calls as those present in the COSMIC database (v59) and those that are absent in COSMIC but predicted to be deleterious by the Sorting Tolerant From Intolerant (SIFT; ref. 25) or PolyPhen (26) algorithms. TP53 is the most frequently mutated gene overall, whereas BRAF is the most frequently mutated gene among melanoma cell lines (Fig. 2A). Although most of the variants identified in these 10 genes are already annotated in COSMIC, novel variants in these 10 genes were also observed. Although, the lack of normal tissue makes it almost impossible to validate these as somatic changes, these variants were not observed in either the 1,000 Genomes Project (27), or in the 5,600 normal whole exomes available through the NHLBI Exome Sequencing Project (28). Besides the many well-defined cancer genes such as those in Fig. 2A, large-scale tumor sequencing efforts by others continue to lead to the discovery of novel cancer genes, such as the 16 genes listed in Fig. 2B. Because the NCI-60 cell lines are so well characterized and readily available, they are ideal tools for hypothesis-driven research of these novel cancer genes/mutations identified by large-scale sequencing efforts. Details for these particular mutations or for any other gene mutation can be downloaded from the public domains, including CellMiner (Fig. 3) or Ingenuity website (Supplementary Fig. S6).

Figure 2.

Mutation spectrum for the top 10 most frequently mutated genes and novel cancer-related genes in the NCI-60. A, the top 10 cosmic census cancer genes (sorted by the number of occurrences in the NCI-60 panel) were scored for the presence of mutations in each cell line. Gray marks variants annotated in the COSMIC v59 database. Blue marks variants that are not in the COSMIC database but identified in this study and predicted to be of deleterious in nature (either SIFT score < 0.05 or polyphen2 score > 0.85). Magenta marks cases where a cell lines harbors at least one COSMIC annotated and at least one novel variant in a particular gene (a gray and a blue mark). B, new cancer genes identified in recent large-scale sequencing studies such as: SETD2 (38), LRP1B (39, 40), PBRM1 (41), SPTA1 (42), DNMT3A (43), ARID1A (44), GRIN2A (45), TRRAP (45), STAG2 (46), EPHA3/5/7 (39), POLE (21), and SYNE1 (47). Blue boxes represent likely loss-of-function mutations (e.g., nonsense, splice site, initiation loss, and frame shift insertions or deletions), whereas magenta indicates missense mutations. Cases with co-occurrence of both types are labeled in gray.

Figure 2.

Close modal

Figure 3.

Snapshot from the CellMiner website. A, to access tabular data, first click on the “Query Genomic Data Sets” tab. Specify data you want by: (i) identifying the query type in step 1 (HUGO name is required); (ii) choosing whether you wish to type in your identifier, or upload your identifier(s) as a file in step 2; (iii) identifying the dataset being queried in step 3 (in this case exome sequencing); (iv) entering your e-mail address in step 6 and clicking “Get data.” B, the tabular data sent to you will include a full set of the data for all 60 cell lines (only 1 cell line is included for reasons of space). Within the output, (i) the probe ID denotes the chromosome number, start location, and the nucleotide change; (ii) AA is amino acid; (iii) dbSNP id; (iv) allele frequency in 1,000 genomes; (v) allele frequency in ESP5400; (vi) SIFT score; (vii) NCBI accession number; (viii) Polyphen2 score. C, to access graphical data, first click on the “NCI-60 Analysis Tools” tab. Choose the graphical output tool by (i) clicking “Graphical output for DNA:Exome sequencing” in step 1; (ii) choosing whether you wish to type in a your identifier, or upload your identifier(s) as a file in step 2; (iii) identifying the gene being queried, also in step 2; (iv) entering your e-mail address in step 3 and clicking “Get data.” D, the graphical data will be sent as an html, with accompanying pngs. The summary of all variants in BRAF is shown (individual cell lines are also included). The number of variants at each location are depicted by the vertical green, red, or brown lines.

Figure 3.

Close modal

To show the utility of this unique dataset and illustrate one of many ways to apply these data in hypothesis-driven research, we carried out an integrated pharmacogenomic investigation. The fact that the NCI-60 panel has been used to screen thousands of compounds provides a rich resource for testing the relationship of variants in genes to drug response. Among 43,225 compounds screened for activity against the NCI-60 cell lines (as of September 2012), 15,898 showed high dynamic range in their GI50 estimates across all cell lines. For each gene with at least 5 cell lines containing a type II variant, we evaluated the association of log GI50 to variants in genes for all of the screened compounds. TP53, the most frequently mutated gene in the NCI-60 panel, shows strong correlation with drug response. MDM2 inhibitors are effective agents in cell lines with wild-type p53 (Fig. 4A), where they can induce cell death. Of the 15,898 compounds and 310 FDA-approved or investigational oncology drugs, the activities of 2 clinically relevant MDM2 inhibitors show strong negative correlation with mutant p53 (Fig. 4B). Nutlin-3 gives the highest statistical significance score for its activity in p53 wild-type cell lines (Fig. 4B and C). MI-219, a known MDM2 inhibitor, exhibits a similar strong negative correlation with mutant p53 (Fig. 4B). In contrast, National Service Center (NSC)-670177 (Supplementary Table S1) shows significant selectivity for the p53 mutant cells. However, the proposed p53-specific compound reactivation of p53 and induction of tumor cell apoptosis (RITA; NSC-652287; ref. 29), initially identified as a DNA cross-linking agent (30), showed little evidence of selective activity for cell lines with p53 wild-type status and only limited correlation with nutlin-3 (Supplementary Fig. S7A), questioning the claim that RITA acts specifically as a p53-reactivating compound. As for comparison, RITA displays far less selectivity for p53 wild-type cells than the classical DNA-targeted agent mithramycin. As expected, expression of the well-known components of the p53 pathway, MDM2 and miR-34a (31) correlate with p53 wild-type cell lines (Fig. 4E and F). Additional pharmacogenomic correlations between TP53 mutational status, miRNAs, mRNA transcripts, or other agents are listed in Supplementary Fig. S7B. Integrating additional genomic datasets, such as gene and miRNA expression data (18) strengthens the value in all these comprehensive datasets for the NCI-60 panel.

Figure 4.

Correlation of TP53 wild-type cells with nutlin-3 and other p53 pathway modulators. A, schematic representation of the p53-MDM2 feedback loop with p53 acting as a positive transcription factor for MDM2 and miRNA-34a whereas nutlin-3 acts as an MDM2 antagonist (48), blocking MDM2-mediated p53 degradation and killing of wild-type p53 cell lines. B, the volcano plots show the difference in mean log GI50 between the cell lines containing a type II variant in TP53 versus those cell lines not containing a variant along the x_-axis and the −log10_P value on the _y_-axis. Each red point represents one of the 15,989 compounds tested from the NCI screening data plus 310 approved and investigational drugs (green points). A magenta guideline is given at significant _P_-value 10−4. The NSC numbers or names for the statistically significant and for comparison some nonsignificant compounds are annotated on the plot. _TP53_-reactivating compounds from literature and in red. C, antiproliferative activity of nutlin-3 across the NCI-60 cell lines, where the bar graph is color coded by tissues of origins. D, the TP53 wild-type cells are marked with horizontal bars, red tick marks, and red lettering. E, MDM2 expression is highest in the TP53 wild-type cells and those targeted by nutlin-3 (note mirror image profiles). F, the expression profile of miRNA 34a, an established p53 target. Abbreviations: BR, breast; CNS, central nervous system; CO, colorectal; LE, leukemia; ME, melanoma; LC, lung cancer; OV, ovarian; PR, prostate; and RE, renal. See also Supplementary Fig. S6 for additional correlations.

Figure 4.

Close modal

We further supplemented this work with cross-validated multivariate analyses. For each of the 310 FDA-approved or investigational oncology drugs, we developed a Super Learner ensemble machine-learning model predicting log GI50 based on variants in genes. We included genes with type II variants in 5 or more cell lines across the NCI-60 panel. Leave-one-out cross-validation was used to evaluate the ability of such modeling to distinguish sensitive from insensitive cell lines for individual drugs and to select active drugs for individual cell lines. We developed these 310 models for each loop of a cross-validation in which one cell line was omitted and the remaining cell lines were used as a training set. Those models were then used to predict the log GI50 values for all drugs for the omitted cell line thereby predicting the most active drugs (smallest normalized log GI50; see Materials and Methods) against this cell line (Supplementary Table S2). Using these models, we generated cross-validated receiver operating characteristic (ROC) curves for each cell line (Supplementary Fig. S8). The ROC curve plots sensitivity versus one minus specificity for identifying active drugs. The area under the curve (AUC) between the ROC curve and the diagonal line is a measure of the predictive accuracy of the WES-based models. A large AUC value for a cell line indicates that the mutation spectrum of the cell line is informative for discriminating active from inactive drugs. The set of drugs analyzed, however, contains many cytotoxics, for which the predictive model based only on mutation spectrum was poorly informative. Our models included only mutation status and did not attempt to distinguish the confounding between mutation status and cell line lineage. Further studies with comprehensive models that include copy number, transcript abundance, and methylation status should yield more accurate predictions.

The ROC curves provide valuable insight into cancer biology. For instance, among the NCI-60 melanoma cell lines, SK-MEL-2 has the lowest AUC value (Fig. 5A). This is particularly interesting because SK-MEL-2 is the only non-BRAF-V600E mutant melanoma cell line with an activating NRAS-Q61R mutation. As shown with the volcano plot in Fig. 5B, the 3 _BRAF-V600E_–specific inhibitors PLX-4720, vemurafenib (Fig. 5C) and SB-590885 stand out with extremely high significance and differential mean GI50 in the _BRAF_-mutant cell lines. All the MEK inhibitors (blue font) including selumetinib (Fig. 5D) and hypothemycin (Fig. 5E) show highly significant selectivity and differential GI50, indicating their therapeutic value in cancer cells with activated mitogen-activated protein kinase (MAPK) pathway. Notably, one compound, NSC-678518 showed extreme selectivity for the _BRAF_-mutated cells. NSC-678518, the anthrax lethal factor, was identified in a screen for agents with similar inhibitory profiles to another MAPK kinase inhibitor, PD098059, and shown to proteolytically inactivate such kinases (32).

Figure 5.

Correlation between MAPK pathway mutations and drug response to compounds that target this pathway in the NCI-60 panel. A, ROC for cross-validated drug predictors for melanoma cells. Cross-validated ROC curves are shown for each cell line. The inset reports the AUC for each cell line and the number of inactive drugs (n1) and active drugs (n2). B, same volcano plot as in Fig. 4B, for BRAF variants. A magenta guideline is given at significant _P_-value 10−4. The NSC numbers or names for the statistically significant and for comparison some nonsignificant compounds are annotated on the plot. Drug response for the BRAF V600E inhibitor vemurafenib (C), the MEK inhibitor selumetinib (D), and the MEK/ERK inhibitor hypothemycin (E). Cell lines with mutations are labeled in red for the gene(s) indicated to the right. F, heat map showing correlations between mutations in key signaling intermediates (PTEN, PIK3R1, PIK3CA, ERBB2, BRAF, and NRAS) versus drugs that target these pathways; MAPK pathway inhibitors (blue), PI3K pathway inhibitors (green), EGFR/ERBB inhibitors (magenta). Values for each drug represent the mean GI50 for each cell line with the particular gene mutations, including previously published deletions and small mutations (49). The number of cell lines with the particular mutation is given in parentheses.

Figure 5.

Close modal

Parallel studies support the value of correlating genomics and targeted agents (2, 9, 10). Figure 5C to E exemplifies that mutations in protein kinase target genes are strong indicators of response to clinically relevant targeted drugs. In addition, such observation could be generalized to key signaling pathways. Ten distinct kinase inhibitors from 3 major target classes cluster separately depending on the mutations in 6 genes: BRAF, NRAS, PIK3CA, PIK3R1, PTEN, and ERBB2 (Fig. 5F). These effects can be viewed in the context of the MAPK and phosphoinositide 3 kinase (PI3K) pathways downstream of receptor tyrosine kinases (RTK).

One of the most clinically relevant RTK is the epidermal growth factor receptor (EGFR). However, as showed by Garnett and colleagues (10), it is critical to integrate genomic mutation data with transcript levels to correlate and possibly predict drug responses. The NCI-60 provides a solid background for studying gene expression (see MDM2 example in Fig. 4E; ref. 18), and its large drug database offers unique opportunities to query drug response parameters. To test this possibility, we examined the EGFR inhibitor, erlotinib, whose activity is highly correlated with gefitinib and lapatinib in the NCI-60 (see Fig. 6 in ref. 18). Overall, high expression of EGFR (ERBB1) and ERBB2 are determinants of cellular response to erlotinib (Fig. 6B). However, the colon and central nervous system (CNS) cell lines are generally insensitive to erlotinib in spite of high EGFR and ERBB2 expression. This can be rationalized by taking into account mutations in the MAPK or PI3K pathways, a common mechanism of resistance (33), which are present in all 7 colon and 4 of 6 CNS cell lines (Fig. 6B).

Figure 6.

Correlation between erlotinib response and EGFR pathway gene expression and RAS–RAF–PTEN mutations in the NCI-60 panel. A, schematic representation of the EGFR pathway with its 4 components: ERBB1 (EGFR), ERBB2, ERBB3, and ERBB4. Dimerization complexes are indicated as nodes on the double-ended arrows according the Kohn's MIM nomenclature convention (50). Activations are shown as green arrows. Activating mutations of RAS or RAF directly activate MEK and render cells resistant to erlotinib (33). Similarly, inactivation of PTEN confers resistance by direct activation of PI3 kinase. B, (left), antiproliferative activity of erlotinib across the NCI-60. The cell lines are color coded by tissues of origins; (center left) the RAS-RAF-PTEN wild-type (WT) cells are marked as full horizontal bars. Mutant cells (Mut) are shown as short bars; ERBB1 expression is highest in many of the cells targeted by erlotinib (center right; note mirror image profiles); ERBB2 expression profile (far right). The cell lines identified by arrows have focal amplification for ERBB1 (RE:SN12C) and ERBB2 (OV:SKOV3; unpublished data).

Figure 6.

Close modal

Additional examples of correlations between type II variants and the 16,208 compounds, including the 310 FDA-approved or investigational oncology drugs are included in Supplementary Figs. 9 and 10. Supplementary Fig. 9 contains volcano plots for type II variants in 44 other genes of interest with the corresponding list of significant NSC numbers in supplementary Table S3. Supplementary Fig. S10 shows volcano plots for 28 selected drugs that are in clinical use or clinical trials. Together, these data again show the potential value of the NCI-60 drug and genomic databases for systems pharmacology.

The power of WES, instead of focused sequencing of preselected genes as published (9, 10), was revealed when we coincidentally found a significant correlation between a germline in-frame deletion (delCAATGT) in ATAD5 (rs72427574) in certain cell lines and their increased sensitivity to DNA-damaging agent bleomycin. In addition, zorbamycin (NSC-146208), and peplomycin (NSC-276382), which are both bleomycin analogues, show strong activity toward these cell lines. ATAD5, the human homolog of yeast ELG1, is essential for maintaining genome stability through its functions in deubiqitinating proliferating cell nuclear antigen and is known to be mutated in endometrial cancer (34–36). Genotype calls revealed 10 cell lines where 5 are heterozygous and 5 are homozygous for delCAATGT. Of the 10 cell lines, 3 are renal (ACHN, CAKI-1, RXF-393), where earlier work suggests dimethane sulfonate analogues, such as DMS612, as effective agents against renal cancer (37) and are being investigated phase I trials in renal cancer patients (#09-C-0111). Interestingly, there are additional germline variants in ATAD5, that are also present exclusively in the same set of 10 cell lines. When we looked for possible haplotypes in the Hapmap database, we discovered a region of linkage-disequilibrium spanning more than 300 kb (Fig. 7B). Therefore, this particular haplotype could be a response modifier during chemotherapy with DNA-damaging agents. These results illustrate the discovery potential of exonic variant data when integrated with previously available NCI-60 databases.

Figure 7.

ATAD5 locus as a response modifier for DNA-damaging agents. A, same volcano plot as in Fig. 4B for ATAD5 delCAATGG (rs72427574). A magenta guideline is given at significant _P_-value 10−4. The names for the statistically significant compounds are annotated on the plot. B, linkage disequilibrium plot characterizing haplotype blocks in the ATAD5 locus. The black bar marks the ATAD5 gene location. The haplotype blocks were created using HaploView program (51), version 4.2.

Figure 7.

Close modal

Discussion

In this study, we provide WES analysis of the widely used NCI-60 cell line panel. We show that the overall pattern of mutation is strikingly divergent between cell lines, ranging from 172 to 9205 type II variants. As expected, higher variant rates are observed in MSI cell lines; but remarkably, the highest number of SNVs was observed in HCC2998, a colon cancer cell line in which we discovered a defect in the proofreading domain of POLϵ. The signature of specific carcinogens is readily discernible in lung cancer and melanoma, which show very low (0.67) and high (3.93) ti/tv ratio, respectively. Variants in established cancer genes are abundantly represented in the NCI-60, and numerous examples of variants in recently implied cancer genes are also present.

In addition to the mutational data provided in this article, substantial drug sensitivity data for tens of thousands of compounds and multiple other types of biological data are available for the NCI-60. Using straightforward approaches (see Fig. 3) together with more sophisticated analyses, we were able to show the influence of specific variants for TP53, BRAF, KRAS, NRAS, PIK3CA, PTEN, and ERBBs on the response to clinically relevant targeted agents (nutlin, vemurafenib, selumetinib, hypothemycin, rapamycin, wortmannin, perifosine, erlotinib, afatinib, lapatinib, and neratinib) and to identify aspects of those results that may merit further study. For example, even though targeted inhibitors of activated BRAF-V600E have been widely studied, the comprehensive NCI-60 datasets offers a unique opportunity to identify additional mechanisms of resistance and possibly offer novel means to overcome acquired resistance. The power of the NCI-60 WES variants is apparent from the observation that common variants in the human population may have a profound effect on drug response. Of course, our observation regarding the ATAD5 gene locus requires further studies; however, it opens up a completely new perspective on common variants and their phenotypes in the context of DNA damaging agents and the ongoing clinical trials with DMS612 (37).

In comparison to the 2 recent studies conducted with more cell lines (947 in ref. 9 and 639 in ref. 10), our study integrates far more drugs (approximately 20,000 vs. 24 in ref. 9 and 130 in ref. 10; see volcano plots in Figs. 4, 5, 7, and Supplementary Figures) and provides a comprehensive dataset of all exonic variants for the NCI-60 cell lines, whereas 1,600 genes were sequenced in ref. 9 and 64 cancer-related genes in ref. 10. Given the availability of extensive biological and pharmacological data and the vast number of NCI-60 variants identified in this study, such comprehensive analyses as performed by these 2 studies offer enormous opportunities. The WES data that we are providing for the NCI-60 also enables the vast compound activity database to be used as a resource for drug development to complement genomic studies conducted using larger cell line panels. That is, when one discovers a genomic variant as a molecular target using other cell line resources, using the WES data for the NCI-60 one can potentially identify screened compounds with selective activity for that target. We have limited our work to the exploration of certain aspects of this invaluable data, and made this dataset public for the greater community to use and analyze. This is critical for expanding our knowledge in understanding tumorigenesis and the genomic bases of drug sensitivity in years to come as many more cancer-related gene aberrations are discovered.

Importantly, the availability of this sequencing data will allow increased precision in the use of these common cell lines as experimental models and, as indicated above, expand the utility of other cell line panels for drug development. To enable this important step forward, the complete dataset is readily accessible in 2 forms, the easily searchable CellMiner database and a prefiltered, annotated Ingenuity Systems database. Through these portals, cancer investigators will be able to select precisely the cell line models most genetically suited to their research. The availability of the variant information allows the formulation and testing of hypotheses arising from the entire range of projects using the NCI-60 or its components. In conclusion, our datasets add substantial depth to the already extensive characterization of the NCI-60 tumor cell panel and provide an invaluable resource for ongoing investigations in cancer cell biology and pharmacology.

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Authors' Contributions

Conception and design: O.D. Abaan, S. Davis, J.H. Doroshow, Y. Pommier, P.S. Meltzer

Development of methodology: O.D. Abaan, S. Davis, R. Walker, Y. Jiang, R.M. Simon, Y. Pommier, P.S. Meltzer

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): O.D. Abaan, M. Pineda

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): O.D. Abaan, E.C. Polley, S. Davis, Y.J. Zhu, S. Bilke, Y. Gindin, S.L. Holbeck, R.M. Simon, J.H. Doroshow, Y. Pommier, P.S. Meltzer

Writing, review, and/or revision of the manuscript: O.D. Abaan, E.C. Polley, S. Davis, S.L. Holbeck, R.M. Simon, J.H. Doroshow, Y. Pommier, P.S. Meltzer

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): O.D. Abaan, S. Davis, R. Walker, M. Pineda, W.C. Reinhold, J.H. Doroshow, P.S. Meltzer

Study supervision: S. Davis, J.H. Doroshow, P.S. Meltzer

Acknowledgments

The authors thank B. Kopp, NCI-Frederick, for DNA purification and validation. The authors thank the NHLBI GO Exome Sequencing Project and its ongoing studies that produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926), and the Heart GO Sequencing Project (HL-103010).

Grant Support

This study was supported by the Division of Cancer Treatment and Diagnosis (DCTD), and the Center for Cancer Research (CCR) of the National Cancer Institute, NIH.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

References

Shoemaker

The NCI60 human tumour cell line anticancer drug screen

Nat Rev Cancer

2006

;

813

–

Weinstein

Drug discovery: cell lines battle cancer

Nature

2012

;

483

544

–

Scherf

Ross

Waltham

Smith

Lee

Tanabe

, et al

A gene expression database for the molecular pharmacology of cancer

Nat Genet

2000

;

236

–

Staunton

Slonim

Coller

Tamayo

Angelo

Park

, et al

Chemosensitivity prediction by transcriptional profiling

Proc Natl Acad Sci U S A

2001

;

10787

–

Szakacs

Annereau

Lababidi

Shankavaram

Arciello

Bussey

, et al

Predicting drug sensitivity and resistance: profiling ABC transporter genes in cancer cells

Cancer Cell

2004

;

129

–

Zoppoli

Solier

Reinhold

Liu

Connelly

Jr.,

Monks

, et al

CHEK2 genomic and proteomic analyses reveal genetic inactivation or endogenous activation across the 60 cell lines of the US National Cancer Institute

Oncogene

2012

;

403

–

Liu

D'Andrade

Fulmer-Smentek

Lorenzi

Kohn

Weinstein

, et al

mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities

Mol Cancer Ther

2010

;

1080

–

Weinstein

Pommier

Connecting genes, drugs and diseases

Nat Biotechnol

2006

;

1365

–

Barretina

Caponigro

Stransky

Venkatesan

Margolin

Kim

, et al

The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity

Nature

2012

;

483

603

–

Garnett

Edelman

Heidorn

Greenman

Dastur

Lau

, et al

Systematic identification of genomic markers of drug sensitivity in cancer cells

Nature

2012

;

483

570

–

Lorenzi

Reinhold

Varma

Hutchinson

Pommier

Chanock

, et al

DNA fingerprinting of the NCI-60 cell line panel

Mol Cancer Ther

2009

;

713

–

Durbin

Fast and accurate short read alignment with Burrows-Wheeler transform

Bioinformatics

2009

;

1754

–

DePristo

Banks

Poplin

Garimella

Maguire

Hartl

, et al

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Nat Genet

2011

;

491

–

Handsaker

Wysoker

Fennell

Ruan

Homer

, et al

The sequence alignment/map format and SAMtools

Bioinformatics

2009

;

2078

–

Wang

Hakonarson

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Nucleic Acids Res

2010

;

e164

Rubinstein

Shoemaker

Paull

Simon

Tosini

Skehan

, et al

Comparison of in vitro anticancer-drug-screening data generated with a tetrazolium assay versus a protein assay against a diverse panel of human tumor cell lines

J Natl Cancer Inst

1990

;

1113

–

Reinhold

Sunshine

Liu

Varma

Kohn

Morris

, et al

CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set

Cancer Res

2012

;

3499

–

511

van der Laan

Polley

Hubbard

Super Learner

Stat Appl Genet Mol Biol

2007

;

Article25

Albertson

Ogawa

Bugni

Hays

Chen

Wang

, et al

DNA polymerase epsilon and delta proofreading suppress discrete mutator and cancer phenotypes in mice

Proc Natl Acad Sci U S A

2009

;

106

17101

–

The Cancer Genome Atlas Network

Comprehensive molecular characterization of human colon and rectal cancer

Nature

2012

;

487

330

–

Ikehata

Ono

The mechanisms of UV mutagenesis

J Radiat Res (Tokyo)

2011

;

115

–

DeMarini

Genotoxicity of tobacco smoke and tobacco smoke condensate: a review

Mutat Res

2004

;

567

447

–

Forbes

Clements

Dawson

Bamford

Webb

Dogan

, et al

Cosmic 2005

Br J Cancer

2006

;

318

–

Henikoff

Predicting deleterious amino acid substitutions

Genome Res

2001

;

863

–

Adzhubei

Schmidt

Peshkin

Ramensky

Gerasimova

Bork

, et al

A method and server for predicting damaging missense mutations

Nat Methods

2010

;

248

–

Abecasis

Altshuler

Auton

Brooks

Durbin

Gibbs

, et al

A map of human genome variation from population-scale sequencing

Nature

2010

;

467

1061

–

O'Connor

Jun

Kang

Abecasis

Leal

, et al

Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants

Nature

2013

;

493

216

–

Issaeva

Bozko

Enge

Protopopova

Verhoef

Masucci

, et al

Small molecule RITA binds to p53, blocks p53-HDM-2 interaction and activates p53 function in tumors

Nat Med

2004

;

1321

–

Nieves-Neira

Rivera

Kohlhagen

Hursey

Pourquier

Sausville

, et al

DNA protein cross-links produced by NSC 652287, a novel thiophene derivative active against human renal cancer cells

Mol Pharmacol

1999

;

478

–

Lim

de Stanchina

Xuan

Liang

, et al

A microRNA component of the p53 tumour suppressor network

Nature

2007

;

447

1130

–

Duesbery

Vande Woude

Anthrax lethal factor causes proteolytic inactivation of mitogen-activated protein kinase kinase

J Appl Microbiol

1999

;

289

–

Wheeler

Dunn

Harari

Understanding resistance to EGFR inhibitors-impact on future treatment strategies

Nat Rev Clin Oncol

2010

;

493

–

507

Bell

Sikdar

Lee

Price

Chatterjee

Park

, et al

Predisposition to cancer caused by genetic and functional defects of mammalian Atad5

PLoS Genet

2011

;

e1002245

Davidson

Katou

Keszthelyi

Sing

Xia

, et al

Endogenous DNA replication stress results in expansion of dNTP pools and a mutator phenotype

EMBO J

2012

;

895

–

907

Fox

Lee

Myung

Dynamic regulation of PCNA ubiquitylation/deubiquitylation

FEBS Lett

2011

;

585

2780

–

Mertins

Myers

Holbeck

Medina-Perez

Wang

Kohlhagen

, et al

In vitro evaluation of dimethane sulfonate analogues with potential alkylating activity and selective renal cell carcinoma cytotoxicity

Mol Cancer Ther

2004

;

849

–

Dalgliesh

Furge

Greenman

Chen

Bignell

Butler

, et al

Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes

Nature

2010

;

463

360

–

Ding

Getz

Wheeler

Mardis

McLellan

Cibulskis

, et al

Somatic mutations affect key pathways in lung adenocarcinoma

Nature

2008

;

455

1069

–

Lee

Jiang

Liu

Haverty

Guan

Stinson

, et al

The mutation spectrum revealed by paired genome sequences from a lung cancer patient

Nature

2010

;

465

473

–

Varela

Tarpey

Raine

Huang

Ong

Stephens

, et al

Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma

Nature

2011

;

469

539

–

Berger

Lawrence

Demichelis

Drier

Cibulskis

Sivachenko

, et al

The genomic complexity of primary human prostate cancer

Nature

2011

;

470

214

–

Ley

Ding

Walter

McLellan

Lamprecht

Larson

, et al

DNMT3A mutations in acute myeloid leukemia

N Engl J Med

2010

;

363

2424

–

Jones

Wang

Shih Ie

Mao

Nakayama

Roden

, et al

Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma

Science

2010

;

330

228

–

Wei

Walia

Lin

Teer

Prickett

Gartner

, et al

Exome sequencing identifies GRIN2A as frequently mutated in melanoma

Nat Genet

2011

;

442

–

Solomon

Kim

Diaz-Martinez

Fair

Elkahloun

Harris

, et al

Mutational inactivation of STAG2 causes aneuploidy in human cancer

Science

2011

;

333

1039

–

Sjoblom

Jones

Wood

Parsons

Lin

Barber

, et al

The consensus coding sequences of human breast and colorectal cancers

Science

2006

;

314

268

–

Vassilev

MDM2 inhibitors for cancer therapy

Trends Mol Med

2007

;

–

Ikediobi

Davies

Bignell

Edkins

Stevens

O'Meara

, et al

Mutation analysis of 24 known cancer genes in the NCI-60 cell line set

Mol Cancer Ther

2006

;

2606

–

Kohn

Aladjem

Circuit diagrams for biological networks

Mol Syst Biol

2006

;

2006 0002

Barrett

Fry

Maller

Daly

HaploView: analysis and visualization of LD and haplotype maps

Bioinformatics

2005

;

263

–

2013

Supplementary data

2,022 Views

224 Web of Science

The Exomes of the NCI-60 Panel: A Genomic Resource for Cancer Biology and Systems Pharmacology (original) (raw)

Abstract

Introduction

Materials and Methods

Cell lines

Exome capture and sequencing

Data processing and variant calls

Drug activity determination

Gene expression and other NCI-60 molecular characterization

Volcano plots

Super Learner prediction models

Results

Discussion

Disclosure of Potential Conflicts of Interest

Authors' Contributions

Acknowledgments

Grant Support

References

Supplementary data

Citing articles via

Email alerts