Chemotherapeutic drug susceptibility associated SNPs are enriched in expression quantitative trait loci (original) (raw)

Abstract

Pharmacogenomics has employed candidate gene studies and, more recently, genome-wide association studies (GWAS) in efforts to identify loci associated with drug response and/or toxicity. The advantage of GWAS is the simultaneous, unbiased testing of millions of SNPs; the challenge is that functional information is absent for the vast majority of loci that are implicated. In the present study, we systematically evaluated SNPs associated with chemotherapeutic agent–induced cytotoxicity for six different anticancer agents and evaluated whether these SNPs were disproportionately likely to be within a functional class such as coding (consisting of missense, nonsense, or frameshift polymorphisms), noncoding (such as 3′UTRs or splice sites), or expression quantitative trait loci (eQTLs; indicating that a SNP genotype is associated with the transcript abundance level of a gene). We found that the chemotherapeutic drug susceptibility–associated SNPs are more likely to be eQTLs, and, in fact, more likely to be associated with the transcriptional expression level of multiple genes (n ≥ 10) as potential master regulators, than a random set of SNPs in the genome, conditional on minor allele frequency. Furthermore, we observed that this enrichment compared with random expectation is not present for other traditionally important coding and noncoding SNP functional categories. This research therefore has significant implications as a general approach for the identification of genetic predictors of drug response and provides important insights into the likely function of SNPs identified in GWAS analysis of pharmacologic studies.

Keywords: genome-wide association study, pharmacogenomics


Interindividual variation in response to drugs in humans is multifactorial, with genetic factors likely contributing to a considerable degree. Early pharmacogenetic discoveries were based on observations in clinical populations and were limited to phenotypes for which a single candidate gene variant had a large effect on drug activity (1). These types of studies have met with variable success and face inherent limitations because variation in response to most clinically administered drugs is likely to be dependent on the combined contribution of multiple genetic variations with moderate or small independent effects. Relative to disease susceptibility, drug response phenotypes are thought to have greater heritability, partly reflecting the fact that environmental risk factors required for expression of pharmacological phenotypes (e.g., drug exposure) are known, measured, and con-sidered in the analysis (2, 3).

Recent efforts have been aimed at using genome-wide approaches in pharmacogenomic discovery both in patients (46) and in cell-based models (711). The majority of genome-wide significant markers identified in patients through genome-wide association studies (GWASs) were for those rare but serious adverse events, such as the flucloxacillin-induced liver injury and statin-induced myopathy (5). For chemotherapeutic drugs, the challenges of performing GWASs in a clinical setting are staggering in terms of cost and logistics, as it is very rare that a homogeneous patient population receives the same dose of a single chemotherapeutic agent. To get around these issues and avoid confounders such as comorbidities, concomitant medications, and diet, preclinical cell-based models have been developed that provide a useful discovery tool. The International HapMap Project (12) has released genotypic data on millions of SNPs in lymphoblastoid cell lines (LCLs) derived from various world populations, enabling GWASs of quantitative traits that likely represent susceptibility to drug.

Using HapMap cell lines, several groups have identified genetic predictors of susceptibility to drug phenotypes including drug-induced cytotoxicity (1). As has been observed for most GWASs, SNPs within exons of obvious candidate genes, considered “smoking gun mutations,” are not rising to the top of these analyses. This observation is in part because many missense and nonsense mutations are rare and the majority of GWASs to date are not powered to interrogate these rare mutations. However, many intergenic and intronic SNPs have been consistently shown to be significantly associated with a broad spectrum of complex traits (13). To date, a biologically meaningful way to interpret these GWAS findings is lacking. For this reason, we evaluated, in a comprehensive and systematic manner, the SNPs associated with drug response for a broad array of chemotherapeutic drugs belonging to different general classes based on their mechanism of action.

Although polymorphisms in coding regions that affect amino acid composition would seem to have the greatest effect on drug response, the contribution of genetic variation that affects transcript abundance level may also be important. Recent genome-wide studies of gene expression have considered transcript abun-dance as a quantitative trait (1416). These studies have shown that gene expression traits—transcript abundance levels considered as quantitative traits—can be mapped to particular genomic loci by integrating genome-wide genotyping and transcriptome studies of variation in gene expression (14, 17). These genomic loci, known as expression quantitative trait loci (eQTLs), can be identified by using association mapping approaches similar to the ones used in genetic association studies of human disease and quantitative phenotypes. Advances in microarray technology make it possible to assay the transcript abundance of most genes in the genome in a high-throughput fashion to enable the mapping of eQTLs on a genome-wide scale.

We evaluated the genomic regions and functional categories for drug susceptibility–associated SNPs. We determined whether they are overrepresented in any specific class such as missense mutations or eQTLs. We found that a disproportionate number of drug susceptibility–associated SNPs in a broad array of chemotherapeutic agents are eQTLs and, indeed, are associated with the transcriptional expression level of multiple genes as potential master regulators (defined as associated with ≥10 genes).

Results

In a candidate gene study or following a GWAS, SNPs within or near genes have been considered those with the highest likelihood of being functional, causal, and worthy of further study. We therefore, examined whether chemotherapeutic drug susceptibility–associated SNPs identified through GWAS are enriched in these genomic regions. Our laboratory performed GWASs between more than 2 million publicly available HapMap SNPs and drug sensitivity phenotype (i.e., IC50 determined from cell growth inhibition) for carboplatin (18), cisplatin (9), etoposide (8), daunorubicin (19), cytarabine (7), and capecitabine using LCLs derived from samples from a population of white people in Utah who were of Northern and Western European descent (heretofore referred to as the CEU population). We evaluated susceptibility-associated SNPs with two sets of statistical cutoffs: SNP–drug IC50 association P value less than or equal to 10−4, which we termed “strongly suggestive association”; and association P value less than or equal to 10−6, which we termed “significantly associated.” We annotated the resultant HapMap SNPs with the National Center for Biotechnology Information dbSNP (20) functional classes, which include splice junction, 3′UTR, nonsense, missense, and frameshift. The last three constitute nonsynonymous polymorphisms and the 3′UTRs are possible ge-nomic binding sites for microRNAs, an important class of gene regulators (21). Fig. 1 illustrates results of the GWAS with the number of strongly suggestive SNPs for each drug included in our study (table within Fig. 1) and the functional categories that were considered.

Fig. 1.

Fig. 1.

Functional annotation of chemotherapeutic drug susceptibility–associated SNPs. SNPs may fall into more than one functional class. The traditional approach prioritizes coding SNPs (e.g., nonsynonymous polymorphisms) within a gene whereas the eQTL-based approach may yield a set of SNPs throughout the genome that are associated with transcript abundance level for a gene and presumably regulatory, as _trans_- or _cis_-acting SNPs. The embedded table indicates the total number of SNPs associated with the IC50 value of each drug at a P ≤ 10−4 level.

Our findings show that the chemotherapeutic drug susceptibility–associated SNPs are not, in magnitude or in proportion, significantly present in the traditionally important amino acid polymorphisms (Table 1). In fact, there were no observed “frameshift” and “nonsense” polymorphisms among the chemotherapeutic drug susceptibility–associated SNPs for each of the drugs at the P values studied. Very few of the chemotherapeutic drug susceptibility–associated SNPs are in the exon regions (Table 1 and Table S1), with a high proportion of them located either in intergenic or intronic regions of the human genome. An example of this functional class distribution for platinum agents (carboplatin and cisplatin) is shown in Table S1, demonstrating the functional characterizations of the two closely related drugs, cisplatin and carboplatin, are similar to one another as expected. These findings—the relative paucity of coding SNPs and the high proportion of intergenic or intronic SNPs among the chemotherapeutic drug susceptibility–associated SNPs—suggest the possibility of noncoding genetic effects contributing toward drug sensitivity.

Table 1.

Observed count and percent of total missense and 3′-UTR SNPs in the chemotherapeutic drug susceptibility–associated SNPs (P value threshold of 10−4) for each drug

Drug No. of missense (% of total) No. of 3′UTR (% of total)
Carboplatin 3 (1%) 1 (0.3%)
Cisplatin 1 (0.3%) 9 (2.3%)
Etoposide 0 (0%) 4 (0.6%)
Daunorubicin 12 (1.1%) 4 (0.1%)
Cytarabine 11 (2.3%) 2 (0.4%)
Capecitabine 3 (0.6%) 2 (0.4%)

To quantify the level of functional enrichment among these chemotherapeutic drug susceptibility–associated SNPs, we applied a functional standard z score method. This method evaluates the SNPs of interest for their associations with certain functional classes when compared with randomly selected minor allele frequency (MAF) matched SNPs for their association with the same function (for details, see Materials and Methods). As expected, among the chemotherapeutic drug susceptibility–associated SNPs, the observed number of missense polymorphisms does not show significant enrichment relative to the distribution of missense polymorphisms in randomly chosen SNPs (Fig. 2) conditional on MAF; for results from each drug evaluated, see Table 2 (z scores range from −2.13 to approximately 2.50; P > 0.05).

Fig. 2.

Fig. 2.

Distribution of the count of missense polymorphisms in 1,000 simulations, each matching the MAF distribution of the chemotherapeutic drug susceptibility–associated SNPs (chemotherapeutic susceptibility P value threshold of 10−4). The black dot is the observed missense polymorphism count in the chemotherapeutic drug susceptibility–associated SNPs.

Table 2.

Functional z-score of coding, noncoding, and eQTL SNPs

Drug Coding Noncoding eQTL
Missense Nonsense Frameshift 5′UTR 3′UTR Splice site
Carboplatin 1.138 −0.127 −0.071 −0.525 −0.837 −0.063 6.571
Cisplatin −0.318 −0.139 −0.063 −0.582 1.502 −0.044 3.953
Etoposide −2.13 −0.16 −0.105 1.378 −0.634 −0.071 14.706
Daunorubicin 1.773 −0.236 −0.118 −1.069 −1.836 −0.084 10.866
Cytarabine 2.495 −0.185 −0.063 −0. 652 −0.829 −0.032 5.829
Capecitabine 0.474 −0.163 −0.063 −0.689 −1.12 −0.071 7.833

Genome-wide studies of gene expression have demonstrated that gene expression levels are highly heritable (22) and that variation in gene expression can be mapped to the genome as eQTLs (14, 17). We therefore evaluated our broad set of cancer chemotherapeutic drug susceptibility–associated SNPs (P < 0.0001) for enrichment in eQTLs. Fig. 3 illustrates a significant enrichment in the eQTL functional class with 192, 191, 537, 608, 257, and 312 eQTLs for carboplatin, cisplatin, etoposide, daunorubicin, cytarabine, and capecitabine, respectively, all of which are significantly higher than those from randomly selected MAF-matched SNPs. The corresponding z scores for each drug can be found in Table 2. Further evaluation indicated that this enrichment in eQTL function remains significant regardless of the statistical cutoff used to generate the SNPs of interest. We examined results from P value thresholds at 10−4 and 10−6, as well as false discovery rate (FDR)–corrected q value thresholds to determine the robustness of our enrichment findings to the definition of chemotherapeutic drug susceptibility–associated SNPs. The eQTL function enrichment within the chemotherapeutic drug susceptibility–associated SNPs was also evident at a range of P value thresholds, 10−6 to 10−4, in defining eQTLs, showing the robustness of our findings to the eQTL inclusion threshold used to evaluate the association between an eQTL and transcript abundance level.

Fig. 3.

Fig. 3.

Distribution of eQTL (expression P value threshold of 10−4) count in 1,000 simulations, each matching the MAF distribution of the chemotherapeutic drug susceptibility–associated SNPs (chemotherapeutic susceptibility P value threshold of 10−4). The black dot is the observed eQTL count in the chemotherapeutic drug susceptibility–associated SNPs.

Interestingly, when examining potential master regulators, defined as SNPs that are associated with the transcriptional expression level of at least 10 genes, the chemotherapeutic drug susceptibility–associated SNPs (P < 0.0001) are more highly enriched in these specific SNPs, with z scores of 7.3, 4.6, 53.8, 13.4, 2.3, and 3.4 for carboplatin, cisplatin, etoposide, daunorubicin, cytarabine, and capecitabine, respectively (Fig. 4). The enrichment in master regulators is, according to our findings, not driven by linkage disequilibrium (LD), given that the observed enrichment persists for chemotherapeutic drug susceptibility–associated SNPs that are not in LD (r 2 < 0.3) compared with the randomly selected SNPs with matching MAF. Furthermore, the enrichment in master regulators among the chemotherapeutic drug susceptibility–associated SNPs continues to hold robustly regardless of the number of associated genes (e.g., when the SNPs are associated with more than 20 or 50 gene expression traits). As described earlier, these master regulator enrichment findings are also independent of the thresholds used in defining eQTLs. Fig. 5 illustrates a master regulator example. SNP rs1649942, located on chromosome 10 within an intron of the NRG3 gene, is associated with cellular sensitivity to cisplatin and carboplatin. Notably, the SNP is also associated with the expression level of 39 genes (P ≤ 0.0001). The identification of these multiple genes suggests the potential functional pathway that this SNP acts upon to affect cellular susceptibility to cisplatin and carboplatin.

Fig. 4.

Fig. 4.

Distribution of master regulator count in 1,000 simulations. A master regulator is defined to be a SNP that predicts the expression of at least 10 genes (eQTL P value threshold of 10−4). The black dot is the observed master regulator count in the chemotherapeutic drug susceptibility–associated SNPs.

Fig. 5.

Fig. 5.

An example of a master regulator. The genotype of SNP rs1649942, which is associated with sensitivity (cytotoxicity P value threshold of 10−4) for both carboplatin and cisplatin, is associated with the transcriptional expression level of 39 genes throughout the genome (eQTL P value threshold of 10−4). The distance between nodes reflects the strength of the association between the SNP genotype and gene expression levels: the closer the gene is to the SNP, the higher the correlation between the SNP genotype and gene expression.

Discussion

In this study, we systematically evaluated the potential function of GWAS findings that related to pharmacologic phenotypes and evaluated the physical location of SNPs associated with these phenotypes. In a broad array of anticancer drugs, the SNPs associated with pharmacologic phenotypes were not more likely to be nonsense, missense, or frameshift SNPs, nor were they within splice junctions or in 3′UTRs. The SNPs are significantly more likely to be associated with the level of gene expression (as eQTLs) and the enrichment is even more pronounced for SNPs associated with 10 or more gene expression traits (termed master regulators). The implications of these findings are that the “function” of SNPs associated with pharmacologic phenotypes of anticancer drugs may be a result of their role in the regulation of gene expression. Furthermore, the prominence of intronic or intergenic SNPs implicated by other GWASs of drug response (1, 23) raises the possibility that this observation may apply more generally. In other words, an individual's susceptibility to the pharmacologic effects of a drug may depend on subtle differences in the level of expression of many genes. Although the present study restricts itself to the specific case of chemotherapeutic agents, the approach elucidated here to investigate functional enrichment may be applied more generally in other pharmacogenomic contexts or other complex traits. For example, we have observed eQTL enrichment in disease susceptibility–associated SNPs from various GWASs (24). Therefore, use of expression information should provide a means to identify and characterize a larger number of contributing loci, and determine the key pharmacological functions influenced by genetic risk factors.

GWASs have rapidly become a standard method for discovery of genetic variants associated with disease, drug response, or other phenotypes. Although a GWAS offers the advantage of comprehensively evaluating the genome without bias, the greatest challenge with a GWAS approach is finding the true positive signals amid the noise. Oftentimes a bias comes into the analysis that may have arisen from candidate gene studies, that the most likely functional or causal SNPs are those located in or near a gene. In fact, there are a number of GWASs that begin with this assumption by limiting the evaluation to only SNPs located in these regions (9, 17), thereby ignoring the vast majority of SNPs; only 2% of the human genome consists of genes (25). Our results would certainly argue against this bias and consider SNPs throughout the genome, particularly those associated with expression as potentially important. Furthermore, the incorporation of eQTL information to characterize loci associated with pharmacologic phenotypes facilitates a genome-wide view of how a SNP may act to affect drug susceptibility. (For example, see Fig. 5 and Fig. S1 for an illustration of how the cisplatin- and carboplatin-associated SNP rs1649942 may affect cellular sensitivity to these platinating drugs through its effect, as a _trans_-acting eQTL, on the expression of various genes located throughout the genome.)

In an effort to account for a substantial proportion of the missing heritability in complex traits, researchers have directed efforts toward identifying rare variants. Most recently, Goldstein and colleagues (26) suggested that rare, yet undetected variants underlie associations with common variants picked up in GWASs. These so-called rare variants are in “synthetic association” so that the common variants are actually pointing to rare variants with higher penetrance that influence disease susceptibility and other phenotypes. The existence and importance of these synthetic associations in complex trait genetics remain to be thoroughly examined. Efforts such as the 1000 Genomes Project (http://www.1000genomes.org) will likely produce valuable information on these rare variants that will allow the analysis of their contribution to the phenotypic trait. The present study, however, provides a functional framework for investigating the biology underlying drug response variation and the potential mechanism(s) between genotype and phenotype associations. It is possible that some of our chemotherapeutic drug susceptibility–associated SNPs may be pointing toward rare “causal” variants; however, this possibility does not affect our primary observation that chemotherapeutic drug susceptibility–associated SNPs are enriched in eQTLs. Furthermore, in contrast to our approach, the notion of synthetic association does not provide a biological explanation for the observed associations. Last, with an increase in the number of potential variants, the impetus to consider eQTLs for the identification and prioritization of associated SNPs becomes even greater.

Current GWASs have relatively little power to detect associations at low allele frequencies. Given our sample size, our study is not powered to evaluate SNPs with less than 5% MAF for their role as eQTLs or for their relationship with chemotherapeutic susceptibility. There are 398,086 rare SNPs (MAF <5%) in the HapMap CEU samples, which were not included in our current comprehensive analysis. Of these rare SNPs, 3,182 are missense polymorphisms, accounting for 0.8% of SNPs in this MAF bin. (A total of 2.3 million SNPs in the HapMap CEU samples were included in our comprehensive analysis, with more than 11,000 missense polymorphisms.) As expected, the proportion of coding SNPs in the rare MAF bin is approximately two times higher than that in the common MAF bins (MAF >5%). It is likely that the contribution of coding SNPs (along with that of eQTLs) to the phenotypic variation would increase if the rare SNPs were to be included. This, however, remains to be fully evaluated. Nevertheless, our study strongly demonstrates that chemotherapeutic drug susceptibility-associated SNPs are not enriched in common coding variants.

Another common method to prioritize genes and determine the most likely “true signals” coming out of a GWAS is by using statistical methods. Metaanalysis (combined associations across GWASs) can be used to pool information from multiple GWASs to increase the chances of finding true positives (27). Conversely, our data indicate that annotating SNPs with information on expression before GWAS may be a means to reduce the multiple tests and distinguish those associations likely to have biological relevance. One might consider including SNPs within or close to genes along with the eQTLs for the analysis so as not to exclude SNPs affecting protein activity or structure.

Interestingly, the degree of eQTL enrichment (from approximately four- to 15-fold enrichment; Table 2) seems to be drug class–specific. In contrast, the number of SNPs in each functional class coming out of a location-directed method is similar for all drugs (e.g., Table S1 shows a comparison of the platinating agents). The six drugs evaluated in the present study can be classified into three drug classes: platinating agents (cisplatin, carboplatin), topoisomerase inhibitors (daunorubicin and etoposide), and antimetabolites (cytarabine, capecitabine). Our enrichment analysis indicates that the topoisomerase inhibitors show the greatest enrichment (13- to 54-fold) in master regulators. The anti-metabolites show the lowest master regulator enrichment (two- to 3.5-fold) and the platinating agents show midlevel enrichment (4.5- to 7.5-fold). The drug classes, defined in terms of molecular therapeutic action, segregate into enrichment levels according to the functional enrichment score, suggesting the value of master regulator enrichment profile to characterize drug classes. We are currently investigating the role of these potential master regulators in defining drug class action.

Finally, the observed enrichment in eQTLs and master regulators among the chemotherapeutic drug susceptibility–associated SNPs holds robustly, that is, independently of (i) the significance level of the genotype–phenotype associations used to define the chemotherapeutic susceptibility–associated SNPs; (ii) the expression P value threshold used to define eQTLs; and (iii) the number of genes regulated by a SNP. Furthermore, the enrichment analysis that we conducted was conditional on MAF so that the sampling of the SNPs used in the simulations would not draw disproportionately from any single bin in the MAF of the drug-associated SNPs, including the low end in which a GWAS has little power to detect associations.

In summary, although polymorphisms that modify the amino acid composition of proteins are natural candidates for explaining variation in drug response phenotype, our findings highlight the relative importance of genetic variations that affect transcript abundance in elucidating the genetic determinants of cellular susceptibility to chemotherapeutic drugs. Genetic variants significantly associated with drug susceptibility for a range of chemotherapeutic agents are more likely to be eQTLs compared with random expectation and are more likely to be eQTLs associated with multiple gene expression phenotypes. This has important implications for further understanding how genetic variation contributes to differences in response to chemotherapeutic agents and may apply to other drug classes.

Materials and Methods

Drugs and Cytotoxicity.

5′-deoxyfluoruridine (5-DFUR; an active form of capecitabine) was obtained from LKT Laboratories and prepared in equal amounts of PBS solution (Invitrogen) and DMSO (Sigma) as a stock solution at a concentration of 32 mM. Carboplatin (18), cisplatin (9), etoposide (8), daunorubicin (19), and cytarabine (7) were prepared and cells were treated as previously described. Briefly, the cytotoxic effect of carboplatin, cisplatin, etoposide, daunorubicin, cytarabine, and 5-DFUR was determined using a short-term AlamarBlue colorimetric assay. LCLs (>85% viability) were plated in triplicate at a density of 1 × 105 cells/mL and drug added 24 h after plating. Cells were exposed to drug for 72 h (except cisplatin was a 48-h exposure) at the following concentrations: 5, 10, 20, 40, and 80 μM carboplatin; 1, 2.5, 5, 10, and 20 μM cisplatin; 0.02, 0.1, 0.5, and 2.5 μM etoposide; 0.0125, 0.025, 0.05, 0.1, 0.2, and 1 μM daunorubicin; 1, 5, 10, 40, and 80 μM cytarabine; and 1, 2.5, 10, 20, 40, 80, and 160 μM 5-DFUR. Final percent survival was ascertained by averaging at least six replicates from two independent experiments. We then calculated IC50s—the concentration of drug at which 50% cellular growth inhibition occurred—as a representation of cellular sensitivity to each drug. All IC50 values were log2 transformed before statistical modeling, creating a dependent variable from an approximately normal distribution (28).

GWAS Studies.

We retrieved SNP data from phase II of the International HapMap consortium (release 23a; nonredundant and rs_strand version). For the present study, we used the set of HapMap parent-offspring trios of CEU population. We restricted ourselves to polymorphisms from the 22 autosomal chromosomes only. SNPs included in the analysis have a MAF of at least 5% and are without Mendelian errors. We performed a GWAS by using the quantitative trait disequilibrium test (29) method between log2-transformed IC50 and greater than 2 million SNPs in the CEU population. For each anticancer agent included in this study, we categorized the GWAS hits, our so-called chemotherapeutic drug susceptibility–associated SNPs, including those that are highly significant (P ≤ 10−6) and those that are strongly suggestive (P ≤ 10−4). We have constructed a database, PACdb (28), which holds these GWAS results (http://www.pacdb.org). In addition to determining the most significant SNPs, we calculated FDR q values (30) for the SNP associations with chemotherapeutic susceptibility. The use of q values enables the correction of P values for multiple comparisons.

eQTL Mapping.

Genome-wide gene expression data were generated in our lab with Affymetrix GeneChip Human Exon Array 1.0 ST Array (exon array) and all raw exon array data have been deposited into Gene Expression Omnibus (accession no. GSE7761). Using the quantitative trait disequilibrium test, we conducted GWAS of more than 13,000 transcript clusters (gene level) with reliable expression (i.e., the average log2-transformed expression values are >5.34) and more than 2 million SNPs in the CEU population (14).

It has been shown that SNPs in probes can produce false eQTL signals by affecting hybridization efficiency (31). We therefore downloaded SNP data from dbSNP (build 129, human genome assembly build 36) to identify probes in probe sets that hybridize to SNP-containing regions. The genomic coordinates of the probes were downloaded from the Affymetrix website (http://www.affymetrix.com). We mapped SNPs to probes using genomic coordinates and filtered SNP-containing probes in subsequent expression analyses.

We have created an online database, SCAN (http://www.scandb.org) (32), that facilitates queries of the dataset for the results of our eQTL studies in LCLs. SCAN can be used using queries of the SNP and gene datasets to retrieve information on the relationship between SNPs and gene expression levels at user-specified P value thresholds.

Stratification by Minor Allele Frequency.

Allele frequencies on the CEU samples were calculated from HapMap genotype data. Only the parents’ genotypes were included in the frequency calculations as the children's alleles are not independent. We used the widely available toolkit PLINK (33) to calculate frequencies. The MAF distribution of the chemotherapeutic drug susceptibility–associated SNPs is shown in Fig. S2.

To investigate the functional enrichment of chemotherapeutic drug susceptibility–associated SNPs identified in our laboratory, we first selected sets of SNPs randomly chosen with matching MAF to the SNP set of interest as background. Using two genotyping platforms frequently used in GWASs, the Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix 6.0) and the High Density Human 1M-Duo (Illumina), we categorized SNPs probed on these commercial products into nonoverlapping MAF bins, each of a width of 5%. Fig. S3 shows the MAF distribution of SNPs on these arrays. The annotations of SNPs on these genotyping platforms were based on build 36 of the human genome reference assembly. In generating the MAF bins, we used the MAFs of the SNPs in the CEU samples.

Location-Directed Functional Enrichment Analysis.

Our assignment of a location-directed functional class to a SNP follows the dbSNP classification scheme (build 129) (20). If a SNP is located in a gene coding region, its functional class is determined by how each allele alters the translated amino acid sequence. If either allele is nonsynonymous, the SNP is assigned a “missense,” “nonsense,” or “frameshift” annotation. If a SNP is proximal to a gene and not in a coding region (e.g., outside a gene, near a gene or in the intron of a gene), it is assigned a functional class that depends on its location relative to the transcript, such as 5′UTR, 3′UTR, splice site, or intron. If a SNP does not map within a gene but is within 2 kb upstream (5′ end) or 500 bp downstream (3′ end) of a gene, it is assigned a “near-gene” functional class. In particular, a SNP's functional class may not be unique, as in the case of a gene with multiple isoform variations or as in the case of a SNP with different functional relationships to a set of genes. See Fig. 1 for a schematic diagram of the functional annotation approach. Each functional class assigned to a SNP is counted once in our functional enrichment analysis, which thus gives equal weight to a SNP's possibly multiple functions. We conducted simulations (N = 1,000), conditional on MAF, to test for functional enrichment among the chemotherapeutic drug susceptibility–associated SNPs (see next section for a detailed description of the simulation approach as applied to eQTL enrichment; the test for enrichment in each of the functional classes uses the same simulation procedure). The simulations were conducted for each drug separately. Moreover, we quantify the level of functional enrichment, for each functional class, through the use of a z score from the distribution obtained from the permutations. Let X be the log2-tranformed value of n + 1, where n is the number of SNPs in a given functional class, considered as a random variable (the “+ 1” ensures that we can always take the log2.). We define a functional enrichment measure as follows:

graphic file with name pnas.1001827107eq1.jpg

where c is the observed count, μ_X_ is the mean of the distribution, and σ_X_ is the SD of the distribution. Particularly, the use of such a “normalized” score enables comparison of enrichment between functional classes as well as between a functional class and eQTLs. Such a score can also be applied to the enrichment of master regulators to enable comparison between drug classes or between individual drugs.

Expression Quantitative Trait Locus Enrichment Analysis.

We conducted simulations to test for an enrichment of eQTLs among the chemotherapeutic drug susceptibility–associated SNPs for each drug. We generated 1,000 randomized SNP sets Si each of the same size as the original list of chemotherapeutic drug susceptibility–associated SNPs and each containing variants, which are matched on MAF distribution of the original list and sampled (without replacement) from the set of typed SNPs on the combined list of SNPs from the high-throughput platforms Affymetrix 6.0 and Illumina 1M. We had grouped the platform SNPs into discrete MAF bins of a width of 5%, from which were selected the SNPs used in the simulations. Given the MAF distribution of the platform SNPs (which shows a steep curve at the lowest end of the frequency spectrum; Fig. S3), we wanted to avoid sampling disproportionately from the lowest MAF bin, in which GWASs, in general, have relatively little power. For each set, we determined the number of eQTLs, Qi, at a given P value threshold. We tested the robustness of the eQTL enrichment across a range of eQTL P value thresholds (10−6 to 10−4). These simulations (N = 1,000) yield an empirical distribution and an empirical P value, calculated as the proportion of simulations in which the number of eQTLs exceeds the observed number, Q, in the list of chemotherapeutic drug susceptibility–associated SNPs:

graphic file with name pnas.1001827107eq2.jpg

where ind is an indicator function and the summation is taken over all simulations. The simulations were conducted for each drug separately.

The combined list of SNPs from the high-throughput platforms constituted our “null set” from which our random SNP sets were drawn to test enrichment. The use of the platform SNPs for this purpose is reasonable given the extensive use of these platforms to conduct GWAS. As GWASs often report imputed SNPs, we also conducted simulations drawing from the entire set of HapMap CEU SNPs and, as before, matching the MAF distribution of the original list of chemotherapeutic drug susceptibility–associated SNPs. In this case, the enrichment we report in eQTLs and master regulators is more pronounced so that the use of platform SNPs as null set is the more conservative approach.

Master Regulator Enrichment Analysis.

An analogous enrichment analysis (N = 1,000) was conducted on the number of master regulators, defined as eQTLs that control the expression of at least 10 genes at a given eQTL P value threshold. As before, this analysis was done across a range of P value thresholds (10−6 to 10−4) to establish robustness of our results. We also investigated whether the enrichment of master regulators in the chemotherapeutic drug susceptibility–associated SNPs depends on the definition of master regulators, specifically the number of genes whose transcriptional expression levels are associated with the eQTL (tested additionally at 20 and 50 genes). In short, we tested the dependence of our findings on both input parameters: the threshold used in assessing the evidence that a SNP predicts transcript level abundance and the number of genes whose expression is associated with the SNP.

Linkage Disequilibrium Analysis.

To investigate whether the observed master regulator enrichment in the chemotherapeutic drug susceptibility–associated SNPs (corrected for multiple comparisons, with q value <0.25) is driven by LD, an LD-pruned set was generated from the chemotherapeutic drug susceptibility–associated SNPs by using a stringent r 2 value of less than 0.3. We then conducted a master regulator enrichment analysis on this set by using a permutation procedure on 100 randomized sets of SNPs that match the LD-pruned set's MAF distribution. LD measures between SNPs in HapMap CEU were used. To demonstrate that the enrichment is not dependent on this initial LD-pruned set chosen, we generated 100 such LD-pruned sets on each of which 100 MAF-matching simulations were performed, as just described. The number of simulations in which the number of master regulators exceeds the observed number among the chemotherapeutic drug susceptibility–associated SNPs yields a P value for the observed enrichment.

Supplementary Material

Supporting Information

Acknowledgments

This work was supported by National Institutes of Health (NIH)/National Institute of General Medical Sciences Grant U01 GM61393 and by NIH/National Cancer Institute Breast Specialized Program Of Research Excellence Grants P50 CA125183, Leukemia and Lymphoma Society Grant 7015, and R01 CA136765 (to M.E.D.).

Footnotes

*This Direct Submission article had a prearranged editor.

The authors declare no conflict of interest.

Data deposition: The data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE7761).

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information