HLA-binding properties of tumor neoepitopes in humans (original) (raw)

. Author manuscript; available in PMC: 2015 Jun 1.

Abstract

Cancer genome sequencing has enabled the rapid identification of the complete repertoire of coding sequence mutations within a patient’s tumor and facilitated their use as personalized immunogens. While a variety of techniques are available to assist in the selection of mutation-defined epitopes to be included within the tumor vaccine, the ability of the peptide to bind patient MHC is a key gateway to peptide presentation. With advances in the accuracy of predictive algorithms for MHC class I binding, choosing epitopes on the basis of predicted affinity provides a rapid and unbiased approach to epitope prioritization. We show herein the retrospective application of a prediction algorithm to a large set of bona fide T-cell defined mutated human tumor antigens that induced immune responses most of which were associated with tumor regression or long-term disease stability. The results support the application of this approach for epitope selection and reveal informative features of these naturally occurring epitopes to aid in epitope prioritization for use in tumor vaccines.

Keywords: somatic mutations, neoantigens, whole-genome sequencing, whole-exome sequencing, T cell, immunotherapy, vaccine, next-generation sequencing, T cell receptor


We and others (13) have suggested that the vast number of personal, tumor-specific mutations found in the genome of cancer patients provides a rich source of unique immunogens (“neoantigens”) for use in tumor vaccination strategies. These tumor neoantigens are attractive as vaccine targets since they are expected to bypass the immune dampening effects of central tolerance and because their expression is exquisitely tumor-specific. In order to bring this highly personalized treatment approach to cancer patients, one crucial challenge is the choice of which of the many possible personal mutated epitopes to incorporate in the vaccine.

Cancer genomes vary widely in the number of total and coding sequence mutations depending on the tumor type (4). The five most common tumor types in the United States (prostate, breast, lung, colon and melanoma) harbor an average of 25 to 500 non-synonymous coding sequence mutations. Vaccination approaches that utilize irradiated whole tumor cells or various forms of cell lysates (511) have attempted to capture all such neoantigens (in addition to native tumor-associated antigens). Although these strategies appear comprehensive and have resulted in clinical benefit in some cases, they do not favor any particular T cell immunogen. Hence, potentially highly effective immunogens may be drowned out within the vast sea of immunologically irrelevant antigens. Such complete antigen preparations are similar to the endogenous presentation of the tumor cell to the immune system and lack the “pharmacologic specificity” of a rationally designed vaccine.

A more selective but still comprehensive approach for neoantigens could be envisioned by utilizing every identified coding mutation as a separate immunogen. Although possibly feasible from the technical standpoint -- especially for tumors with a low mutation load -- the dilution of the potent immunogens is likely to reduce its effectiveness, and thus, a more discriminating approach to identify the most effective subset seems advisable.

Potential strategies to identify and prioritize mutated antigens

Multiple biochemical and biologic techniques are available that can help prioritize candidate mutated antigens for inclusion in tumor vaccines.

Mass Spectrometry

Great strides in the fields of mass spectrometry (MS) and associated computational algorithms have enabled the characterization of the MHC-displayed “ligandome” (12,13). This approach can be used to test if a mutated peptide (or a native tumor-associated peptide) is displayed by tumor cells. This information is important since the peptide-MHC complexes are the substrates recognized by T cell receptors (TCR). However, the approach is limited technically by insufficient amounts of tumor tissue and conceptually by the observation that few peptide-bound MHC targets are needed for an effective T cell response (14,15). As a result, many useful but less abundant targets on the cell may be bypassed in favor of those that are less potent but more highly represented.

Ex vivo T cell assays

Peripheral blood monocytes (PBMC) or tumor-infiltrating lymphocytes (TIL) can be tested in antigen-specific ex vivo assays to identify neoantigens that stimulate existing T-cell populations. This strategy would be expected to reveal the patient’s natural response to neoantigens (2,16). However, routine clinical application of ex vivo assays is costly and technically challenging given the number of neoantigen mutations (requiring rapid preparation of many stimulatory immunogens), the requirement of MHC-matched antigen presenting cells (APC) for some of these assays and the relative insensitivity of these techniques. Most importantly, using ex vivo assays as a filter for neoantigen selection limits the spectrum of T cell reactivity to existing T cell responses. In patients with clinically evident tumors, this would restrict the selected neoantigen repertoire to the existing and possibly ineffective T cell responses. It is currently unknown whether enhancement of an ongoing T cell response or generation of de novo responses is clinically relevant for an effective tumor vaccine. Other biological assays such as in vitro or in vivo immunization of a humanized mouse could also be considered but such assays are likewise technically challenging, costly and conceptually limited.

In silico prediction of peptide-MHC binding

Generation of an immune response to any mutated peptide sequence and recognition of tumor cells containing that peptide depend critically on the ability of the patient’s MHC molecules to effectively bind the mutated peptide and present it to a T cell. Advanced algorithms utilizing neural network-based learning approaches have been developed to capitalize on large amounts of data describing peptides that bind with different strengths to a wide variety of class I MHC molecules (17). These algorithms allow rapid in silico prediction of peptide-binding strength to patient-specific MHC alleles, and potentially enable a more rapid and less restrictive approach to filter the list of candidate neoantigens from sequencing data. Using results from the next-generation DNA sequencing, we have evaluated the binding for more than 100 different predicted peptides to understand the boundaries of the accuracy of prediction by these algorithms (18). In order to link this in silico analysis to potentially clinically and biologically relevant observations, we present here an analysis of 40 neoantigens previously identified as CD8+ T-cell targets in the literature.

The predicted binding characteristics of tumor neoepitopes recognized by T cells in patients with antitumor immunity

We have conducted an extensive search of the literature including recent reviews on neoantigens (1,2) from PUBMED, and the most comprehensive list of cancer vaccine antigens compiled by the Cancer Research Institute (19), identifying reports of spontaneous CD8+ T-cell responses in cancer patients in whom the target epitopes were discovered subsequently. To avoid bias of the results, reports of vaccinations with known epitopes or of selected searches for single T-cell epitopes (such as for an immune response to a known mutated oncogene) were not included. Multiple reports of spontaneous CD8+ T-cell epitopes were identified, and remarkably in each case following an unbiased search for the dominant T-cell epitope, the target epitope was a neoantigen. Two-thirds of the patients in these reports experienced significant partial or complete tumor regression or long-term stable disease, either spontaneously or following therapy.

As shown in Table 1, 31 of these 40 neoepitopes were identified in an unbiased manner based on cDNA expression cloning or MHC/peptide elution, while the remaining 9 were found based on genomic mutation and epitope-binding predictions. These neoantigens resulted from 35 missense mutations and 5 frame-shift mutations (that led to novel open reading frames, neoORFs) and are restricted by 11 different HLA alleles, representing both common and less common alleles as expected from sampling of the population at large. Approximately 80% of these are somatic mutations found exclusively in the tumors of individual patients. The remaining alterations are polymorphic loci within hematopoietically-restricted minor histocompatiblility antigens (miHAgs) identified following hematopoietic stem cell transplantation for blood malignancies. In almost every case, the mutated peptide was significantly (>100X) more potent than the cognate native peptide in the induction of T cell IFNγ production or cytotoxicity. These examples represent seven different cancer types (non-small cell lung cancer, melanoma, renal cell carcinoma, bladder cancer, B-cell acute lymphoblastic leukemia, multiple myeloma, chronic lymphocytic leukemia).

Table 1.

Biological features and predicted binding affinities of neoantigen-directed T-cell responses in humans

Group Gene Ref IdentificationApproach* FavorableClinicalResponse MUT>>>NATT cellResponseDetected(approach)§ HLAAllele MutatedEpitope(Native allele) PredictedBinding Affinity(IC50 nM)
MUT NAT
1 ECGF-1 32 (miHAg) cDNA Yes Yes (γ) B*07:02 RPHAIRRPLAL(R) 3 2
ME-1 31 cDNA Yes Yes (C) A*02:01 FLDEFMEGV(A) 3 2
PLEKHM2 16 WES Yes Yes (γ) A*01:01 LTDDRLFTCY(H) 3 97
FNDC3B 18 WES Yes (T) (≥10X) A*02:01 VVMSWAPPV(L) 4 7
PRDX5 27 cDNA NR Yes (C) A*02:01 LLLDDLLVSI(S) 5 7
_MATN_2 16 WES Yes Yes (γ) A*11:01 KTLTSVFQK(E) 5 20
DDX21 33 cDNA Yes Yes (C) A*68:01 EAFIQPITR(S) 10 29
RBAF 29 cDNA Yes Yes (C) B*07:02 RPHVPESAF(G) 10 68
GAS7 16,34 cDNA(and later WES) Yes Yes (C) A*02:01 SLADEAEVYL(H) 12 39
ATR 35 WES Yes (T) (≥ 10X) A*03:01 KLYEEPLLK(S) 13 13
SIRT2 29 cDNA Yes C (10X) A*03:01 KIFSEVTLK(P) 14 16
EF2 36 MS NR Yes (C) A*68:02 ETVSEQSNV(E) 16 27
KIAA0223 (HA-1) 37 (miHAg) MS Yes(Severe GvHD) NR A*02:01 VLHDDLLEA(R) 17 140
GAPDH 34 cDNA Yes Yes (C) A*02:01 GIVEGLITTV(M) 21 27
BCL2A1 38 (miHAg) Linkage NR Yes (C) A*24:02 DYLQYVLQI(C) 22 34
HSP 70 39 cDNA NR Yes (C) A*02:01 SLFEGIDIYT(F) 23 7
ACTININ 30 cDNA Yes Yes (C) A*02:01 FIASNGVKLV(K) 29 44
CDK12 16 WES Yes Yes (γ) A*11:01 CILGKLFTK(E) 33 42
KIAA1440 40 cDNA Yes Yes (C) A*01:01 QTACEVLDY(T) 33 78
HAUS3 16 WES Yes Yes (γ) A*02:01 ILNAMIAKI(T) 34 36
BCL2A1 38 (miHAg)* Linkage NR Yes (C) B*44:03 KEFEDDIINW(G) 36 27
PPP1R3B 16,28 cDNA(and later WES) Yes Yes (γ) A*01:01 YTDFHCQYV(P) 49 72
HB-1 41 (miHAg) cDNA NR Yes (C) B*44:03 EEKRGSLHVW(Y) 81 67
MUM-2 42 cDNA Yes Yes (C) B*44:02 SELFRSGLDSY(R) 184 182
KIAA0205 43 cDNA NR Yes (C) B*44:03 AEPIDIQTW(N) 258 288
GPNMB 29 cDNA Yes Yes (C) A*03:01 TLDWLLQTPK(G) 282 179
2 CSNK1A1 16 WES Yes Yes (γ) A*02:01 GLFGDIYLAI(S) 6 1312
CLPP 44 cDNA Yes Yes (C) A*02:01 ILDKVLVHL(P) 32 7566
CTNNB1 45 cDNA Yes (?) Yes (C) A*24:02 SYLDSGIHF(S) 41 18746
SNRP116 29 cDNA Yes Yes (C) A*03:01 KILDAVVAQK(E) 48 14976
OS9 46 cDNA NR Yes (C) B*44:03 KELEGILLL(P) 60 1161
MYH2 47 cDNA Yes Yes (C) A*03:01 KINKNPKYK(E) 141 4960
3 MART-2 48 cDNA Yes (weak) Yes (C) A*01:01 FLEGNEVGKTY(G) 1115 4504
NFYC 49 cDNA NR Yes (C) B*52:01 AQQITKTEV(Q) 7314 5701
CDK4 50 cDNA NR Yes (C) A*02:01 ACDPHSGHFV(R) 11192 25222
neoORF pARF14-ORF3 51 cDNA Yes Not Relevant A*11:01 AVCPWTWLR 25 Not Relevant
HMSD-ν 52 (miHAg) cDNA Yes Not Relevant B*44:03 MEIFIEVFSHF 36 Not Relevant
PANE-1 53 (miHAg) MS NR Not Relevant A*03:01 RVWDLPGVLK 44 Not Relevant
MUM1 54 cDNA Yes Yes(C) B*44:02 (L)EEKLIVVLF (S) 434 (409) Not Relevant
P2X5 55 (miHAg) Linkage Yes Not Relevant B*07:02 TPNQRQNVC 1769 Not Relevant

Because these neoepitopes are associated with biological responses, they provide an ideal set of sequences for retrospective peptide affinity predictions to “reverse engineer” predictable characteristics of effective epitopes. For this analysis, we utilized the netMHCpanv2.4 algorithm (Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark, www.dtu.dk) (20). NetMHCpan is an artificial neural network trained algorithm with an extensive training data set (17), including 43 HLA-A and -B alleles, representing ~90% and ~60%, respectively, of the allelic population distribution, with more than 1000 members each in the training set. NetMHCpan was determined to be one of the most accurate predictive algorithms in a 2012 competition (21). We applied this algorithm to individually predict MHC binding for all possible tiled peptides containing the mutated or the corresponding unmutated residues of these observed spontaneously epitopes in order to determine:

Functional neoepitopes are correctly predicted by a class I MHC-peptide binding algorithm

Thirty-one of the epitopes shown in Table 1 were identified by ex vivo T-cell reactivity or mass spectrometry and did not utilize genomic sequence or binding prediction information as a component of their identification. For all but one of these 31, we found that the reported epitope was the peptide with the strongest predicted MHC-binding affinity among the tiled peptides containing the mutation. The only exception was a _MUM1_-derived 10mer containing an additional leucine at the N-terminus that had a slightly better predicted affinity (IC50 of 409 nM) than the observed 9mer (IC50 434 nM). We conclude that the MHC-peptide binding prediction algorithm netMHCpan consistently predicts the naturally recognized tumor neoepitope from all of the possible epitopes harboring a specific mutation.

Most functional neoepitopes have high to moderate predicted IC50

Twenty of 31 (65%) of the naturally recognized missense and neoORF epitopes had predicted IC50 < 50 nM (strong binders) and 3 of 31 (10%) had a predicted IC50 between 50 nM and 150 nM (moderate binders). Thus, 75% of the dominant T cell clones isolated from the naturally occurring T cell populations recognize an epitope with a strong or moderate predicted affinity (IC50 <150 nM) for the patient’s MHC allele. Since an unbiased functional assay (cytolysis, IFNγ production, or MS) was the critical test used to identify each of the stimulating peptides in these 31 examples, it is unlikely that there was an experimental bias towards the identification of epitopes with higher predicted affinity. Four of 31 naturally recognized peptides were predicted to be “weak” binding peptides (IC50 between 150 and 500 nM), indicating that a total of 27 of 31 (87%) of the naturally occurring epitopes would have been considered as binding peptides (IC50 < 500 nM affinity) using netMHCpan.

Conversely, only four of 31 naturally recognized peptides were predicted by netMHCpan to be non-binders (IC50>500nM); they may be false negatives from the prediction algorithm or may represent low affinity yet functional epitopes. Although these alternatives cannot be distinguished based on the available data, three observations are relevant. First, for the three epitopes arising from missense mutations (MART-2, NFYC, CDK4), cytolytic activity was preferentially induced by the mutated peptide and not by the native peptide at a range of peptide concentrations (1 – 10 nM) comparable to those observed with more strongly predicted binding peptides. Second, for the Arg → Cys CDK4 mutation, the highly oxidizable sulfur residue may contribute serendipitously to MHC binding as a “pseudo-” anchor residue that could not have been accounted for by the prediction algorithms. Finally, T cells recognizing the fourth epitope (the miHAg P2X5) represented as much as 1.6% of all circulating T cells following the therapeutic infusion of donor lymphocytes. Results from these 40 examples dataset suggest that there are limitations to the capability of predictive algorithms and that up to 15% of target T-cell epitopes may be missed by the prediction algorithms.

Most of the cognate native peptides are predicted to bind MHC equally to the mutated peptides

In addition to analyzing MHC binding to the mutated epitopes we also compared the predicted affinities of the cognate native epitopes corresponding to all 35 missense epitopes in Table 1 and identified 3 distinct classes. The predominant class (26 of 35 or 74%; Group 1) was composed of native/mutated pairs that were predicted to bind with comparable affinity (with 23 of 26 showing strong to moderate predicted binding [IC50 <150 nM] and the remaining 3 showing weak binding [IC50 between 150 and 500 nM]). Despite comparable predicted binding, in almost all cases the mutated peptide had been found to be significantly more potent in stimulating T cells than the native peptide. A smaller group (6 of 35 or 17%; Group 2) showed low predicted binding for the native epitope and strong binding for the mutated epitope, directly correlating with the differential T-cell responses to the mutated and native peptides. Finally, in a minority of cases both mutated and native peptides were predicted to be non-binding (3 of 35 or 9%; Group 3). We note that each Group in Table 1 comprises multiple HLA with no apparent bias in representation. While the existence of the Group 1 and Group 2 epitopes is not surprising, the predominance of the Group 1 epitopes (containing 74% of all missense epitopes and 4 times more abundant than Group 2) with comparable affinities for both the mutated and native peptides is unexpected.

DISCUSSION

The MHC-bound peptide can be considered as a double-sided “key”, which must fit both the MHC and the TCR “locks” in order to stimulate an immune response and for subsequent target-cell cytolysis (Figure 1A). Sequence-specific binding of peptides to the MHC molecule is highly dependent on the interactions of the peptide side chains at particular positions (“anchors”) along the length of the peptide with chemical moieties defined by the polymorphic residues that constitute the MHC binding pocket (2224); hence, predictive calculations are sequence-dependent (25). Furthermore, analysis of these critical MHC-binding positions and residues over a wide range of MHC alleles shows that only a few positions of the peptide are anchor positions and only a few amino acids at the anchor positions of the peptide contribute to binding in a positive manner (26). On the other side of the “key”, TCR recognition of the peptide/MHC complex gains specificity from the ordered presentation of the other face of the peptide conferred by the anchoring residues.

Figure 1.

Figure 1

(A) The two faces of a bound peptide to the MHC and TCR molecules form a “double-sided key” that must be present in order to stimulate an antigen-specific immune response. Green--Anchor residues in the peptide that interact with MHC. Purple--Regions of the peptide that interact with the TCR surface. (B) A scatter plot of the predicted affinities of epitopes that stimulate detectable neoantigen T-cell responses, shown in Table 1. Group 1 epitopes demonstrate comparable predicted affinities of native and mutated peptides and were determined to have mutations in regions of the peptide critical for interactions with the TCR (dark purple – strong/moderate binders; light purple – weak binders). Group 2 epitopes (green) are mutated peptides with strong/moderate predicted affinity whose corresponding native peptides are not predicted to bind MHC, and were found to have mutations in the peptide residues critical for the interaction with MHC. Group 3 epitopes (grey) represent peptides where neither the native or mutated peptide are predicted to be HLA-binding peptides and may be either false negatives of the prediction algorithm or very low affinity functional epitopes.

For the majority of the missense mutations, both the native and the mutated peptides were predicted to be binding peptides (Group 1, Figure 1B). This observation is almost certainly a consequence of the mutations affecting the region of the peptide “key” that is involved in TCR recognition. In all but 2 of these 26 examples, the mutation was in a non-anchor position (as identified by the online tool provided at www.sypeithi.de ; Ref. # 26). In the two non-conforming examples (PLEKHM2 and KIAA1440), a second anchor residue was already present in the native peptide. Other investigators have also reported mutations with equivalent affinity predictions for the native and the mutated peptides pairs (16,27). Our broader analysis suggests that such mutant epitopes are a common phenomenon. Only a minority of the missense mutations was found in Group 2 characterized by non-binding of the native peptide. All of the Group 2 examples except for MYOSIN were mutations to preferred anchor residues at critical anchor positions.

Although the majority of the naturally occurring tumor epitopes were derived from the corresponding native peptides predicted to bind MHC, the vast majority (>98%) of the native human peptidome is not predicted to contain peptides that are binding epitopes of human MHC (our unpublished analysis using netMHCpan). Random mutational events that convert a non-binding peptide (the vastly predominant target) to a binding peptide (Group 2) are expected to be rare because they require mutation to one of only a few specific amino acids at a small number of anchor positions. For most MHC molecules, there are only one or two important anchor positions and usually only 2 or 3 amino acids at those positions promote binding. Conversely, non-anchor positions are 3–4 times more abundant than anchor positions, and most mutations to native binding epitopes in these non-anchor positions would maintain MHC binding (Group 1). This simple probabilistic explanation may be sufficient to account for the predominance of the observed Group 1 epitopes (derived from the vastly under-represented class of native peptides that are predicted to bind MHC). Alternatively, more complex explanations may be required. For example, aspects of central immune tolerance that are currently not well understood may cause the extant TCR repertoire to more effectively respond to peptides presenting a surface chemically distinct from any native peptide (Group 1) than to peptides which more efficiently present an otherwise native peptide surface (Group 2).

We did not observe weak native peptide binders that converted to strong/moderate mutated binders or strong/moderate native peptide binders that converted to weak mutated peptide binders. While this may reflect the relatively limited dataset we used, it could also be that such upgrading or downgrading of binding involves anchor residue changes that moderately increase/decrease binding affinity but retain similar chemical structure of the non-anchor residues available for TCR recognition. In these scenarios, because the native peptides could bind MHC, central immune tolerance may have effectively deleted cells with the reactive TCR, rendering both native and mutated peptides non-immunogenic.

From the perspective of efficacy, we propose that mutations resulting in either Group 1 or Group 2 binders should be considered as acceptable for use as immunogens as both types of mutated neoepitopes have been found in vivo in cancer patients with spontaneous tumor regressions and in long-term cancer survivors. Notably, in long-term survivors, T cells specific to mutated tumor epitopes from both Group 1 and Group 2 have been found to persist over many years (28,29).

From the perspective of safety, there have been no reports of immune-mediated toxicities (except for the expected occurrence of graft vs host disease (GVHD) as a result of responses against miHAgs) despite the observation that for most of the mutations, the cognate native peptide was predicted, and experimentally demonstrated in some cases (18,27,30,31), to bind MHC as well as the mutated peptide (Group 1). Importantly, in almost all cases the mutant peptide was shown to be more potent than the native peptide in stimulating T cell cytotoxicity or IFNγ production. The absence of autoimmune toxicity in these patients fits with the model that T cells reactive to the native epitope were eliminated by central immune tolerance and that T cells reactive to the mutated epitope do not cross-react to the native epitope as the mutation exclusively affects the TCR binding region.

In conclusion_, in silico_ peptide-binding predictions provide a useful and rapidly deployable tool to capture the types of immunogens that are naturally observed in cancer patients, many of whom experienced tumor regression and sometimes long-term tumor control. Moreover, results from our retrospective prediction study reveal features of these epitopes to further guide inclusion as immunogens in vaccines. We have recently initiated a clinical study employing personalized neoantigen epitopes identified by whole-exome sequencing and prioritized by MHC-binding predictions in which we will carefully monitor the immune response to each mutation (NCT01970358).

Acknowledgements

We thank Sachet Shukla for helpful and insightful discussions. We acknowledge the generous support of the Blavatnik Family Foundation and the NIH (NHLBI:5 R01 HL103532-03; NCI:1R01CA155010-02) for our work on neoepitope-based vaccines.

Footnotes

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed

Authors' Contributions

Conception and design: E.F. Fritsch, N. Hacohen, C.J. Wu

Development of methodology: E. F. Fritsch, N. Hacohen, C.J. Wu, M. Rajasagi

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): N. Hacohen, C.J. Wu

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): E.F. Fritsch, N. Hacohen, C.J. Wu, M. Rajasagi,V.A. Brusic

Writing, review, and/or revision of the manuscript: E.F. Fritsch, C.J. Wu, N. Hacohen, M Rajasagi, P. A. Ott, V. Brusic

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): N. Hacohen, C.J. Wu

Study supervision: N. Hacohen, C.J. Wu

REFERENCES