MHCSeqNet: a deep neural network model for universal MHC binding prediction (original) (raw)

ACME: pan-specific peptide-MHC class I binding prediction through attention-based deep neural networks

Motivation: Prediction of peptide binding to the major histocompatibility complex (MHC) plays a vital role in the development of therapeutic vaccines for the treatment of cancer. Algorithms with improved correlations between predicted and actual binding affinities are needed to increase precision and reduce the number of false positive predictions. Results: We present ACME (Attention-based Convolutional neural networks for MHC Epitope binding prediction), a new pan-specific algorithm to accurately predict the binding affinities between peptides and MHC class I molecules, even for those new alleles that are not seen in the training data. Extensive tests have demonstrated that ACME can significantly outperform other state-of-the-art prediction methods with an increase of the Pearson correlation coefficient between predicted and measured binding affinities by up to 23 percentage points. In addition, its ability to identify strongbinding peptides has been experimentally validated. Moreover, by integrating the convolutional neural network with attention mechanism, ACME is able to extract interpretable patterns that can provide useful and detailed insights into the binding preferences between peptides and their MHC partners. All these results have demonstrated that ACME can provide a powerful and practically useful tool for the studies of peptide-MHC class I interactions.

OnionMHC: A deep learning model for peptide — HLA-A*02:01 binding predictions using both structure and sequence feature sets

Journal of Micromechanics and Molecular Physics

The peptide binding to Major Histocompatibility Complex (MHC) proteins is an important step in the antigen-presentation pathway. Thus, predicting the binding potential of peptides with MHC is essential for the design of peptide-based therapeutics. Most of the available machine learning-based models predict the peptide-MHC binding based on the sequence of amino acids alone. Given the importance of structural information in determining the stability of the complex, here we have utilized both the complex structure and the peptide sequence features to predict the binding affinity of peptides to human receptor HLA-A*02:01. To our knowledge, no such model has been developed for the human HLA receptor before that incorporates both structure and sequence-based features. Results: We have applied machine learning techniques through the natural language processing (NLP) and convolutional neural network to design a model that performs comparably with the existing state-of-the-art models. Our mo...

CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks

Communications Biology

The Major Histocompatibility Complex (MHC) binds to the derived peptides from pathogens to present them to killer T cells on the cell surface. Developing computational methods for accurate, fast, and explainable peptide-MHC binding prediction can facilitate immunotherapies and vaccine development. Various deep learning-based methods rely on separate feature extraction from the peptide and MHC sequences and ignore their pairwise binding information. This paper develops a capsule neural network-based method to efficiently capture the peptide-MHC complex features to predict the peptide-MHC class I binding. Various evaluations confirmed our method outperformance over the alternative methods, while it can provide accurate prediction over less available data. Moreover, for providing precise insights into the results, we explored the essential features that contributed to the prediction. Since the simulation results demonstrated consistency with the experimental studies, we concluded that ...

NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks

Predicting epitopes recognized by cytotoxic T cells has been a long standing challenge within the field of immuno- and bioinformatics. While reliable predictions of peptide binding are available for most Major Histocompatibility Complex class I (MHCI) alleles, prediction models of T cell receptor (TCR) interactions with MHC class I-peptide complexes remain poor due to the limited amount of available training data. Recent next generation sequencing projects have however generated a considerable amount of data relating TCR sequences with their cognate HLA-peptide complex target. Here, we utilize such data to train a sequence-based predictor of the interaction between TCRs and peptides presented by the most common human MHCI allele, HLA-A*02:01. Our model is based on convolutional neural networks, which are especially designed to meet the challenges posed by the large length variations of TCRs. We show that such a sequence-based model allows for the identification of TCRs binding a giv...

NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction

BMC Bioinformatics, 2009

The major histocompatibility complex (MHC) molecule plays a central role in controlling the adaptive immune response to infections. MHC class I molecules present peptides derived from intracellular proteins to cytotoxic T cells, whereas MHC class II molecules stimulate cellular and humoral immunity through presentation of extracellularly derived peptides to helper T cells. Identification of which peptides will bind a given MHC molecule is thus of great importance for the understanding of host-pathogen interactions, and large efforts have been placed in developing algorithms capable of predicting this binding event.

In silico Antibody-Peptide Epitope prediction for Personalized cancer therapy

ABSTRACTThe human leukocyte antigen (HLA) system is a complex of genes on chromosome 6 in humans that encodes cell-surface proteins responsible for regulating the immune system. Viral peptides presented to cancer cell surfaces by the HLA trigger the immune system to kill the cells, creating Antibody-peptide epitopes (APE). This study proposes an in-silico approach to identify patient-specific APEs by applying complex networks diagnostics on a novel multiplex data structure as input for a deep learning model. The proposed analytical model identifies patient and tumor-specific APEs with as few as 20 labeled data points. Additionally, the proposed data structure employs complex network theory and other statistical approaches that can better explain and reduce the black box effect of deep learning. The proposed approach achieves an F1-score of 80% and 93% on patients one and two respectively and above 90% on tumor-specific tasks. Additionally, it minimizes the required training time and...

Immunopeptidomic Data Integration to Artificial Neural Networks Enhances Protein-Drug Immunogenicity Prediction

Frontiers in Immunology, 2020

Recombinant DNA technology has, in the last decades, contributed to a vast expansion of the use of protein drugs as pharmaceutical agents. However, such biological drugs can lead to the formation of anti-drug antibodies (ADAs) that may result in adverse effects, including allergic reactions and compromised therapeutic efficacy. Production of ADAs is most often associated with activation of CD4 T cell responses resulting from proteolysis of the biotherapeutic and loading of drug-specific peptides into major histocompatibility complex (MHC) class II on professional antigen-presenting cells. Recently, readouts from MHC-associated peptide proteomics (MAPPs) assays have been shown to correlate with the presence of CD4 T cell epitopes. However, the limited sensitivity of MAPPs challenges its use as an immunogenicity biomarker. In this work, MAPPs data was used to construct an artificial neural network (ANN) model for MHC class II antigen presentation. Using Infliximab and Rituximab as showcase stories, the model demonstrated an unprecedented performance for predicting MAPPs and CD4 T cell epitopes in the context of protein-drug immunogenicity, complementing results from MAPPs assays and outperforming conventional prediction models trained on binding affinity data.

Improving T-cell mediated immunogenic epitope identification via machine learning: the neoIM model

bioRxiv (Cold Spring Harbor Laboratory), 2022

The identification of immunogenic peptides that will elicit a CD8+ T cell-specific immune response is a critical step for various immunotherapeutic strategies such as cancer vaccines. Significant research effort has been directed towards predicting whether a peptide is presented on class I major histocompatibility complex (MHC I) molecules. However, only a small fraction of the peptides predicted to bind to MHC I turn out to be immunogenic. Prediction of immunogenicity, i.e. the likelihood for CD8+ T cells to recognize and react to a peptide presented on MHC I, is of high interest to reduce validation costs, de-risk clinical studies and increase therapeutic efficacy especially in a personalized setting where in vitro immunogenicity pre-screening is not possible. To address this, we present neoIM, a random forest classifier specifically trained to classify short peptides as immunogenic or non-immunogenic. This first-in-class algorithm was trained using a positive dataset of more than 8000 non-self immunogenic peptide sequences, and a negative dataset consisting of MHC I-presented peptides with one or two mismatches to the human proteome for a closer resemblance to a background of mutated but non-immunogenic peptides. Peptide features were constructed by performing principal component analysis on amino acid physicochemical properties and stringing together the values of the ten main principal components for each amino acid in the peptide, combined with a set of peptide-wide properties. The neoIM algorithm outperforms the currently publicly available methods and is able to predict peptide immunogenicity with high accuracy (AUC=0.88). neoIM is MHC-allele agnostic, and in vitro validation through ELISPOT experiments on 33 cancer-derived neoantigens have confirmed its predictive power, showing that 71% of all immunogenic peptides are contained within the top 30% of neoIM predictions and all immunogenic peptides were included when selecting the top 55% of peptides with the highest neoIM score. Finally, neoIM results can help to better predict the response to checkpoint inhibition therapy, especially in low TMB tumors, by focusing on the number of immunogenic variants in a tumor. Overall, neoIM enables significantly improved identification of immunogenic peptides allowing the development of more potent vaccines and providing new insights into the characteristics of immunogenic peptides.

neoMS: Attention-based Prediction of MHC-I Epitope Presentation

bioRxiv (Cold Spring Harbor Laboratory), 2022

Personalised immunotherapy aims to (re-)activate the immune system of a given patient against its tumour. It relies extensively on the ability of tumour-derived neoantigens to trigger a T-cell immune reaction able to recognise and kill the tumour cells expressing them. Since only peptides presented on the cell surface can be immunogenic, the prediction of neoantigen presentation is a crucial step of any discovery pipeline. Limiting neoantigen presentation to MHC binding fails to take into account all other steps of the presentation machinery and therefore to assess the true potential clinical benefit of a given epitope. Indeed, research has uncovered that merely 5% of predicted tumour-derived MHC-bound peptides is actually presented on the cell surface, demonstrating that affinity-based approaches fall short from isolating truly actionable neoantigens. Here, we present neoMS, a MHC-I presentation prediction algorithm leveraging mass spectrometry-derived MHC ligandomic data to better isolate presented antigens from potentially very large sets. The neoMS model is a transformer-based, peptide-sequence-to-HLAsequence neural network algorithm, trained on 386,647 epitopes detected in the ligandomes of 92 HLAmonoallelic datasets and 66 patient-derived HLA-multiallelic datasets. It leverages attention mechanisms in which the most relevant parts of both putative epitope and HLA alleles are isolated. This results in a positive predictive value of 0.61 at a recall of 40% on its patient-derived test dataset, considerably outperforming current alternatives. Predictions made by neoMS correlate with peptide identification confidence in mass spectrometry experiments and reliably identify binding motif preferences of individual HLA alleles thereby further consolidating the biological relevance of the model. Additionally, neoMS displays extrapolation capabilities, showing good predictive power for presentation by HLA alleles not present in its training dataset. Finally, it was found that neoMS results can help refine predictions of response to immune checkpoint inhibitor treatment in certain cancer indications. Taken together, these results establish neoMS as a considerable step forward in high-specificity isolation of clinically actionable antigens for immunotherapies.

ACP-MHCNN: An Accurate Multi-Headed Deep-Convolutional Neural Network to Predict Anticancer peptides

2020

Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification through sim...