Predicting protein-protein interactions through sequence-based deep learning - PubMed (original) (raw)

Predicting protein-protein interactions through sequence-based deep learning

Somaye Hashemifar et al. Bioinformatics. 2018.

Abstract

Motivation: High-throughput experimental techniques have produced a large amount of protein-protein interaction (PPI) data, but their coverage is still low and the PPI data is also very noisy. Computational prediction of PPIs can be used to discover new PPIs and identify errors in the experimental PPI data.

Results: We present a novel deep learning framework, DPPI, to model and predict PPIs from sequence information alone. Our model efficiently applies a deep, Siamese-like convolutional neural network combined with random projection and data augmentation to predict PPIs, leveraging existing high-quality experimental PPI data and evolutionary information of a protein pair under prediction. Our experimental results show that DPPI outperforms the state-of-the-art methods on several benchmarks in terms of area under precision-recall curve (auPR), and computationally is more efficient. We also show that DPPI is able to predict homodimeric interactions where other methods fail to work accurately, and the effectiveness of DPPI in specific applications such as predicting cytokine-receptor binding affinities.

Availability and implementation: Predicting protein-protein interactions through sequence-based deep learning): https://github.com/hashemifar/DPPI/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Illustration of the DPPI model for predicting binary protein–protein interactions. (a) DPPI takes as input a pair of protein sequences and learns to predict an interaction score. (b) Each input protein is represented as a probabilistic sequence profile generated by PASI-BLAST. (c) The convolutional module consists of several convolutional layers that learn a set of filters, each of which is responsible for detecting various patterns in a sequence. (d) The random projection module maps each sequence to a representation useful for modelling paired sequences

Fig. 2.

Fig. 2.

DPPI accurately and efficiently predicts protein–protein interactions. (a) Performance comparison of DPPI with competing sequence-based methods on human and yeast benchmark data. Values were adopted from Hamp and Rost (2015) and show the average precision and recall rate of 10-fold cross-validation. The numbers in parenthesis indicate the area under precision-recall curve (auPR). (b) The mean auPR of DPPI and Profppikernel with respect to different training sizes. We discontinued training Profppikernel with 10k examples after 17 days before it finished. The trend line using the first 3 training sizes is plotted for each method and extended over the last 3 sizes with the 95% interval illustrated. (c) The wall time for training DPPI and Profppikernel. We trained Profppikernel using 96 threads by running it on 32 AMD Opteron(TM) Processor 6272 CPUs

Fig. 3.

Fig. 3.

DPPI performance on test sets generated by more stringent criteria. Probability values of negative interactions on both the new test sets and original test sets is consistent with each other. Black and blue lines show the density of negative interactions for new (25% sequence identity) and original (40% sequence identity) test sets respectively. Red line shows the density of positive interactions for original test set. The top and bottom pictures are for human and yeast, respectively

Fig. 4.

Fig. 4.

Performance comparison of DPPI model (i.e. model with fixed-weight RP) with two alternative architectures—one without RP module (no-RP model) and the other with a variant RP module

Fig. 5.

Fig. 5.

Performance of DPPI software regarding 25% sequence similarity

Fig. 6.

Fig. 6.

Scatter plot between the ranking percentile calculated from DPPI’s prediction scores versus log dissociation constant (Kd) of IL-2 and IL-13 engineered variants. Trend line emphasizing linear correlation is depicted. IL-2: Pearson correlation = 0.98, _P_-value < 0.018. IL-13: Pearson correlation = 0.69, _P_-value < 0.001

Similar articles

Cited by

References

    1. Ben-Hur A., Noble W.S. (2005) Kernel methods for predicting protein–protein interactions. Bioinformatics ,21, i38–i46. - PubMed
    1. Bengio Y. (2012) Neural Networks: Tricks of the Trade. Springer-Verlag, Berlin, Heidelberg, pp. 437–478.
    1. Bromley J., et al. (1993) Signature Verification Using A “Siamese” Time Delay Neural network. IJPRAI, 07, 669–688.
    1. Cooijmans T., et al. (2016) Recurrent batch normalization. arXiv Preprint arXiv: 1603.09025.
    1. Das J., Yu H. (2012) HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. ,6, 92. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources