Additivity in protein-DNA interactions: how good an approximation is it? - PubMed (original) (raw)
Comparative Study
. 2002 Oct 15;30(20):4442-51.
doi: 10.1093/nar/gkf578.
Affiliations
- PMID: 12384591
- PMCID: PMC137142
- DOI: 10.1093/nar/gkf578
Comparative Study
Additivity in protein-DNA interactions: how good an approximation is it?
Panayiotis V Benos et al. Nucleic Acids Res. 2002.
Abstract
Man and Stormo and Bulyk et al. recently presented their results on the study of the DNA binding affinity of proteins. In both of these studies the main conclusion is that the additivity assumption, usually applied in methods to search for binding sites, is not true. In the first study, the analysis of binding affinity data from the Mnt repressor protein bound to all possible DNA (sub)targets at positions 16 and 17 of the binding site, showed that those positions are not independent. In the second study, the authors analysed DNA binding affinity data of the wild-type mouse EGR1 protein and four variants differing on the middle finger. The binding affinity of these proteins was measured to all 64 possible trinucleotide (sub)targets of the middle finger using microarray technology. The analysis of the measurements also showed interdependence among the positions in the DNA target. In the present report, we review the data of both studies and we re- analyse them using various statistical methods, including a comparison with a multiple regression approach. We conclude that despite the fact that the additivity assumption does not fit the data perfectly, in most cases it provides a very good approximation of the true nature of the specific protein-DNA interactions. Therefore, additive models can be very useful for the discovery and prediction of binding sites in genomic DNA.
Figures
Figure 1
Probability plots. The probability distributions of the measured data (abscissas) and the BAM predictions (ordinates) are plotted for the EGR DNA-binding proteins. The predictions are based on additive models under different levels of additivity: blue, red and green marks correspond to the 1*2*3, 12*3 and 1*23 models. Each scatter plot contains all 64 data points, although many data points may coincide. The grey diagonal line represents the ideally best fit of the predictions to the measurements. The scatter plot of protein KASN shows an example of failure of additive models to represent the real data of a non-specific binding protein (note that all probability values, measured and predicted, are <0.05).
Figure 2
A graphical representation of the non-independent effect of positions 16 and 17 of the Mnt DNA binding site. In the left graph, the probabilities based on the measured K A values are plotted against the 1*2 additive model. In the case of Mnt, the deviation from additivity in the high probability states is higher than that of Figure 1. However, the right graph plots the two probability distributions by dinucleotide target and shows that the additive model is in pretty good agreement with the measured data. These graphs are based on the data reported in the study of Man and Stormo (3).
Figure 3
Probability and log-probability plots. Scatter plots of the negative logarithms (A) and the predicted binding probabilities (B) for the mononucleotide models that provide ‘best fit’ to the data, according to different criteria. The BAM that we calculate in this paper minimises the squared difference between the predicted and the measured probabilities in the data. The regression model (RM) minimises the squared difference between the predicted and the measured log-probabilities of the data (equivalent to energies). This model was calculated using the BLSS package (42) on the normalised average K A values of the wild-type EGR protein. Methods for calculating such regression models also exist in the literature (16,17). The two plots show that BAM is better than RM at predicting the high probability targets, whereas RM better fits the high log-probability ones (equivalent to the high energies). The diagonals (straight lines) correspond to the measured values.
Similar articles
- Quantitative modeling of DNA-protein interactions: effects of amino acid substitutions on binding specificity of the Mnt repressor.
Man TK, Yang JS, Stormo GD. Man TK, et al. Nucleic Acids Res. 2004 Aug 2;32(13):4026-32. doi: 10.1093/nar/gkh729. Print 2004. Nucleic Acids Res. 2004. PMID: 15289576 Free PMC article. - Analysis of the Max-binding protein MNT in human medulloblastomas.
Sommer A, Waha A, Tonn J, Sörensen N, Hurlin PJ, Eisenman RN, Lüscher B, Pietsch T. Sommer A, et al. Int J Cancer. 1999 Sep 9;82(6):810-6. doi: 10.1002/(sici)1097-0215(19990909)82:6<810::aid-ijc7>3.0.co;2-v. Int J Cancer. 1999. PMID: 10446446 - Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay.
Man TK, Stormo GD. Man TK, et al. Nucleic Acids Res. 2001 Jun 15;29(12):2471-8. doi: 10.1093/nar/29.12.2471. Nucleic Acids Res. 2001. PMID: 11410653 Free PMC article. - Evidence of mnt-myc antagonism revealed by mnt gene deletion.
Hurlin PJ, Zhou ZQ, Toyo-Oka K, Ota S, Walker WL, Hirotsune S, Wynshaw-Boris A. Hurlin PJ, et al. Cell Cycle. 2004 Feb;3(2):97-9. Cell Cycle. 2004. PMID: 14712062 Review. - A biochemical and biological analysis of Myc superfamily interactions.
Schreiber-Agus N, Alland L, Muhle R, Goltz J, Chen K, Stevens L, Stein D, DePinho RA. Schreiber-Agus N, et al. Curr Top Microbiol Immunol. 1997;224:159-68. doi: 10.1007/978-3-642-60801-8_16. Curr Top Microbiol Immunol. 1997. PMID: 9308239 Review. No abstract available.
Cited by
- Divergent DNA-binding specificities of a group of ETHYLENE RESPONSE FACTOR transcription factors involved in plant defense.
Shoji T, Mishima M, Hashimoto T. Shoji T, et al. Plant Physiol. 2013 Jun;162(2):977-90. doi: 10.1104/pp.113.217455. Epub 2013 Apr 29. Plant Physiol. 2013. PMID: 23629834 Free PMC article. - Screening of potential pseudo att sites of Streptomyces phage ΦC31 integrase in the human genome.
Hu ZP, Chen LS, Jia CY, Zhu HZ, Wang W, Zhong J. Hu ZP, et al. Acta Pharmacol Sin. 2013 Apr;34(4):561-9. doi: 10.1038/aps.2012.173. Epub 2013 Feb 18. Acta Pharmacol Sin. 2013. PMID: 23416928 Free PMC article. - Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels.
Wang X, Kuwahara H, Gao X. Wang X, et al. BMC Syst Biol. 2014;8 Suppl 5(Suppl 5):S5. doi: 10.1186/1752-0509-8-S5-S5. Epub 2014 Dec 12. BMC Syst Biol. 2014. PMID: 25605483 Free PMC article. - Tuning promoter strength through RNA polymerase binding site design in Escherichia coli.
Brewster RC, Jones DL, Phillips R. Brewster RC, et al. PLoS Comput Biol. 2012;8(12):e1002811. doi: 10.1371/journal.pcbi.1002811. Epub 2012 Dec 13. PLoS Comput Biol. 2012. PMID: 23271961 Free PMC article. - DNA Motif Recognition Modeling from Protein Sequences.
Wong KC. Wong KC. iScience. 2018 Sep 28;7:198-211. doi: 10.1016/j.isci.2018.09.003. Epub 2018 Sep 10. iScience. 2018. PMID: 30267681 Free PMC article.
References
- Stormo G. (2000) DNA binding sites: representation and discovery. Bioinformatics, 16, 16–23. - PubMed
- Stormo G. (1990) Consensus patterns in DNA. Methods Enzymol., 183, 211–221. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources