Additivity in protein-DNA interactions: how good an approximation is it? - PubMed (original) (raw)

Comparative Study

. 2002 Oct 15;30(20):4442-51.

doi: 10.1093/nar/gkf578.

Affiliations

Comparative Study

Additivity in protein-DNA interactions: how good an approximation is it?

Panayiotis V Benos et al. Nucleic Acids Res. 2002.

Abstract

Man and Stormo and Bulyk et al. recently presented their results on the study of the DNA binding affinity of proteins. In both of these studies the main conclusion is that the additivity assumption, usually applied in methods to search for binding sites, is not true. In the first study, the analysis of binding affinity data from the Mnt repressor protein bound to all possible DNA (sub)targets at positions 16 and 17 of the binding site, showed that those positions are not independent. In the second study, the authors analysed DNA binding affinity data of the wild-type mouse EGR1 protein and four variants differing on the middle finger. The binding affinity of these proteins was measured to all 64 possible trinucleotide (sub)targets of the middle finger using microarray technology. The analysis of the measurements also showed interdependence among the positions in the DNA target. In the present report, we review the data of both studies and we re- analyse them using various statistical methods, including a comparison with a multiple regression approach. We conclude that despite the fact that the additivity assumption does not fit the data perfectly, in most cases it provides a very good approximation of the true nature of the specific protein-DNA interactions. Therefore, additive models can be very useful for the discovery and prediction of binding sites in genomic DNA.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Probability plots. The probability distributions of the measured data (abscissas) and the BAM predictions (ordinates) are plotted for the EGR DNA-binding proteins. The predictions are based on additive models under different levels of additivity: blue, red and green marks correspond to the 1*2*3, 12*3 and 1*23 models. Each scatter plot contains all 64 data points, although many data points may coincide. The grey diagonal line represents the ideally best fit of the predictions to the measurements. The scatter plot of protein KASN shows an example of failure of additive models to represent the real data of a non-specific binding protein (note that all probability values, measured and predicted, are <0.05).

Figure 2

Figure 2

A graphical representation of the non-independent effect of positions 16 and 17 of the Mnt DNA binding site. In the left graph, the probabilities based on the measured K A values are plotted against the 1*2 additive model. In the case of Mnt, the deviation from additivity in the high probability states is higher than that of Figure 1. However, the right graph plots the two probability distributions by dinucleotide target and shows that the additive model is in pretty good agreement with the measured data. These graphs are based on the data reported in the study of Man and Stormo (3).

Figure 3

Figure 3

Probability and log-probability plots. Scatter plots of the negative logarithms (A) and the predicted binding probabilities (B) for the mononucleotide models that provide ‘best fit’ to the data, according to different criteria. The BAM that we calculate in this paper minimises the squared difference between the predicted and the measured probabilities in the data. The regression model (RM) minimises the squared difference between the predicted and the measured log-probabilities of the data (equivalent to energies). This model was calculated using the BLSS package (42) on the normalised average K A values of the wild-type EGR protein. Methods for calculating such regression models also exist in the literature (16,17). The two plots show that BAM is better than RM at predicting the high probability targets, whereas RM better fits the high log-probability ones (equivalent to the high energies). The diagonals (straight lines) correspond to the measured values.

Similar articles

Cited by

References

    1. Stormo G. (2000) DNA binding sites: representation and discovery. Bioinformatics, 16, 16–23. - PubMed
    1. Stormo G. (1990) Consensus patterns in DNA. Methods Enzymol., 183, 211–221. - PubMed
    1. Man T.-K. and Stormo,G. (2001) Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res., 15, 2471–2478. - PMC - PubMed
    1. Bulyk M., Johnson,P. and Church,G. (2002) Nucleotides of transcription factor binding sites exert inter-dependent effects on the binding affinities of transcription factors. Nucleic Acids Res., 30, 1255–1261. - PMC - PubMed
    1. Bulyk M., Huang,X., Choo,Y. and Church,G. (2001) Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA, 98, 7158–7163. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources