Connecting protein structure with predictions of regulatory sites - PubMed (original) (raw)

Connecting protein structure with predictions of regulatory sites

Alexandre V Morozov et al. Proc Natl Acad Sci U S A. 2007.

Abstract

A common task posed by microarray experiments is to infer the binding site preferences for a known transcription factor from a collection of genes that it regulates and to ascertain whether the factor acts alone or in a complex. The converse problem can also be posed: Given a collection of binding sites, can the regulatory factor or complex of factors be inferred? Both tasks are substantially facilitated by using relatively simple homology models for protein-DNA interactions, as well as the rapidly expanding protein structure database. For budding yeast, we are able to construct reliable structural models for 67 transcription factors and with them redetermine factor binding sites by using a Bayesian Gibbs sampling algorithm and an extensive protein localization data set. For 49 factors in common with a prior analysis of this data set (based largely on phylogenetic conservation), we find that half of the previously predicted binding motifs are in need of some revision. We also solve the inverse problem of ascertaining the factors from the binding sites by assigning a correct protein fold to 25 of the 49 cases from a previous study. Our approach is easily extended to other organisms, including higher eukaryotes. Our study highlights the utility of enlarging current structural genomics projects that exhaustively sample fold structure space to include all factors with significantly different DNA-binding specificities.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

PWM predictions for five TFs in the Zn2-Cys6 binuclear cluster family, with co-crystal structures showing extensive spacing and orientation variability. (Left) Structure-based priors. (Right) PWMs refined with Gibbs sampling (see Materials and Methods). Arrows show the relative orientation of two monomeric half-sites in the dimeric site from the crystal structure. The monomeric half-sites can be arranged in direct (tail-to-head; HAP1), inverted (head-to-head; GAL4, PPR1, PUT3), and everted (tail-to-tail; LEU3) orientations.

Fig. 2.

Illustration of how structural and sequence data are mined in the case of ARG81. A DNA-binding domain of the Zn2-Cys6 binuclear cluster type is found in the ARG81 protein sequence. The HAP1 homodimer (PDB code 1hwt) is identified as the homolog with the highest interface scores _S_hm (93.5 for chain C, 88.7 for chain D). The interface scores reflect the similarity of the HAP1 and ARG81 DNA-binding interfaces on the basis of their protein sequence alignments. Interface amino acids are labeled “b” for the DNA phosphate backbone contacts and “s” for the DNA base contacts. Observed amino acid mutations at the interface are sufficiently conservative and thus are assumed not to change the binding specificity significantly. However, to approximate previously characterized ARG81 binding sites (26), columns 4–6 are removed from the HAP1 PWM, and the CGC half-sites are replaced by the more common CGG half-sites. The 1hwt-based PWM modified in this way is used as the informative prior for the Gibbs sampling algorithm, which is run on the intergenic sequences known to be bound by ARG81 from the ChIP-chip experiment (2). After the ARG81 sites are identified, their alignment is used to compile the ARG81 PWM. Each site in the alignment is weighted by its posterior probability p(s, c) (>0.05).

Fig. 3.

Prediction of the informative prior for the phosphatase system regulator PHO4. (A) Crystal structure of the PHO4 helix–loop–helix dimer bound to its consensus site (PDB code 1a0a). (B) Atomic profile: the number of heavy atoms, Ni, within 4.5 Å of base pair i in the binding site. (C) Consensus base probability profile: the probability w _i_α (Ni) of the consensus base α at position i in the binding site (cf. Eq. 1). (D) Structure-based PWM prediction.

Cited by

Less-is-more: selecting transcription factor binding regions informative for motif inference.
Xu J, Gao J, Ni P, Gerstein M. Xu J, et al. Nucleic Acids Res. 2024 Feb 28;52(4):e20. doi: 10.1093/nar/gkad1240. Nucleic Acids Res. 2024. PMID: 38214231 Free PMC article.
Comparative Analysis of the IclR-Family of Bacterial Transcription Factors and Their DNA-Binding Motifs: Structure, Positioning, Co-Evolution, Regulon Content.
Suvorova IA, Gelfand MS. Suvorova IA, et al. Front Microbiol. 2021 Jun 10;12:675815. doi: 10.3389/fmicb.2021.675815. eCollection 2021. Front Microbiol. 2021. PMID: 34177859 Free PMC article.
An Exploration Into Improving DNA Motif Inference by Looking for Highly Conserved Core Regions.
Thompson JA, Congdon CB. Thompson JA, et al. IEEE Symp Comput Intell Bioinforma Comput Biol Proc. 2013 Apr;2013:60-67. doi: 10.1109/CIBCB.2013.6595389. Epub 2013 Sep 12. IEEE Symp Comput Intell Bioinforma Comput Biol Proc. 2013. PMID: 31008453 Free PMC article.
Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding.
Le DD, Shimko TC, Aditham AK, Keys AM, Longwell SA, Orenstein Y, Fordyce PM. Le DD, et al. Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3702-E3711. doi: 10.1073/pnas.1715888115. Epub 2018 Mar 27. Proc Natl Acad Sci U S A. 2018. PMID: 29588420 Free PMC article.

References

1. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreider J, Hannett N, Kanin E, et al. Science. 2000;290:2306–2309. - PubMed
1. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, et al. Nature. 2004;431:99–104. - PMC - PubMed
1. Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML. Nat Genet. 2004;36:1331–1339. - PMC - PubMed
1. Liu X, Noll DM, Lieb JD, Clarke ND. Genome Res. 2005;15:421–427. - PMC - PubMed
1. MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, Fraenkel E. BMC Bioinformatics. 2006;7:113. - PMC - PubMed

Connecting protein structure with predictions of regulatory sites - PubMed (original) (raw)