Identification of the binding sites of regulatory proteins in bacterial genomes - PubMed (original) (raw)
Identification of the binding sites of regulatory proteins in bacterial genomes
Hao Li et al. Proc Natl Acad Sci U S A. 2002.
Abstract
We present an algorithm that extracts the binding sites (represented by position-specific weight matrices) for many different transcription factors from the regulatory regions of a genome, without the need for delineating groups of coregulated genes. The algorithm uses the fact that many DNA-binding proteins in bacteria bind to a bipartite motif with two short segments more conserved than the intervening region. It identifies all statistically significant patterns of the form W(1)N(x)W(2), where W(1) and W(2) are two short oligonucleotides separated by x arbitrary bases, and groups them into clusters of similar patterns. These clusters are then used to derive quantitative recognition profiles of putative regulatory proteins. For a given cluster, the algorithm finds the matching sequences plus the flanking regions in the genome and performs a multiple sequence alignment to derive position-specific weight matrices. We have analyzed the Escherichia coli genome with this algorithm and found approximately 1,500 significant patterns, which give rise to approximately 160 distinct position-specific weight matrices. A fraction of these matrices match the binding sites of one-third of the approximately 60 characterized transcription factors with high statistical significance. Many of the remaining matrices are likely to describe binding sites and regulons of uncharacterized transcription factors. The significance of these matrices was evaluated by their specificity, the location of the predicted sites, and the biological functions of the corresponding regulons, allowing us to suggest putative regulatory functions. The algorithm is efficient for analyzing newly sequenced bacterial genomes for which little is known about transcriptional regulation.
Figures
Figure 1
Distribution of the center positions of the predicted binding sites (relative to the transcriptional start point) for four weight matrices. For the two matrices matching CRP and LexA, positional distribution of the known binding sites is also shown.
Similar articles
- Probabilistic clustering of sequences: inferring new bacterial regulons by comparative genomics.
van Nimwegen E, Zavolan M, Rajewsky N, Siggia ED. van Nimwegen E, et al. Proc Natl Acad Sci U S A. 2002 May 28;99(11):7323-8. doi: 10.1073/pnas.112690399. Proc Natl Acad Sci U S A. 2002. PMID: 12032281 Free PMC article. - Reconstruction of novel transcription factor regulons through inference of their binding sites.
Elmas A, Wang X, Samoilov MS. Elmas A, et al. BMC Bioinformatics. 2015 Sep 21;16:299. doi: 10.1186/s12859-015-0685-y. BMC Bioinformatics. 2015. PMID: 26388177 Free PMC article. - PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.
Siddharthan R, Siggia ED, van Nimwegen E. Siddharthan R, et al. PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article. - Identifying target sites for cooperatively binding factors.
GuhaThakurta D, Stormo GD. GuhaThakurta D, et al. Bioinformatics. 2001 Jul;17(7):608-21. doi: 10.1093/bioinformatics/17.7.608. Bioinformatics. 2001. PMID: 11448879 - Comparative analysis of regulatory patterns in bacterial genomes.
Gelfand MS, Novichkov PS, Novichkova ES, Mironov AA. Gelfand MS, et al. Brief Bioinform. 2000 Nov;1(4):357-71. doi: 10.1093/bib/1.4.357. Brief Bioinform. 2000. PMID: 11465053 Review.
Cited by
- GeF-seq: A Simple Procedure for Base-Pair Resolution ChIP-seq.
Chumsakul O, Nakamura K, Fukamachi K, Ishikawa S, Oshima T. Chumsakul O, et al. Methods Mol Biol. 2024;2819:39-53. doi: 10.1007/978-1-0716-3930-6_3. Methods Mol Biol. 2024. PMID: 39028501 - Auxotrophic and prototrophic conditional genetic networks reveal the rewiring of transcription factors in Escherichia coli.
Gagarinova A, Hosseinnia A, Rahmatbakhsh M, Istace Z, Phanse S, Moutaoufik MT, Zilocchi M, Zhang Q, Aoki H, Jessulat M, Kim S, Aly KA, Babu M. Gagarinova A, et al. Nat Commun. 2022 Jul 14;13(1):4085. doi: 10.1038/s41467-022-31819-x. Nat Commun. 2022. PMID: 35835781 Free PMC article. - Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets.
Toivonen J, Kivioja T, Jolma A, Yin Y, Taipale J, Ukkonen E. Toivonen J, et al. Nucleic Acids Res. 2018 May 4;46(8):e44. doi: 10.1093/nar/gky027. Nucleic Acids Res. 2018. PMID: 29385521 Free PMC article. - Efficient inference for sparse latent variable models of transcriptional regulation.
Dai Z, Iqbal M, Lawrence ND, Rattray M. Dai Z, et al. Bioinformatics. 2017 Dec 1;33(23):3776-3783. doi: 10.1093/bioinformatics/btx508. Bioinformatics. 2017. PMID: 28961802 Free PMC article. - Transcription-coupled changes to chromatin underpin gene silencing by transcriptional interference.
Ard R, Allshire RC. Ard R, et al. Nucleic Acids Res. 2016 Dec 15;44(22):10619-10630. doi: 10.1093/nar/gkw801. Epub 2016 Sep 8. Nucleic Acids Res. 2016. PMID: 27613421 Free PMC article.
References
- Roth F P, Hughes J D, Estep P W, Church G M. Nat Biotechnol. 1998;16:939–945. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases