The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons - PubMed (original) (raw)
Comparative Study
. 2002 Feb;12(2):298-308.
doi: 10.1101/gr.207502.
Affiliations
- PMID: 11827949
- PMCID: PMC155268
- DOI: 10.1101/gr.207502
Comparative Study
The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons
Nikolaus Rajewsky et al. Genome Res. 2002 Feb.
Abstract
The comparison of homologous noncoding DNA for organisms a suitable evolutionary distance apart is a powerful tool for the identification of cis regulatory elements for transcription and translation and for the study of how they assemble into functional modules. We have fit the three parameters of an affine global probabilistic alignment algorithm to establish the background mutation rate of noncoding sequence between E. coli and a series of gamma proteobacteria ranging from Salmonella to Vibrio. The lower bound we find to the neutral mutation rate is sufficiently high, even for Salmonella, that most of the conservation of noncoding sequence is indicative of selective pressures rather than of insufficient time to evolve. We then use a local version of the alignment algorithm combined with our inferred background mutation rate to assign a significance to the degree of local sequence conservation between orthologous genes, and thereby deduce a probability profile for the upstream regulatory region of all E. coli protein-coding genes. We recover 75%-85% (depending on significance level) of all regulatory sites from a standard compilation for E. coli, and 66%-85% of sigma sites. We also trace the evolution of known regulatory sites and the groups associated with a given transcription factor. Furthermore, we find that approximately one-third of paralogous gene pairs in E. coli have a significant degree of correlation in their regulatory sequence. Finally, we demonstrate an inverse correlation between the rate of evolution of transcription factors and the number of genes they regulate. Our predictions are available at http://www.physics.rockefeller.edu/([tilde-see text])siggia.
Figures
Figure 1
Phylogeny of relevant bacterial species. The three-letter abbreviations are as follows: eco, Escherichia coli K12 (genbank entry NC_000913); stm, Salmonella typhimurium LT2 (
genome.wustl.edu/gsc/bacterial/salmonella.shtml
); kpn, Klebsiella pneumoniae MGH78578 (
genome.wustl.edu/gsc/Projects/bacterial/klebsiella.shtml
); ype, Yersinia pestis CO-92 (
www.sanger.ac.uk/Projects/Y\_pestis/
); vcb, Vibrio cholerae N16961 (genbank NC_002505 and NC_002506); hin, Haemophilus influenzae Rd (genbank NC_000907). The phylogenetic tree is based on 16S ribosomal RNA sequences. H. influenzae is shown only for comparative purposes and was not analyzed in our study.
Figure 2
The probability profiles for the orthologous region upstream of the gene lpdA (lipoamide dehydrognease (NADH). The abscissa is in bp units, and the start codon for lpdA begins at position 325. In (a), κ = 0 for all species, whereas in (b) it is optimized separately in each case (as explained in the text), which yields κ = 0.006, 0.003, 0.01, and 0.06 for kpn, stm, vch, and ype, respectively. The two known factor binding sites for sigma 70 (rpoD17) and an anaerobic factor arcA are marked. In (b), the predictions of McCue et al. (2001) are marked with “W” and the remaining bars are our predictions from the summed profiles.
Figure 3
The probability profiles for the intergenic region between the conserved divergently transcribed pair of E. coli genes, yfhD to the left and purL to the right, whose 5′ end begins at position = 396. An optimal κ = 0.006, 0.001, 0, 0.003 was determined for kpn, stm, vch, and ype, respectively. There is only one documented binding site for purine repressor (purR). The predictions of McCue et al. (2001) for both genes are combined without distinction and labeled with “W”.
Figure 4
Normalized score histograms of genes with known function and genes with unknown function.
Figure 5
Protein conservation and DNA binding specificity. The plot shows (1-PID) versus DNA binding specificity x Eq. (5). Each data point corresponds to one of the 51 E. coli transcription factors which has an ortholog in Vibrio cholera. The straight line shown is a linear fit with slope 0.086 ± 0.005. Note that there is an upper cutoff of 0.7 in. (1-PID) since by definition, all orthologs have a PID of at least 0.3. The two obvious outliers at x = 3.4 and (1-PID) ∼ 0.7 are FarR and SoxS. Note that for some of the factors (e.g., FarR), only very few binding sites are known; that is, our estimate of the binding specificity has a large error.
Similar articles
- Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes.
McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. McCue L, et al. Nucleic Acids Res. 2001 Feb 1;29(3):774-82. doi: 10.1093/nar/29.3.774. Nucleic Acids Res. 2001. PMID: 11160901 Free PMC article. - A model of evolution with constant selective pressure for regulatory DNA sites.
Enikeeva FN, Kotelnikova EA, Gelfand MS, Makeev VJ. Enikeeva FN, et al. BMC Evol Biol. 2007 Jul 27;7:125. doi: 10.1186/1471-2148-7-125. BMC Evol Biol. 2007. PMID: 17662135 Free PMC article. - PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.
Siddharthan R, Siggia ED, van Nimwegen E. Siddharthan R, et al. PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article. - Evolution of cis-regulatory sequences in Drosophila.
He X, Sinha S. He X, et al. Methods Mol Biol. 2010;674:283-96. doi: 10.1007/978-1-60761-854-6_18. Methods Mol Biol. 2010. PMID: 20827599 Review. - Current status and future perspectives on the evolution of cis-regulatory elements in plants.
Yocca AE, Edger PP. Yocca AE, et al. Curr Opin Plant Biol. 2022 Feb;65:102139. doi: 10.1016/j.pbi.2021.102139. Epub 2021 Nov 24. Curr Opin Plant Biol. 2022. PMID: 34837823 Review.
Cited by
- Dynamics of genetic variation in transcription factors and its implications for the evolution of regulatory networks in Bacteria.
Ali F, Seshasayee ASN. Ali F, et al. Nucleic Acids Res. 2020 May 7;48(8):4100-4114. doi: 10.1093/nar/gkaa162. Nucleic Acids Res. 2020. PMID: 32182360 Free PMC article. - Comparative analysis of LytS/LytTR-type histidine kinase/response regulator systems in γ-proteobacteria.
Behr S, Brameyer S, Witting M, Schmitt-Kopplin P, Jung K. Behr S, et al. PLoS One. 2017 Aug 10;12(8):e0182993. doi: 10.1371/journal.pone.0182993. eCollection 2017. PLoS One. 2017. PMID: 28796832 Free PMC article. - Control of MarRAB Operon in Escherichia coli via Autoactivation and Autorepression.
Prajapat MK, Jain K, Saini S. Prajapat MK, et al. Biophys J. 2015 Oct 6;109(7):1497-508. doi: 10.1016/j.bpj.2015.08.017. Biophys J. 2015. PMID: 26445450 Free PMC article. - Following the Footsteps of Chlamydial Gene Regulation.
Domman D, Horn M. Domman D, et al. Mol Biol Evol. 2015 Dec;32(12):3035-46. doi: 10.1093/molbev/msv193. Epub 2015 Sep 30. Mol Biol Evol. 2015. PMID: 26424812 Free PMC article. - An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors.
Jin J, He K, Tang X, Li Z, Lv L, Zhao Y, Luo J, Gao G. Jin J, et al. Mol Biol Evol. 2015 Jul;32(7):1767-73. doi: 10.1093/molbev/msv058. Epub 2015 Mar 6. Mol Biol Evol. 2015. PMID: 25750178 Free PMC article.
References
- Bailey, T. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings ISMB'94, pp. 28–36. - PubMed
- Blanchette, M., Schwikowski, B., and Tompa, M. 2000. An exact algorithm to identify motifs in orthologous sequences from multiple species. Proceedings of ISMB2000, pp. 37–45. - PubMed
- Bussemaker HJ, Li H, Siggia ED. Regulatory element detection using correlation with genome-wide mRNA expression data. Nat Genetics. 2001;2:167–171. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources