Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation - PubMed (original) (raw)

Comparative Study

Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation

Pooja Rawal et al. Genome Res. 2006 May.

Abstract

The role of nonlinear DNA in replication, recombination, and transcription has become evident in recent years. Although several studies have predicted and characterized regulatory elements at the sequence level, very few have investigated DNA structure as regulatory motifs. Here, using G-quadruplex or G4 DNA motifs as a model, we have researched the role of DNA structure in transcription on a genome-wide scale. Analyses of >61,000 open reading frames (ORFs) across 18 prokaryotes show enrichment of G4 motifs in regulatory regions and indicate its predominance within promoters of genes pertaining to transcription, secondary metabolite biosynthesis, and signal transduction. Based on this, we predict that G4 DNA may present regulatory signals. This is supported by conserved G4 motifs in promoters of orthologous genes across phylogenetically distant organisms. We hypothesized a regulatory role of G4 DNA during supercoiling stress, when duplex destabilization may result in G4 formation. This is in line with our observations from target site analysis for 55 DNA-binding proteins in Escherichia coli, which reveals significant (P<0.001) association of G4 motifs with target sites of global regulators FIS and Lrp and the sigma factor RpoD (sigma70). These factors together control >1000 genes in the early growth phase and are believed to be induced by supercoiled DNA. We also predict G4 motif-induced supercoiling sensitivity for >30 operons in E. coli, and our findings implicate G4 DNA in DNA-topology-mediated global gene regulation in E. coli.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Schematic representation of G4 motif. (A) Hydrogen-bonded G-tetrad with K+ (Na+ also stabilizes a G-tetrad); each guanine in this planar array is contributed from different G-runs, which are separated by intervening loops in an intramolecular motif. (B,C) Intramolecular folding pattern showing stem and loop organization in an antiparallel (B) and parallel (C) conformation of a G4 motif, where the planes represent each tetrad unit and are stacked to form the stem of the motif.

Figure 2.

Figure 2.

Putative regulatory regions in prokaryotes are enriched in PG4 motifs. (A) Genome-wide distribution of PG4 motifs within the + strand in 18 prokaryotes showing frequency of the bases forming PG4 motifs in each region expressed as a ratio of the GC frequency of the respective region (_R_PG4/GC) for each organism. (Inset) Median ratio (_R_PG4/GC) for each region calculated from the distribution in the respective regions across all organisms. (Supplemental Table S5 shows the mean and standard deviation, and Supplemental Fig. S3 shows a similar distribution for the − strand.) The intergenic (beyond −200 bp) region includes all intergenic regions except the downstream region between two convergently oriented genes. (B) GC-rich organisms have selected for PG4 motifs in their immediate upstream regions. Ratio of the frequency of PG4 motifs (after controlling for GC% in the respective regions) in the −100-bp region versus beyond −100 bp within the intergenic region shows a high correlation with the GC% of the intergenic region for respective organisms. (C) The motif frequency of intergenic versus intragenic regions does not depend on the GC% of the genome. The ratio-plot for intergenic versus intragenic regions against overall (genome-wide) GC% of the organism shows very low correlation. M. genitalium shows a high ratio (>5.0) because of a very low intergenic basepair length (correlation on excluding M. genitalium was 0.24). (D) The number of PG4 motifs decreases sharply on moving upstream of genes relative to the intragenic regions. Data were plotted from all 61,355 ORFs in 18 organisms within the flanking 500 bases of the start codon of all ORFs. The center of each motif sequence was used for mapping with respect to the start codon (i.e., for a sequence of length n, the n/2-th base was used as its coordinate). (E) Promoter-rich regions have a higher density of PG4 motifs. Intergenic regions separating divergently (promoter-rich) and convergently (possibly promoter-less) oriented gene pairs were mapped in all 18 organisms for comparison. The median of PG4 density (number of bases involved in motif pattern normalized for sequence length of the respective region) is shown along with the density in the intergenic regions (beyond −200 bp, as in A). The difference between the divergent and convergent (P < 0.007) and the divergent and intergenic (P < 0.025) regions was significant, while the difference between the convergent and intergenic regions was not significant (P = 0.199). All statistical comparisons were done in a pairwise mode for the different genomic regions, and significance was estimated using the two-tailed nonparametric Signed Wilcoxon Test. The organism acronyms are as obtained from KEGG and are mentioned in Methods.

Figure 3.

Figure 3.

Genes harboring PG4 motifs in their regulatory region show distinct functional distribution in a comparative analysis comprising 37,974 ORFs from 18 organisms. (A) The distribution of genes with at least one PG4 motif within the −200-bp region is shown as the percentage of total genes in the respective function class—secondary metabolite biosynthesis, transcription, and translation related genes show significant difference (P < 0.004). (B) The intragenic PG4 motif density indicates that the distribution is not significantly different across the functional classes (P = 0.108). The PG4 motif density was calculated as the number of bases involved in motif formation per kilobase of gene length. Two classes, chromatin structure and dynamics and RNA processing and modification, which constitute only 0.054% and 0.09% of the distribution, were not included in the plots. Extracellular structure, nuclear structure, and cytoskeleton genes do not have any motifs in their regulatory regions. Undefined classes like function unknown and general function prediction have been excluded from analysis along with genes not found in the COGS database. All function information was obtained from the COGS database. A plot showing distribution across the functional classes with respect to the total ORFs (5574) with PG4 motifs in −200-bp regions is shown in Supplemental Figure S5.

Figure 4.

Figure 4.

Global regulators Lrp, FIS, and GlpR and sigma factors σ70 and σS are predominantly associated with PG4 motifs in Escherichia coli. We computationally mapped target sites for 55 DNA-binding proteins in the region flanking (100 bp) PG4 motifs present within −200 bp of start codons in the + strand (118 motifs) and − strand (96 motifs). Sites were also mapped to 445 promoter regions (within −200 bp of start codon) devoid of PG4 motifs as a control set. (A) Overall representation of sites (for nine factors with >1% sites) as a percentage of total sites for 55 DNA-binding proteins is shown for the respective regions. (B) Frequency distribution of TFBS. Motifs or promoters (%) were plotted against the number of sites found either flanking the motifs or within the promoter (in case of control set); representative plots for three factors are shown (for others, see Supplemental Fig. S6). Distributions were observed to be significantly (P < 0.001) different for Lrp, RpoD, FIS, RpoS, and GlpR when compared between the + or − strand and the control set, while SoxS, TyrR, Crp, and OmpR did not show a statistically significant difference (_P_ > 0.05). (C) Target sites (median) per motif (+/− strand) or promoter (control set) are shown for five factors with significantly different distribution. Nonparametric comparisons were done using the Mann-Whitney U-test; the _P_-values for respective comparisons are shown in Supplemental Table S8.

Similar articles

Cited by

References

    1. Arimondo P.B., Riou J.F., Mergny J.L., Tazi J., Sun J.S., Garestier T., Helene C., Riou J.F., Mergny J.L., Tazi J., Sun J.S., Garestier T., Helene C., Mergny J.L., Tazi J., Sun J.S., Garestier T., Helene C., Tazi J., Sun J.S., Garestier T., Helene C., Sun J.S., Garestier T., Helene C., Garestier T., Helene C., Helene C. Interaction of human DNA topoisomerase I with G-quartet structures. Nucleic Acids Res. 2000;28:4832–4838. - PMC - PubMed
    1. Bachrati C.Z., Hickson I.D., Hickson I.D. RecQ helicases: Suppressors of tumorigenesis and premature aging. Biochem. J. 2003;374:577–606. - PMC - PubMed
    1. Bacolla A., Wells R.D., Wells R.D. Non-B DNA conformations, genomic rearrangements, and human disease. J. Biol. Chem. 2004;279:47411–47414. - PubMed
    1. Balagurumoorthy P., Brahmachari S.K., Brahmachari S.K. Structure and stability of human telomeric sequence. J. Biol. Chem. 1994;269:21858–21869. - PubMed
    1. Balke V.L., Gralla J.D., Gralla J.D. Changes in the linking number of supercoiled DNA accompany growth transitions in. Escherichia coli. J. Bacteriol. 1987;169:4499–4506. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources