Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers - PubMed (original) (raw)
Comparative Study
Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers
Dmitri A Papatsenko et al. Genome Res. 2002 Mar.
Abstract
The early developmental enhancers of Drosophila melanogaster comprise one of the most sophisticated regulatory systems in higher eukaryotes. An elaborate code in their DNA sequence translates both maternal and early embryonic regulatory signals into spatial distribution of transcription factors. One of the most striking features of this code is the redundancy of binding sites for these transcription factors (BSTF). Using this redundancy, we explored the possibility of predicting functional binding sites in a single enhancer region without any prior consensus/matrix description or evolutionary sequence comparisons. We developed a conceptually simple algorithm, Scanseq, that employs an original statistical evaluation for identifying the most redundant motifs and locates the position of potential BSTF in a given regulatory region. To estimate the biological relevance of our predictions, we built thorough literature-based annotations for the best-known Drosophila developmental enhancers and we generated detailed distribution maps for the most robust binding sites. The high statistical correlation between the location of BSTF in these experiment-based maps and the location predicted in silico by Scanseq confirmed the relevance of our approach. We also discuss the definition of true binding sites and the possible biological principles that govern patterning of regulatory regions and the distribution of transcriptional signals.
Figures
Figure 1
Strategies for BSTF map construction. Two strategies for constructing maps of binding sites rely on a matrix search for experimentally defined binding sites for transcription factors (BSTF). The first strategy (refined map path) is used to verify the exact location and size of the experimental sites. A second strategy (consistent map path) takes into account both the presence of the experimentally verified sites and the matrix score of found matches. The initial map is the raw footprint data from a literature source.
Figure 2
Scanseq
algorithm. Initial search is performed with words of length m with 0-k mismatches. For each word found in the sequence, the corresponding motif (word set), is refined by positional weight matrix (PWM), and is statistically evaluated through Z score. In the final stage, Z scores for motifs within a range of m and k are compared and a predicted map is generated. Note that the PWM in the
Scanseq
algorithm is not the same as in the strategy of BSTF map construction, and it does not include any a priori information about binding motifs.
Figure 3
Sensitivity of
Scanseq
to the parameters of the initial search. Z-score profile plot (_X_-axis is the position in the sequence) is shown for the even-skipped stripe 2 enhancer using a range of length (m) and divergence (k). Each horizontal line corresponds to a combination of m (7 bp–10 bp) and k (1–3 mismatches) that are shown on the left side. Z-score values are represented by the color scale (bottom left). The bottom bar shows the distribution of binding sites for transcription factors (BSTF; consistent map) in the even-skipped stripe 2 enhancer. The best statistical correlation with the consistent map for eve stripe 2 was observed at the following parameters: {m = 7; k = 1}, {m = 8; k = 1}, and {m = 9; k = 2}.
Figure 4
(see figure on preceding page)
Scanseq
predictions. Z-score profile plots and maps of predictions are shown for even-skipped stripe 2 (panels A, B), hairy stripe 7 (panels C, D), even-skipped stripe 4+6 (panels E, F), and runt stripe 5 (panels G, H). The plots show the maximum observed Z scores (_Y_-axis) for each position in the sequence (_X_-axis) using a selected parameter range (mmin, mmax, kmax, and c). Panels A, C, E, and G (see parameters and statistics in Table 3) show the results after training on the group-of-10 enhancers. The results of individual trainings (see Table 4) are shown in panels B, D, F, and H. The predicted map is shown below each Z-score profile plot. The blue bars represent the most redundant segments (predicted by
Scanseq
); the red bars represent the established distribution for binding sites for transcription factors (BSTF): Consistent maps for even-skipped stripe 2 (Giant sites were not used in the training), hairy stripe 6, and the virtual maps for even-skipped stripe 4+6 and runt stripe 5 are shown.
Figure 5
Detailed map of predictions for even-skipped stripe 2. The comparison between the
Scanseq
predictions (in red) and the consistent map (in green) shows the efficiency of individual training (panel B) versus training on a group of 10 (panel A). In both cases, periodic sequences (ATCCC)n generated very high statistical scores.
Figure 6
Structure and conservation of tandem repeats. Periodic structures of ∼100-bp region from even-skipped stripe 2 (A, D), even-skipped stripe 4+6 (B), and fushi-tarazu proximal enhancer (C) are revealed by matrix search for Bicoid, Knirps, and Tramtrack, respectively (see also Table 5). The red arrows indicate sites that produce a positional weight matrix score in the 4–6 range (shadow sites). Evolutionary conservation in four species of Drosophila is shown for eve stripe 2 (ATCCC)n (D).
Similar articles
- Molecular dissection of cis-regulatory modules at the Drosophila bithorax complex reveals critical transcription factor signature motifs.
Starr MO, Ho MC, Gunther EJ, Tu YK, Shur AS, Goetz SE, Borok MJ, Kang V, Drewell RA. Starr MO, et al. Dev Biol. 2011 Nov 15;359(2):290-302. doi: 10.1016/j.ydbio.2011.07.028. Epub 2011 Jul 28. Dev Biol. 2011. PMID: 21821017 Free PMC article. - Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura.
Berman BP, Pfeiffer BD, Laverty TR, Salzberg SL, Rubin GM, Eisen MB, Celniker SE. Berman BP, et al. Genome Biol. 2004;5(9):R61. doi: 10.1186/gb-2004-5-9-r61. Epub 2004 Aug 20. Genome Biol. 2004. PMID: 15345045 Free PMC article. - Evolution of developmental genes: molecular microevolution of enhancer sequences at the Ubx locus in Drosophila and its impact on developmental phenotypes.
Phinchongsakuldit J, MacArthur S, Brookfield JF. Phinchongsakuldit J, et al. Mol Biol Evol. 2004 Feb;21(2):348-63. doi: 10.1093/molbev/msh025. Epub 2003 Dec 5. Mol Biol Evol. 2004. PMID: 14660693 - Transcriptional silencing of homeotic genes in Drosophila.
Bienz M, Müller J. Bienz M, et al. Bioessays. 1995 Sep;17(9):775-84. doi: 10.1002/bies.950170907. Bioessays. 1995. PMID: 8763830 Review.
Cited by
- Recent computational approaches to understand gene regulation: mining gene regulation in silico.
Abnizova I, Subhankulova T, Gilks W. Abnizova I, et al. Curr Genomics. 2007 Apr;8(2):79-91. doi: 10.2174/138920207780368150. Curr Genomics. 2007. PMID: 18660846 Free PMC article. - Contrasting patterns of transposable element insertions in Drosophila heat-shock promoters.
Haney RA, Feder ME. Haney RA, et al. PLoS One. 2009 Dec 29;4(12):e8486. doi: 10.1371/journal.pone.0008486. PLoS One. 2009. PMID: 20041194 Free PMC article. - Functional evolution of cis-regulatory modules at a homeotic gene in Drosophila.
Ho MC, Johnsen H, Goetz SE, Schiller BJ, Bae E, Tran DA, Shur AS, Allen JM, Rau C, Bender W, Fisher WW, Celniker SE, Drewell RA. Ho MC, et al. PLoS Genet. 2009 Nov;5(11):e1000709. doi: 10.1371/journal.pgen.1000709. Epub 2009 Nov 6. PLoS Genet. 2009. PMID: 19893611 Free PMC article. - Statistical detection of cooperative transcription factors with similarity adjustment.
Pape UJ, Klein H, Vingron M. Pape UJ, et al. Bioinformatics. 2009 Aug 15;25(16):2103-9. doi: 10.1093/bioinformatics/btp143. Epub 2009 Mar 13. Bioinformatics. 2009. PMID: 19286833 Free PMC article. - Tcf12 and NeuroD1 cooperatively drive neuronal migration during cortical development.
Singh A, Mahesh A, Noack F, Cardoso de Toledo B, Calegari F, Tiwari VK. Singh A, et al. Development. 2022 Feb 1;149(3):dev200250. doi: 10.1242/dev.200250. Epub 2022 Feb 11. Development. 2022. PMID: 35147187 Free PMC article.
References
- Andrioli LP, Vasisht V, Wasserman KT, Oberstein A, Kaplan L, Small S. 42nd Annual Drosophila Research Conference. 2001. The forkhead domain protein slp1 participates in combinatorial repression of even-skipped stripe 2. p. a37. The Genetics Society of America, Washington, D.C.
- Apostolico A, Bock ME, Lonardi S, Xu X. Efficient detection of unusual words. J Comput Biol. 2000;7:71–94. - PubMed
- Arnosti DN, Barolo S, Levine M, Small S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development. 1996;122:205–214. - PubMed
- Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2:28–36. - PubMed
- ————— Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning. 1995;21:51–80.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases