Detection of functional DNA motifs via statistical over-representation - PubMed (original) (raw)

Comparative Study

. 2004 Feb 26;32(4):1372-81.

doi: 10.1093/nar/gkh299. Print 2004.

Affiliations

Comparative Study

Detection of functional DNA motifs via statistical over-representation

Martin C Frith et al. Nucleic Acids Res. 2004.

Abstract

The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A 2×2 contingency table.

Figure 2

Figure 2

Pictogram representations of the ERE (3) and the Jaspar PPARγ motif (C Burge and F White,

http://genes.mit.edu/pictogram.html

).

Figure 3

Figure 3

Detection by Clover of ERE motifs embedded in random DNA sequences of varying length. In all panels, the _P_-values of the 108 Jaspar motifs are plotted as dots. _P_-values of zero were increased to 0.001 to fit on the log scale. Crosses indicate the PPARγ motif, and circles indicate the six other ERE-like nuclear receptor motifs. (A) Results for 15 ERE-containing sequences with no decoy sequences. (B) Results for 15 ERE-containing sequences with five decoy sequences. (C) Results for 15 ERE-containing sequences with 15 decoy sequences.

Figure 4

Figure 4

Detection by contingency table based methods of EREs embedded in random DNA sequences of varying length. In all panels, the _P_-values of the 108 Jaspar motifs are plotted as dots. Crosses indicate the PPARγ motif, and circles indicate the six other ERE-like nuclear receptor motifs. (A, B, C) Motif counting method. Length 50 sequences were not analyzed because the number of possible locations is <1000 for some motifs, making the 0.1% threshold criterion impossible. (D, E, F) Sequence counting method. (A, D) Results for 15 ERE-containing sequences with no decoy sequences. (B, E) Results for 15 ERE-containing sequences with five decoy sequences. (C, F) Results for 15 ERE-containing sequences with 15 decoy sequences.

Similar articles

Cited by

References

    1. Stormo G.D. (2000). DNA binding sites: representation and discovery. Bioinformatics, 16, 16–23. - PubMed
    1. Pennacchio L.A. and Rubin,E.M. (2001). Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet., 2, 100–109. - PubMed
    1. Frith M.C., Hansen,U., Spouge,J.L. and Weng,Z. (2004). Finding functional sequence elements by multiple local alignment. Nucleic Acids Res., 32, 189–200. - PMC - PubMed
    1. Liu R., McEachin,R.C. and States,D.J. (2003). Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. Genome Res., 13, 654–661. - PMC - PubMed
    1. Aerts S., Thijs,G., Coessens,B., Staes,M., Moreau,Y. and De Moor,B. (2003). Toucan: deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res., 31, 1753–1764. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources