CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets - PubMed (original) (raw)

. 2004 Jul 1;32(Web Server issue):W475-84.

doi: 10.1093/nar/gkh353.

Affiliations

CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets

Suresh Karanam et al. Nucleic Acids Res. 2004.

Abstract

The advent of DNA microarray technology and the sequencing of multiple vertebrate genomes has provided a unique opportunity for the integration of comparative genomics with high-throughput gene expression analysis. Here we describe the conserved transcription factor binding site (CONFAC) software that enables the high-throughput identification of conserved transcription factor binding sites (TFBSs) in the regulatory regions of hundreds of genes at a time (http://morenolab.whitehead.emory.edu/cgi-bin/confac/login.pl). The CONFAC software compares non-coding regulatory sequences between human and mouse genomes to enable identification of conserved TFBSs that are significantly enriched in promoters of gene clusters from microarray analyses compared to sets of unchanging control genes using a Mann-Whitney U-test. Analysis of random gene sets demonstrated that using our approach, over 98% of TFBSs had false positive rates below 5%. As a proof-of-principle, we have validated the CONFAC software using gene sets from four separate microarray studies and identified TFBSs known to be functionally important for regulation of each of the four gene sets.

PubMed Disclaimer

Figures

Figure 1

Figure 1

(A) Schematic of data flow in CONFAC software. The user input is a tab-delimited list of genes of interest. The CONFAC software interfaces with the human and mouse genomes, local pairwise BLAST and local MATCH software to identify TFBSs that are conserved between human and mouse promoter regions. The output is a table of TFBS occurrences for each gene that has at least one conserved TFBS. (B) Identification of significantly enriched TFBSs. Two CONFAC output tables for affected and control gene sets are submitted to a Mann–Whitney _U_-test to identify sites that are significantly overrepresented in the affected gene list compared to controls.

Figure 2

Figure 2

(A) A screenshot of the CONFAC user interface for uploading gene lists. The user can specify core and matrix similarities, and sets of PWMs. (B) A screenshot of the user interface for the Mann–Whitney test for statistical significance. The user can upload their own control datasets or choose from several default control sets. The user also specifies the _P_-value and mean-difference cutoffs for the analysis. (C) A screenshot of the output of the Mann–Whitney test, which lists significant TFBSs, the average frequencies for both sets, the mean difference and the _P_-values.

Figure 3

Figure 3

(A) The average frequency of TFBSs that are significantly enriched in 20 FKHR Class I target genes relative to 41 control genes are graphed for both Class I target genes and control genes. Four FOX sites and the FKHR-responsive IRS site were significantly overrepresented in this DBD-dependent gene set. Error bars represent the standard error for each conserved TFBS in this and all subsequent figures. (B) FOX sites were significantly more frequent in promoters of 20 FKHR Class I activation target genes than in promoters of 24 FKHR Class III repression target genes.

Figure 4

Figure 4

(A) The average frequency of TFBSs that are highly significantly enriched (P < 0.001) in 21 TNF-inducible genes relative to 41 control genes is shown. Five of the most significant (_P_ < 0.001) TFBSs were NF-κB sites. (**B**) The five most significantly enriched TFBSs in promoters of 20 NF-κB target genes from HRS tumor cells were NF-κB sites. STAT sites were also significantly enriched relative to control genes using _P_-value <0.05 and mean difference >0.25.

Figure 5

Figure 5

The average frequency of TFBSs that are significantly enriched in 33 genes strongly upregulated in prostate cancer (23) relative to 41 control genes is shown. Three of the four significant TFBSs were homeobox family sites.

References

    1. Quandt K., Frech,K., Karas,H., Wingender, E. and Werner,T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res., 23, 4878–4884. - PMC - PubMed
    1. Kel A.E., Gossling,E., Reuter,I., Cheremushkin,E., Kel-Margoulis,O.V. and Wingender,E. (2003) MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res., 31, 3576–3579. - PMC - PubMed
    1. Oeltjen J.C., Malley,T.M., Muzny,D.M., Miller,W., Gibbs,R.A. and Belmont,J.W. (1997) Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res., 7, 315–329. - PubMed
    1. Wasserman W.W. and Fickett,J.W. (1998) Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol., 278, 167–181. - PubMed
    1. Fickett J.W. and Wasserman,W.W. (2000) Discovery and modeling of transcriptional regulatory regions. Curr. Opin. Biotechnol., 11, 19–24. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources