Identification of regulatory elements using a feature selection method - PubMed (original) (raw)
Identification of regulatory elements using a feature selection method
Sündüz Keleş et al. Bioinformatics. 2002 Sep.
Abstract
Motivation: Many methods have been described to identify regulatory motifs in the transcription control regions of genes that exhibit similar patterns of gene expression across a variety of experimental conditions. Here we focus on a single experimental condition, and utilize gene expression data to identify sequence motifs associated with genes that are activated under this experimental condition. We use a linear model with two-way interactions to model gene expression as a function of sequence features (words) present in presumptive transcription control regions. The most relevant features are selected by a feature selection method called stepwise selection with monte carlo cross validation. We apply this method to a publicly available dataset of the yeast Saccharomyces cerevisiae, focussing on the 800 basepairs immediately upstream of each gene's translation start site (the upstream control region (UCR)).
Results: We successfully identify regulatory motifs that are known to be active under the experimental conditions analyzed, and find additional significant sequences that may represent novel regulatory motifs. We also discuss a complementary method that utilizes gene expression data from a single microarray experiment and allows averaging over variety of experimental conditions as an alternative to motif finding methods that act on clusters of co-expressed genes.
Availability: The software is available upon request from the first author or may be downloaded from http://www.stat.berkeley.edu/\~sunduz.
Contact: keles@stat.berkeley.edu
Similar articles
- Regulatory motif finding by logic regression.
Keles S, van der Laan MJ, Vulpe C. Keles S, et al. Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027 - Identification of DNA regulatory motifs using Bayesian variable selection.
Tadesse MG, Vannucci M, Liò P. Tadesse MG, et al. Bioinformatics. 2004 Nov 1;20(16):2553-61. doi: 10.1093/bioinformatics/bth282. Epub 2004 Apr 29. Bioinformatics. 2004. PMID: 15117754 - Efficiently finding regulatory elements using correlation with gene expression.
Bannai H, Inenaga S, Shinohara A, Takeda M, Miyano S. Bannai H, et al. J Bioinform Comput Biol. 2004 Jun;2(2):273-88. doi: 10.1142/s0219720004000612. J Bioinform Comput Biol. 2004. PMID: 15297982 - CLICK and EXPANDER: a system for clustering and visualizing gene expression data.
Sharan R, Maron-Katz A, Shamir R. Sharan R, et al. Bioinformatics. 2003 Sep 22;19(14):1787-99. doi: 10.1093/bioinformatics/btg232. Bioinformatics. 2003. PMID: 14512350 - Regulatory sequence analysis: application to the interpretation of gene expression.
Vilo J, Kivinen K. Vilo J, et al. Eur Neuropsychopharmacol. 2001 Dec;11(6):399-411. doi: 10.1016/s0924-977x(01)00117-1. Eur Neuropsychopharmacol. 2001. PMID: 11704417 Review.
Cited by
- Computational analyses of eukaryotic promoters.
Zhang MQ. Zhang MQ. BMC Bioinformatics. 2007 Sep 27;8 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-8-S6-S3. BMC Bioinformatics. 2007. PMID: 17903284 Free PMC article. Review. - Predicting gene expression in T cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models.
Costa IG, Roider HG, do Rego TG, de Carvalho Fde A. Costa IG, et al. BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S29. doi: 10.1186/1471-2105-12-S1-S29. BMC Bioinformatics. 2011. PMID: 21342559 Free PMC article. - Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.
Liu KY, Zhou X, Kan K, Wong ST. Liu KY, et al. Neuroinformatics. 2006 Winter;4(1):95-117. doi: 10.1385/NI:4:1:95. Neuroinformatics. 2006. PMID: 16595861 - Practical strategies for discovering regulatory DNA sequence motifs.
MacIsaac KD, Fraenkel E. MacIsaac KD, et al. PLoS Comput Biol. 2006 Apr;2(4):e36. doi: 10.1371/journal.pcbi.0020036. PLoS Comput Biol. 2006. PMID: 16683017 Free PMC article. No abstract available. - Statistical methods for identifying yeast cell cycle transcription factors.
Tsai HK, Lu HH, Li WH. Tsai HK, et al. Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13532-7. doi: 10.1073/pnas.0505874102. Epub 2005 Sep 12. Proc Natl Acad Sci U S A. 2005. PMID: 16157877 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases