Decoding Human Regulatory Circuits (original) (raw)
- William Thompson1,5,
- Michael J. Palumbo1,
- Wyeth W. Wasserman2,
- Jun S. Liu3, and
- Charles E. Lawrence1,4
- 1_Center for Bioinformatics, The Wadsworth Center, New York State Department of Health, Albany, New York 12208, USA_
- 2_Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V5Z 4H4, Canada_
- 3_Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA_
- 4_Computer Science Department, Rensselaer Polytechnic Institute, Troy, New York 12180, USA_
Abstract
Clusters of transcription factor binding sites (TFBSs) which direct gene expression constitute _cis_-regulatory modules (CRMs). We present a novel algorithm, based on Gibbs sampling, which locates, de novo, the cis features of these CRMs, their component TFBSs, and the properties of their spatial distribution. The algorithm finds 69% of experimentally reported TFBSs and 85% of the CRMs in a reference data set of regions upstream of genes differentially expressed in skeletal muscle cells. A discriminant procedure based on the output of the model specifically discriminated regulatory sequences in muscle-specific genes in an independent test set. Application of the method to the analysis of 2710 10-kb fragments upstream of annotated human genes identified 17 novel candidate modules with a false discovery rate ≤0.05, demonstrating the applicability of the method to genome-scale data.
Footnotes
[Supplemental material is available online at www.genome.org.\]
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2589004.
↵5 Corresponding author. E-MAIL thompson{at}wadsworth.org; FAX (518) 402-4623.
- Accepted July 22, 2004.
- Received March 17, 2004.
Cold Spring Harbor Laboratory Press