Decoding Human Regulatory Circuits (original) (raw)

  1. William Thompson1,5,
  2. Michael J. Palumbo1,
  3. Wyeth W. Wasserman2,
  4. Jun S. Liu3, and
  5. Charles E. Lawrence1,4
  6. 1_Center for Bioinformatics, The Wadsworth Center, New York State Department of Health, Albany, New York 12208, USA_
  7. 2_Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V5Z 4H4, Canada_
  8. 3_Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA_
  9. 4_Computer Science Department, Rensselaer Polytechnic Institute, Troy, New York 12180, USA_

Abstract

Clusters of transcription factor binding sites (TFBSs) which direct gene expression constitute _cis_-regulatory modules (CRMs). We present a novel algorithm, based on Gibbs sampling, which locates, de novo, the cis features of these CRMs, their component TFBSs, and the properties of their spatial distribution. The algorithm finds 69% of experimentally reported TFBSs and 85% of the CRMs in a reference data set of regions upstream of genes differentially expressed in skeletal muscle cells. A discriminant procedure based on the output of the model specifically discriminated regulatory sequences in muscle-specific genes in an independent test set. Application of the method to the analysis of 2710 10-kb fragments upstream of annotated human genes identified 17 novel candidate modules with a false discovery rate ≤0.05, demonstrating the applicability of the method to genome-scale data.

Footnotes