DBD: a transcription factor prediction database - PubMed (original) (raw)
DBD: a transcription factor prediction database
Sarah K Kummerfeld et al. Nucleic Acids Res. 2006.
Abstract
Regulation of gene expression influences almost all biological processes in an organism; sequence-specific DNA-binding transcription factors are critical to this control. For most genomes, the repertoire of transcription factors is only partially known. Hitherto transcription factor identification has been largely based on genome annotation pipelines that use pairwise sequence comparisons, which detect only those factors similar to known genes, or on functional classification schemes that amalgamate many types of proteins into the category of 'transcription factor'. Using a novel transcription factor identification method, the DBD transcription factor database fills this void, providing genome-wide transcription factor predictions for organisms from across the tree of life. The prediction method behind DBD identifies sequence-specific DNA-binding transcription factors through homology using profile hidden Markov models (HMMs) of domains. Thus, it is limited to factors that are homologus to those HMMs. The collection of HMMs is taken from two existing databases (Pfam and SUPERFAMILY), and is limited to models that exclusively detect transcription factors that specifically recognize DNA sequences. It does not include basal transcription factors or chromatin-associated proteins, for instance. Based on comparison with experimentally verified annotation, the prediction procedure is between 95% and 99% accurate. Between one quarter and one-half of our genome-wide predicted transcription factors represent previously uncharacterized proteins. The DBD (www.transcriptionfactor.org) consists of predicted transcription factor repertoires for 150 completely sequenced genomes, their domain assignments and the hand curated list of DNA-binding domain HMMs. Users can browse, search or download the predictions by genome, domain family or sequence identifier, view families of transcription factors based on domain architecture and receive predictions for a protein sequence.
Figures
Figure 1
Transcription factor prediction procedure. We begin with a set of proteins, shown as horizontal lines. For example, the initial set of proteins may be a whole proteome. Each sequence is searched against the SUPERFAMILY and Pfam HMM libraries. A domain is assigned to a particular protein when one of the HMMs matches a region of sequence with an _E_-value less than or equal to 0.001 for SUPERFAMILY or greater than or equal to the trusted cutoff for PFAM. Assigned domains are shown as coloured boxes where the colour indicates the family. For example, the small dark-blue boxes represent the Zinc finger C2H2 type DNA-binding domains. Proteins with at least one DNA-binding domain assigned are selected as putative transcription factors. The designation of DNA-binding is based on our manual curation of Pfam and SUPERFAMILY models.
Figure 2
DBD: Yeast predictions screen-shot. Each predicted transcription factor is listed with two rows for the SUPERFAMILY and PFAM domain architectures. Domains are represented as rectangles, coloured according to their family and horizontally located based on their position in the amino acid sequence. Clicking on a domain takes the user directly to that family in the relevant domain database. Proteins are ordered based on their domain architecture. For ease of navigation (in particular for large genomes), the list of transcription factors is split into pages with 50 entries per page by default. Users can navigate between pages using previous/next or clicking on a page number.
Figure 3
Number of genes in each of 151 genomes versus transcription factor predictions. The Number of genes (_x_-axis, log-scale) is plotted against the number of predicted transcription factors (_y_-axis, log-scale). Each splice variant is counted independently. (See database website for a list of genomes considered.)
Similar articles
- DBD--taxonomically broad transcription factor predictions: new content and functionality.
Wilson D, Charoensawan V, Kummerfeld SK, Teichmann SA. Wilson D, et al. Nucleic Acids Res. 2008 Jan;36(Database issue):D88-92. doi: 10.1093/nar/gkm964. Epub 2007 Dec 11. Nucleic Acids Res. 2008. PMID: 18073188 Free PMC article. - webPRC: the Profile Comparer for alignment-based searching of public domain databases.
Brandt BW, Heringa J. Brandt BW, et al. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W48-52. doi: 10.1093/nar/gkp279. Epub 2009 May 6. Nucleic Acids Res. 2009. PMID: 19420063 Free PMC article. - Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.
Gough J, Karplus K, Hughey R, Chothia C. Gough J, et al. J Mol Biol. 2001 Nov 2;313(4):903-19. doi: 10.1006/jmbi.2001.5080. J Mol Biol. 2001. PMID: 11697912 - An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.
[No authors listed] [No authors listed] Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review. - Genomic repertoires of DNA-binding transcription factors across the tree of life.
Charoensawan V, Wilson D, Teichmann SA. Charoensawan V, et al. Nucleic Acids Res. 2010 Nov;38(21):7364-77. doi: 10.1093/nar/gkq617. Epub 2010 Jul 30. Nucleic Acids Res. 2010. PMID: 20675356 Free PMC article. Review.
Cited by
- The origin, deployment, and evolution of a plant-parasitic nematode effectorome.
Molloy B, Shin DS, Long J, Pellegrin C, Senatori B, Vieira P, Thorpe PJ, Damm A, Ahmad M, Vermeulen K, Derevnina L, Wei S, Sperling A, Reyes Estévez E, Bruty S, de Souza VHM, Kranse OP, Maier T, Baum T, Eves-van den Akker S. Molloy B, et al. PLoS Pathog. 2024 Jul 29;20(7):e1012395. doi: 10.1371/journal.ppat.1012395. eCollection 2024 Jul. PLoS Pathog. 2024. PMID: 39074142 Free PMC article. - Transcription factor expression landscape in Drosophila embryonic cell lines.
Drewell RA, Klonaros D, Dresch JM. Drewell RA, et al. BMC Genomics. 2024 Mar 23;25(1):307. doi: 10.1186/s12864-024-10241-1. BMC Genomics. 2024. PMID: 38521929 Free PMC article. - TFCheckpoint database update, a cross-referencing system for transcription factors from human, mouse and rat.
Acencio ML, Vazquez M, Chawla K, Lægreid A, Kuiper M. Acencio ML, et al. Nucleic Acids Res. 2024 Jan 5;52(D1):D334-D344. doi: 10.1093/nar/gkad1030. Nucleic Acids Res. 2024. PMID: 37992291 Free PMC article. - PredicTF: prediction of bacterial transcription factors in complex microbial communities using deep learning.
Oliveira Monteiro LM, Saraiva JP, Brizola Toscan R, Stadler PF, Silva-Rocha R, Nunes da Rocha U. Oliveira Monteiro LM, et al. Environ Microbiome. 2022 Feb 8;17(1):7. doi: 10.1186/s40793-021-00394-x. Environ Microbiome. 2022. PMID: 35135629 Free PMC article. - Insights into the Lignocellulose-Degrading Enzyme System of Humicola grisea var. thermoidea Based on Genome and Transcriptome Analysis.
Steindorff AS, Serra LA, Formighieri EF, de Faria FP, Poças-Fonseca MJ, de Almeida JRM. Steindorff AS, et al. Microbiol Spectr. 2021 Oct 31;9(2):e0108821. doi: 10.1128/Spectrum.01088-21. Epub 2021 Sep 15. Microbiol Spectr. 2021. PMID: 34523973 Free PMC article.
References
- Salgado H., Gama-Castro S., Martinez-Antonio A., Diaz-Peredo E., Sanchez-Solano F., Peralta-Gil M., Garcia-Alonso D., Jimenez-Jacinto V., Santos-Zavaleta A., Bonavides-Martinez C., Collado-Vides J. RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acids Res. 2004;32:D303–D306. - PMC - PubMed
- Kanamori M., Konno H., Osato N., Kawai J., Hayashizaki Y., Suzuki H. A genome-wide and nonredundant mouse transcription factor database. Biochem. Biophys. Res. Commun. 2004;322:787–793. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources