JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update (original) (raw)
Journal Article
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
Search for other works by this author on:
,
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
*To whom correspondence should be addressed. Boris Lenhard: Tel: +47 55584362 ; Fax:
+47 55584295
; Email: boris.lenhard@bccs.uib.no Albin Sandelin: Tel: +45 52321285; Fax: +45 35325669; Email: albin@binf.ku.dk
Search for other works by this author on:
1 Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway, 2 The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2100 København Ø, 3 Informatics and Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, 4 Center for Comparative Genomics, Institute of Biology, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark and 5 Sars Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
*To whom correspondence should be addressed. Boris Lenhard: Tel: +47 55584362 ; Fax:
+47 55584295
; Email: boris.lenhard@bccs.uib.no Albin Sandelin: Tel: +45 52321285; Fax: +45 35325669; Email: albin@binf.ku.dk
Search for other works by this author on:
The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.
Received:
14 September 2007
Revision received:
15 October 2007
Accepted:
16 October 2007
Published:
15 November 2007
Cite
Jan Christian Bryne, Eivind Valen, Man-Hung Eric Tang, Troels Marstrand, Ole Winther, Isabelle da Piedade, Anders Krogh, Boris Lenhard, Albin Sandelin, JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update, Nucleic Acids Research, Volume 36, Issue suppl_1, 1 January 2008, Pages D102–D106, https://doi.org/10.1093/nar/gkm955
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for transcription factors and other DNA patterns. With its third major release, JASPAR has been expanded and equipped with additional functions aimed at both casual and power users. The heart of the JASPAR database—the JASPAR CORE sub-database—has increased by 12% in size, and three new specialized sub-databases have been added. New functions include clustering of matrix models by similarity, generation of random matrices by sampling from selected sets of existing models and a language-independent Web Service applications programming interface for matrix retrieval. JASPAR is available at http://jaspar.genereg.net .
INTRODUCTION
Computational analysis of regulatory properties of DNA is most often based on the use of matrix models describing binding preferences of transcription factors, or other DNA patterns. Such matrices are based on sets of known or inferred sites for a DNA-binding protein, and can be scanned over genomic sequences to predict novel binding sites ( 1 , 2 ). JASPAR is the most comprehensive open-access database holding such models. The heart of JASPAR is the JASPAR CORE sub database, holding curated, non-redundant matrix models from multi-cellular eukaryotes. The methodology for JASPAR CORE curation has been described previously ( 3 ). JASPAR CORE is now a standard resource in gene regulation bioinformatics and is used as a matrix set in a wide variety of other services [for instance ( 4–9 )], and large-scale projects ( 10 , 11 ). Besides JASPAR CORE, the database contains several sub-databases (JASPAR Collections) holding matrix models produced by different methods and for different purposes ( Table 1 ).
Database | Number of models | Scope | Species coverage | When to use |
---|---|---|---|---|
JASPAR CORE | 138 | Curated, non-redundant matrix models | Multi-cellular eukaryotes | ‘Standard’ promoter analysis |
JASPAR FAM | 11 | Familial ‘consensus’ patterns for major structural families of transcription factors | Multi-cellular eukaryotes | Matrix-to-matrix comparison and classification, or as prior knowledge for pattern finders |
JASPAR PHYLOFACTS | 174 | Evolutionary conserved patterns in 5′ promoter regions | Multi-cellular eukaryotes | As a complement to JASPAR CORE for large-scale studies |
JASPAR POLII | 13 | Core promoter element models | Multi-cellular eukaryotes | Core promoter analysis |
JASPAR CNE | 233 | Motifs overrepresented in vertebrate highly conserved non-coding elements | Human | Analysis of regulatory content of long-range enhancers |
JASPAR SPLICE | 6 a | Splice sites | Human a | Splice site analysis |
Database | Number of models | Scope | Species coverage | When to use |
---|---|---|---|---|
JASPAR CORE | 138 | Curated, non-redundant matrix models | Multi-cellular eukaryotes | ‘Standard’ promoter analysis |
JASPAR FAM | 11 | Familial ‘consensus’ patterns for major structural families of transcription factors | Multi-cellular eukaryotes | Matrix-to-matrix comparison and classification, or as prior knowledge for pattern finders |
JASPAR PHYLOFACTS | 174 | Evolutionary conserved patterns in 5′ promoter regions | Multi-cellular eukaryotes | As a complement to JASPAR CORE for large-scale studies |
JASPAR POLII | 13 | Core promoter element models | Multi-cellular eukaryotes | Core promoter analysis |
JASPAR CNE | 233 | Motifs overrepresented in vertebrate highly conserved non-coding elements | Human | Analysis of regulatory content of long-range enhancers |
JASPAR SPLICE | 6 a | Splice sites | Human a | Splice site analysis |
Database | Number of models | Scope | Species coverage | When to use |
---|---|---|---|---|
JASPAR CORE | 138 | Curated, non-redundant matrix models | Multi-cellular eukaryotes | ‘Standard’ promoter analysis |
JASPAR FAM | 11 | Familial ‘consensus’ patterns for major structural families of transcription factors | Multi-cellular eukaryotes | Matrix-to-matrix comparison and classification, or as prior knowledge for pattern finders |
JASPAR PHYLOFACTS | 174 | Evolutionary conserved patterns in 5′ promoter regions | Multi-cellular eukaryotes | As a complement to JASPAR CORE for large-scale studies |
JASPAR POLII | 13 | Core promoter element models | Multi-cellular eukaryotes | Core promoter analysis |
JASPAR CNE | 233 | Motifs overrepresented in vertebrate highly conserved non-coding elements | Human | Analysis of regulatory content of long-range enhancers |
JASPAR SPLICE | 6 a | Splice sites | Human a | Splice site analysis |
Database | Number of models | Scope | Species coverage | When to use |
---|---|---|---|---|
JASPAR CORE | 138 | Curated, non-redundant matrix models | Multi-cellular eukaryotes | ‘Standard’ promoter analysis |
JASPAR FAM | 11 | Familial ‘consensus’ patterns for major structural families of transcription factors | Multi-cellular eukaryotes | Matrix-to-matrix comparison and classification, or as prior knowledge for pattern finders |
JASPAR PHYLOFACTS | 174 | Evolutionary conserved patterns in 5′ promoter regions | Multi-cellular eukaryotes | As a complement to JASPAR CORE for large-scale studies |
JASPAR POLII | 13 | Core promoter element models | Multi-cellular eukaryotes | Core promoter analysis |
JASPAR CNE | 233 | Motifs overrepresented in vertebrate highly conserved non-coding elements | Human | Analysis of regulatory content of long-range enhancers |
JASPAR SPLICE | 6 a | Splice sites | Human a | Splice site analysis |
Here we present the recent JASPAR expansion, which includes a significant increase of the JASPAR CORE content and an addition of three new sub-databases focusing on core promoter patterns, splice sites and motifs detected in vertebrate highly conserved non-coding elements, respectively. In addition, we present several unique functional features in the web interface aimed at both casual and power users, including statistics on expected number of predictions each matrix will yield at several different thresholds in random sequences generated by three commonly encountered sequence background models, dynamic clustering of matrices by similarity and generation of random matrices using a selected set of matrices as background model.
RESULTS
Here we briefly describe the new data and functional features; more detailed descriptions are available at the documentation at the web site.
Expansion of JASPAR CORE
The JASPAR CORE database holds a curated set of transcription factor-binding profiles from multi-cellular eukaryotes: this is a unique feature with respect to databases of similar scope. We have extended JASPAR CORE with 15 new, high-quality profiles from recent experimental literature, increasing the total number of JASPAR CORE models to 138 ( Table 1 ). In addition, annotation for all models in the database has been updated [e.g. to standard gene symbols from Entrez Gene ( 12 )] and expanded. Prompted by user feedback, several existing matrices have been updated or corrected.
New sub-databases
Existing and new sub-databases within JASPAR and their specific features are described in Table 1 . Since the last update, we have added three new sub-databases, which are briefly described below (see the web documentation for details):
JASPAR POLII
The large body of novel data pertaining transcription start sites ( 13 , 14 ) has triggered a new interest in computational studies of core promoters. The JASPAR POLII sub-database holds 13 known DNA patterns linked to RNA polymerase II core promoters, such as the Inr and BRE elements, each based on experimental evidence: each model must be constructed using five or more experimentally verified sites. An important difference to the transcription factor profiles in JASPAR CORE is that patterns here do not necessarily have a specified protein that binds them [See Ref. ( 15 ) for a review on core promoter patterns]. When possible, profiles were extended by 2 nt more than the core motif. We consistently report positions relative to the TSS as the position of 5′ and 3′ edge of the matrix.
JASPAR CNE
Highly conserved non-coding elements (CNEs) are a distinctive feature of metazoan genomes. Many of them can be shown to act as long-range enhancers that drive expression of genes that are themselves regulators of core aspects of metazoan development and differentiation. Since they act as regulatory inputs, attempts at deciphering the regulatory content of these elements have started ( 16–18 ). JASPAR CNE is a collection of 233 matrix profiles derived by Xie et al. ( 19 ) by clustering of overrepresented motifs from human conserved non-coding elements. While the biochemical and biological role of most of these patterns is still unknown, Xie et al. have shown that the most abundant ones correspond to known DNA-binding proteins, among them is the insulator-binding protein CTCF. These matrix profiles will be useful for further characterization of regulatory inputs in long-range developmental gene regulation in vertebrates.
JASPAR SPLICE
This small collection contains matrix profiles of human canonical and non-canonical splice sites, as matching donor:acceptor pairs. It currently contains only six highly reliable profiles (two canonical and four non-canonical) obtained from human genome ( 20 ). In the future, we shall include additional eukaryotic species, as well as new models for exonic splicing enhancers (ESE) and inhibitors (ESI).
Extended functionality
In addition to data extension, we have implemented a number of functional improvements in the web interface of the JASPAR database. These range from static statistics, such as expected number of hits on typical DNA sequence for any factor, to dynamic tools for similarity-based profile clustering and for generating random profiles based on a subset of known profiles.
Web service interface
The JASPAR database can now be reached remotely through a new Web Service interface. Current functionality includes retrieval of profiles by name, by identifier and by searching profile annotations. The purpose of providing an external application programming interface (API) is to simplify the utilization of JASPAR in distributed applications and in scientific workflows created in workflow editors like Triana ( 21 ), BPEL ( http://www.bpelsource.com/ ) or Taverna ( 22 ). Other benefits include platform- and language-independent access, as well as constant up-to-date access to the database over time. The API is implemented as a WS-I compliant Web Service, identical to the technology used for the services made available through the EMBRACE Network of Excellence ( www.embracegrid.info ), and the Web Service technology chosen by the European Bioinformatics Institute (EBI) ( 23 ). Its basic usage is described in tutorials at the JASPAR web site. The WSDL describing this service can be found at: http://api.bioinfo.no/wsdl/JasparDB.wsdl . Further information about the Web Service, including example clients in Java and Python, is available on the Jaspar web site and in the WSDL file.
Expected predictions/base-pair statistics for all models
An important problem with genome-wide scanning with matrix models is the limited information content in a typical matrix, resulting in numerous spurious hits just due to sequence background ( 1 , 2 ). The number of false positives varies considerably between factors and also depends on what type of sequences that models are applied to, user-defined cutoffs and to a more limited extent on the type of scoring scheme used. For a first-glance assessment of the rate of spurious predictions of a given model, we apply the model to three distinct sequence sets: known promoters from the EPD database ( 24 ), CpG islands and randomly selected genomic DNA, respectively. For different score thresholds, we plot the mean number of hits per 1000 nt for each sequence set. The resulting bar plots are available for each JASPAR matrix ( Figure 1 ).
Figure 1.
New features in the JASPAR database web interface. (A) A listing of matrices in the JASPAR-CORE database resulting from selection of MADS and bHLH-type factors. These models are used in the clustering analysis in panel C. (B) A pop-up window showing detail information on the MA0001 model, with expected predictions/bp statistics. (C) Dynamic clustering of selected profiles. At the top, a dendrogram describing the similarities of the input profiles is shown. Clusters of similar modes are merged into familial binding profiles, shown below. In this case, two larger clusters are produced, corresponding to bHLH and MADS type matrices. Two smaller clusters correspond to outliers in both groups.
Dynamic clustering by similarity and creation of familial binding profiles from a given profile subset
Many transcription factors bind similar targets and it is often helpful to cluster similar binding profiles to generate familial binding profiles—models describing a set of matrices ( 25 ). Part of this problem is matrix profile comparison and alignments, explored by several researchers ( 25–30 ). Recently, Mahony et al. ( 27 , 28 ) made a comprehensive study on alignments of matrices and construction of familial binding profiles, resulting in the STAMP tool, which is now used within JASPAR to cluster matrix models. Hierarchical clustering is performed on a selected set of matrices using the UPGMA algorithm with a Pearson Correlation Coefficient distance metric. Then the optimal number of clusters is selected using a log variant of the Calinski and Harabasz statistic [See Ref. ( 27 ) for details]. Finally, the clusters are partitioned and a familial binding profile is created for each cluster using iterative refinement (a multiple alignment method). An example is shown in Figure 1 .
Dynamic random profile generation
In many computational studies, it is helpful to have a set of ‘random’ matrices. This is particularly true for assessment of distances between putative sites and reference points as transcription start sites, and also for matrix-to-matrix comparisons. In these cases, it is desired that the randomized matrices should share properties with the true matrix set—for instance having the same nucleotide content and/or the same general information content.
Within any JASPAR sub-database, users can select a subset of matrices, which will then be used to generate random matrices using one of two methods: (i) Permutations: Columns of the selected matrices are shuffled: either constrained to shuffling of columns within each matrix or between all selected matrices. (ii) Probabilistic sampling: This enables the users to generate random Position Frequency Matrices from selected profiles. In our model, each random column is sampled from a posterior distribution—a 4D Dirichlet mixture distribution. The posterior distribution has two contributions: a multinomial with counts of columns selected as in (i), and a Dirichlet mixture prior trained from all observed nucleotides in the JASPAR database. We assume that column positions are independent.
DISCUSSION
We have presented a significant update to the JASPAR database, including an expansion of the core database, three new sub-databases and many new utilities. The new web service interface enables easy interaction with scientific workflows and an increasing number of programming languages that support this technology. We project that the new features, together with the open-access policy, will further consolidate the JASPAR database as a standard resource in the field of gene regulation bioinformatics.
Towards a comprehensive set of models for most known transcription factors
The lack of models for the binding specificity of most transcription factors is a significant bottleneck for comprehensive computational analysis of genomes. Only a fraction of transcription factors have been characterized in enough detail to allow the construction of adequate models of their binding specificity. This problem is being solved in two principally different ways. First, tiling array approaches for measuring binding preferences en masse are being developed ( 31 ); these technologies show great promise and are expected to make their mark on the field in the near future. Second, a wealth of cis -regulatory elements, characterized in painstaking detail, is hidden in experimental literature; many of these sites are not included in any database. There is a growing awareness of this problem in the field, resulting in online open-access databases such as ORegAnno ( 32 ) and PAZAR ( 33 ), where one of the goals is to house expert-curated binding sites. We are currently developing services to enable cross-talk with these databases to enable matrix models built on curated sites that exceed a certain quality threshold. JASPAR, ORegAnno and PAZAR face the same challenge: to build models or sites, it is necessary to mine the literature, which inevitably means that the curators will miss many important studies. The only long-term solution would be a requirement by scientific journals for researchers to deposit protein–DNA interactions in public databases prior to publication, much in the same way as mRNAs must be submitted to Genbank ( 34 ). Part of such a system will be to establish a minimal standard for reporting these interactions, much like the MIAME standard ( 35 ) for microarray data. As before, JASPAR team is always prepared to incorporate new matrices and matrix sets provided by external contributors.
Data availability
All the data in JASPAR are available without any restrictions, either from the web interface, as flat files or through the Web service interface.
ACKNOWLEDGEMENTS
Thanks to Katsuya Shigesada for pointing out errors in matrix MA0002, Shaun Mahony and Panayiotis V. Benos for generously sharing the STAMP code and general helpfulness and Vladimir B. Bajic for kindly providing the frequency matrices for JASPAR SPLICE. E.V., M.-H.E.T., T.M., O.W., A.K. and A.S. were supported by a grant from the Novo Nordisk foundation to the Bioinformatics Center. I.P. was supported by a grant from Carlsberg Foundation (21-00-0680). J.C.B. was supported by EMBRACE—an EU Sixth Framework Network of Excellence. B.L. was supported by the Functional Genomics Programme (FUGE) of the Research Council of Norway, and a core grant from the Sars Centre. Funding to pay the Open Access publication charges for this article was provided by a grant from the Novo Nordisk Foundation and the Functional Genomics Programme of the Research Council of Norway.
Conflict of interest statement . None declared.
REFERENCES
1
DNA binding sites: representation and discovery
,
Bioinformatics
,
2000
, vol.
16
(pg.
16
-
23
)
2
Applied bioinformatics for the identification of regulatory elements
,
Nat. Rev. Genet.
,
2004
, vol.
5
(pg.
276
-
287
)
3
JASPAR: an open-access database for eukaryotic transcription factor binding profiles
,
Nucleic Acids Res.
,
2004
, vol.
32
(pg.
D91
-
D94
)
4
Identification of conserved regulatory elements by comparative genome analysis
,
J. Biol.
,
2003
, vol.
2
pg.
13
5
et al.
Sockeye: a 3D environment for comparative genomics
,
Genome Res.
,
2004
, vol.
14
(pg.
956
-
962
)
6
Detection of functional DNA motifs via statistical over-representation
,
Nucleic Acids Res.
,
2004
, vol.
32
(pg.
1372
-
1381
)
7
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
3154
-
3164
)
8
MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes
,
BMC Bioinformatics
,
2005
, vol.
6
pg.
79
9
TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
W393
-
W396
)
10
The ENCODE Consortium
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
,
Nature
,
2007
, vol.
447
(pg.
799
-
816
)
11
et al.
The transcriptional landscape of the mammalian genome
,
Science
,
2005
, vol.
309
(pg.
1559
-
1563
)
12
Entrez Gene: gene-centered information at NCBI
,
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
D26
-
D31
)
13
New problems in RNA polymerase II transcription initiation: matching the diversity of core promoters with a variety of promoter recognition factors
,
J. Biol. Chem.
,
2007
, vol.
282
(pg.
14685
-
14689
)
14
Mammalian RNA polymerase II core promoters: insights from genome-wide studies
,
Nat. Rev. Genet.
,
2007
, vol.
8
(pg.
424
-
436
)
15
The RNA polymerase II core promoter
,
Annu. Rev. Biochem.
,
2003
, vol.
72
(pg.
449
-
479
)
16
et al.
A global genomic transcriptional code associated with CNS-expressed genes
,
Exp. Cell Res.
,
2006
, vol.
312
(pg.
3108
-
3119
)
17
et al.
In vivo enhancer analysis of human conserved non-coding sequences
,
Nature
,
2006
, vol.
444
(pg.
499
-
502
)
18
Predicting tissue-specific enhancers in the human genome
,
Genome Res.
,
2007
, vol.
17
(pg.
201
-
211
)
19
Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites
,
Proc. Natl Acad. Sci. USA
,
2007
, vol.
104
(pg.
7145
-
7150
)
20
Information for the coordinates of exons (ICE): a human splice sites database
,
Genomics
,
2004
, vol.
84
(pg.
762
-
766
)
21
Triana: a graphical web service composition and execution toolkit
,
2004
Proceedings of the IEEE International Conference on Web Services
(pg.
514
-
524
)
22
Taverna: a tool for building and running workflows of services
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
W729
-
W732
)
23
Web services at the European bioinformatics institute
,
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
W6
-
W11
)
24
EPD in its twentieth year: towards complete promoter coverage of selected model organisms
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D82
-
D85
)
25
Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics
,
J. Mol. Biol.
,
2004
, vol.
338
(pg.
207
-
215
)
26
Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae
,
J. Mol. Biol.
,
2000
, vol.
296
(pg.
1205
-
1214
)
27
DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies
,
PLoS Comput. Biol.
,
2007
, vol.
3
pg.
e61
28
STAMP: a web tool for exploring DNA-binding motif similarities
,
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
W253
-
W258
)
29
Searching databases of conserved sequence regions by aligning protein multiple-alignments
,
Nucleic Acids Res.
,
1996
, vol.
24
(pg.
3836
-
3845
)
30
Similarity of position frequency matrices for transcription factor binding sites
,
Bioinformatics
,
2005
, vol.
21
(pg.
307
-
313
)
31
Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities
,
Nat. Biotechnol.
,
2006
, vol.
24
(pg.
1429
-
1435
)
32
ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation
,
Bioinformatics
,
2006
, vol.
22
(pg.
637
-
640
)
33
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation
,
Genome Biol.
,
2007
, vol.
8
pg.
R10
34
GenBank
,
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
D21
-
D25
)
35
et al.
Minimum information about a microarray experiment (MIAME)-toward standards for microarray data
,
Nat. Genet.
,
2001
, vol.
29
(pg.
365
-
371
)
Author notes
The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
I agree to the terms and conditions. You must accept the terms and conditions.
Submit a comment
Name
Affiliations
Comment title
Comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.
Citations
Views
Altmetric
Metrics
Total Views 3,779
2,819 Pageviews
960 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 1 |
December 2016 | 2 |
January 2017 | 7 |
February 2017 | 29 |
March 2017 | 49 |
April 2017 | 19 |
May 2017 | 15 |
June 2017 | 15 |
July 2017 | 24 |
August 2017 | 15 |
September 2017 | 18 |
October 2017 | 21 |
November 2017 | 26 |
December 2017 | 46 |
January 2018 | 53 |
February 2018 | 65 |
March 2018 | 62 |
April 2018 | 67 |
May 2018 | 61 |
June 2018 | 43 |
July 2018 | 56 |
August 2018 | 29 |
September 2018 | 35 |
October 2018 | 48 |
November 2018 | 47 |
December 2018 | 33 |
January 2019 | 37 |
February 2019 | 23 |
March 2019 | 83 |
April 2019 | 75 |
May 2019 | 50 |
June 2019 | 44 |
July 2019 | 44 |
August 2019 | 63 |
September 2019 | 38 |
October 2019 | 43 |
November 2019 | 48 |
December 2019 | 30 |
January 2020 | 42 |
February 2020 | 48 |
March 2020 | 32 |
April 2020 | 24 |
May 2020 | 23 |
June 2020 | 32 |
July 2020 | 25 |
August 2020 | 35 |
September 2020 | 31 |
October 2020 | 27 |
November 2020 | 48 |
December 2020 | 44 |
January 2021 | 27 |
February 2021 | 40 |
March 2021 | 36 |
April 2021 | 32 |
May 2021 | 30 |
June 2021 | 35 |
July 2021 | 30 |
August 2021 | 37 |
September 2021 | 34 |
October 2021 | 49 |
November 2021 | 43 |
December 2021 | 39 |
January 2022 | 40 |
February 2022 | 40 |
March 2022 | 30 |
April 2022 | 42 |
May 2022 | 37 |
June 2022 | 26 |
July 2022 | 31 |
August 2022 | 39 |
September 2022 | 53 |
October 2022 | 54 |
November 2022 | 26 |
December 2022 | 45 |
January 2023 | 44 |
February 2023 | 22 |
March 2023 | 21 |
April 2023 | 70 |
May 2023 | 55 |
June 2023 | 29 |
July 2023 | 24 |
August 2023 | 29 |
September 2023 | 25 |
October 2023 | 45 |
November 2023 | 41 |
December 2023 | 55 |
January 2024 | 63 |
February 2024 | 47 |
March 2024 | 63 |
April 2024 | 73 |
May 2024 | 46 |
June 2024 | 50 |
July 2024 | 52 |
August 2024 | 47 |
September 2024 | 71 |
October 2024 | 37 |
Citations
533 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic