ORegAnno: an open-access community-driven resource for regulatory annotation (original) (raw)

Journal Article

1 Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada, 2 Wellcome Trust Sanger Institute, CB10 1SA Hinxton, UK, 3 VIB Department of Molecular and Developmental Genetics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium, 4 Department of Computational Biology, School of Medicine, 3501 Fifth Avenue, University of Pittsburgh, Pittsburgh, PA 15213, USA, 5 DEPSN, Institut Alfred Fessard, CNRS, 91198 Gif-sur-Yvette, France, 6 New York State Center of Excellence in Bioinformatics and the Life Sciences, Buffalo, NY 14203, 7 Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA, 8 VIB Department for Molecular Biomedical Research, Ghent University, 9052 Ghent, Belgium, 9 Bioinformatics and Genomics Program, Centre de Regulació Genòmica. Dr Aiguader 88, 08003 Barcelona, Catalonia, Spain, 10 Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada, 11 Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK and 12 Department of genetics and pathology, Uppsala University, SE-75185 Uppsala, Sweden

*To whom correspondence should be addressed. Tel: +1 604 707 5900 x. 5401 ; Fax:

+1 604 876 3561

; Email: obig@bcgsc.ca Correspondence may also be addressed to Stephen Montgomery. Tel: +44 1223 834244 (ext 7297); Fax: +44 1223 494919; Email: sm8@sanger.ac.uk ; Steven J.M. Jones. Tel: +1 604 877 6083; Fax: +1 604 876 3561; Email: sjones@bcgsc.ca

Search for other works by this author on:

† The complete list of The Open Regulatory Annotation Consortium members has been listed at the end of the article.

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Author Notes

Received:

15 September 2007

Revision received:

16 October 2007

Accepted:

17 October 2007

Published:

15 November 2007

Cite

Obi L. Griffith, Stephen B. Montgomery, Bridget Bernier, Bryan Chu, Katayoon Kasaian, Stein Aerts, Shaun Mahony, Monica C. Sleumer, Mikhail Bilenky, Maximilian Haeussler, Malachi Griffith, Steven M. Gallo, Belinda Giardine, Bart Hooghe, Peter Van Loo, Enrique Blanco, Amy Ticoll, Stuart Lithwick, Elodie Portales-Casamar, Ian J. Donaldson, Gordon Robertson, Claes Wadelius, Pieter De Bleser, Dominique Vlieghe, Marc S. Halfon, Wyeth Wasserman, Ross Hardison, Casey M. Bergman, Steven J.M. Jones, The Open Regulatory Annotation Consortium, ORegAnno: an open-access community-driven resource for regulatory annotation, Nucleic Acids Research, Volume 36, Issue suppl_1, 1 January 2008, Pages D107–D113, https://doi.org/10.1093/nar/gkm967
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the ‘publication queue’ allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or ‘check out’ papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org .

BACKGROUND

A consequence of the escalating pace of genomic sequencing has been the requirement for novel methodology and large-scale efforts to interpret and annotate sequence function. Initial efforts to achieve this were primarily focused on identifying protein-coding genes, RNA genes and repetitive DNA, since the rules governing their presence are generally tractable. However, less annotated, due to their small size and variability, gene regulatory sequences are widely regarded to be at least as important to our understanding of biological systems. To aid in their identification, computational techniques such as phylogenetic footprinting, transcription factor (TF)-binding matrices, and motif clustering have been developed ( 1–3 ). Unfortunately, the predictive ability of such methods has been difficult to assess without large, well-described and comprehensive collections of biologically validated regulatory sequences ( 3 ). Sets of cis -regulatory sequences have been annotated by curation from the primary literature and several databases have been developed to collect and disseminate these sets ( 4–11 ). However, these databases are often species- or process-specific, and do not provide sufficient details about the experiments or conditions under which function was demonstrated, and in some cases require payment for access. Data access is generally limited to web-based search pages without any option for the programmatic interaction essential to most bioinformatics studies. Finally, they are typically ‘closed systems’ in that they do not allow continued addition or annotation by the research community and as such are not maintainable over the long term without vast resources. We have developed the Open Regulatory Annotation database (ORegAnno) to overcome these challenges and support research in regulatory biology ( 12 ). ORegAnno provides standardized technologies for the long-term, community-driven, open-access curation of cis -regulatory data. Here we provide an update of developments on the ORegAnno database and progress in the field of open regulatory annotation.

OVERVIEW

ORegAnno ( http://www.oreganno.org ) is a database and literature curation system for community-based annotation of experimentally proven DNA regulatory regions, transcription factor binding sites (TFBS) and regulatory variants. A ‘publication queue’ allows papers of interest to be added to the system for future curation. Thus both regulatory papers and their regulatory sequences are managed in the system. ORegAnno is based on open-source technology and is comprised of a MySQL database with a Java-based web application that indexes new annotations using the Lucene search engine ( http://lucene.apache.org/ ) and provides programmatic access to the underlying data using Hibernate ( http://www.hibernate.org/ ) and SOAP Web Services. Figure 1 outlines the annotation process and information flow. Users in the gene regulation community can enter or ‘check out’ papers from the publication queue for detailed manual curation, using a series of annotation pages. A typical record entry consists of species, sequence type, sequence (plus sufficient flanking sequence for genome alignment), target gene, binding factor, experimental outcome and one or more detailed lines of experimental evidence demonstrating function of the sequence. Records are cross-referenced to Ensembl or Entrez Gene identifiers, PubMed and dbSNP (for regulatory polymorphisms). Before committing a record to the database, ORegAnno performs a number of error checks (e.g. that the sequence has not been entered previously) and asks the user to verify its contents. A BLAST-based mapping agent then assigns genome coordinates to each sequence, allowing it to be viewed as a track in the Ensembl or UCSC genome browsers. Once finished with a paper, a user will then ‘close’ it in the queue and assign an annotation result (success, neutral or failure). Existing records can be updated, commented and scored (positive if verified as correct; negative if a problem is identified) by any registered user or deprecated and replaced by a ‘Validator’ user. The complete database or any subset can be searched or downloaded in a number of formats or accessed programmatically.

Figure 1.

Information flow for ORegAnno annotation process. ( A ) Data input. A publication queue allows papers from scientific literature to be added to the system for future curation. Users in the gene regulation community can enter or ‘check out’ papers from the queue for detailed manual curation using a series of user-friendly annotation pages. It is also possible to ‘batch upload’ complete datasets (e.g. external databases) using the ORegAnno XML data exchange format. ( B ) Data storage and processing. All functionality of the ORegAnno web application depends on storage and retrieval of data from an underlying MySQL relational database. Records are cross-referenced to PubMed, Entrez, Ensembl, dbSNP and eVOC where appropriate. A BLAST-based mapping agent assigns genome coordinates to each sequence. ( C ) Visualization. All mapped ORegAnno records can be viewed as custom tracks in the Ensembl or UCSC genome browsers. Most records are also available as official tracks in UCSC. ( D ) Data access. The web application provides an advanced search page for the entire record set. Each record page represents a complete summary of the data for a verified regulatory sequence. Nightly data dumps are posted in XML format. Programmatic interaction with ORegAnno is available through web services using the Perl SOAP modules.

RECENT DEVELOPMENTS

New entries

Since ORegAnno was first released, the collection has grown by ∼10-fold from 2691 to 30 145 records. This total includes 15 738 regulatory regions, 14 229 TFBSs and 178 regulatory variants (polymorphisms and haplotypes) from 19 species ( Table 1 ). A total of 29 433 records have been mapped to one of 14 species representing a mapping success rate of ∼98%. New additions were incorporated from external datasets including a large set of human promoters ( 13 ), the REDfly resource ( 9 ), HBB and Erythroid modules ( 14 , 15 ), the Vista Enhancer dataset ( 11 ), ChIP–chip sites for CTCF( 16 ) and multiple yeast TFs( 17 , 18 ) and ChIP-Seq sites for STAT1 ( 19 ) and REST ( 20 ). Apart from the 11 external datasets currently in ORegAnno, extensive manual curation of the literature has produced an additional 1293 original sequence records. A large number of annotations were entered during the RegCreative Jamboree ( http://www.dmbr.ugent.be/bioit/contents/regcreative/ ) at which 130 scientific articles were examined in depth with 96 papers meeting the criteria for annotation and resulting in 501 new regulatory sequence records. In total, 922 publications have been curated by 45 contributing users (from >300 registered users). The complete set of records contain regulatory sequences for over 3853 genes and 465 TFs, describe 41 856 experimental sources of evidence referencing 31 different cell types and are further annotated by 49 807 user-comments. The majority of records (98.9%) had positive experimental outcomes (i.e. the experiments demonstrated the sequence to be functional) but a small set of negative or neutral results have also been catalogued.

Table 1.

Current content of ORegAnno database

Species	Regulatory haplotype	Regulatory polymorphism	Regulatory region	Transcription factor binding site	Totals
Bos taurus	1	1
Caenorhabditis briggsae	21	21
Caenorhabditis elegans	13	194	207
Ciona intestinalis	7	17	24
Ciona savignyi	1	1	2
Cricetinae	3	3
Danio rerio	2	2	4
Drosophila melanogaster	680	1415	2095
Gallus gallus	8	29	37
Halocynthia roretzi	6	6
Homo sapiens	6	171	14 948	7834	22 959
HIV 1	2	2
Mus musculus	1	55	215	271
Oryctolagus cuniculus	1	1
Rattus norvegicus	15	99	114
Saccharomyces cerevisiae	1	4392	4393
Takifugu rubripes	2	2
Xenopus laevis	1	1	2
Xenopus tropicalis	1	1
Totals (19 species)	7	171	15 738	14 229	30 145

Species	Regulatory haplotype	Regulatory polymorphism	Regulatory region	Transcription factor binding site	Totals
Bos taurus	1	1
Caenorhabditis briggsae	21	21
Caenorhabditis elegans	13	194	207
Ciona intestinalis	7	17	24
Ciona savignyi	1	1	2
Cricetinae	3	3
Danio rerio	2	2	4
Drosophila melanogaster	680	1415	2095
Gallus gallus	8	29	37
Halocynthia roretzi	6	6
Homo sapiens	6	171	14 948	7834	22 959
HIV 1	2	2
Mus musculus	1	55	215	271
Oryctolagus cuniculus	1	1
Rattus norvegicus	15	99	114
Saccharomyces cerevisiae	1	4392	4393
Takifugu rubripes	2	2
Xenopus laevis	1	1	2
Xenopus tropicalis	1	1
Totals (19 species)	7	171	15 738	14 229	30 145

Table 1.

Current content of ORegAnno database

Species	Regulatory haplotype	Regulatory polymorphism	Regulatory region	Transcription factor binding site	Totals
Bos taurus	1	1
Caenorhabditis briggsae	21	21
Caenorhabditis elegans	13	194	207
Ciona intestinalis	7	17	24
Ciona savignyi	1	1	2
Cricetinae	3	3
Danio rerio	2	2	4
Drosophila melanogaster	680	1415	2095
Gallus gallus	8	29	37
Halocynthia roretzi	6	6
Homo sapiens	6	171	14 948	7834	22 959
HIV 1	2	2
Mus musculus	1	55	215	271
Oryctolagus cuniculus	1	1
Rattus norvegicus	15	99	114
Saccharomyces cerevisiae	1	4392	4393
Takifugu rubripes	2	2
Xenopus laevis	1	1	2
Xenopus tropicalis	1	1
Totals (19 species)	7	171	15 738	14 229	30 145

Species	Regulatory haplotype	Regulatory polymorphism	Regulatory region	Transcription factor binding site	Totals
Bos taurus	1	1
Caenorhabditis briggsae	21	21
Caenorhabditis elegans	13	194	207
Ciona intestinalis	7	17	24
Ciona savignyi	1	1	2
Cricetinae	3	3
Danio rerio	2	2	4
Drosophila melanogaster	680	1415	2095
Gallus gallus	8	29	37
Halocynthia roretzi	6	6
Homo sapiens	6	171	14 948	7834	22 959
HIV 1	2	2
Mus musculus	1	55	215	271
Oryctolagus cuniculus	1	1
Rattus norvegicus	15	99	114
Saccharomyces cerevisiae	1	4392	4393
Takifugu rubripes	2	2
Xenopus laevis	1	1	2
Xenopus tropicalis	1	1
Totals (19 species)	7	171	15 738	14 229	30 145

Recent applications

The ORegAnno resource has proven useful for the development of both computational and experimental methods for the identification of novel TFBSs and regulatory polymorphisms. One such approach, called cisRED ( http://www.cisred.org ), uses multiple motif discovery methods applied to sequence sets that include up to 42 orthologous sequence regions from vertebrates ( 21 ). The collection of known binding sites in ORegAnno has proved an invaluable resource for the parameter optimization and estimates of accuracy for this resource. In another study, the set of known regulatory SNPs (rSNPs) in ORegAnno was used to investigate and prioritize various properties that may be important for identifying novel regulatory polymorphisms ( 22 ). The discriminatory potential of 23 properties related to gene regulation and population genetics was assessed by comparing these known rSNPs to a set of SNPs of unknown function (ufSNPs). A support vector machine classifier using these properties was able to discriminate rSNPs from ufSNPs with a sensitivity and specificity of 82% and 71%, respectively ( 22 ). Finally, ORegAnno has also served a critical role in the development of new experimental approaches such as ChIP-Seq. ChIP-Seq is similar to the well-described ChIP–chip method ( 23 ) except that DNA fragments isolated from the protein–DNA complex are identified by DNA sequencing instead of hybridization to a tiling microarray. The approach was first demonstrated for the STAT1 TF in interferon-γ-stimulated HeLa S3 cells ( 19 ). A set of 41 experimentally verified sites representing 34 genomic loci for STAT1 binding were first collected from the literature and entered into ORegAnno (Oreganno dataset: OREGDS00006). Stimulated ChIP-Seq peaks were found to overlap 24 of 34 of these loci, suggesting a sensitivity of ∼71%. For the ORegAnno STAT1 sites shown to be functional in HeLa cells specifically, sensitivity was 100%. The collection of known STAT1 sites and binding matrices derived from them also allowed a set of high-confidence novel STAT1-binding sites to be determined and entered into ORegAnno as their own dataset (OREGDS00007). This iterative process, whereby existing data drives the creation of new data, demonstrates the utility and flexibility of the ORegAnno system.

Publication queue

An important new feature of ORegAnno called the ‘publication queue’ was created as a literature management system to allow registered users to input relevant papers from the scientific literature as targets for annotation. All that is required to enter a publication is a valid PubMed identifier. Optionally, a TF can be specified, allowing users to later search the queue for papers related to TFs of interest. Normally, publications are added to the queue with an entry type of ‘expert entry’, indicating that a human expert reviewed the paper and found it to be relevant. However, it is also possible to enter ‘text-mining entry’ papers (see below). A publication enters the queue with an initial state of ‘pending’. Any user can then ‘open’ the publication and begin the annotation process. Once annotated, the paper is either ‘closed’ or reset to ‘pending’ if annotation work remains. Free-form comment fields are optional for each change of state. However, when a publication is closed, one of several standardized closure comments must be chosen (success – addition of new records, failure – did not describe regulatory element, etc.). These allow the overall success rate and failure causes to be tracked. The queue can be queried on a number of fields including user, PubMed id, title, abstract, author, publication date and journal. Search results can be optionally filtered by state (pending, open or closed), TF, entry type (expert or text mining) or text-mining score. Each queue record contains a history of all state changes and comments as well as links to the publication's PubMed abstract. The current set of ‘expert entry’ papers in the queue was obtained from existing sources of curated publications including the Drosophila DNase I Footprint Database ( 8 ), REDfly ( 9 ), a catalog of regulatory elements for muscle-specific regulation of transcription ( 24 , 25 ), ABS ( 4 ), TRED ( 7 ), ooTFD ( 26 ), DBTGR ( 10 ) or added manually by individual ORegAnno users from literature searches and review articles. The expert entry queue currently contains 4438 gene regulation papers of which 3478 are open or pending and 960 are closed.

Development of text-mining strategies and the ‘text-mining queue’

The publication queue represents an unprecedented resource for researchers interested in developing text-mining approaches to identify papers involved in gene regulation and/or extract regulatory data from these papers. We used both the ‘success’ and the ‘failure’ papers from the ‘expert-entry’ queue to validate and compare different vector space models ( 27 ) for cis -regulatory document retrieval (Aerts and coworkers, manuscript in preparation). The model with the best performance in terms of sensitivity and specificity was chosen to rank the entire corpus of PubMed abstracts. By manually curating uniformly distributed samples from the top 100 000 scoring abstracts, a cut-off was set at ∼58 000 so that the positive predictive value (PPV) of top-scoring abstracts reached 50%, a success rate similar to that achieved during the RegCreative Jamboree (54%), and judged satisfactory by the Jamboree participants. These 58 000 papers, containing an estimated 29 000 papers that will result in regulatory annotations, have been added to the ORegAnno queue (54 351 new additions after removing duplications). We estimate that this large cis -regulatory text corpus will require around 15–30 person-years to be fully curated. Therefore, the Open Regulatory Annotation Consortium is actively pursuing research in text-mining techniques to identify the actual cis -regulatory sequences, the species and the target gene automatically from the full text papers. In a pilot study, sequences were extracted from full-text articles for papers in the ORegAnno expert-based queue and the top 4501 papers from the text-mining-based queue. When comparing the automatically extracted data with the collection of manual ORegAnno annotations, this study achieved a reasonably high PPV (62%) at the sequence level, showing that automatic draft annotation of cis -regulatory elements is indeed feasible by text-mining (Aerts and coworkers, manuscript in preparation). Such draft annotations should help accelerate the manual curation and can also serve directly as benchmark data to validate cis -element prediction algorithms.

Ontologies in ORegAnno

The ORegAnno evidence ontology is a simple ontology of evidence classes, types and subtypes for describing experiments that demonstrate the identity and/or function of regulatory sequences and their factors. These lines of evidence capture critical details from primary experiments and allow end users to filter the ORegAnno sequence set, based on their own criteria for experimental credibility. The ontology has been considerably extended since last published, and currently consists of six classes (e.g. Transcription regulator site), 14 evidence types (e.g. Reporter gene assay) and 72 evidence subtypes (e.g. Transient transfection luciferase assay). This ontology has been adopted by the PAZAR resource ( 28 ) and is being developed in collaboration with that group using Protégé ( http://protege.stanford.edu/ ). The complete evidence ontology can be obtained in XML format ( http://www.oreganno.org/oregano/evidence.xml ) or as a Protégé project file ( http://www.pazar.info/ontologies/newevidence.pprj ). Within each line of evidence, a user can also specify the cell type in which experiments were conducted using the eVOC cell type ontology ( 29 ). We are working to incorporate additional Ontologies such as the BRENDA Tissue Ontology, and improvements to the Sequence Ontology are currently being developed for the cis -regulatory domain.

Other improvements

The ORegAnno website has been updated to use Ajax technology, improving the ease of annotation. Ajax improves a web page's usability by exchanging small amounts of data with the server behind the scenes, so that the entire web page does not have to be reloaded each time the user requests a change ( http://www.xul.fr/en-xml-ajax.html) . A detailed case study has been added to the help pages to guide users through the entire process of annotating a paper. Annotation pages have been improved so that individual ‘help bubbles’ are available next to each field. Additional web services methods have been created to allow programmatic access to the publication queue and genome mappings.

DATA ACCESS

The website ( http://www.oreganno.org ) provides access to an advanced search page for the entire record set, the publication queue, simple tools for scanning or extracting sequences, database dumps and extensive help documentation. Each record page represents a complete summary of the data for a verified regulatory sequence along with links to external sources such as UCSC, Ensembl and PubMed. All data are freely available in a number of formats without any user registration. Users are required to register and login only if they wish to add records, comments or scores. Nightly data dumps of the database are posted in XML format on the website. Human (hg18) and fly (dm3) records are available through the UCSC genome browser ( http://genome.ucsc.edu/ ) as a standard track under the ‘Expression and Regulation’ tab. Mouse (mm8), worm (ce4) and rat (rn4) are available through the UCSC ‘genome-test’ browser ( http://genome-test.cse.ucsc.edu/ ). The ORegAnno dataset is also in the process of being incorporated into the PAZAR database (760 records to date). Programmatic interaction with ORegAnno is available through web services using the Perl SOAP modules (see ‘Dump’ page for examples). Requests for the entire database (e.g. a MySQL dump) or other formats can be addressed to the authors. ORegAnno records are typically mapped to only the most current genome build for each species as provided by UCSC (e.g. hg18 for human). However, mapping can easily be performed for any other genome build upon request. A mailing list exists for updates and user assistance ( oreganno@bcgsc.ca ). The ORegAnno web application is available open-source under the Lesser GNU Public License at https://oreganno.dev.java.net/ .

ACKNOWLEDGEMENTS

We thank the Open Regulatory Annotation Consortium for their continuing efforts to improve this resource through manual curation and record validation. We also thank the owners of regulatory sequence databases that made their data available for inclusion in ORegAnno. This work was funded by British Columbia Cancer Foundation; Genome Canada; Genome British Columbia; European Network of Excellence (ENFIN); BioSapiens Network of Excellence; Research Foundation – Flanders (FWO); The Pleiades Promoter Project; Michael Smith Foundation for Health Research to O.L.G., M.C.S., M.G. and S.J.M.J.; Canadian Institutes of Health Research to O.L.G.; European Molecular Biology Laboratory to S.B.M.; Marie Curie Early Stage Research Training Fellowship (MEST-CT-2004-504854) to M.H.; Natural Sciences and Engineering Research Council to S.B.M., and M.G.; Research Foundation – Flanders (FWO) to P.V.L.; Swedish Research Council to C.W. Funding to pay the Open Access publication charges for this article was provided by Genome Canada and Genome British Columbia.

Conflict of interest statement . None declared.

REFERENCES

Applied bioinformatics for the identification of regulatory elements

Nat. Rev. Genet.

2004

, vol.

(pg.

276

287

)

Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques

Genome Res.

2006

, vol.

(pg.

1455

1464

)

et al.

Assessing computational tools for the discovery of transcription factor binding sites

Nat. Biotechnol.

2005

, vol.

(pg.

137

144

)

ABS: a database of Annotated regulatory Binding Sites from orthologous promoters

Nucleic Acids Res.

2006

, vol.

(pg.

D63

D67

)

A new generation of JASPAR, the open-access repository for transcription factor binding site profiles

Nucleic Acids Res.

2006

, vol.

(pg.

D95

D97

)

et al.

TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes

Nucleic Acids Res.

2006

, vol.

(pg.

D108

D110

)

TRED: a transcriptional regulatory element database, new entries and other development

Nucleic Acids Res.

2007

, vol.

(pg.

D137

D140

)

Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster

Bioinformatics

2005

, vol.

(pg.

1747

1749

)

REDfly: a regulatory element database for Drosophila

Bioinformatics

2006

, vol.

(pg.

381

383

)

DBTGR: a database of tunicate promoters and their regulatory elements

Nucleic Acids Res.

2006

, vol.

(pg.

D552

D555

)

VISTA Enhancer Browser–a database of tissue-specific human enhancers

Nucleic Acids Res.

2007

, vol.

(pg.

D88

D92

)

ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation

Bioinformatics

2006

, vol.

(pg.

637

640

)

Identification and functional analysis of human transcriptional promoters

Genome Res.

2003

, vol.

(pg.

308

312

)

Evaluation of regulatory potential and conservation scores for detecting cis -regulatory modules in aligned mammalian genome sequences

Genome Res.

2005

, vol.

(pg.

1051

1060

)

et al.

Experimental validation of predicted mammalian erythroid cis-regulatory modules

Genome Res.

2006

, vol.

(pg.

1480

1492

)

Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome

Cell

2007

, vol.

128

(pg.

1231

1245

)

et al.

Transcriptional regulatory code of a eukaryotic genome

Nature

2004

, vol.

431

(pg.

104

)

An improved map of conserved regulatory sites for Saccharomyces cerevisiae

BMC Bioinformatics

2006

, vol.

pg.

113

et al.

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing

Nat. Methods

2007

, vol.

(pg.

651

657

)

Genome-wide mapping of in vivo protein-DNA interactions

Science

2007

, vol.

316

(pg.

1497

1502

)

et al.

cisRED: a database system for genome-scale computational discovery of regulatory elements

Nucleic Acids Res.

2006

, vol.

(pg.

D68

D73

)

A survey of genomic properties for the detection of regulatory polymorphisms

PLoS Comput. Biol.

2007

, vol.

pg.

e106

et al.

Genome-wide location and function of DNA binding proteins

Science

2000

, vol.

290

(pg.

2306

2309

)

Identification of regulatory regions which confer muscle-specific gene expression

J. Mol. Biol.

1998

, vol.

278

(pg.

167

181

)

oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes

Nucleic Acids Res.

2005

, vol.

(pg.

3154

3164

)

Object-oriented transcription factors database (ooTFD)

Nucleic Acids Res.

2000

, vol.

(pg.

308

310

)

TXTGate: profiling gene groups with text-based information

Genome Biol.

2004

, vol.

pg.

R43

PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation

Genome Biol.

2007

, vol.

pg.

R207

et al.

eVOC: a controlled vocabulary for unifying gene expression data

Genome Res.

2003

, vol.

(pg.

1222

1230

)

THE OPEN REGULATORY ANNOTATION CONSORTIUM MEMBERS

Amy Ticoll, Andy Schroeder, Arun Ramani, Bart Hooghe, Belinda Giardine, Boris Adryan, Bridget Bernier, Casey Bergman, Claes Wadelius, Daniel Sobral, Debra Fulton, Denis Thieffry, Dominique Vlieghe, Elodie Portales-Casamar, Enrique Blanco, Erin D. Pleasance, Florian Leitner, Gordon Robertson, Hedi Peterson, Helge Roider, Ian J. Donaldson, Ildefonso Cases, Jean Imbert, Jean-Valery Turatsinze, Jonathan Mudge, Katayoon Kasaian, Maggie Zhang, Malachi Griffith, Marc Halfon, Maximilian Haeussler, Misha Bilenky, Monica Sleumer, Nathalie Theret, Nikiforos Karamanis, Obi Griffith, Paco Hulpiau, Peter Van Loo, Pieter De Bleser, Priit Adler, Ross Hardison, Shaun Mahony, Stein Aerts, Stephen Montgomery, Steven J.M. Jones, Steven M. Gallo, Wyeth Wasserman, Yves Moreau.

Author notes

† The complete list of The Open Regulatory Annotation Consortium members has been listed at the end of the article.

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 2,545

1,942 Pageviews

603 PDF Downloads

Since 12/1/2016

Month:	Total Views:
December 2016	1
January 2017	5
February 2017	25
March 2017	18
April 2017	7
May 2017	13
June 2017	11
July 2017	9
August 2017	10
September 2017	10
October 2017	13
November 2017	17
December 2017	34
January 2018	41
February 2018	37
March 2018	38
April 2018	42
May 2018	32
June 2018	31
July 2018	31
August 2018	23
September 2018	37
October 2018	29
November 2018	33
December 2018	27
January 2019	16
February 2019	23
March 2019	25
April 2019	44
May 2019	39
June 2019	17
July 2019	52
August 2019	37
September 2019	54
October 2019	30
November 2019	49
December 2019	28
January 2020	60
February 2020	28
March 2020	24
April 2020	9
May 2020	14
June 2020	21
July 2020	31
August 2020	26
September 2020	39
October 2020	24
November 2020	35
December 2020	8
January 2021	14
February 2021	22
March 2021	27
April 2021	29
May 2021	27
June 2021	24
July 2021	18
August 2021	15
September 2021	28
October 2021	26
November 2021	23
December 2021	14
January 2022	29
February 2022	26
March 2022	25
April 2022	34
May 2022	23
June 2022	33
July 2022	27
August 2022	38
September 2022	56
October 2022	38
November 2022	32
December 2022	26
January 2023	25
February 2023	17
March 2023	23
April 2023	38
May 2023	10
June 2023	11
July 2023	18
August 2023	25
September 2023	19
October 2023	18
November 2023	17
December 2023	28
January 2024	51
February 2024	20
March 2024	32
April 2024	20
May 2024	23
June 2024	11
July 2024	25
August 2024	64
September 2024	34
October 2024	33
November 2024	22

Citations

185 Web of Science

ORegAnno: an open-access community-driven resource for regulatory annotation (original) (raw)

Cite

Abstract

BACKGROUND

OVERVIEW

RECENT DEVELOPMENTS

New entries

Recent applications

Publication queue

Development of text-mining strategies and the ‘text-mining queue’

Ontologies in ORegAnno

Other improvements

DATA ACCESS

ACKNOWLEDGEMENTS

REFERENCES

THE OPEN REGULATORY ANNOTATION CONSORTIUM MEMBERS

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Cited

ORegAnno: an open-access community-driven resource for regulatory annotation (original) (raw)

Cite

Abstract

BACKGROUND

OVERVIEW

RECENT DEVELOPMENTS

New entries

Recent applications

Publication queue

Development of text-mining strategies and the ‘text-mining queue’

Ontologies in ORegAnno

Other improvements

DATA ACCESS

ACKNOWLEDGEMENTS

REFERENCES

THE OPEN REGULATORY ANNOTATION CONSORTIUM MEMBERS

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited