CircaDB: a database of mammalian circadian gene expression profiles (original) (raw)

Abstract

CircaDB (http://circadb.org) is a new database of circadian transcriptional profiles from time course expression experiments from mice and humans. Each transcript’s expression was evaluated by three separate algorithms, JTK_Cycle, Lomb Scargle and DeLichtenberg. Users can query the gene annotations using simple and powerful full text search terms, restrict results to specific data sets and provide probability thresholds for each algorithm. Visualizations of the data are intuitive charts that convey profile information more effectively than a table of probabilities. The CircaDB web application is open source and available at http://github.com/itmat/circadb.

INTRODUCTION

Circadian rhythms are biological rhythms of ∼24 h in many physiological and behavioral processes (1,2). These rhythms are generated by a cell autonomous circadian clock, present in most cells in mammals. This circadian clock is composed of interlocked transcriptional, translational feedback loops, where transactivators activate repressors that later feedback on the activators (3). Components of the required E-box loop include Bmal1, Bmal2, Clock and Npas2, bHLH-PAS transactivators, Per1, Per2 and Per3, PAS domain containing repressors and Cry1 and Cry2 (4), transcriptional repressors related to cryptochromes from plants and insects. An important secondary loop also exists, the ROR loop, which comprises Rev-erb-alpha, Rev-erb-beta, transcriptional repressors, as well as Rorα, Rorb and Rorγ, transcriptional activators (5–7). Factors in this loop regulate transcript levels of several of the E-box components including Bmal1, Cry1, Npas2 and Per2. The cAMP Responsive Element Binding Protein (CREB) pathway (8,9) and D-box binding factors, Dbp, Hlf, Tef, Nfil3, also regulate clock function (10,11). Thus, transcription factors play a major role in the functioning of the core clock.

In addition to regulating transcription of each other, clock factors also impart circadian rhythms in expression of many ‘output’ genes. First order clock control genes are those directly regulated by clock factors (e.g. Clock/Bmal1), while second order output genes could be regulated by a first-order clock-control gene, but not clock components (12–14). Because of this, the research community has spent more than a decade cataloging genes under clock control (12,13,15–17). Historically, these include many disease genes, drug targets and important components of various biological pathways (1,18–20). For example, HMG-CoA reductase, the rate limiting enzyme of cholesterol biosynthesis and target of statins, is under clock control in liver (21). Several factors have catalysed a more complete description of circadian rhythms, including the advent of DNA arrays (16) and now RNA sequencing (22), powerful statistical approaches to find rhythmic genes (23) and appropriate experimental design.

The goal of CircaDB is to systematically collect, analyse and visualize circadian expression profiles for bench researchers in a simple and straightforward fashion. Common queries are supported and include straightforward queries of expression profiles, as well as compound queries searching keywords in the gene annotation, in multiple tissues, with the ability to restrict results by probability of cycling.

MATERIALS AND METHODS

Various publicly available microarray time course studies (23–26) were collected (Table 1). References and links to download the expression data sets are outlined on the website. Data from each study were re-analysed using three circadian rhythm detection algorithms: JTK_CYCLE, Lombe Scargle, de Lichtenberg (23,27,28). Table 2 lists the runtime parameters of the algorithms on each data set. The reported expression values from each study were not filtered, as each algorithm accounts for technical replicates. The significance calls and other results reported by each algorithm were entered into a MySQL database.

Table 1.

Expresssion data sets in CircaDB

Name Time points Species/tissue
Panda 2002 12 Mouse suprachiasmatic nuclei (SCN) of the hypothalamus, and liver
Hughes 2009 48 Mouse liver, NIH3T3 cells, pituitary gland and human U2OS cells
Miller 2007 and Andrews 2010 12 (WT) Wild type mouse liver, SCN and skeletal muscle
7 (KO) Clock mutant mouse liver, SCN and skeletal muscle
Rudic 2004 12 Mouse aorta, kidney

Table 2.

Runtime parameters for each data set and algorithm

Data set JTK_CYCLE Lomb Scargle De Lichtenberg
Panda 2002 Periods: 16–32 h minFrequency = 1/32, maxFrequncy = 1/18; (periods = 18–32 h; #test frequencies: 4*N Period = 24 h
#Permutations = 10 000
Hughes 2009 (mouse) Periods: 6–42 h minFrequency = 1/6, maxFrequncy = 1/42; (periods = 6–42 h; #test frequencies: 4*N Period = 24 h
#Permutations = 10 000
Hughes 2009 (human) Periods: 6–42 h minFrequency = 1/6, maxFrequncy = 1/42; (periods = 6–42 h; #test frequencies: 4*N Period = 24 h
#Permutations = 10 000
Miller 2007 Periods: 16–32 h minFrequency = 1/32, maxFrequncy = 1/18; (periods = 18–32 h; #test frequencies: 4*N Period = 24 h
#Permutations = 10 000
Andrews 2010 Periods: 20–28 h minFrequency = 1/6, maxFrequncy = 1/42; (periods = 6–42 h; #test frequencies: 4*N Period = 24 h
#Permutations = 10 000
Rudic 2004 Periods: 16–32 h minFrequency = 1/32, maxFrequncy = 1/18; (periods = 18–32 h; #test frequencies: 4*N Period = 24 h
#Permutations = 10 000

Gene annotation data were downloaded from the Affymetrix NetAffx resource (http://www.affymetrix.com/analysis/index.affx). Annotations were then entered into the database alongside the unfiltered experimental values and the results of the circadian rhythm detection algorithms. Transcript information was supplemented with links to the GeneWiki project (29,30) and Homologene (http://www.ncbi.nlm.nih.gov/homologene). The data model for the database is described in Figure 1.

Figure 1.

Figure 1.

The database schema. Boxes represent table, and edges represent foreign key relationships. Further documentation is available at http://github.com/itmat/circadb.

The transcript annotation and the statistical results were indexed with the Sphinx full text search system (http://sphinxsearch.com/). Visualization of data is accomplished by created using pre-formatted URI requests to the Google Charts API (https://developers.google.com/chart/). The web application was coded using the Ruby on Rails framework (http://rubyonrails.org/).

All source code for data loading and the web application is licensed under the GNU General Public License (GPL-2.0) license and available at http://github.com/itmat/circadb.

RESULTS AND DISCUSSION

In creating CircaDB, we have provided the research community a clear, concise and powerful interface for querying genes within the context of circadian expression profile data. Another circadian expression database, Diurnal 2.0 (31), provides a similar resource to CircaDB but focuses on plant data. It also restricts its initial search to transcript accessions, whereas CircaDB allows full query capabilities on gene annotation. CircaDB provides advanced keyword search capabilities of gene annotation. This includes the ability to search by phrases, boolean conditions and combinations thereof. Queries can also be restricted by a given experiment’s data set, phase of expression and significance of a particular algorithm (Figure 2).

Figure 2.

Figure 2.

(a) The query interface for CircaDB. The interface consists of a simple and powerful full-text search capability, with possible restrictions on the data sets, phase information and a significance threshold for a given algorithm. (b) The set of available threshold categories for the circadian classification algorithms.

The Database of Circadian Gene Expression (24), part of the Gene Atlas Project (32), contains a subset of the same data sets in CircaDB, but uses a single circadian expression algorithm. CircaDB contains all of these data and re-analysed them with newer and more robust set of algorithms (23,27,28). Three algorithms were used to allow for the inspection of the differences between each algorithm’s results (Figure 3). CircaDB is actively maintained and will continue to add new features and data sets as time they become available. Requests for integration of data sets are handled via submitting a request via the project site at Github. CiraDB also provides integration expression profiles for use within BioGPS (33).

Figure 3.

Figure 3.

Expression profile report. A simple visualization of the data accompanies the main annotation of the gene probe, probability values from various circadian rhythm detection algorithms and other circadian information.

Finally, to facilitate use of this database framework by other researcher groups, we have made the source code for the application freely available under the GPL 2.0 open source license. The project has been recently used to visualize circadian experiments for Anopheles gambiae (34). All of these together make CircaDB a unique and valuable resource for the circadian research community.

FUNDING

The National Institutes of Health, the National Center for Advancing Translational Sciences [8UL1TR000003] (to Garret FitzGerald, University of Pennsylvania); National Heart, Lung, and Blood Institute [1R01HL097800-04 to J.B.H.]; the Defense Advanced Research Projects Agency [BAA-11-65] (to John Harer, Duke University). Funding for open access charge: Departmental Funds.

Conflict of interest statement. None declared.

REFERENCES