Enrichr: a comprehensive gene set enrichment analysis web server 2016 update (original) (raw)

Abstract

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

INTRODUCTION

The Gene Ontology (GO), which was first published in the year 2000 (1), introduced the concept of associating a collection of genes with a functional biological term in a systematic way. GO was needed because methods such as cDNA microarrays that measure mRNA expression at a global genome-wide scale produce lists of differentially expressed genes that are difficult to interpret. The creation of GO enabled the analysis of gene lists in the context of prior knowledge. Early tools such as FatiGO (2), BiNGO (3) and TermFinder (4) first realized this concept. Initially, most enrichment analyses of sets of differentially expressed genes, integrated with prior knowledge, were limited to either GO terms, or gene sets were projected onto known protein–protein interaction networks and signaling pathways. These include, for example, membership of genes in pathway databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) (5). Later on, other types of annotated gene sets for enrichment analysis emerged; for example, chromosome location of genes, computationally predicted targets of microRNAs and transcription factors, and gene modules identified computationally from large collections of gene expression data (6). Subsequently, improved enrichment analysis algorithms (7,8) and enrichment analysis tools (9–13) emerged. Here we present a major update to the enrichment analysis tool Enrichr, which was first published in 2013. Since its initial publication, we added many new features and data sets to Enrichr. The new gene set libraries that were added include differentially expressed genes after drug, gene, disease and pathogen perturbations extracted from the national center for biotechnology information (NCBI) gene expression omnibus (GEO) through a crowdsourcing project. Furthermore, we have implemented the ability to submit fuzzy sets, upload BED files, a calendar that shows the number of lists submitted each day, an improved application programming interface (API), an enhanced help documentation, an improved Find a Gene feature, and visualization of the results as clustergrams. In this manuscript, we also provide updated benchmarking results of the different scoring schemes implemented in Enrichr and visualize the overlap between the data sets currently within Enrichr compared with other comparable web-server tools and resources that serve gene set libraries.

ENHANCEMENTS AND UPDATES

New gene set libraries

Since the original publication of Enrichr in 2013 (14), we have systematically added new gene set libraries (Table 1). We created gene set libraries from HumanCyc (15), a metabolic pathway resource stored in BioPAX format (16); gene and small-molecule perturbations from the LINCS L1000 data set; NCI-Nature pathways (17); protein complexes from the NURSA project (18); pathways from the PANTHER resource (19); targets of phosphatases from DEPOD (20); human phenotypes from the Human Phenotype Ontology (HPO) (21); genes associated with grants using NIH RePORTER and GeneRIF (22); transcription factor targets computed from the ChIP-seq data from the ENCODE project (23); differentially expressed genes from the Allen Brain Atlas (24); tissue expression extracted from the Genotype-Tissue Expression (GTEx) project (25); protein expression in tissues and cell types from ProteomicsDB (26) and the Human Proteome Map (HPM) (27); genes associated with cell survival from the Achilles Project (28); and more. More details about constructing these new libraries are available as supporting online materials. These libraries are open source, freely available for download from the libraries page of Enrichr. In the updated version of Enrichr, we added a new category of gene set libraries called ‘Crowd’. These libraries were created by an independent crowdsourcing project where participants extracted gene expression signatures for six specific themes as described below.

Details of the new gene set libraries added to Enrichr since its original publication

Table 1.

Details of the new gene set libraries added to Enrichr since its original publication

PMID stands for PubMed identifiers.

Table 1.

Details of the new gene set libraries added to Enrichr since its original publication

PMID stands for PubMed identifiers.

Differentially expressed genes after drug, gene, disease, ligand and pathogen perturbations extracted from GEO by the crowd

To extract gene sets from gene expression data deposited in the GEO (29), we established a crowdsourcing microtask project that asks participants to extract gene sets from GEO for the following categories: (1) single-gene perturbations in mammalian cells; (2) comparison of diseased versus normal tissues; (3) single-drug perturbations in mammalian cells; (4) perturbations applied to MCF7 cells; (5) comparison between young and old mammalian tissues; (6) endogenous ligand perturbations of mammalian cells; and (7) comparison of before and after pathogen infection of human cells. Participants of the microtasks were recruited via two Coursera massive online open courses (MOOCs) and worked voluntarily on finding relevant studies from the GEO database. Participants were instructed to identify control and perturbation samples (GSM files), and to add additional metadata such as cell-line/tissue used in each study, as well as IDs for genes, diseases and small molecules. Participants were also instructed to use the browser extension GEO2Enrichr (30) to extract differentially expressed gene sets from GEO. The metadata and gene sets were submitted to our crowdsourcing database and then converted to gene set libraries for Enrichr.

To ensure the quality of these crowd-generated gene set libraries, we performed both automatic and manual sanitizations. We first programmatically re-processed all the entries submitted by the participants to calculate differentially expressed gene sets using the metadata submitted by the participants using the Characteristic Direction method (31). Incorrect entries where samples did not belong to the particular study were automatically filtered. We also automatically filtered out entries with invalid gene symbols and mismatched organisms. Entries from curators who submitted more than 10% invalid entries were removed entirely. Entries that passed these filters were randomly sampled for manual inspection to ensure that the metadata, such as the perturbed genes, were in fact perturbed in the study, and control samples and perturbation samples were correctly selected. As a result, approximately 20% of the submitted entries were removed for each microtask.

In addition, to encourage Enrichr users to contribute their own lists to the crowd category, we added a checkbox on the submission page that enables user-submitted lists to be added to a collection that can then be searched by other users. The default settings of the checkbox are unchecked to avoid users exposing their lists by accident. So far, ∼600 lists were contributed by users of Enrichr. In the future, we plan to make these contributed lists available for search by the community.

Benchmarking enrichment methods

To benchmark the performance of the various enrichment analysis methods implemented within Enrichr, namely, the proportion test, the Z-score and the combined score, as well as other similar published methods, for example, the over representation analysis (ORA) method (11), as well as simple methods such as the Jaccard distance or the number of overlapping genes, we processed 489 experiments that genetically perturbed (knockdown, knockout or overexpression) transcript factors (TFs) from 293 studies available from GEO. We identified the differentially expressed genes from these studies using the Characteristic Direction (CD) method (31). We then performed enrichment analysis against the ChIP-X enrichment analysis (ChEA) gene set library, ranking TFs with the different scoring methods (32). The hypothesis behind this benchmarking idea is that genes that are differentially expressed after genetic perturbations of a TF are enriched for the targets of the TF as determined by ChIP-seq regardless of cell type, mammalian organism or microarray platform. We then find the ranks of the perturbed TFs for each enrichment analysis scoring methods and plot their cumulative distributions. Our results demonstrate that the combined score and the Z-score methods recover more of the ‘correct’ terms compared with the other methods we tested (Figure 1A). This result is consistent with our results from 2013, presented in the original Enrichr publication.

Benchmarking different enrichment analysis methods. (A) Deviation of the cumulative distribution from uniform of the scaled ranks of TFs derived from different enrichment analysis methods; (B) Comparison between crisp and fuzzy version of the proportion test. The ranking distribution of randomly ordered ChEA terms is plotted in gray dashed line. The area under the curve (AUC) is indicated in the legend as a measure of the degree of deviation from uniform.

Figure 1.

Benchmarking different enrichment analysis methods. (A) Deviation of the cumulative distribution from uniform of the scaled ranks of TFs derived from different enrichment analysis methods; (B) Comparison between crisp and fuzzy version of the proportion test. The ranking distribution of randomly ordered ChEA terms is plotted in gray dashed line. The area under the curve (AUC) is indicated in the legend as a measure of the degree of deviation from uniform.

Fuzzy enrichment analysis

A fuzzy set is composed of a pair |$\{ S,\ m\}$|⁠, where S is a set and m is a membership function defined over the members of the set: |$m:S \to [0,1]$|⁠. For each |$\ x \in S$|⁠, the value m(x) is the grade of membership of x, such that if |$m(x) = 0$| then x is termed ‘not in the set’ and if |$m(x) = 1$| then x is termed ‘completely in the set’, and intermediate values of x are considered to have intermediate fuzzy membership. In these terms, the simple gene sets referred to above are called ‘crisp sets’ because all the genes in these sets have a membership value of 1. Another common representation for fuzzy sets is

\begin{equation*}\left\{ {m({x_1})/{x_1}\ ,\ m({x_2})/{x_2},\ \ldots \ } \right\}\end{equation*}

To perform enrichment analysis with fuzzy sets, we require the fuzzy equivalent of set intersection. For the fuzzy sets |$\{ S,\ {m_a}\}$| and |$\{ S,\ {m_b}\}$|⁠, this is defined such that

\begin{equation*}\begin{array}{*{20}{l}} {\{ S,\ {m_a}\} \cap \{ S,\ {m_b}\} = }\\ {\left\{ {Min\left( {{m_a}({x_1}),\ {m_b}({x_1})} \right)/{x_1},Min\left( {{m_a}({x_2}),\ {m_b}({x_2})} \right)/{x_2},\ ...\ } \right\}} \end{array}\end{equation*}

In addition, we need the cardinality of a fuzzy set which is defined as the sum of the grades of membership of each element,

\begin{equation*}|S| = \sum\limits_{x \in S} {m(x).} \end{equation*}

The fuzzy _P_-value enrichment score can be calculated by decomposing the null distribution into two parts; firstly we denote by Z the number of non-zero grades of membership in the fuzzy intersection between two null fuzzy sets: |$\{ S,\ {m_{a,\ null}}\} \cap \{ S,\ {m_{b,\ null}}\}$|⁠. Then Z is a random variable that is distributed by the hypergeometric distribution:

\begin{equation*}P(Z = z) = \frac{{\left( {\begin{array}{*{20}{c}} {{N_a}}\\ z \end{array}} \right)\left( {\begin{array}{*{20}{c}} {|S| - {N_a}}\\ {{N_b} - z} \end{array}} \right)}}{{\left( {\begin{array}{*{20}{c}} {|S|}\\ {{N_b}} \end{array}} \right)}}\end{equation*}

While intuitively fuzzy enrichment analysis should be more accurate than ‘crisp’ enrichment analysis, because ‘fuzzy’ enrichment considers the ranks and magnitude of genes in both the input set and the library sets, our initial results so far only show a marginal enhancement, utilizing the same TF-centered benchmark presented above (Figure 1B). In the future, we plan to further explore ways to improve the performance of the fuzzy set enrichment analysis idea. It is also important to note that with the fuzzy set enrichment analysis, the scaling method used to convert typical values that represent, for example, level of differential expression, into membership values between 0 and 1 is important. Overall effective use of fuzzy enrichment analysis requires advanced computational expertise. However, in the near future, we plan to make such transformations easier and more transparent.

Uploading BED files

The introduction of ChIP-seq and ChIP-chip technologies enables the detection of de novo transcription factor binding sites and changes in histone modifications in mammalian genomes. Efforts such as the ENCODE project supply a large compendium of this type of data. To identify the exact location of protein-DNA binding, genomic regions with statistically enriched reads, called peaks, are detected. The final step in such analyses is to associate peaks with genes. The updated version of Enrichr features similar functionality developed for the popular tool Genomic Regions Enrichment of Annotations Tool (GREAT) (33) by allowing users to upload BED files describing genomics region peaks. A Java module in Enrichr maps the chromosome coordinates listed in input BED files to their nearest coding mouse or human genes. User options allow the specification of whether the input is for human or mouse, and the number of genes to return based on distance to the transcription start site (TSS). The identified nearest genes are automatically uploaded to Enrichr for enrichment analysis. Enrichr now has a new button that enables users to view, cut and paste the uploaded lists. This feature can be used to analyze the nearest genes from any input BED file containing peaks using other tools.

Visualization of the results with clustergrams

One of the new features of Enrichr is the visualization of the enrichment results as clustergrams. This is achieved using Clustergrammer (https://github.com/MaayanLab/clustergrammer), an independent data visualization module we developed for multiple projects. Clustergrammer provides dynamic visualizations of Enrichr's enrichment analysis results. It enables a user to visualize the associations between their input genes and the overlapping genes of the top enriched terms. Clustergrammer visualizes these associations using a heat map in which the columns are the top enriched terms, and the rows are the input genes. The cells in the heat map indicate whether a gene from the input list overlaps with genes that belong to an enriched term. The enriched terms in the columns of the heat map are ranked based on their enrichment score. This score is indicated by the length of a transparent red bar that is displayed above the column labels. The input genes are hierarchically clustered based on their associations with the top enriched terms. Clustering is calculated using the Jaccard distance and average linkage. The heat map is interactive; a user can zoom and pan using scroll and drag functions. The rows and columns can be toggled between different orderings. The heat map can be re-ordered based on a single row or column by double-clicking on a label. The matrix is initialized to show the top 20 input genes that are associated with the top 10 enriched terms; however, these can be adjusted with sliders. This slider can be used to show more of the user's input genes. Users can search for an input gene using a search box to identify a gene of interest if the heat map contains many rows. Users can also save an image of the clustergram using the camera icon, or share the interactive visualization using the permanent link available by clicking the share icon.

Deployment with Docker, Mesos and Marathon

The Enrichr hosting and deployment process has changed drastically since its original publication. To account for the increased traffic through both Enrichr's web interface and API, the application and its dependencies are now packaged into a Docker container (34) running the Debian 8.0 operating system with Java 8 installed. Once packaged, the Docker container is deployed onto a 16-node cluster managed using Apache Mesos (35). To maximize uptime, Mesosphere's Marathon software is used on top of Apache Mesos as a cluster-wide initialization and control system (36). The Marathon software automatically controls restarting Enrichr and moving resources across cluster nodes.

Libraries management

One of the challenges related to enrichment analysis tools such as Enrichr is provenance: the ability to repeat enrichment results, even after libraries and computational methods for computing enrichment have been updated. To address this issue, we created a ‘Legacy’ category in which we place older libraries so that these can be accessed by users who wish to repeat their own results, or repeat a published result conducted by others. The Legacy category has gene set libraries with a year label. We plan to update libraries once a year to balance consistency of results with timely content. Initially, the gene set libraries of Enrichr were not made available for download. Since 2015, we have made all libraries accessible for direct download. This enables other computational biologists to explore the deep relationships between genes in annotated gene sets, and to develop new tools using these libraries.

Overview of Enrichr statistics

Enrichr currently contains 102 gene set libraries belonging to eight categories. In total, there are currently 180 184 annotated gene sets within Enrichr. So far, 1 050 236 gene sets have been uploaded for analysis with Enrichr. While most (∼65%) users submit only 1–3 lists to Enrichr, there are also many heavy users where the distribution of lists submitted per user fits a well-behaved power law (Figure 2A). The submitted lists' size also follows a power-law distribution, but contains a peak around ∼250 genes per list (Figure 2B). This peak is likely an artifact from submissions that arrive from the tool GEO2Enrichr, which has a default setting of posting the top 500 genes separated into up-regulated or down-regulated genes from signatures processed from GEO. Examining the occurrence of individual genes in a submitted gene sets, we observe a log-normal distribution (Figure 2C) with the most popular genes: EGR1, FOS, TXNIP, DDIT4 and SGK1. EGR1 and FOS are well-known immediate early genes (IEG), and their high presence likely confirms that these genes are most commonly found as differentially expressed. The appearance of TXNIP, DDIT4 and SGK1 as common genes is interesting since these genes have a lesser-known role to be most responsive. The identification of the common occurrence of genes in submitted lists and annotated gene sets can potentially be applied to correct for biases, and as a result improve knowledge extraction. More extensive analysis of gene occurrence and co-occurrence in submitted lists demonstrates that such collective knowledge can be used to discover gene functions and predict protein interactions (37). Finally, we plot the distribution of the lengths of the 180 184 annotated gene sets provided for search by Enrichr (Figure 2D). Overall, this distribution also fits a power law with few inflections that likely represent specific libraries with hard cut-offs for gene sets. It is still an open question what are the recommendations for optimal enrichment analysis when it comes to setting thresholds for gene set lengths. This is likely because the answer is context dependent, but more investigation can be done with appropriate benchmarks.

Statistics of Enrichr. (A) Histogram of gene lists submitted per user. (B) Histogram of uploaded list lengths. (C) Histogram of appearance of genes in uploaded list. (D) Histogram of annotated gene set sizes within Enrichr.

Figure 2.

Statistics of Enrichr. (A) Histogram of gene lists submitted per user. (B) Histogram of uploaded list lengths. (C) Histogram of appearance of genes in uploaded list. (D) Histogram of annotated gene set sizes within Enrichr.

COMPARISON TO OTHER SIMILAR TOOLS

Comparing libraries and resources in other tools

Next we aim to compare the resources and libraries offered for search by Enrichr with other similar tools. For this, we compared Enrichr with GO-Elite (38) and MSigDB (6), two leading resources that contain a comprehensive collection of gene set libraries. We summarized all the sources of gene set libraries for the three resources and plotted a Venn diagram to show the overlap among these resources (Figure 3A). Enrichr contains a large portion of MSigDB but is more comprehensive than both resources. MSigDB (6) contains eight collections of gene set libraries, two of which are also included in Enrichr (Computational and Oncogenic signatures). Many of the other collections of gene set libraries in MSigDB share the same sources with other gene set libraries currently present in Enrichr. These include, for example, the GO, pathway databases such as KEGG, Biocarta and Reactome, microRNAs/gene targets and gene sets created from position weight matrices. In addition, we note that MSigDB contains chemical and genetic perturbation gene sets manually curated from supporting materials of publications, whereas Enrichr contains differentially expressed genes after chemical and genetic perturbation curated from GEO. We compared the GEO data sets covered by Enrichr and MSigDB and found that there is some overlap while Enrichr has coverage of more data sets (Figure 3B).

Comparing Enrichr resources with MSigDB and GO-Elite. (A) Venn diagram summarizing the various resources processed and served by Enrichr, MSigDB and GO-Elite. (B) Venn diagram to compare the number of processed gene sets of genetic and chemical perturbations curated from publications in Enrichr and MSigDB.

Figure 3.

Comparing Enrichr resources with MSigDB and GO-Elite. (A) Venn diagram summarizing the various resources processed and served by Enrichr, MSigDB and GO-Elite. (B) Venn diagram to compare the number of processed gene sets of genetic and chemical perturbations curated from publications in Enrichr and MSigDB.

User interface pros and cons

There are many other gene set enrichment analysis tools that could be compared with Enrichr; for example, some leading tools are Fidea (39), DAVID (13), WebGestalt (12), g:Profiler (12) and GSEA (40). The advantages of Enrichr over some of these tools are its comprehensiveness, ease of use and interactive visualization of the results. Enrichr is lacking some of the flexibility available with those other tools. For example, Enrichr merges human, mouse and rat genes, which has advantages and disadvantages. Enrichr does not have an ID conversion tool, which is highly desired by many users. Enrichr also does not have the ability to upload a background list, and it does not have implementation of parametric tests such as Gene Set Enrichment Analysis (GSEA) (40), Parametric Analysis of Gene set Enrichment (PAGE) (9), and our own Principal Angle Enrichment Analysis (PAEA) (41). These features are planned.

FUTURE DIRECTIONS

As more genomics, transcriptomics and proteomics data accumulate, we plan to continue adding to Enrichr new gene set libraries. We also plan to continually improve the visualization of the enrichment results. It might be useful for users to view results across libraries, and to have a report of the most interesting enrichment results across all libraries. Enrichr currently supports only input from mammalian genes; in the future, we plan to add versions of Enrichr for yeast, worm and fly. The collection of terms for genes can be used to identify similarity between genes across resources, and this will improve the Find a Gene feature by suggesting similar genes. By examining the lists submitted to Enrichr, we noticed that approximately 10% of the submitted lists do not contain valid gene names. Users submit probe IDs, protein IDs, genes from other organisms, complete tables from spreadsheets with special characters, and other non-standard genes names. To accommodate these users, Enrichr needs to provide methods to convert these inputs into usable gene sets. The enrichment analysis concept can be expanded into new directions. For example, drug-set enrichment analysis (42) can be used to identify common functions for collections of drugs. In addition, enrichment analysis tools are increasingly becoming network-aware. The edge set enrichment analysis (43) method is one example of how network information can be incorporated into enrichment analysis. The collective analysis of the over one million gene sets submitted to Enrichr can be viewed as a potential resource for biological discovery. Each list can be classified into an attractor of similar lists and classified by methods of data acquisition but also biological regulatory layers, i.e. mRNA/proteins/SNPs, as well as biological roles. While we are committed to keeping user lists completely private, we also aim to explore the collective knowledge that is accumulating from all user submissions to Enrichr (37).

FUNDING

NIH [R01GM098316, U54HL127624 and U54CA189201 to A.M.]. Funding for open access charge: Institutional funds.

Conflict of interest statement. None declared.

REFERENCES

Gene Ontology: tool for the unification of biology

Nat. Genet.

2000

25

25

29

FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes

Bioinformatics

2004

20

578

580

BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks

Bioinformatics

2005

21

3448

3449

GO:: TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes

Bioinformatics

2004

20

3710

3715

KEGG: kyoto encyclopedia of genes and genomes

Nucleic Acids Res.

2000

28

27

30

Molecular signatures database (MSigDB) 3.0

Bioinformatics

2011

27

1739

1740

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

Nucleic Acids Res.

2009

37

1

13

Gene set enrichment analysis: performance evaluation and usage guidelines

Brief. Bioinform.

2011

13

281

291

PAGE: parametric analysis of gene set enrichment

BMC Bioinformatics

2005

6

144

ToppGene Suite for gene list enrichment analysis and candidate gene prioritization

Nucleic Acids Res.

2009

37

W305

W311

GeneTrail—advanced gene set enrichment analysis

Nucleic Acids Res.

2007

35

W186

W192

WebGestalt: an integrated system for exploring gene sets in various biological contexts

Nucleic Acids Res.

2005

33

W741

W748

DAVID: database for annotation, visualization, and integrated discovery

Genome Biol.

2003

4

P3

Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool

BMC Bioinformatics

2013

14

128

The Handbook of Metabolomics

2012

NY

Humana Press

419

438

The BioPAX community standard for pathway data sharing

Nat. Biotechnol.

2010

28

935

942

PID: the pathway interaction database

Nucleic Acids Res.

2009

37

D674

D679

Analysis of the human endogenous coregulator complexome

Cell

2011

145

787

799

PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees

Nucleic Acids Res.

2013

41

D377

D386

The human DEPhOsphorylation database DEPOD: a 2015 update

Nucleic Acids Res.

2015

43

D531

D535

et al.

The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data

Nucleic Acids Res.

2014

42

D966

D974

et al.

Gene: a gene-centered information resource at NCBI

Nucleic Acids Res.

2015

43

D36

D42

et al.

ENCODE data at the ENCODE portal

Nucleic Acids Res.

2016

44

D726

D732

Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system

Nucleic Acids Res.

2013

41

D996

D1008

Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans

Science

2015

348

648

660

et al.

Mass-spectrometry-based draft of the human proteome

Nature

2014

509

582

587

et al.

A draft map of the human proteome

Nature

2014

509

575

581

et al.

Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies

Sci. Data

2014

1

140035

et al.

NCBI GEO: archive for functional genomics data sets—update

Nucleic Acids Research

2013

41

D991

D995

GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions

Bioinformatics

2015

31

3060

3062

The characteristic direction: a geometrical approach to identify differentially expressed genes

BMC Bioinformatics

2014

15

79

ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments

Bioinformatics

2010

26

2438

2444

GREAT improves functional interpretation of cis-regulatory regions

Nat. Biotechnol.

2010

28

495

501

Docker: lightweight linux containers for consistent development and deployment

Linux J.

2014

239

2

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

NSDI

2011

11

22

22

Integrating Apache Airavata with Docker, Marathon, and Mesos

Concurrency and Computation: Practice and Experience

2015

28

1952

1959

Large collection of diverse gene set search queries recapitulate known protein-protein interactions and gene-gene functional associations

2016

arXiv:

1601.01653

GO-Elite: a flexible solution for pathway and ontology over-representation

Bioinformatics

2012

28

2209

2210

FIDEA: a server for the functional interpretation of differential expression analysis

Nucleic Acids Res.

2013

41

W84

W88

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles

Proc. Natl. Acad. Sci. U.S.A.

2005

102

15545

15550

Principle Angle Enrichment Analysis (PAEA): Dimensionally reduced multivariate gene set enrichment analysis tool

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE

2015

256

262

Drug-set enrichment analysis: a novel tool to investigate drug mode of action

Bioinformatics

2016

32

235

241

ESEA: discovering the dysregulated pathways based on edge set enrichment analysis

Sci. Rep.

2015

5

13044

© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.