Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes - PubMed (original) (raw)

Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes

Zuguang Gu et al. BMC Syst Biol. 2012.

Abstract

Background: Biological pathways are important for understanding biological mechanisms. Thus, finding important pathways that underlie biological problems helps researchers to focus on the most relevant sets of genes. Pathways resemble networks with complicated structures, but most of the existing pathway enrichment tools ignore topological information embedded within pathways, which limits their applicability.

Results: A systematic and extensible pathway enrichment method in which nodes are weighted by network centrality was proposed. We demonstrate how choice of pathway structure and centrality measurement, as well as the presence of key genes, affects pathway significance. We emphasize two improvements of our method over current methods. First, allowing for the diversity of genes' characters and the difficulty of covering gene importance from all aspects, we set centrality as an optional parameter in the model. Second, nodes rather than genes form the basic unit of pathways, such that one node can be composed of several genes and one gene may reside in different nodes. By comparing our methodology to the original enrichment method using both simulation data and real-world data, we demonstrate the efficacy of our method in finding new pathways from biological perspective.

Conclusions: Our method can benefit the systematic analysis of biological pathways and help to extract more meaningful information from gene expression data. The algorithm has been implemented as an R package CePa, and also a web-based version of CePa is provided.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Meta-analysis of the pathway catalogue. A) Distribution of the number of member genes in each node; B) Distribution of the number of nodes in which a single gene resides; C) Relationship between node count and gene count in biological pathways. The pathways are derived from Pathway Interaction Database, NCI-Nature catalogue. For figure A and B, points are log-scaled on the Y-axis.

Figure 2

Figure 2

P -values and centrality distributions of pathways with different random network structures under different centrality measurements. Pathway topologies are generated from (A) Erdös-Rényi model and (B) Barabási-Albert model. Comparisons are made between in-degree, out-degree, betweenness, in-largest reach, out-largest reach centralities, as well as the equal weight condition. Each plot represents the distribution of differential nodes centralities in each simulation, assessed by maximum value, the 75th quartile, median value and minimum value. All data are ordered by _p_-values on the X-axis. Points in the figure are randomly shifted by small intervals for ease of visualization.

Figure 3

Figure 3

Comparison of p -values influenced by key nodes. Differential nodes, weighted by degree, are selected in two ways: from high to low degree and from low to high degree. Also, traditional ORA was applied for comparison.

Figure 4

Figure 4

Heatmap of FDRs in pathways. A) Pathways evaluated as significant by both traditional ORA and our method for at least one centrality measure; B) Pathways for which our method disagrees with traditional ORA. In each heatmap, columns are sorted by FDRs calculated from traditional ORA and rows are sorted through hierarchical clustering. Green and red denote insignificant and highly significant, respectively.

Figure 5

Figure 5

Summary of MAPK-TRK pathway generated under in-largest reach centrality. A) Distribution of in-largest reach centrality of differential nodes in the simulated pathway. The distribution of differential nodes centralities in each simulation is assessed by maximum value, the 75th quartile, median value and minimum value; B) Distribution of in-largest reach centrality of all nodes in the real pathway; C) Histogram of simulated scores in the pathway; D) Graph view of the pathway where the size of a node is proportional to its centrality value and nodes in red represent differential nodes. In figures A and B, dots are randomly shifted by small intervals for ease of visualization. In figures A and C, the real pathway score is marked with a red line.

Figure 6

Figure 6

Workflow of the centrality-based pathway enrichment analysis. A typical figure on the left illustrates the corresponding step on the right side. The essential steps are: 1) Obtain a differentially expressed gene list. This list can be compiled using a variety of methods and sources; 2) Map genes to nodes; 3) Select several centrality measurements and calculate their values; 4) Weighting nodes by centrality, calculate the pathway-level score; 5) In simulations, repeat steps 1 to 4 for a user-specified number of cycles (1000 cycles were used in the current study) and generate a null distribution of pathway-level scores; 6) Calculate _p_-values and display the results summary.

Similar articles

Cited by

References

    1. Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. doi: 10.1126/science.1069492. - DOI - PubMed
    1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7:55–65. doi: 10.1038/nrg1749. - DOI - PubMed
    1. Cary MP, Bader GD, Sander C. Pathway information for systems biology. FEBS lett. 2005;579:1815–1820. doi: 10.1016/j.febslet.2005.02.005. - DOI - PubMed
    1. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. doi: 10.1093/nar/gkn923. - DOI - PMC - PubMed
    1. Khatri P, Drăghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005;21:3587–3595. doi: 10.1093/bioinformatics/bti565. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources