Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes - PubMed (original) (raw)

Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes

Zuguang Gu et al. BMC Syst Biol. 2012.

Abstract

Background: Biological pathways are important for understanding biological mechanisms. Thus, finding important pathways that underlie biological problems helps researchers to focus on the most relevant sets of genes. Pathways resemble networks with complicated structures, but most of the existing pathway enrichment tools ignore topological information embedded within pathways, which limits their applicability.

Results: A systematic and extensible pathway enrichment method in which nodes are weighted by network centrality was proposed. We demonstrate how choice of pathway structure and centrality measurement, as well as the presence of key genes, affects pathway significance. We emphasize two improvements of our method over current methods. First, allowing for the diversity of genes' characters and the difficulty of covering gene importance from all aspects, we set centrality as an optional parameter in the model. Second, nodes rather than genes form the basic unit of pathways, such that one node can be composed of several genes and one gene may reside in different nodes. By comparing our methodology to the original enrichment method using both simulation data and real-world data, we demonstrate the efficacy of our method in finding new pathways from biological perspective.

Conclusions: Our method can benefit the systematic analysis of biological pathways and help to extract more meaningful information from gene expression data. The algorithm has been implemented as an R package CePa, and also a web-based version of CePa is provided.

PubMed Disclaimer

Figures

Figure 1

Meta-analysis of the pathway catalogue. A) Distribution of the number of member genes in each node; B) Distribution of the number of nodes in which a single gene resides; C) Relationship between node count and gene count in biological pathways. The pathways are derived from Pathway Interaction Database, NCI-Nature catalogue. For figure A and B, points are log-scaled on the Y-axis.

Figure 2

P -values and centrality distributions of pathways with different random network structures under different centrality measurements. Pathway topologies are generated from (A) Erdös-Rényi model and (B) Barabási-Albert model. Comparisons are made between in-degree, out-degree, betweenness, in-largest reach, out-largest reach centralities, as well as the equal weight condition. Each plot represents the distribution of differential nodes centralities in each simulation, assessed by maximum value, the 75th quartile, median value and minimum value. All data are ordered by _p_-values on the X-axis. Points in the figure are randomly shifted by small intervals for ease of visualization.

Figure 3

Comparison of p -values influenced by key nodes. Differential nodes, weighted by degree, are selected in two ways: from high to low degree and from low to high degree. Also, traditional ORA was applied for comparison.

Figure 4

Heatmap of FDRs in pathways. A) Pathways evaluated as significant by both traditional ORA and our method for at least one centrality measure; B) Pathways for which our method disagrees with traditional ORA. In each heatmap, columns are sorted by FDRs calculated from traditional ORA and rows are sorted through hierarchical clustering. Green and red denote insignificant and highly significant, respectively.

Figure 5

Summary of MAPK-TRK pathway generated under in-largest reach centrality. A) Distribution of in-largest reach centrality of differential nodes in the simulated pathway. The distribution of differential nodes centralities in each simulation is assessed by maximum value, the 75th quartile, median value and minimum value; B) Distribution of in-largest reach centrality of all nodes in the real pathway; C) Histogram of simulated scores in the pathway; D) Graph view of the pathway where the size of a node is proportional to its centrality value and nodes in red represent differential nodes. In figures A and B, dots are randomly shifted by small intervals for ease of visualization. In figures A and C, the real pathway score is marked with a red line.

Figure 6

Workflow of the centrality-based pathway enrichment analysis. A typical figure on the left illustrates the corresponding step on the right side. The essential steps are: 1) Obtain a differentially expressed gene list. This list can be compiled using a variety of methods and sources; 2) Map genes to nodes; 3) Select several centrality measurements and calculate their values; 4) Weighting nodes by centrality, calculate the pathway-level score; 5) In simulations, repeat steps 1 to 4 for a user-specified number of cycles (1000 cycles were used in the current study) and generate a null distribution of pathway-level scores; 6) Calculate _p_-values and display the results summary.

Cited by

Distribution of centrality measures on undirected random networks via the cavity method.
Bartolucci S, Caccioli F, Caravelli F, Vivo P. Bartolucci S, et al. Proc Natl Acad Sci U S A. 2024 Oct;121(40):e2403682121. doi: 10.1073/pnas.2403682121. Epub 2024 Sep 25. Proc Natl Acad Sci U S A. 2024. PMID: 39320915 Free PMC article.
Ant colony optimization for the identification of dysregulated gene subnetworks from expression data.
Hanna EM, El Hasbani G, Azar D. Hanna EM, et al. BMC Bioinformatics. 2024 Aug 1;25(1):254. doi: 10.1186/s12859-024-05871-x. BMC Bioinformatics. 2024. PMID: 39090538 Free PMC article.
RCPA: An Open-Source R Package for Data Processing, Differential Analysis, Consensus Pathway Analysis, and Visualization.
Nguyen H, Nguyen H, Maghsoudi Z, Tran B, Draghici S, Nguyen T. Nguyen H, et al. Curr Protoc. 2024 May;4(5):e1036. doi: 10.1002/cpz1.1036. Curr Protoc. 2024. PMID: 38713133
MicroRNAs in the Pathogenesis of Preeclampsia-A Case-Control In Silico Analysis.
Kasimanickam R, Kasimanickam V. Kasimanickam R, et al. Curr Issues Mol Biol. 2024 Apr 17;46(4):3438-3459. doi: 10.3390/cimb46040216. Curr Issues Mol Biol. 2024. PMID: 38666946 Free PMC article.
Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles.
Chang LY, Lee MZ, Wu Y, Lee WK, Ma CL, Chang JM, Chen CW, Huang TC, Lee CH, Lee JC, Tseng YY, Lin CY. Chang LY, et al. Nucleic Acids Res. 2024 Feb 9;52(3):e17. doi: 10.1093/nar/gkad1187. Nucleic Acids Res. 2024. PMID: 38096046 Free PMC article.

References

1. Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. doi: 10.1126/science.1069492. - DOI - PubMed
1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7:55–65. doi: 10.1038/nrg1749. - DOI - PubMed
1. Cary MP, Bader GD, Sander C. Pathway information for systems biology. FEBS lett. 2005;579:1815–1820. doi: 10.1016/j.febslet.2005.02.005. - DOI - PubMed
1. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. doi: 10.1093/nar/gkn923. - DOI - PMC - PubMed
1. Khatri P, Drăghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005;21:3587–3595. doi: 10.1093/bioinformatics/bti565. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes - PubMed (original) (raw)