LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data - PubMed (original) (raw)

LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data

Maureen A Sartor et al. Bioinformatics. 2009.

Abstract

Motivation: The elucidation of biological pathways enriched with differentially expressed genes has become an integral part of the analysis and interpretation of microarray data. Several statistical methods are commonly used in this context, but the question of the optimal approach has still not been resolved.

Results: We present a logistic regression-based method (LRpath) for identifying predefined sets of biologically related genes enriched with (or depleted of) differentially expressed transcripts in microarray experiments. We functionally relate the odds of gene set membership with the significance of differential expression, and calculate adjusted P-values as a measure of statistical significance. The new approach is compared with Fisher's exact test and other relevant methods in a simulation study and in the analysis of two breast cancer datasets. Overall results were concordant between the simulation study and the experimental data analysis, and provide useful information to investigators seeking to choose the appropriate method. LRpath displayed robust behavior and improved statistical power compared with tested alternatives. It is applicable in experiments involving two or more sample types, and accepts significance statistics of the investigator's choice as input.

PubMed Disclaimer

Figures

Fig. 1.

Simulation results: ability to rank enriched GO terms log10-rankings of enriched GO terms were calculated to compare the ability of methods to correctly rank these categories at the top of the list. Thus, lower ranking scores are better. Methods are LRpath, FE with the following three criteria for detecting DEGs (P<0.001, P<0.01, P<0.05, P<0.10 and P<0.50), BayGO, sigPathway (sigPath), allez and ProbCD. Initial four parameter sets (A) used 90%, 75%, 50% and 25% enrichment with DEGs, 500 total DEGs, normally distributed fold changes, two enriched categories and three replicates for treated and control groups. Subsequent groups had the following differences: (B) 1000 DEGs, (C) DEGs with higher fold changes, (D) five enriched GO terms, (E) five replicates. Data shown are averages from 30 simulation runs for each parameter set. LRpath performed significantly better than the next best methods (_P_=2.2×10−4 compared with FE P<0.05 and _P_=1.5×10−4 compared with FE P<0.10) using a Wilcoxon rank test.

Fig. 2.

Effect of input statistics on LRpath and allez. Graphed is the average increase in log-rank of enriched GO terms relative to LRpath with –log(_P_-values) as input, which ranked best. LRpath and allez produced very similar results when given the same input. *P<0.05 from Wilcoxon rank test between allez and LRpath using –log(P). **P<0.05 from Wilcoxon rank test between allez and LRpath using _z_-transformed gene ranks.

Fig. 3.

Concordance of methods between two independent Breast Cancer datasets. Reproducibility of the methods (LRpath, FE with cutoffs of 0.50, 0.10, 0.01 and 0.001 for DEGs, BayGO, GSEA, sigPathway, allez and ProbCD, respectively) was tested by measuring the consistency of results across two datasets, both comparing grade 3 to grade 1 tumors. (A)Correlation between datasets for each method. As a measure of significance, the –log(_P_-values) of GO term enrichment were calculated for each method and dataset separately, and the Pearson correlation coefficients between datasets were calculated. (B) Overlapping enriched GO terms by rank. Ranked lists of GO terms were generated for each method and each dataset separately. The number of overlapping GO terms was calculated between datasets for each method for increasing length of ranked lists.

Cited by

Current approaches and outstanding challenges of functional annotation of metabolites: a comprehensive review.
Nguyen QH, Nguyen H, Oh EC, Nguyen T. Nguyen QH, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae498. doi: 10.1093/bib/bbae498. Brief Bioinform. 2024. PMID: 39397425 Free PMC article. Review.
danRerLib: a Python package for zebrafish transcriptomics.
Schwartz AV, Sant KE, George UZ. Schwartz AV, et al. Bioinform Adv. 2024 May 6;4(1):vbae065. doi: 10.1093/bioadv/vbae065. eCollection 2024. Bioinform Adv. 2024. PMID: 38770229 Free PMC article.
GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership.
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. Carbonetto P, et al. Genome Biol. 2023 Oct 19;24(1):236. doi: 10.1186/s13059-023-03067-9. Genome Biol. 2023. PMID: 37858253 Free PMC article.
Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression.
Jablonski KP, Beerenwinkel N. Jablonski KP, et al. Bioinformatics. 2023 Aug 1;39(8):btad522. doi: 10.1093/bioinformatics/btad522. Bioinformatics. 2023. PMID: 37610338 Free PMC article.
Molecular differences in brain regional vulnerability to aging between males and females.
Zhou X, Cao J, Zhu L, Farrell K, Wang M, Guo L, Yang J, McKenzie A, Crary JF, Cai D, Tu Z, Zhang B. Zhou X, et al. Front Aging Neurosci. 2023 May 22;15:1153251. doi: 10.3389/fnagi.2023.1153251. eCollection 2023. Front Aging Neurosci. 2023. PMID: 37284017 Free PMC article.

References

1. Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
1. Benjamini Y, et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B. 1995;57:289–300.
1. Berriz GF, et al. Characterizing gene sets with FuncAssociate. Bioinformatics. 2003;19:2502–2504. - PubMed
1. Curtis RK, et al. Pathways to the analysis of microarray data. Trends Biotechnol. 2005;23:429–435. - PubMed
1. Dennis G, Jr, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4(3) - PubMed

LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data - PubMed (original) (raw)