ChEA3: transcription factor enrichment analysis by orthogonal omics integration - PubMed (original) (raw)

ChEA3: transcription factor enrichment analysis by orthogonal omics integration

Alexandra B Keenan et al. Nucleic Acids Res. 2019.

Abstract

Identifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF-gene co-expression from RNA-seq studies, TF-target associations from ChIP-seq experiments, and TF-gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.

PubMed Disclaimer

Figures

Figure 1.

Performance of the ChEA3 libraries and integration techniques in recovering the perturbed TFs from 946 TF LOF and GOF experiments from the TFpertGEOupdn benchmark dataset. (A) Mean ROC AUC and mean PR AUC over 5000 bootstrapped ROC and PR curves; (B) composite ROC curves generated from 5000 boostrapped curves; (C) composite PR curves generated from 5000 bootstrapped curves; (D) the deviation of the cumulative distribution from uniform of the scaled rankings of each perturbed TF in the benchmarking dataset. Anderson-Darling test of uniformity: MeanRank P = 6.34 × 10−7; TopRank P = 6.34 × 10−7; ARCHS4 P = 6.34 × 10−7; ENCODE P = 2.06 × 10−6; Enrichr Queries P = 6.83 × 10−7; GTEx P = 6.45 × 10−7; Literature ChIP-seq P = 1.28 × 10−6; ReMap P = 1.02 × 10−6.

Figure 2.

Fraction of the TFpertGEOupdn benchmarking dataset subset recovered in the top one percentile of rankings compared to the library TF coverage. (A) A heatmap visualizing transcription factor coverage for the ChEA3 libraries. (B) The fraction of the TFpertGEOupdn subset TFs recovered in the top percentile of ranks for each ChEA3 library. Only the TFpertGEOupdn gene sets where the perturbed TF was covered by the library were considered when computing the ‘Percent Subset Recovered’.

Figure 3.

Effect of input type on ChEA3 performance. The deviation of the cumulative distribution from uniform of the scaled rankings of perturbed TFs in the benchmarking dataset for: (A) TF overexpression or chemical activation experiments from TFpertGEOup; (B) TF overexpression or chemical activation experiments from TFpertGEOdn; (C) TF knockdown, knockout or chemical inactivation experiments from TFpertGEOup; and (D) TF knockdown, knockout or chemical inactivation experiments from TFpertGEOdn.

Figure 4.

Comparison of available TF prediction tools with ChEA3 with the hsTFpertGEO benchmarking dataset. (A) Composite ROC curves generated from 5000 bootstrapped curves; (B) composite PR curves generated from 5000 bootstrapped curves; (C) the deviation of the cumulative distribution from uniform of the scaled rankings of each perturbed TF in the benchmarking dataset; Anderson–Darling test of uniformity: VIPER GTEx Regulon P = 1.39 × 10−6, MAGICACT P = 6.58 × 10−5, TFEA.ChIP P = 2.47 × 10−6, BART P = 2.34 × 10−6, DoRothEA Regulon A P = 2.39 × 10−6; DoRothEA Regulon B P = 2.22 × 10−6, DoRothEA Regulon C P = 1.92 × 10−6, DoRothEA Regulon D P = 1.71 × 10−6, DoRothEA Regulon E P = 1.46 × 10−6, DoRothEA Regulon TOP10score P = 1.46 × 10−6; (D) mean ROC AUC and mean PR AUC over 5000 bootstrapped ROC and PR curves for available TF prediction tools as compared with ChEA3 benchmarked with hsTFpertGEO.

Figure 5.

Comparison of available TF prediction tools with ChEA3. (A) The percent of the perturbed TFs recovered by the tool in the top one percentile of ranks as compared to TF coverage of the tool. For the ‘Percent Subset Recovered’ metric, we consider only the subset of the hsTFpertGEO TF perturbation experiments where the TF is covered by the tool. (B) The percent of the perturbed TFs recovered by the tool in the top one percentile of ranks as compared to TF coverage of the tool. For the ‘Percent Total Recovered’ metric, we consider all 443 TF perturbation experiments in the hsTFpertGEO benchmarking datasets. (C) Mean AUROC over 5000 bootstrapped curves compared to tool TF coverage. (D) Mean AUPR over 5000 bootstrapped curves compared to tool TF coverage.

Figure 6.

Scatterplots showing activating/repressing activity across TFs. Significant ORs (P< 0.05) are plotted. For uniformity, when examining loss-of-function TF perturbations, we consider –log(OR), as this will be positive if the TF acts as an activator of its targets and negative if it acts as a repressor. Conversely, we consider log(OR) for gain-of-function perturbations, which will be positive if the TF is an activator and negative if the TF acts as a repressor. Red arrows indicate TFs discussed in the results. (A) ORs from gain-of-function TF perturbations; (B) ORs from loss-of-function TF perturbations; (C) TF–target interactions from the TRRUST v2 database. For each TF, the percent of activating TF–target interactions (red) or repressive TF–target interactions (blue) from the subset of TF–target interactions in TTRUST v2 for which directionality is available.

Cited by

Phenotype remodelling of HNSCC cells in the muscle invasion environment.
Zeng G, Shen Y, Sun W, Lu H, Liang Y, Wu J, Liao G. Zeng G, et al. J Transl Med. 2024 Oct 7;22(1):909. doi: 10.1186/s12967-024-05607-8. J Transl Med. 2024. PMID: 39375763 Free PMC article.
Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes.
Epi25 Collaborative. Epi25 Collaborative. Nat Neurosci. 2024 Oct;27(10):1864-1879. doi: 10.1038/s41593-024-01747-8. Epub 2024 Oct 3. Nat Neurosci. 2024. PMID: 39363051
Sustained alterations in proximal tubule gene expression in primary culture associate with HNF4A loss.
Telang AC, Ference-Salo JT, McElliott MC, Chowdhury M, Beamish JA. Telang AC, et al. Sci Rep. 2024 Oct 2;14(1):22927. doi: 10.1038/s41598-024-73861-3. Sci Rep. 2024. PMID: 39358473 Free PMC article.
Advances in omics data for eosinophilic esophagitis: moving towards multi-omics analyses.
Matsuyama K, Yamada S, Sato H, Zhan J, Shoda T. Matsuyama K, et al. J Gastroenterol. 2024 Sep 19. doi: 10.1007/s00535-024-02151-6. Online ahead of print. J Gastroenterol. 2024. PMID: 39297956
Unveiling and Validating the Role of Fatty Acid Metabolism in Ulcerative Colitis.
Deng B, Zhen J, Xiang Z, Li X, Tan C, Chen Y, He P, Ma J, Dong W. Deng B, et al. J Inflamm Res. 2024 Sep 13;17:6345-6362. doi: 10.2147/JIR.S479011. eCollection 2024. J Inflamm Res. 2024. PMID: 39291081 Free PMC article.

References

1. Lambert S.A., Jolma A., Campitelli L.F., Das P.K., Yin Y., Albu M., Chen X., Taipale J., Hughes T.R., Weirauch M.T.. The human transcription factors. Cell. 2018; 172:650–665. - PubMed
1. Cusanovich D.A., Pavlovic B., Pritchard J.K., Gilad Y.. The functional consequences of variation in transcription factor binding. PLos Genet. 2014; 10:e1004226. - PMC - PubMed
1. Jolma A., Yan J., Whitington T., Toivonen J., Nitta K.R., Rastas P., Morgunova E., Enge M., Taipale M., Wei G. et al. .. DNA-binding specificities of human transcription factors. Cell. 2013; 152:327–339. - PubMed
1. Margolin A.A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla Favera R., Califano A.. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006; 7(Suppl. 1):S7. - PMC - PubMed
1. Alvarez M.J., Shen Y., Giorgi F.M., Lachmann A., Ding B.B., Ye B.H., Califano A.. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 2016; 48:838–847. - PMC - PubMed

ChEA3: transcription factor enrichment analysis by orthogonal omics integration - PubMed (original) (raw)