ChEA3: transcription factor enrichment analysis by orthogonal omics integration - PubMed (original) (raw)

ChEA3: transcription factor enrichment analysis by orthogonal omics integration

Alexandra B Keenan et al. Nucleic Acids Res. 2019.

Abstract

Identifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF-gene co-expression from RNA-seq studies, TF-target associations from ChIP-seq experiments, and TF-gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.

© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Performance of the ChEA3 libraries and integration techniques in recovering the perturbed TFs from 946 TF LOF and GOF experiments from the TFpertGEOupdn benchmark dataset. (A) Mean ROC AUC and mean PR AUC over 5000 bootstrapped ROC and PR curves; (B) composite ROC curves generated from 5000 boostrapped curves; (C) composite PR curves generated from 5000 bootstrapped curves; (D) the deviation of the cumulative distribution from uniform of the scaled rankings of each perturbed TF in the benchmarking dataset. Anderson-Darling test of uniformity: MeanRank P = 6.34 × 10−7; TopRank P = 6.34 × 10−7; ARCHS4 P = 6.34 × 10−7; ENCODE P = 2.06 × 10−6; Enrichr Queries P = 6.83 × 10−7; GTEx P = 6.45 × 10−7; Literature ChIP-seq P = 1.28 × 10−6; ReMap P = 1.02 × 10−6.

Figure 2.

Figure 2.

Fraction of the TFpertGEOupdn benchmarking dataset subset recovered in the top one percentile of rankings compared to the library TF coverage. (A) A heatmap visualizing transcription factor coverage for the ChEA3 libraries. (B) The fraction of the TFpertGEOupdn subset TFs recovered in the top percentile of ranks for each ChEA3 library. Only the TFpertGEOupdn gene sets where the perturbed TF was covered by the library were considered when computing the ‘Percent Subset Recovered’.

Figure 3.

Figure 3.

Effect of input type on ChEA3 performance. The deviation of the cumulative distribution from uniform of the scaled rankings of perturbed TFs in the benchmarking dataset for: (A) TF overexpression or chemical activation experiments from TFpertGEOup; (B) TF overexpression or chemical activation experiments from TFpertGEOdn; (C) TF knockdown, knockout or chemical inactivation experiments from TFpertGEOup; and (D) TF knockdown, knockout or chemical inactivation experiments from TFpertGEOdn.

Figure 4.

Figure 4.

Comparison of available TF prediction tools with ChEA3 with the hsTFpertGEO benchmarking dataset. (A) Composite ROC curves generated from 5000 bootstrapped curves; (B) composite PR curves generated from 5000 bootstrapped curves; (C) the deviation of the cumulative distribution from uniform of the scaled rankings of each perturbed TF in the benchmarking dataset; Anderson–Darling test of uniformity: VIPER GTEx Regulon P = 1.39 × 10−6, MAGICACT P = 6.58 × 10−5, TFEA.ChIP P = 2.47 × 10−6, BART P = 2.34 × 10−6, DoRothEA Regulon A P = 2.39 × 10−6; DoRothEA Regulon B P = 2.22 × 10−6, DoRothEA Regulon C P = 1.92 × 10−6, DoRothEA Regulon D P = 1.71 × 10−6, DoRothEA Regulon E P = 1.46 × 10−6, DoRothEA Regulon TOP10score P = 1.46 × 10−6; (D) mean ROC AUC and mean PR AUC over 5000 bootstrapped ROC and PR curves for available TF prediction tools as compared with ChEA3 benchmarked with hsTFpertGEO.

Figure 5.

Figure 5.

Comparison of available TF prediction tools with ChEA3. (A) The percent of the perturbed TFs recovered by the tool in the top one percentile of ranks as compared to TF coverage of the tool. For the ‘Percent Subset Recovered’ metric, we consider only the subset of the hsTFpertGEO TF perturbation experiments where the TF is covered by the tool. (B) The percent of the perturbed TFs recovered by the tool in the top one percentile of ranks as compared to TF coverage of the tool. For the ‘Percent Total Recovered’ metric, we consider all 443 TF perturbation experiments in the hsTFpertGEO benchmarking datasets. (C) Mean AUROC over 5000 bootstrapped curves compared to tool TF coverage. (D) Mean AUPR over 5000 bootstrapped curves compared to tool TF coverage.

Figure 6.

Figure 6.

Scatterplots showing activating/repressing activity across TFs. Significant ORs (P< 0.05) are plotted. For uniformity, when examining loss-of-function TF perturbations, we consider –log(OR), as this will be positive if the TF acts as an activator of its targets and negative if it acts as a repressor. Conversely, we consider log(OR) for gain-of-function perturbations, which will be positive if the TF is an activator and negative if the TF acts as a repressor. Red arrows indicate TFs discussed in the results. (A) ORs from gain-of-function TF perturbations; (B) ORs from loss-of-function TF perturbations; (C) TF–target interactions from the TRRUST v2 database. For each TF, the percent of activating TF–target interactions (red) or repressive TF–target interactions (blue) from the subset of TF–target interactions in TTRUST v2 for which directionality is available.

Similar articles

Cited by

References

    1. Lambert S.A., Jolma A., Campitelli L.F., Das P.K., Yin Y., Albu M., Chen X., Taipale J., Hughes T.R., Weirauch M.T.. The human transcription factors. Cell. 2018; 172:650–665. - PubMed
    1. Cusanovich D.A., Pavlovic B., Pritchard J.K., Gilad Y.. The functional consequences of variation in transcription factor binding. PLos Genet. 2014; 10:e1004226. - PMC - PubMed
    1. Jolma A., Yan J., Whitington T., Toivonen J., Nitta K.R., Rastas P., Morgunova E., Enge M., Taipale M., Wei G. et al. .. DNA-binding specificities of human transcription factors. Cell. 2013; 152:327–339. - PubMed
    1. Margolin A.A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla Favera R., Califano A.. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006; 7(Suppl. 1):S7. - PMC - PubMed
    1. Alvarez M.J., Shen Y., Giorgi F.M., Lachmann A., Ding B.B., Ye B.H., Califano A.. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 2016; 48:838–847. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources