MAGIC: A tool for predicting transcription factors and cofactors driving gene sets using ENCODE data - PubMed (original) (raw)
MAGIC: A tool for predicting transcription factors and cofactors driving gene sets using ENCODE data
Avtar Roopra. PLoS Comput Biol. 2020.
Abstract
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of 'target', or 'non-target' followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments.
Conflict of interest statement
The author has declared that no competing interests exist.
Figures
Fig 1. Comparing ranks of manipulated factors predicted by MAGIC, CHEA3, TFEA and Enrichr.
MCF7(shCon_vs_shREST), TCGA(Lum_vs_Basal), Brain(WT_vs_CTCFko) and DGC(Quiet_vs_Reactive) datasets were analyzed by MAGIC, CHEA3 (using all available libraries: ARCHS4 co-expression, ENCODE ChIP-Seq, Enrichr Queries, GTEx co-expression, Literature mining, ReMAP ChIP-seq, Mean Rank and Top Rank), TFEA and Enrichr. The reciprocal integer ranks for REST, ESR1, CTCF and FOS (the Factors manipulated in MCF7(shCon_vs_shREST), TCGA(Lum_vs_Basal), Brain(WT_vs_CTCFko) and DGC(Quiet_vs_Reactive)) are plotted. The top rank = 1, second rank = 0.5 etc. ND = Not Determined; Factor not present in library.
Fig 2. Manipulated transcription factors and associated cofactors are preferentially ranked by MAGIC compared to CHEA3, TFEA and Enrichr.
(A) Emperical cumulatives were generated of factional ranks (1/Integer ranks) for manipulated factors and associated cofactors for the 4 datasets using all algorithms and libraries as in Fig 1. The difference between the cumulative of all scaled fractional ranks and a uniform distribution for the manipulated factor and associated cofactors is plotted against the fractional rank. Kolmogorov-smirnov tests of each distribution against a uniform distribution yields p<10−10 for all tests. (B) Area Under Curve (AUC) for D(r)-r x r curves in panel A. (C) The D(r)-r curves in panel A were scaled for the rank of the manipulated actor. For each algorithm, D(r)-r was multiplied by the fractional rank of the manipulated factor (FR). (D) AUCs for curves in panel C.
Fig 3. MAGIC demonstrates skill at calling manipulated factors as assessed by Precision Recall and Receiver Operator Characteristics.
(A) Precision Recall curves for the four datasets and all algorithms and libraries. (B) Receiver Operator Characteristic curves for the four datasets and all algorithms and libraries. As in panel A, data was not balanced prior to graphing. (C) ROC versus PR AUCs for all algorithms and libraries. (D) PR AUCs were scaled for fractional rank of the manipulated factor by multiplying PR UAC by FR.
Fig 4. MAGIC requires a valid background gene list for optimal performance.
(A) D(r)-r curves for the 4 datasets generated for MAGIC outputs in the presence or absence of a background list. Kolmogorov-smirnov statistics for the 2 curves: MCF7(shCon_vs_shREST); D = 0.10, p = 8.6x10-5. TCGA(Lum_vs_Basal); D = 0.15, p = 4.6x10-11. Brain(WT_vs_CTCFko); D = 0.26, p = 1.1x10-16. DGC(Quiet_vs_Reactive); D = 0.11, p = 3.9x10-9. (B) Precision Recall curves for MAGIC outputs in the presence or absence of a background list: MCF7(shCon_vs_shREST); 0.84 vs 0.78, TCGA(Lum_vs_Basal); 0.83 vs 0.81, Brain(WT_vs_CTCFko); 0.91 vs 0.71, DGC(Quiet_vs_Reactive); 0.72 vs 0.67. Black vertical line denotes 80% Recall (C) Receiver Operator Characteristics curves for MAGIC outputs in the presence or absence of a background list (unbalanced). (D) Emperical cumulative distribution for False Discovery Rates associated with MAGIC outputs in the presence or absence of a background list. For all datasets, Kolmogorov-smirnov p <10−4. (E) The integer ranks for the top 50 Factors called by MAGIC in the presence of a background list were compared to their ranks in the absence of a background list.
Similar articles
- Properly defining the targets of a transcription factor significantly improves the computational identification of cooperative transcription factor pairs in yeast.
Wu WS, Lai FJ. Wu WS, et al. BMC Genomics. 2015;16 Suppl 12(Suppl 12):S10. doi: 10.1186/1471-2164-16-S12-S10. Epub 2015 Dec 9. BMC Genomics. 2015. PMID: 26679776 Free PMC article. - Revealing transcription factor and histone modification co-localization and dynamics across cell lines by integrating ChIP-seq and RNA-seq data.
Zhang L, Xue G, Liu J, Li Q, Wang Y. Zhang L, et al. BMC Genomics. 2018 Dec 31;19(Suppl 10):914. doi: 10.1186/s12864-018-5278-5. BMC Genomics. 2018. PMID: 30598100 Free PMC article. - KnockTF: a comprehensive human gene expression profile database with knockdown/knockout of transcription factors.
Feng C, Song C, Liu Y, Qian F, Gao Y, Ning Z, Wang Q, Jiang Y, Li Y, Li M, Chen J, Zhang J, Li C. Feng C, et al. Nucleic Acids Res. 2020 Jan 8;48(D1):D93-D100. doi: 10.1093/nar/gkz881. Nucleic Acids Res. 2020. PMID: 31598675 Free PMC article. - FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.
Zhang S, Liang Y, Wang X, Su Z, Chen Y. Zhang S, et al. DNA Res. 2019 Jun 1;26(3):231-242. doi: 10.1093/dnares/dsz004. DNA Res. 2019. PMID: 30957858 Free PMC article. - ChIPXpress: using publicly available gene expression data to improve ChIP-seq and ChIP-chip target gene ranking.
Wu G, Ji H. Wu G, et al. BMC Bioinformatics. 2013 Jun 10;14:188. doi: 10.1186/1471-2105-14-188. BMC Bioinformatics. 2013. PMID: 23758851 Free PMC article.
Cited by
- Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets.
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Lu Z, et al. Brief Bioinform. 2024 Jul 25;25(5):bbae366. doi: 10.1093/bib/bbae366. Brief Bioinform. 2024. PMID: 39082650 Free PMC article. Review. - BIT: Bayesian Identification of Transcriptional Regulators.
Lu Z, Xu L, Wang X. Lu Z, et al. bioRxiv [Preprint]. 2024 Jun 3:2024.06.02.597061. doi: 10.1101/2024.06.02.597061. bioRxiv. 2024. PMID: 38895220 Free PMC article. Preprint. - Ki-67 is necessary during DNA replication for fork protection and genome stability.
Stamatiou K, Huguet F, Serapinas LV, Spanos C, Rappsilber J, Vagnarelli P. Stamatiou K, et al. Genome Biol. 2024 Apr 22;25(1):105. doi: 10.1186/s13059-024-03243-5. Genome Biol. 2024. PMID: 38649976 Free PMC article. - Silencing Apoe with divalent-siRNAs improves amyloid burden and activates immune response pathways in Alzheimer's disease.
Ferguson CM, Hildebrand S, Godinho BMDC, Buchwald J, Echeverria D, Coles A, Grigorenko A, Vangjeli L, Sousa J, McHugh N, Hassler M, Santarelli F, Heneka MT, Rogaev E, Khvorova A. Ferguson CM, et al. Alzheimers Dement. 2024 Apr;20(4):2632-2652. doi: 10.1002/alz.13703. Epub 2024 Feb 20. Alzheimers Dement. 2024. PMID: 38375983 Free PMC article. - The calcium channel TRPC6 promotes chemotherapy-induced persistence by regulating integrin α6 mRNA splicing.
Mukhopadhyay D, Goel HL, Xiong C, Goel S, Kumar A, Li R, Zhu LJ, Clark JL, Brehm MA, Mercurio AM. Mukhopadhyay D, et al. Cell Rep. 2023 Nov 28;42(11):113347. doi: 10.1016/j.celrep.2023.113347. Epub 2023 Nov 1. Cell Rep. 2023. PMID: 37910503 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous