TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data - PubMed (original) (raw)
doi: 10.1093/nar/gkv1507. Epub 2015 Dec 23.
Tiago C Silva 2, Catharina Olsen 1, Luciano Garofano 3, Claudia Cava 4, Davide Garolini 5, Thais S Sabedot 2, Tathiane M Malta 2, Stefano M Pagnotta 6, Isabella Castiglioni 4, Michele Ceccarelli 7, Gianluca Bontempi 8, Houtan Noushmehr 9
Affiliations
- PMID: 26704973
- PMCID: PMC4856967
- DOI: 10.1093/nar/gkv1507
TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
Antonio Colaprico et al. Nucleic Acids Res. 2016.
Abstract
The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures
Figure 1.
TCGA data overview. (A) bars represent number of patients by disease; bubbles represent the available data size in TB by disease; (B) number of samples by platform and by level, grouped by type: genomic, transcriptomic and epigenomic. (C) Barplot: number of citations for TCGA papers. Bubble plot: number of TCGA papers, in parenthesis the number of papers published by the TCGA Research Network. Source: Scopus search for 'TCGA', adding TCGA Research Network papers that were not found during this search.
Figure 2.
Overview of TCGAbiolinks functions. TCGAbiolinks is organized in three categories. In the first category (Data), functions to query the TCGA database, to download the data and to prepare it are made available. The second category (Analysis) contains functions that allow the user to carry out different types of analyses; these include clustering (TCGAanalyze_Clustering), differential expression analysis (TCGAanalyze_DEA) and enrichment analysis (TCGAanalyze_EA). Finally, the obtained results can be visualized using the functions in the third category (Visualization): these include principal component analysis (TCGAvisualize_PCA), starburst plots (TCGAvisualize_starburst) and survival curves (TCGAvisualize_SurvivalCoxNET). The different dependencies to other R/Bioconductor packages are specified in the last row of the figure.
Figure 3.
Integrative analysis of BRCA data using TCGA clinical data and subtypes. Case study n.1 Integrative (or Downstream) analysis of gene expression and clinical data from BRCA disease with univariate and multivariate survival analysis using DNET package. (A–D) Top 20 GO, BP, CC, MF (Biological Process, Cellular Component, Molecular Function) and Pathways enriched by DEGs respectively. Gene annotation by DAVID's database. (E) Significant genes univariate Kaplan-Meier and multivariate with Cox regression, in a net of five communities with same _P_-values using DNET package, and interactions among genes by STRING's database.
Figure 4.
Case study n.2 Integrative (or Downstream) analysis of gene expression and clinical data from LGG disease with unsupervised clustering and crossing expression clusters with clinical and molecular information. (A) Heatmap of 1187 more variables genes clustered with tree k = 4 in EC1, EC2, EC3, EC4. (B) Kaplan Meier survivals plot for EC clusters. (C and D) Distribution of the DNA Methylation clusters and ATRX mutation within the EC clusters.
Figure 5.
Case study n.3 Integrative analysis of gene expression and DNA methylation data from COAD disease, comparing groups CIMP.L and CIMP.H. (A) Expression volcano plot: fold change of expression data versus significance. (B) DNA methylation volcano plot: difference of DNA methylation versus significance. (C) Starburst plot: DNA methylation significance versus gene expression significance.
Figure 6.
Case study n.4 TCGAbiolinks integration: integrative analysis using ELMER. (A) Each scatter plot showing the average DNA methylation level of sites with the AP1 motif in all KIRC samples plotted against the expression of the transcription factor CEBPB and GFI1, respectively. (B) The schematic plot shows probe colored in blue and the location of nearby 20 genes, the genes significantly linked to the probe are in red. (C) The plot shows the Odds Ratio (x axis) for the selected motifs with OR above 1.1 and lower boundary of OR above 1.1. The range shows the 95% confidence interval for each Odds Ratio.
Similar articles
- TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.
Silva TC, Colaprico A, Olsen C, D'Angelo F, Bontempi G, Ceccarelli M, Noushmehr H. Silva TC, et al. F1000Res. 2016 Jun 29;5:1542. doi: 10.12688/f1000research.8923.2. eCollection 2016. F1000Res. 2016. PMID: 28232861 Free PMC article. - New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx.
Mounir M, Lucchetta M, Silva TC, Olsen C, Bontempi G, Chen X, Noushmehr H, Colaprico A, Papaleo E. Mounir M, et al. PLoS Comput Biol. 2019 Mar 5;15(3):e1006701. doi: 10.1371/journal.pcbi.1006701. eCollection 2019 Mar. PLoS Comput Biol. 2019. PMID: 30835723 Free PMC article. - InterSIM: Simulation tool for multiple integrative 'omic datasets'.
Chalise P, Raghavan R, Fridley BL. Chalise P, et al. Comput Methods Programs Biomed. 2016 May;128:69-74. doi: 10.1016/j.cmpb.2016.02.011. Epub 2016 Feb 27. Comput Methods Programs Biomed. 2016. PMID: 27040832 Free PMC article. - PanCancer insights from The Cancer Genome Atlas: the pathologist's perspective.
Cooper LA, Demicco EG, Saltz JH, Powell RT, Rao A, Lazar AJ. Cooper LA, et al. J Pathol. 2018 Apr;244(5):512-524. doi: 10.1002/path.5028. Epub 2018 Feb 22. J Pathol. 2018. PMID: 29288495 Free PMC article. Review. - A survey and evaluation of Web-based tools/databases for variant analysis of TCGA data.
Zhang Z, Li H, Jiang S, Li R, Li W, Chen H, Bo X. Zhang Z, et al. Brief Bioinform. 2019 Jul 19;20(4):1524-1541. doi: 10.1093/bib/bby023. Brief Bioinform. 2019. PMID: 29617727 Free PMC article. Review.
Cited by
- AKT and EZH2 inhibitors kill TNBCs by hijacking mechanisms of involution.
Schade AE, Perurena N, Yang Y, Rodriguez CL, Krishnan A, Gardner A, Loi P, Xu Y, Nguyen VTM, Mastellone GM, Pilla NF, Watanabe M, Ota K, Davis RA, Mattioli K, Xiang D, Zoeller JJ, Lin JR, Morganti S, Garrido-Castro AC, Tolaney SM, Li Z, Barbie DA, Sorger PK, Helin K, Santagata S, Knott SRV, Cichowski K. Schade AE, et al. Nature. 2024 Oct 9. doi: 10.1038/s41586-024-08031-6. Online ahead of print. Nature. 2024. PMID: 39385030 - Prognosis of colorectal cancer, prognostic index of immunogenic cell death associated genes in response to immunotherapy, and potential therapeutic effects of ferroptosis inducers.
Lei M, Xiao M, Long Z, Lin T, Ding R, Quan Q. Lei M, et al. Front Immunol. 2024 Sep 20;15:1458270. doi: 10.3389/fimmu.2024.1458270. eCollection 2024. Front Immunol. 2024. PMID: 39372411 Free PMC article. - Actin-related protein 2/3 complex subunit 1B promotes ovarian cancer progression by regulating the AKT/PI3K/mTOR signaling pathway.
Ke M, Zhu H, Lin Y, Zhang Y, Tang T, Xie Y, Chen ZS, Wang X, Shen Y. Ke M, et al. J Transl Int Med. 2024 Oct 1;12(4):406-423. doi: 10.2478/jtim-2024-0025. eCollection 2024 Sep. J Transl Int Med. 2024. PMID: 39360160 Free PMC article. - Cancer cell-specific PD-L1 expression is a predictor of poor outcome in patients with locally advanced oral cavity squamous cell carcinoma.
Wang M, Qin L, Thia K, Nguyen T, MacDonald S, Belobrov S, Kranz S, Goode D, Trapani JA, Wiesenfeld D, Neeson PJ. Wang M, et al. J Immunother Cancer. 2024 Oct 2;12(10):e009617. doi: 10.1136/jitc-2024-009617. J Immunother Cancer. 2024. PMID: 39357980 Free PMC article. - Comprehensive analysis of bulk, single-cell RNA sequencing, and spatial transcriptomics revealed IER3 for predicting malignant progression and immunotherapy efficacy in glioma.
Wang Q, Zhang C, Pang Y, Cheng M, Wang R, Chen X, Ji T, Yang Y, Zhang J, Zhong C. Wang Q, et al. Cancer Cell Int. 2024 Oct 1;24(1):332. doi: 10.1186/s12935-024-03511-1. Cancer Cell Int. 2024. PMID: 39354533 Free PMC article.
References
- Rubin G., Berendsen A., Crawford S.M., Dommett R., Earle C., Emery J., Fahey T., Grassi L., Grunfeld E., Gupta S., et al. The expanding role of primary care in cancer control. Lancet Oncol. 2015;16:1231–1272. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous