TCGA-Assembler: open-source software for retrieving and processing TCGA data (original) (raw)
- Correspondence
- Published: 29 May 2014
Nature Methods volume 11, pages 599–600 (2014)Cite this article
- 10k Accesses
- 385 Citations
- 9 Altmetric
- Metrics details
Subjects
To the Editor:
The Cancer Genome Atlas (TCGA, https://tcga-data.nci.nih.gov/tcga/) has been generating multimodal genomics, epigenomics and proteomics data for thousands of tumor samples across more than 20 types of cancer. TCGA data are divided into three levels. Many of the level 1 and level 2 data contain protected information such as raw DNA-sequencing data and individual germline variants, and level 3 data are public, containing high-level summaries such as expression quantifications of genes. Also, some level 2 de-identified clinical data (e.g., survival and drug treatments) are publicly available. Included in the public data are genome-wide measurements of different genetic characterizations, such as DNA copy number, DNA methylation and mRNA expression for the same genes; this provides unprecedented opportunities for systematic investigation of cancer mechanisms at multiple molecular and regulatory layers1,2. Few tools of integrative data mining for TCGA exist, partly owing to the lack of tools to acquire and assemble the large-scale TCGA data. Specifically, the level 3 TCGA data are stored as hundreds of thousands of sample- and platform-specific files accessible through the open-access HTTP directory on the servers of TCGA's Data Coordinating Center (DCC, https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/). Navigating through all of the files manually is impossible. To meet these challenges, we introduce TCGA-Assembler, a software package that automates and streamlines the retrieval, assembly and processing of public TCGA data. TCGA-Assembler equips users with the ability to acquire and process public TCGA data with open-source and freely available programs.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Figure 1: TCGA-Assembler as a tool for acquiring, assembling and processing public TCGA data.

References
- Cancer Genome Atlas Network. Nature 490, 61–70 (2012).
- Cancer Genome Atlas Research Network. Nature 455, 1061–1068 (2008).
- Xu, Y. et al. in IEEE International Workshop on Genomic Signal Processing and Statistics 135–138 (2012).
Google Scholar - Robbins, D.E., Grüneberg, A., Deus, H.F., Tanik, M.M. & Almeida, J.S. Bioinformatics 29, 1333–1340 (2013).
Article CAS Google Scholar - Gao, J. et al. Sci. Signal. 6, pl1 (2013).
Article Google Scholar
Acknowledgements
This work was supported by grants from the US National Cancer Institute (R01 CA132897 to Y.J. and R01 CA163481 to P.Q.
Author information
Authors and Affiliations
- Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, Illinois, USA
Yitan Zhu & Yuan Ji - Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
Peng Qiu - Department of Health Studies, University of Chicago, Chicago, Illinois, USA
Yuan Ji
Authors
- Yitan Zhu
- Peng Qiu
- Yuan Ji
Corresponding author
Correspondence toYuan Ji.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Zhu, Y., Qiu, P. & Ji, Y. TCGA-Assembler: open-source software for retrieving and processing TCGA data.Nat Methods 11, 599–600 (2014). https://doi.org/10.1038/nmeth.2956
- Published: 29 May 2014
- Issue date: June 2014
- DOI: https://doi.org/10.1038/nmeth.2956