TCGA-Assembler: open-source software for retrieving and processing TCGA data (original) (raw)

Nature Methods volume 11, pages 599–600 (2014)Cite this article

Subjects

To the Editor:

The Cancer Genome Atlas (TCGA, https://tcga-data.nci.nih.gov/tcga/) has been generating multimodal genomics, epigenomics and proteomics data for thousands of tumor samples across more than 20 types of cancer. TCGA data are divided into three levels. Many of the level 1 and level 2 data contain protected information such as raw DNA-sequencing data and individual germline variants, and level 3 data are public, containing high-level summaries such as expression quantifications of genes. Also, some level 2 de-identified clinical data (e.g., survival and drug treatments) are publicly available. Included in the public data are genome-wide measurements of different genetic characterizations, such as DNA copy number, DNA methylation and mRNA expression for the same genes; this provides unprecedented opportunities for systematic investigation of cancer mechanisms at multiple molecular and regulatory layers1,2. Few tools of integrative data mining for TCGA exist, partly owing to the lack of tools to acquire and assemble the large-scale TCGA data. Specifically, the level 3 TCGA data are stored as hundreds of thousands of sample- and platform-specific files accessible through the open-access HTTP directory on the servers of TCGA's Data Coordinating Center (DCC, https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/). Navigating through all of the files manually is impossible. To meet these challenges, we introduce TCGA-Assembler, a software package that automates and streamlines the retrieval, assembly and processing of public TCGA data. TCGA-Assembler equips users with the ability to acquire and process public TCGA data with open-source and freely available programs.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Subscribe to this journal

Receive 12 print issues and online access

$259.00 per year

only $21.58 per issue

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Figure 1: TCGA-Assembler as a tool for acquiring, assembling and processing public TCGA data.

References

  1. Cancer Genome Atlas Network. Nature 490, 61–70 (2012).
  2. Cancer Genome Atlas Research Network. Nature 455, 1061–1068 (2008).
  3. Xu, Y. et al. in IEEE International Workshop on Genomic Signal Processing and Statistics 135–138 (2012).
    Google Scholar
  4. Robbins, D.E., Grüneberg, A., Deus, H.F., Tanik, M.M. & Almeida, J.S. Bioinformatics 29, 1333–1340 (2013).
    Article CAS Google Scholar
  5. Gao, J. et al. Sci. Signal. 6, pl1 (2013).
    Article Google Scholar

Download references

Acknowledgements

This work was supported by grants from the US National Cancer Institute (R01 CA132897 to Y.J. and R01 CA163481 to P.Q.

Author information

Authors and Affiliations

  1. Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, Illinois, USA
    Yitan Zhu & Yuan Ji
  2. Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
    Peng Qiu
  3. Department of Health Studies, University of Chicago, Chicago, Illinois, USA
    Yuan Ji

Authors

  1. Yitan Zhu
  2. Peng Qiu
  3. Yuan Ji

Corresponding author

Correspondence toYuan Ji.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

About this article

Cite this article

Zhu, Y., Qiu, P. & Ji, Y. TCGA-Assembler: open-source software for retrieving and processing TCGA data.Nat Methods 11, 599–600 (2014). https://doi.org/10.1038/nmeth.2956

Download citation

This article is cited by