The Progenetix oncogenomic resource in 2021 - PubMed (original) (raw)

Meta-Analysis

The Progenetix oncogenomic resource in 2021

Qingyao Huang et al. Database (Oxford). 2021.

Abstract

In cancer, copy number aberrations (CNAs) represent a type of nearly ubiquitous and frequently extensive structural genome variations. To disentangle the molecular mechanisms underlying tumorigenesis as well as identify and characterize molecular subtypes, the comparative and meta-analysis of large genomic variant collections can be of immense importance. Over the last decades, cancer genomic profiling projects have resulted in a large amount of somatic genome variation profiles, however segregated in a multitude of individual studies and datasets. The Progenetix project, initiated in 2001, curates individual cancer CNA profiles and associated metadata from published oncogenomic studies and data repositories with the aim to empower integrative analyses spanning all different cancer biologies. During the last few years, the fields of genomics and cancer research have seen significant advancement in terms of molecular genetics technology, disease concepts, data standard harmonization as well as data availability, in an increasingly structured and systematic manner. For the Progenetix resource, continuous data integration, curation and maintenance have resulted in the most comprehensive representation of cancer genome CNA profiling data with 138 663 (including 115 357 tumor) copy number variation (CNV) profiles. In this article, we report a 4.5-fold increase in sample number since 2013, improvements in data quality, ontology representation with a CNV landscape summary over 51 distinctive National Cancer Institute Thesaurus cancer terms as well as updates in database schemas, and data access including new web front-end and programmatic data access. Database URL: progenetix.org.

© The Author(s) 2021. Published by Oxford University Press.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The currently available CNA data points in Progenetix and TCGA Progenetix database contain 115 357 cancer samples with 92 307 mapped to the 51 defined critical nodes in NCIt ontology tree and 23 050 samples not mapped to the tree (black), whereas TCGA repository contains 11 090 samples with 9103 samples mapped and 1987 samples not mapped to the tree (black). Colors of the stacked bar plot (left) match the branch colors on NCIt ontology tree (right).

Figure 4.

Figure 4.

Demonstration of further functionality pages: A. Publication search; B. NCIt hierarchical tree navigation A: Cancer-genomics-associated publications are recorded with number of samples stratified by technology used. The publications can be filtered by keywords; B: Part of the sample subsets contained in Progenetix under the hierarchical NCIt classification tree. It allows for selection of sample subsets at different levels; C: User can upload custom segment files for data visualization.

Figure 2.

Figure 2.

The genomic CNV fraction across 51 NCIt umbrella nodes Each dot represents one sample’s CNV fraction range from 0 to 1 and the red horizontal line indicates median CNV of the respective cancer type. Each cancer type contains between 104 and 11 804 CNV profiles (median 904; See Supplementary Table S1).

Figure 3.

Figure 3.

Beacon-style query using fuzzy ranges to identify biosamples with variants matching the CNA range This example queries for a continuous, focal duplication covering the complete MYC gene’s coding region with < = 6 Mb in size. A: Filter for dataset; B: filter for cancer classification (NCIt and ICD-O-3 ontology terms available); C: additional filter, e.g. Cellosaurus; D: additional filter for geographic location; E: external link to UCSC browser to view the alignment of matched variants; F: cancer type classification sorted by frequency of the matched biosamples present in the subset; G: list of matched biosamples with description, statistics and reference. More detailed biosample information can be viewed through ‘id’ link to the sample detail page; H: matched variants with reference to biosamples can be downloaded in json or csv format.

Similar articles

Cited by

References

    1. Hanahan D. and Weinberg R.A. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646–674.doi: 10.1016/j.cell.2011.02.013 - DOI - PubMed
    1. Albertson D.G., Collins C., McCormick F.. et al. (2003) Chromosome aberrations in solid tumors. Nat. Genet., 34, 369–376.doi: 10.1038/ng1215 - DOI - PubMed
    1. Baudis M. and Cleary M.L. (2001) Progenetix. net: an online repository for molecular cytogenetic aberration data. Bioinformatics, 17, 1228–1229.doi: 10.1093/bioinformatics/17.12.1228 - DOI - PubMed
    1. Cai H., Kumar N., Ai N.. et al. (2014) Progenetix: 12 years of oncogenomic data curation. Nucleic Acids Res., 42, D1055–D1062.doi: 10.1093/nar/gkt1108 - DOI - PMC - PubMed
    1. Cai H., Kumar N. and Baudis M. (2012) ArrayMap: a reference resource for genomic copy number imbalances in human malignancies. PLoS One, 7, e36944. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources