Systematic annotation of orphan RNAs reveals blood-accessible molecular barcodes of cancer identity and cancer-emergent oncogenic drivers - PubMed (original) (raw)
[Preprint]. 2024 Mar 21:2024.03.19.585748.
doi: 10.1101/2024.03.19.585748.
Jeffrey Wang 1 2 3 4 5, Jung Min Suh 1 2 3 4, Brian J Woo 1 2 3 4, Albertas Navickas 1 2 3 4 6, Kristle Garcia 1 2 3 4, Keyi Yin 1 2 3 4, Lisa Fish 1 2 3 4, Benjamin Hänisch 1 2 3 4, Daniel Markett 1 2 3 4, Shaorong Yu 1 2 3 4, Gillian Hirst 8, Lamorna Brown-Swigart 9, Laura J Esserman 8, Laura J van 't Veer 9, Hani Goodarzi 1 2 3 4 10
Affiliations
- PMID: 38562907
- PMCID: PMC10983903
- DOI: 10.1101/2024.03.19.585748
Systematic annotation of orphan RNAs reveals blood-accessible molecular barcodes of cancer identity and cancer-emergent oncogenic drivers
Jeffrey Wang et al. bioRxiv. 2024.
Update in
- Systematic annotation of orphan RNAs reveals blood-accessible molecular barcodes of cancer identity and cancer-emergent oncogenic drivers.
Wang J, Suh JM, Woo BJ, Navickas A, Garcia K, Yin K, Fish L, Cavazos T, Hänisch B, Markett D, Hirst GL, Brown-Swigart L, Esserman LJ, van 't Veer LJ, Goodarzi H. Wang J, et al. Cell Rep Med. 2026 Feb 17;7(2):102577. doi: 10.1016/j.xcrm.2025.102577. Epub 2026 Jan 23. Cell Rep Med. 2026. PMID: 41579861 Free PMC article.
Abstract
From extrachromosomal DNA to neo-peptides, the broad reprogramming of the cancer genome leads to the emergence of molecules that are specific to the cancer state. We recently described orphan non-coding RNAs (oncRNAs) as a class of cancer-specific small RNAs with the potential to play functional roles in breast cancer progression1. Here, we report a systematic and comprehensive search to identify, annotate, and characterize cancer-emergent oncRNAs across 32 tumor types. We also leverage large-scale in vivo genetic screens in xenografted mice to functionally identify driver oncRNAs in multiple tumor types. We have not only discovered a large repertoire of oncRNAs, but also found that their presence and absence represent a digital molecular barcode that faithfully captures the types and subtypes of cancer. Importantly, we discovered that this molecular barcode is partially accessible from the cell-free space as some oncRNAs are secreted by cancer cells. In a large retrospective study across 192 breast cancer patients, we showed that oncRNAs can be reliably detected in the blood and that changes in the cell-free oncRNA burden captures both short-term and long-term clinical outcomes upon completion of a neoadjuvant chemotherapy regimen. Together, our findings establish oncRNAs as an emergent class of cancer-specific non-coding RNAs with potential roles in tumor progression and clinical utility in liquid biopsies and disease monitoring.
Conflict of interest statement
Disclosure of Potential Competing Interest H.G. is a co-founder and shareholder of Exai Bio. J.W., L.F., and T.C. are employees and shareholders of Exai Bio. L.J.E. reports funding from Merck & Co.; participation on an advisory board for Blue Cross Blue Shield; and personal fees from UpToDate. L.J.v.V. is a founding advisor and shareholder of Exai BIo; part-time employee and owns stock in Agendia. All other authors declare no competing interests.
Figures
Figure 1.. Systematic annotation of oncRNA loci across human cancers using small RNA sequencing data from TCGA and exRNA atlas.
(A) A binary heatmap representing the presence and absence of oncRNA species across human cancers. Here we show a subset of 2,808 of the top significant oncRNAs. The subset was created by selecting 100 of the most significant oncRNAs for each cancer type as determined by the Fisher exact test and collapsing oncRNAs selected multiple times. Each column represents an annotated oncRNA, and each row represents one TCGA sample. Rows were grouped based on their tumor type (TCGA code) and columns were clustered based on their patterns. (B) Number of oncRNAs associated with the major human cancers, namely lung, breast, and gastrointestinal cancers, depicted as an UpSet plot. The vertical blue bars represent the oncRNA counts across one or more cancers with the exact numbers included at the top. (C) A 2D UMAP projection summarizing the oncRNA profiles across TCGA cancer samples. Samples are colored by tumor type. (D) The confusion matrix for tissue-of-origin classification based on oncRNA presence and absence in each sample. The matrix was row-normalized. (E) A volcano plot representing the relationship between chromatin accessibility and oncRNA detection. The x-axis represents, for each oncRNA, the log2 median difference in chromatin accessibility between samples in which the oncRNA was present versus absent. The y-axis shows the significance of the observed differences based on FDR corrected P values calculated using a one-sided Mann-Whitney test. A total of 10,290 oncRNA loci were considered for this analysis based on the coverage of ATAC data. Of these, 3,255 showed a positive association between oncRNA presence and increased chromatin accessibility; of these, 1,989 were also statistically significant at an FDR of 1%. (F) Chromatin accessibility signal of four exemplary oncRNA loci from (E), grouped by the detection of the cognate oncRNA in the small RNA dataset of each sample. Values are shown as violin plots and boxplots. The boxplots show the distribution quartiles, and the whiskers show the quartiles ± IQR (interquartile range). Also reported are the number of samples in which the oncRNAs were detected as well as their associated corrected P values.
Figure 2.. Annotation of subtype-associated oncRNAs across breast and colorectal cancer samples.
(A–B) Binary heatmaps of oncRNAs associated with breast cancer subtypes (A) and colorectal cancer CMS labels (B). One-way ANOVA tests followed by FDR correction were used to identify oncRNAs with significant associations. (C–D) Exemplary subtype-associated oncRNA loci along with their expression patterns for breast cancer subtypes (C) or colon cancer CMS labels (D). The expression values are natural log transformed and P values were calculated using a one-way ANOVA test. (E–F) The number of oncRNAs that were detected in one or more breast cancer subtypes (E) or colorectal cancer CMS labels (F) shown as UpSet plots. (G–H) ROC curves for XGBoost multiclass classifiers that predict the breast cancer subtype or colon cancer CMS label based on oncRNA presence/absence fingerprints averaged across held-out validation sets in a 5-fold cross validation setup. 946 and 514 samples were tested in breast and colorectal cancer respectively and the resulting mean and standard deviation of AUCs were calculated for each subtype across the 5 folds.
Figure 3.. Systematic annotation of driver oncRNAs using a scalable in vivo genetic screening approach.
(A) Workflow schematic of oncRNA cancer and oncRNA TuD functional screens. (B-C) Volcano plots of oncRNA functional screen results for breast cancer (MDA-MB-231) and colorectal cancer (SW480), respectively. In vivo growth phenotypic score refers to enriched representation of cancer cells transduced with cognate oncRNA upon tumor growth in the xenograft model. (D) Expression levels of two example oncRNAs with significant tumor growth phenotype from the functional screen in TCGA-BRCA tumor and tumor-adjacent normal tissues. P values were calculated using a one-tailed Mann-Whitney test. (E) Survival of TCGA-BRCA patients stratified by expression level of cognate driver oncRNA. P values were calculated using a log-rank test. (F) Informative iPage pathways associated with TCGA-BRCA cancer samples expressing cognate oncRNAs compared to TCGA-BRCA cancer samples with no detectable respective oncRNAs. Top panel shows gene expression differences in discrete expression bins. Genes that are up-regulated in oncRNA expressing cancer samples are in the right bins, whereas bins to the left contain genes with lower expression. The heatmap shows the corresponding pathway in relation to the expression bins. Red entries indicate enrichment of pathway genes in a given expression bin whereas blue entries indicate depletion. Enrichment and depletion are measured using log-transformed hypergeometric P values.
Figure 4.. In vivo validation of functional oncRNAs in xenograft models of breast cancer.
(A) Left: Growth of MDA-MB-231 tumors overexpressing oncRNA.ch7.29 or oncRNA.ch17.67 relative to controls in the mammary fat-pad of NSG mice. 2 tumors per mouse and n=4 mice for each cohort. P values were calculated using two-way ANOVA. Right: Ex vivo tumor measurements after tumor excision. P values were calculated using a one-tailed Mann-Whitney test. Tumors overexpressing oncRNA.ch7.29 were 2.6 fold larger than controls. Tumors overexpressing oncRNA.ch17.67 were 1.7 fold larger than controls. (B) Bioluminescence imaging plot of lung colonization by MDA-MB-231 cells overexpressing oncRNA.ch7.29 or oncRNA.ch17.67 compared to control. n = 5 per cohort. P values were calculated using two-way ANOVA. (C) Left: Growth of HCC-LM2 cells overexpressing oncRNA.ch7.29 or oncRNA.ch17.67 and HCC-LM2 controls in the mammary fat-pad of NSG mice mammary fat-pad assays. n=4 for each cohort. P values were calculated using two-way ANOVA. Right: Ex vivo tumor measurements after tumor excision. P values were calculated using a one-tailed Mann-Whitney test. Tumors overexpressing oncRNA.ch7.29 were 1.6 fold larger than controls. Tumors overexpressing oncRNA.ch17.67 were 1.8 fold larger than controls. (D) Bioluminescence imaging plot of lung colonization by HCC-LM2 cells overexpressing oncRNA.ch7.29 or oncRNA.ch17.67 compared to control. n = 5 per cohort. P values were calculated using two-way ANOVA. (E) Volcano plots of differentially expressed genes in HCC-LM2 cells overexpressing oncRNA.ch7.29 or oncRNA.ch17.67 compared to HCC-LM2 controls. The P value cut-off corresponds to a 10% FDR. (F) Representative pathways associated with HCC-LM2 overexpressing oncRNA.ch7.29 or oncRNA.ch17.67 compared to controls generated using iPAGE. Top panel shows gene expression differences in discrete expression bins. Genes that are up-regulated in oncRNA over-expressing cells are in the rightmost bins, whereas bins to the left contain genes with lower expression in oncRNA over-expressing cells. The heatmap shows the enrichment or depletion of the corresponding pathway in each expression bin. Red entries indicate enrichment of pathway genes in a given expression bin whereas blue entries indicate depletion.
Figure 5.. Analysis of cell-free RNA content across a large panel of cancer cell lines.
(A) Pair-wise correlation heatmap for small RNA abundance in the cell-free RNA extracted from conditioned media. The counts for annotated small RNAs, such as miRNAs, tRNA fragments, snoRNAs, and etc, were used to generate this heatmap. (B) A 2D UMAP plot summarizing the abundance of small RNAs in the cell-free space across the cell line models we have profiled (in biological replicates). The points are colored based on the tissue-of-origin. (C) Contribution of each annotated family of small RNA species to their cell-free RNA content relative to annotated RNAs, omitting cell-free RNA with no known annotations. The values are normalized across cell lines and oncRNAs are shown in blue. (D) An UpSet plot of oncRNA counts detected in the cell-free RNA fraction of cell lines from each tissue-of-origin. Cell-free oncRNAs show tumor-specific patterns of expression. (E) 2D UMAP summary of oncRNA profiles across cell-free RNA profiles collected.
Figure 6.. Changes in circulating oncRNA content over the course of neoadjuvant chemotherapy is informative of short-term and long-term clinical outcomes.
(A) Overview of patient and tumor characteristics tabulated based on changes in oncRNA burden (ΔoncRNA). (B) Normalized oncRNA burden (counts per million) before (T0) and after (T3) neoadjuvant chemotherapy. P value was calculated using a one-tailed Wilcoxon test. (C) Forest plots for logistic regression models predicting pathologic complete response (pCR) or high residual cancer burden (RCB III) as a function of ΔoncRNA after neoadjuvant chemotherapy. One-tailed P values are also included. (D) Survival in patients grouped based on their oncRNA burden (ΔoncRNA). Reported are the hazard ratio and P value based on a log-rank test. (E) A forest plot for a multivariate Cox proportional hazard model including both ΔoncRNA and pCR as covariates.
References
- Knezevich S. R., McFadden D. E., Tao W., Lim J. F. & Sorensen P. H. A novel ETV6-NTRK3 gene fusion in congenital fibrosarcoma. Nat. Genet. 18, 184–187 (1998). -PubMed
- Larson R. A. et al. Evidence for a 15;17 translocation in every patient with acute promyelocytic leukemia. Am. J. Med. 76, 827–841 (1984). -PubMed
- Rowley J. D. Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973). -PubMed
Publication types
Grants and funding
- T32 GM136547/GM/NIGMS NIH HHS/United States
- T32 AI007334/AI/NIAID NIH HHS/United States
- S10 OD028511/OD/NIH HHS/United States
- P01 CA210961/CA/NCI NIH HHS/United States
- R01 CA244634/CA/NCI NIH HHS/United States
- R01 CA240984/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources