Integrative identification of non-coding regulatory regions driving metastatic prostate cancer - PubMed (original) (raw)
. 2024 Sep 24;43(9):114764.
doi: 10.1016/j.celrep.2024.114764. Epub 2024 Sep 13.
Ruhollah Moussavi-Baygi 2, Heather Karner 1, Mehran Karimzadeh 3, Hassan Yousefi 1, Sean Lee 1, Kristle Garcia 2, Tanvi Joshi 2, Keyi Yin 2, Albertas Navickas 2, Luke A Gilbert 1, Bo Wang 4, Hosseinali Asgharian 5, Felix Y Feng 6, Hani Goodarzi 7
Affiliations
- PMID: 39276353
- PMCID: PMC11466230
- DOI: 10.1016/j.celrep.2024.114764
Integrative identification of non-coding regulatory regions driving metastatic prostate cancer
Brian J Woo et al. Cell Rep. 2024.
Abstract
Large-scale sequencing efforts have been undertaken to understand the mutational landscape of the coding genome. However, the vast majority of variants occur within non-coding genomic regions. We designed an integrative computational and experimental framework to identify recurrently mutated non-coding regulatory regions that drive tumor progression. Applying this framework to sequencing data from a large prostate cancer patient cohort revealed a large set of candidate drivers. We used (1) in silico analyses, (2) massively parallel reporter assays, and (3) in vivo CRISPR interference screens to systematically validate metastatic castration-resistant prostate cancer (mCRPC) drivers. One identified enhancer region, GH22I030351, acts on a bidirectional promoter to simultaneously modulate expression of the U2-associated splicing factor SF3A1 and chromosomal protein CCDC157. SF3A1 and CCDC157 promote tumor growth in vivo. We nominated a number of transcription factors, notably SOX6, to regulate expression of SF3A1 and CCDC157. Our integrative approach enables the systematic detection of non-coding regulatory regions that drive human cancers.
Keywords: CP: Cancer; CP: Molecular biology; bidirectional; genomics; modeling; noncoding.
Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of interests The authors declare no competing interests.
Figures
Figure 1.. Regression and deep learning models effectively predict the background mutational density in regulatory regions
(A) Genomic regions have a background mutation rate that is a function of their sequence context, functional annotation classes, and underlying epigenetic features. We developed an outlier detection model based on a generalized linear regression model (GLM), termed MutSpotterCV, to use such features to estimate the expected mutational density in a given region. (B) The scatterplot of observed vs. predicted mutational density values (normalized) generated by the MutSpotterCV achieved a Pearson correlation of 0.55. We used the predictions of this model to perform an outlier analysis to identify regulatory regions that are mutated at a substantially higher rate than expected by chance. The resulting outlier regions are marked in red. (C) We also tested the ability of models with increased complexity to perform this prediction task. One of our best-performing models was a deep convolutional neural network (CNN). The input to this model is a multilayered encoding of sequence and epigenetic signals. (D) This model, named DM2D, achieved a Pearson correlation of 0.85, far exceeding that of MutSpotterCV. Nevertheless, the identities of final outliers identified by both models were virtually the same. Therefore, we deemed these regions regulatory elements that are hypermutated in mCRPC samples. The same outliers are colored in (B) and (D). See also Figure S1 and Tables S1-S5.
Figure 2.. Regulatory and fitness consequences of mCRPC-associated non-coding regulatory regions
(A) Schematic of the MPRA used to assess the enhancer activity of regulatory sequences hypermutated in mCRPC and their scrambled control as background. (B) A volcano plot showing the measured enhancer activity for each regulatory segment (wild-type sequence) relative to its scrambled control. (C) Schematic of our in vivo CRISPRi strategy designed to identify regulatory regions that contribute to subcutaneous tumor growth in xenografted mice. (D) In vivo fitness consequences of expressing sgRNAs targeting mCRPC hypermutated regulatory regions. The x axis shows the calculated fitness scores (Rho), where positive values denote increased tumor growth upon sgRNA expression, and negative values denote the opposite. The y axis represents −log10 of the p value associated with each enrichment. See also Figure S2.
Figure 3.. Base-resolution in vitro and in silico assays reveal the functional consequences of mCRPC-associated mutations
(A) A volcano plot demonstrating the impact of individual mutations relative to their reference allele on enhancer activity. (B) The overall performance of our Blue Heeler (BH) model in predicting gene expression for held-out instances. (C) Comparison of mutational impact on the expression of downstream genes and the overall impact of the mutated regulatory regions based on our in vivo screen. A previously annotated enhancer (geneHancer: GH22I030351) shows a strong phenotype in xenografted mice, and patients with mutations in it show generally increased expression in downstream genes. (D) Comparing the expression of genes associated with GH22I030351 in mCRPC patient samples with and without mutations in this enhancer. The combined p value shows the overall effect of mutations across all these genes. (E) In four of five cases, measuring the impact of mutations observed in our cohort shows a general increase in regulatory activity of GH22I030351 in our MPRA measurements. p value calculated comparing mutation to WT sequence. (F) The CCDC157 (ENSG00000187860) promoter sequence, which is immediately downstream of GH22I030351, was used to dissect the impact of mutations in silico based on feature attribution scores from our BH model. Top: the results of an in silico saturation mutagenesis experiment, in which the impact of every mutation upstream of CCDC157 on its expression was measured. We observed both gain-of-function and loss-of-function mutations. The regulatory region of interest is shown as a box, and the mutations observed in patients are marked by dashed lines. We have also reported saliency scores for this promoter. We further zoomed in on saturation mutagenesis results for our regulatory region of interest to show (1) the distribution of impact scores for types of mutations, (2) the importance score for loci mutated in patients with the exact mutation shown as a bounded box, and (3) the saliency score associated with each mutated locus. See also Figure S3.
Figure 4.. GH22I030351 promotes prostate cancer growth through modulation of SF3A1 and CCDC157 expression
(A) Subcutaneous tumor growth in CRISPRi-ready C4-2B cells expressing a non-targeting control or sgRNAs targeting GH22I030351. Two-way ANOVA was used to calculate the reported p value. Also shown is the size of extracted tumors at the conclusion of the experiment (day 18 post injection); The p values were calculated using one-tailed t test (n = 8 and 7, respectively). Data are represented as mean ± SEM. (B) SF3A1 and CCDC157 mRNA levels, measured using qPCR, in control and GH22I030351-silenced C4-2B cells (n = 3). The p values are based on a one-tailed Mann-Whitney U test. (C) Comparison of proliferation rates, as measured by the slope of log-cell count measured over 3 days, for control as well as SF3A1 and CCDC157 knockdown cells (n = 6 per shRNA condition). Hairpin RNAs were induced at day 0, and cell viability was measured at days 1, 2, and 3. The p values were calculated using least-square models comparing the slope of each knockdown to the control wells. (D) Colony formation assay for SF3A1 and CCDC157 knockdown cells in the C4-2B background. Hairpin RNAs were induced at day 0, and colonies were counted at day 8. The p values were calculated using one-tailed Mann-Whitney U tests. (E) Subcutaneous tumor growth in C4-2B cells overexpressing SF3A1 and CCDC157 ORFs in a lentiviral construct. Tumors were measured using calipers at ~3 weeks post injection, and p values were calculated using a one-tailed Student’s t test. (F) Size of extracted tumors in subcutaneous tumor growth in CRISPRa-ready C4-2B cells expressing a non-targeting control or sgRNAs targeting GH22I030351 at the conclusion of the experiment (day 22 post injection); the p values were calculated using one-tailed t test (n = 8 and 8, respectively). (G) Subcutaneous tumor growth in CRISPRi-ready C4-2B cells expressing non-targeting (CTRL) sgRNAs, C4-2B cells expressing shRNAs against SF3A1 and CCDC157 (DKD), or CRISPRi-ready C4-2B cells expressing sgRNAs targeting GH22I030351, and the DKD lentiviral construct (sgGH22I030351 + DKD). Tumors were measured using calipers at ~3 weeks post injection, and p values were calculated using a one-tailed Student’s t test. See also Figure S4.
Figure 5.. SF3A1 upregulation results in splicing alterations similar to those observed in GH22I030351-mutated tumors
(A) A volcano plot comparing cassette exon usage (percent spliced in [Ψ]) between tumors with mutations in GH22I030351 relative to other samples in our cohort. Marked are cassette exons with a larger than 10% change in Ψ (ranging between −1 and 1) and p < 0.01. (B) SF3A1 CLIP-seq in C4-2B lines allowed us to identify, at base resolution, high-confidence binding sites of SF3A1 by mapping crosslinking-induced deletions. We used FIRE to discover the most significant sequence motif, and here we report its associated mutual information (MI) and Z score. (C) The enrichment of cassette exons bound by SF3A1 among those with higher Ψ in samples with mutations in GH22I030351. For this analysis, we ordered all annotated cassette exons based on their ΔΨ values from −1 (left) to +1 (right). We then grouped them into equally populated bins and assessed the non-random distribution of SF3A1-bound cassette exons across these measurements using MI. Individual bins are colored based on their hypergeometric p value as well. (D) Comparison of changes in Ψ values in GH22I030351-mutant and SF3A1 overexpression samples. We observed a significant enrichment of SF3A1 binding among cassette exons that are simultaneously upregulated in both GH22I030351-mutant and SF3A1 overexpression samples. It should be noted that unbound cassette exons do not show a correlation between these two sets of comparisons. See also Figure S5.
Figure 6.. Putative transcription factors that regulate gene expression through GH22I030351
(A) Mutations in GH22I030351 alter transcription factor binding. Left: sequence motif of SOX6. Shown is the mutation observed in DTB_176_BL compared to the reference genome. Center: bar plot showing the FIMO enrichment score of the SOX6 motif for the reference genome (green) and the patient’s sequence (red). Right: bar plot showing the difference in motif score (red) and difference in −log10 p value (blue) of motif enrichment in the patient harboring the mutation with respect to the reference genome. (B and C) Similarly, shown for a SMAD2–4 and TEAD1 motif. (D) In vivo MPRA ChIP-seq assay for TEAD1, SOX6, and SMAD2. The x axis shows the log2 relative enrichment of the mutant allele with respect to the reference allele. (E) Changes in the expression of SF3A1 and CCDC157 in response to silencing transcription factors we hypothesized to regulate their expression. The p values were calculated using a one-tailed Welch’s t test. (F) Subcutaneous tumor growth in SOX6 knockdown and control cells in xenografted mice (n = 8). The p values were calculated using two-way ANOVA using time as a covariate. Data are represented as mean ± SEM. See also Figure S6.
Update of
- Integrative identification of non-coding regulatory regions driving metastatic prostate cancer.
Woo BJ, Moussavi-Baygi R, Karner H, Karimzadeh M, Garcia K, Joshi T, Yin K, Navickas A, Gilbert LA, Wang B, Asgharian H, Feng FY, Goodarzi H. Woo BJ, et al. bioRxiv [Preprint]. 2023 Jun 2:2023.04.14.535921. doi: 10.1101/2023.04.14.535921. bioRxiv. 2023. PMID: 37398273 Free PMC article. Updated. Preprint.
References
MeSH terms
Substances
Grants and funding
- T32 AI007334/AI/NIAID NIH HHS/United States
- S10 OD028511/OD/NIH HHS/United States
- R01 CA244634/CA/NCI NIH HHS/United States
- R01 CA240984/CA/NCI NIH HHS/United States
- T32 CA108462/CA/NCI NIH HHS/United States
- DP2 CA239597/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous