A Quantitative and Predictive Model for RNA Binding by Human Pumilio Proteins - PubMed (original) (raw)

. 2019 Jun 6;74(5):966-981.e18.

doi: 10.1016/j.molcel.2019.04.012. Epub 2019 May 8.

Sarah K Denny 2, Pavanapuresan P Vaidyanathan 1, Winston R Becker 3, Johan O L Andreasson 4, Curtis J Layton 4, Kalli Kappel 3, Varun Shivashankar 5, Raashi Sreenivasan 1, Rhiju Das 1, William J Greenleaf 6, Daniel Herschlag 7

Affiliations

A Quantitative and Predictive Model for RNA Binding by Human Pumilio Proteins

Inga Jarmoskaite et al. Mol Cell. 2019.

Abstract

High-throughput methodologies have enabled routine generation of RNA target sets and sequence motifs for RNA-binding proteins (RBPs). Nevertheless, quantitative approaches are needed to capture the landscape of RNA-RBP interactions responsible for cellular regulation. We have used the RNA-MaP platform to directly measure equilibrium binding for thousands of designed RNAs and to construct a predictive model for RNA recognition by the human Pumilio proteins PUM1 and PUM2. Despite prior findings of linear sequence motifs, our measurements revealed widespread residue flipping and instances of positional coupling. Application of our thermodynamic model to published in vivo crosslinking data reveals quantitative agreement between predicted affinities and in vivo occupancies. Our analyses suggest a thermodynamically driven, continuous Pumilio-binding landscape that is negligibly affected by RNA structure or kinetic factors, such as displacement by ribosomes. This work provides a quantitative foundation for dissecting the cellular behavior of RBPs and cellular features that impact their occupancies.

Keywords: PUF proteins; Pumilio; RNA binding proteins; eCLIP; high-throughput biophysics; post-transcriptional regulation; thermodynamics.

Copyright © 2019. Published by Elsevier Inc.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS

The authors declare no competing interests.

Figures

Figure 1.

Figure 1.. Quantitative High-Throughput Measurements of RNA Binding to PUM2

(A) Top: crystal structure of the RNA-binding domain of human PUM2 bound to UGUAAAUA RNA (PDB: 3Q0Q; Lu and Hall, 2011). For simplicity, the eight RNA-binding sites (R1–R8) are numbered in the 5′ to 3′ order of bound RNA residues, the reverse of the order in protein primary sequence. Center: representative PUM2 sequence motif (based on Hafner et al., 2010). Bottom: schematic representation of PUM2 residues involved in base-specific interactions (Wang et al., 2002). (B) Scaffolds for studying RNA sequence specificity. Yellow circles indicate the variable region (see Figure S1B). (C) Left: schematic representation of an RNA-MaP experiment (Buenrostro et al., 2014). Right: representative images of a subset of RNA clusters after incubation with increasing PUM2 concentrations. Asterisk at 58.4 nM indicates adjusted contrast relative to other images, due to increased background fluorescence. (D) Representative binding curves for the consensus sequence (UGUAUAUA, S2b scaffold) and a mutated sequence (UGUA

GCGC

, S1a scaffold). The number of clusters containing the indicated sequence (n) is noted. Circles indicate the fluorescence in the protein channel normalized by the fluorescence in the RNA channel. Medians and 95% confidence intervals (CIs) across the clusters are shown. Blue lines indicate the fits to the binding model, which includes a nonspecific term for PUM2 binding to the PUM2-RNA complex, and the gray area indicates the 95% CI of the fit (_K_D(consensus) = 0.17 nM, CI95% = (0.10; 0.35); _K_D(mutant) > 340 nM, corresponding to the upper limit for binding affinities that could be confidently distinguished from background). (E) Comparison of technical replicates performed on two different flow cells. Data with at least five clusters per experiment and with ΔG error less than 1 kcal/mol (95% CI) are shown. Transparent tiles correspond to ΔG values greater than reliably distinguishable from background (STAR Methods); n corresponds to the number of variants within the high-confidence affinity range, with the total number indicated in parentheses. The black dashed line indicates a slope of 1, and the red line is offset by the mean difference between replicates 1 and 2 (0.32 kcal/mol) that accounts for small differences in protein activity and/or dilution. The RMSE value was calculated after accounting for this offset (RMSE = 0.42 kcal/mol without accounting for the offset). See also Figure S1.

Figure 2.

Figure 2.. Analysis of Single-Mutant Variant Binding to PUM2

(A) Top: color code for the scaffolds in Figure 1B; the arrow points to affinities for each position 1 sequence variant. Bottom: _K_D values of PUM2 for single mutants at each position of the UGUAUAUA consensus. Bars indicate weighted means of two replicate measurements and error bars indicate weighted replicate errors. The dashed line indicates the average affinity for the consensus sequence across the four scaffolds, and the consensus residues are circled. Asterisks indicate variants with significant differences between scaffolds (10% FDR). (B) Scaffold variance before and after accounting for RNA secondary structure and after excluding sequences with predicted structure. The bars indicate standard deviations of the distribution of differences between each measured value (part A) and the scaffold mean for the respective sequence variant; see also Figure S2A and STAR Methods. Dashed lines indicate the standard deviation of measurement error. The experimental standard deviation was higher at 37°C than 25°C because of weaker binding and the absence of an independent duplicate experiment. (C) Model for RNA structure effects on PUM2 binding. Occluded RNA molecules increase the observed dissociation constant (weaken binding) by stabilizing the unbound state (see also Figure S2B and STAR Methods). (D) Single-mutant affinities after accounting for structure effects predicted by RNAfold (solid bars; Lorenz et al., 2011); the transparent region indicates the structure correction. Error bars indicate weighted replicate errors. Asterisks indicate variants with significant scaffold differences after accounting for structure effects. (E) Median effects of each single mutation (residues 1–8) across scaffolds and across 5A/C/U backgrounds at 25°C, after excluding variants with alternative binding registers and after accounting for structure. Error bars indicate 95% CIs of the median. Mutational effects were calculated relative to the weighted mean affinity for the UGUA[A/C/U]AUA consensus across scaffolds. Position 9 specificity was derived as shown in Figure S2F and the mutational effect was calculated relative to the most tightly bound residue (G). (F) Comparison of single-mutant affinities measured by RNA-MaP (Figure 2E) and by gel shift. 1C, purple; 2A, yellow; 2C, green; 3A, white; 3G, red; 4G, orange; 4U, blue; 5G, wheat; 7C, brown; 7G, magenta; 9A, lime; 9C, cyan; 9U, gray. The gel-shift values are averages and 95% CIs from two to four measurements. See also Figures S2 and S3.

Figure 3.

Figure 3.. Development of a Predictive Model for PUM2 Specificity

(A) Top: schematic representation and test of the additive consecutive model. b is the position of bound base, and X is the base at position b.ΔΔGbx values correspond to the measured single mutation penalties at 25°C (Figure 2E; Table S2). Bottom: predicted versus observed ΔΔG values relative to the UGUAUAUAU consensus sequence for all unstructured variants in the library. Predicted ΔΔG values account for the ensemble of all possible registers along the RNA sequence (STAR Methods). Transparent symbols indicate variants bound more weakly than the threshold for high-confidence affinity determination; these variants were excluded from determining the R2 and RMSE values and from global fitting in parts E and F. Points are colored based on the deviation from predicted affinity, divided by the uncertainty of the measurement (z=|ΔGobs−ΔGpred|/σΔG; capped at z = 3 for visualization). The black dashed line is the unity line and the dashed gray lines denote 1 kcal/mol deviation from the predicted value. (B) C-insertion library for base-flipping analysis. (C) Example of an insertion that gives binding tighter than predicted by the additive consecutive binding model and provides evidence for base flipping. X indicates a mismatch. ΔΔGpred corresponds to the prediction from additive consecutive model (Figure 3A). With flipping, ΔΔGpredindicates the prediction accounting for bound positions only, which is 0 as the consensus residues are in each site. (D) Summary of observed and predicted ΔΔG values for each of the C insertions in part B. Green box indicates positions at which the observed ΔΔG values are smaller than predicted, suggesting base flipping. Arrows indicate that the observed affinities are lower limits for base flipping penalties. Averages and standard errors for library variants containing the consensus sequence with the indicated insertion and lacking stable alternative registers are shown (Table S3). (E) Additive nonconsecutive model. Y indicates the residue(s) flipped at position f. Numbering of flipped residues is based on the flanking bound residues; 3/4–6/7. The dashed orange outline indicates a cluster of outliers with residue coupling. (F) Final model including binding, flipping, and coupling terms. c indicates the positions of coupled residues, and Z is the identity of coupled residues. Final model parameters are provided in Table 1. See also Figures S4–S6 and Tables S2, S3, S4, and S5.

Figure 4.

Figure 4.. Thermodynamic Model for PUM2 Binding Integrating Binding Modes and Registers

(A) An RNA sequence of length n can be bound in a series of 9- to 11-mer registers (r), within which the RNA residues are variably distributed between bound and flipped positions. Representative subsets of binding registers and base arrangements are shown for each of the four binding modes included in the model: consecutive, 1-nt, and 2-nt flips (at a single position) and two flips at different positions. The equations indicate integration of predicted ΔΔG values for all possible binding sites to obtain the final affinity. The ΔΔG values for predicting individual binding site configurations are given in Table 1. ΔGWT is the affinity for the consensus sequence. (B) Schematic representation of a predictive model of PUM2 occupancies on an mRNA target (see STAR Methods).

Figure 5.

Figure 5.. Comparison of RNA-Binding Specificities of PUM2 and Wild-Type and Engineered PUM1 Proteins

(A) Correlation between PUM1 and PUM2 affinities across the library. The red line has a slope of 1 with an offset of 1.07 kcal/mol, corresponding to weaker observed binding for PUM1 than PUM2; the RMSE value was calculated after accounting for the constant offset. (B) Predicting PUM1 binding with the PUM2-based model. Inset shows the distribution of deviations from predicted values. (C) Schematic representation of the single amino-acid change in repeat “R6” of engineered PUM1. (D) Differences between the single mutant specificities of wild-type and mutant PUM1. Differences between weighted means of single mutant penalties across scaffolds in the UGUAUAUA background are shown, and the error bars indicate propagated weighted errors. N.A. indicates lack of detectable binding by mutant PUM1. (E) Predicted mutant PUM1 affinities (based on the PUM2 model) versus observed affinities; the ΔΔG values are relative to the UGUAUAUAU consensus. Despite accurate predictions for most variants, 18% of variants deviated by >1 kcal/mol, consistent with altered specificity of mutant PUM1. (F) Predicted versus observed mutant PUM1 affinities with the altered 6U penalty.

Figure 6.

Figure 6.. Testing the Thermodynamic Model in Vivo

(A) Thermodynamic affinity predictions compared to eCLIP enrichment in K562 cells (Van Nostrand et al., 2016). Median eCLIP enrichments across sites within bins of predicted relative affinities are shown, and error bars indicate 95% CIs on the median. Only sites lacking adjacent UGUA-containing sites (within 100 nt) are shown due to inflation of eCLIP signal observed in the presence of nearby sites (Figure S7A). Black dashed line indicates the predicted change in eCLIP signal with increasing predicted ΔΔG values, relative to the eCLIP signal in the lowest ΔΔG bin. eCLIP (closed circles) and input (open circles) correspond to crosslinked samples that were or were not treated with anti-PUM2 antibody, respectively (Van Nostrand et al., 2016). The gray dashed line indicates the eCLIP enrichment for sites with predicted ΔΔG values greater than 4.5 kcal/mol (expressed transcripts); since eCLIP signal and input were each normalized to this value, this expected enrichment is equal to 1. Numbers of sites per bin range from 97 to 14,787 and are provided in Table S6. (B) Median eCLIP enrichment and 95% CIs across bins of predicted ΔΔG, using either the full thermodynamic model (left) or a model that does not take into account flipped residues (right). Only bins with at least 25 sites are shown. (C) Comparison of eCLIP enrichment for sites within 3′ UTR (orange) or CDS (gray) regions of expressed genes in K562 cells. Medians and 95% CIs are shown. Black and gray lines are as in A. (D) Fractions of sites annotated as 3′ UTR, CDS, or 5′ UTR within bins of predicted ΔΔG values. (E) Fold difference (log2) of the observed fraction of sites with the given annotation (5′ UTR, CDS, and 3′ UTR) versus the expected fraction (based on randomly selected sites). (F) Median eCLIP enrichment of consensus sites across bins of predicted secondary structure stabilities for structures blocking the PUM2 consensus site (Figure 2C; STAR Methods). Colors indicate the number of flanking nucleotides (nt) included in the stability calculations. Dashed line indicates the predicted change in eCLIP signal for increasing secondary structure stability at 37°C. Medians and 95% CIs for bins with at least 20 sites are shown. (G) Example of thermodynamic occupancy predictions for the 3′ UTR region of the human cyclin-dependent kinase inhibitor 1b_CDKN1B_ mRNA, a known target of human Pumilio proteins (Kedde et al., 2010). The left axis indicates predicted relative occupancies with respect to the UGUAUAUAU consensus; the right axis indicates predicted fractional occupancies (i.e., fraction of bound versus total CDKN1B mRNA) after accounting for cellular PUM2 and RNA abundances (see STAR Methods). (H) PUM2-binding landscape across the human transcriptome, predicted by our thermodynamic model using in vivo PUM2 and mRNA levels (see STAR Methods). Bars indicate the number of bound PUM2 molecules across RNA binding sites with zero to eight nonconsensus residues without flipped residues (blue) or with up to two flipped residues (green). The consensus was defined as UGUA[ACU]AUAN. See Table S6 for numbers of sites of each type. See also Figure S7 and Table S6.

References

    1. Becker WR, Jarmoskaite I, Kappel K, Vaidyanathan PP, Denny SK, Das R, Greenleaf WJ, and Herschlag D (2019a). Quantitative high-throughput tests of ubiquitous RNA secondary structure prediction algorithms via RNA/protein binding. bioRxiv. 10.1101/571588. - DOI
    1. Becker WR, Jarmoskaite I, Vaidyanathan PP, Greenleaf WJ, and Herschlag D (2019b). Demonstration of Protein Cooperativity Mediated by RNA Structure Using the Human Protein PUM2. RNA 118, 10.1261/rna.068585.118. - DOI - PMC - PubMed
    1. Berg JM, and Stryer L (2002). Biochemistry, 5th edition (W H Freeman; ).
    1. Bohn JA, Van Etten JL, Schagat TL, Bowman BM, McEachin RC,Freddolino PL, and Goldstrohm AC (2018). Identification of diverse target RNAs that are functionally regulated by human Pumilio proteins. Nucleic Acids Res. 46, 362–386. - PMC - PubMed
    1. Buenrostro JD, Araya CL, Chircus LM, Layton CJ, Chang HY, Snyder MP, and Greenleaf WJ (2014). Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol 32, 562–568. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources