Prediction of potent shRNAs with a sequential classification algorithm (original) (raw)

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Fellmann, C. & Lowe, S.W. Nat. Cell Biol. 16, 10–18 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  2. Guda, S. et al. Mol. Ther. 23, 1465–1474 (2015).
    Article CAS PubMed PubMed Central Google Scholar
  3. Grimm, D. et al. Nature 441, 537–541 (2006).
    Article CAS PubMed Google Scholar
  4. McBride, J.L. et al. Proc. Natl. Acad. Sci. USA 105, 5868–5873 (2008).
    Article CAS PubMed PubMed Central Google Scholar
  5. Baek, S.T. et al. Neuron 82, 1255–1262 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  6. Zuber, J. et al. Nat. Biotechnol. 29, 79–83 (2011).
    Article CAS PubMed Google Scholar
  7. Fellmann, C. et al. Cell Rep. 5, 1704–1713 (2013).
    Article CAS PubMed Google Scholar
  8. Gu, S. et al. Cell 151, 900–911 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  9. Watanabe, C., Cuellar, T.L. & Haley, B. RNA Biol. 13, 25–33 (2016).
    Article PubMed PubMed Central Google Scholar
  10. Fellmann, C. et al. Mol. Cell 41, 733–746 (2011).
    Article CAS PubMed PubMed Central Google Scholar
  11. Yuan, T.L. et al. Cancer Discov. 4, 1182–1197 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  12. Knott, S.R.V. et al. Mol. Cell 56, 796–807 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  13. Auyeung, V.C.C., Ulitsky, I., McGeary, S.E.E. & Bartel, D.P.P. Cell 152, 844–858 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  14. Viola, P. & Jones, M. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 511–518 (2001).
    Google Scholar
  15. Pelossof, R. Learning with Stochastic Focus of Attention PhD thesis, (Columbia Univ. 2011).
  16. Leslie, C., Eskin, E. & Noble, W.S. Pac. Symp. Biocomput. 575, 564–575 (2002).
    Google Scholar
  17. Sonnenburg, S., Rätsch, G. & Rieck, K. Large scale learning with string kernels. Large-scale Kernel Machines. (eds. Bottou, L., Chapelle, O., DeCoste, D. & Weston, J.) 73–104 (MIT Press, Cambridge, MA 2007).
  18. Vert, J.P., Foveau, N., Lajaunie, C. & Vandenbrouck, Y. BMC Bioinformatics 7, 520 (2006).
    Article PubMed PubMed Central Google Scholar
  19. Kampmann, M. et al. Proc. Natl. Acad. Sci. USA 112, E3384–E3391 (2015).
    Article CAS PubMed PubMed Central Google Scholar
  20. Matveeva, O.V., Nazipova, N.N., Ogurtsov, A.Y. & Shabalina, S.A. Front. Genet. 3, 163 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  21. Morgens, D.W., Deans, R.M., Li, A. & Bassik, M.C. Nat. Biotechnol. 34, 634–636 (2016).
    Article CAS PubMed PubMed Central Google Scholar
  22. Kampmann, M., Bassik, M.C. & Weissman, J.S. Proc. Natl. Acad. Sci. USA 110, E2317–E2326 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  23. Hart, T., Brown, K.R., Sircoulomb, F., Rottapel, R. & Moffat, J. Mol. Syst. Biol. 10, 733 (2014).
    Article PubMed PubMed Central Google Scholar
  24. Spies, N., Burge, C.B. & Bartel, D.P. Genome Res. 23, 2078–2090 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  25. Derti, A. et al. Genome Res. 22, 1173–1183 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  26. Lianoglou, S., Garg, V., Yang, J.L., Leslie, C.S. & Mayr, C. Genes Dev. 27, 2380–2396 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  27. Yi, R., Doehle, B.P., Qin, Y., Macara, I.G. & Cullen, B.R. RNA 11, 220–226 (2005).
    Article CAS PubMed PubMed Central Google Scholar
  28. Boudreau, R.L., Martins, I. & Davidson, B.L. Mol. Ther. 17, 169–175 (2009).
    Article CAS PubMed Google Scholar
  29. Sigoillot, F.D. et al. Nat. Methods 9, 363–366 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  30. Khvorova, A., Reynolds, A. & Jayasena, S.D. Cell 115, 209–216 (2003).
    Article CAS PubMed Google Scholar
  31. Reynolds, A. et al. Nat. Biotechnol. 22, 326–330 (2004).
    Article CAS PubMed Google Scholar
  32. Schwarz, D.S. et al. Cell 115, 199–208 (2003).
    Article CAS PubMed Google Scholar
  33. Huesken, D. et al. Nat. Biotechnol. 23, 995–1001 (2005).
    Article CAS PubMed Google Scholar
  34. Saetrom, P. & Snøve, O. Biochem. Biophys. Res. Commun. 321, 247–253 (2004).
    Article CAS PubMed Google Scholar
  35. Filhol, O. et al. PLoS One 7, e48057 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  36. Taxman, D.J. et al. BMC Biotechnol. 6, 7 (2006).
    Article PubMed PubMed Central Google Scholar
  37. Sonnenburg, S. et al. J. Mach. Learn. Res. 11, 1799–1802 (2010).
    Google Scholar
  38. Huber, W. et al. Nat. Methods 12, 115–121 (2015).
    CAS PubMed PubMed Central Google Scholar
  39. Lawrence, M. et al. PLoS Comput. Biol. http://dx.doi.org/10.1371/journal.pcbi.1003118 (2013).
  40. Dow, L.E. et al. Nat. Protoc. 7, 374–393 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  41. Platt, R.J. et al. Cell 159, 440–455 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  42. Hochedlinger, K., Yamada, Y., Beard, C. & Jaenisch, R. Cell 121, 465–477 (2005).
    Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank J.A. Doudna, G.J. Hannon, L.E. Dow and S.N. Floor for continuous support and valuable discussions. We gratefully acknowledge assistance and support from A. Banito, V. Sridhar, L. Faletti, C.C. Chen and S. Tian. C.F. was supported in part by a K99/R00 Pathway to Independence Award (K99GM118909) from the National Institutes of Health (NIH), National Institute of General Medical Sciences (NIGMS). C.F. is a founder of Mirimus Inc., a company that develops RNAi-based reagents and transgenic mice. This work was also supported in part by grant CA013106 (S.W.L.). S.W.L. is a founder and member of the scientific advisory board of Mirimus Inc., the Geoffrey Beene Chair of Cancer Biology at MSKCC and an investigator of the Howard Hughes Medical Institute. J.Z. is a member of the scientific advisory board, and P.K.P. is a founder and employee of Mirimus Inc. C.S.L. was supported in part by NHGRI U01 grants HG007033 and HG007893 and NCI U01 grant CA164190. A375 cells were a kind gift from Neal Rosen, MSKCC.

Author information

Author notes

  1. Raphael Pelossof and Lauren Fairchild: These authors contributed equally to this work.

Authors and Affiliations

  1. Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
    Raphael Pelossof, Lauren Fairchild, Christian Widmer, Vipin T Sreedharan, Gunnar Rätsch & Christina S Leslie
  2. Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, USA
    Lauren Fairchild
  3. Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
    Chun-Hao Huang, Darjus F Tschaharganeh, Vishal Thapar & Scott W Lowe
  4. Cell and Developmental Biology Program, Weill Graduate School of Medical Sciences, Cornell University, New York, New York, USA
    Chun-Hao Huang & Scott W Lowe
  5. Department of Computer Science, Machine Learning Group, Berlin Institute of Technology, Berlin, Germany
    Christian Widmer
  6. Mirimus Inc., Woodbury, New York, USA
    Nishi Sinha, Dan-Yu Lai, Yuanzhe Guan, Prem K Premsrirut & Christof Fellmann
  7. Research Institute of Molecular Pathology, Vienna Biocenter, Vienna, Austria
    Thomas Hoffmann & Johannes Zuber
  8. RNAi Core, Memorial Sloan Kettering Cancer Center, New York, New York, USA
    Qing Xiang & Ralph J Garippa
  9. Department of Computer Science, ETH Zurich, Zurich, Switzerland
    Gunnar Rätsch
  10. Howard Hughes Medical Institute and Memorial Sloan Kettering Cancer Center, New York, New York, USA
    Scott W Lowe
  11. Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
    Christof Fellmann

Authors

  1. Raphael Pelossof
    You can also search for this author inPubMed Google Scholar
  2. Lauren Fairchild
    You can also search for this author inPubMed Google Scholar
  3. Chun-Hao Huang
    You can also search for this author inPubMed Google Scholar
  4. Christian Widmer
    You can also search for this author inPubMed Google Scholar
  5. Vipin T Sreedharan
    You can also search for this author inPubMed Google Scholar
  6. Nishi Sinha
    You can also search for this author inPubMed Google Scholar
  7. Dan-Yu Lai
    You can also search for this author inPubMed Google Scholar
  8. Yuanzhe Guan
    You can also search for this author inPubMed Google Scholar
  9. Prem K Premsrirut
    You can also search for this author inPubMed Google Scholar
  10. Darjus F Tschaharganeh
    You can also search for this author inPubMed Google Scholar
  11. Thomas Hoffmann
    You can also search for this author inPubMed Google Scholar
  12. Vishal Thapar
    You can also search for this author inPubMed Google Scholar
  13. Qing Xiang
    You can also search for this author inPubMed Google Scholar
  14. Ralph J Garippa
    You can also search for this author inPubMed Google Scholar
  15. Gunnar Rätsch
    You can also search for this author inPubMed Google Scholar
  16. Johannes Zuber
    You can also search for this author inPubMed Google Scholar
  17. Scott W Lowe
    You can also search for this author inPubMed Google Scholar
  18. Christina S Leslie
    You can also search for this author inPubMed Google Scholar
  19. Christof Fellmann
    You can also search for this author inPubMed Google Scholar

Contributions

R.P., L.F., C.S.L. and C.F. conceived and designed the study, and developed the data integration framework. R.P., L.F., and C.W. built the algorithm, and carried out the model training and computational validation. C.-H.H., N.S., D.-Y.L., Y.G., P.K.P., D.F.T., T.H., J.Z., S.W.L. and C.F. generated the biological data sets and validated knockdown potency. R.P., L.F., C.W. and V.T.S. built the web page. V.T. and G.R. assisted with study design and advised on algorithmic development. Q.X. and R.J.G. helped with validation of predictions. R.P., L.F., C.-H.H., T.H., J.Z., S.W.L., C.S.L. and C.F. analyzed data and wrote the manuscript.

Corresponding authors

Correspondence toChristina S Leslie or Christof Fellmann.

Ethics declarations

Competing interests

C.F. is a founder of Mirimus Inc., a company that develops RNAi-based reagents and transgenic mice. S.W.L. is a founder and member of the scientific advisory board of Mirimus Inc. J.Z. is a member of the scientific advisory board of Mirimus Inc. P.K.P. is a founder and employee of Mirimus Inc. R.P. and L.F. have filed intellectual property on SplashRNA.

Integrated supplementary information

Supplementary Figure 1 Data set generation.

(a-f) Generation of the M1 (miR-30, 20,400 shRNAs) Sensor assay data set (Supplementary Table 2, Online Methods).

(a) Schematic of our previously published Sensor assay that enables large-scale functional assessment of shRNA potency (Online Methods).

(b) Library complexity over Sensor assay sort cycles. Shown are normalized read numbers (parts per million, ppm) in both duplicates for each shRNA represented within the initial libraries (Vector) and the pools after the indicated sorts (Sort 3, 5).

(c) Correlation of reads per shRNA between the two replicates before sorting (left panel), after Sort 5 (middle panel) and between the initial and endpoint population (right panel; shown for one representative replicate). r, Pearson correlation coefficient.

(d) Correlation of Sensor score and reads per shRNA in the vector libraries, showing that the score is independent of the initial shRNA representation. r, Pearson correlation coefficient.

(e) Enrichment or depletion of 17 control shRNAs after Sort 5. All controls have been used in previous Sensor assays (e.g. TILE, mRas + hRAS) and are classified into a strong, intermediate and weak class according to their knockdown potency assessed by immunoblotting.

(f) Rank correlation of 325 performance control shRNAs. 65 shRNAs per gene targeting mouse Bcl2, Kras, Mcl1, Myc and Trp53 that had previously been tested as part of the TILE data set were chosen as supplemental controls to assess Sensor assay performance for weak, intermediate and strong shRNAs. The individual shRNA ranks between TILE and M1 were highly correlated (325 shRNAs, Spearman rank correlation coefficient rho: 0.63; gene-specific correlation coefficients are also reported), even though the TILE and M1 data sets were generated several years apart, using mostly different equipment, reagents and operators.

(g) Generation of the miR-E reporter assay data set (Supplementary Table 2, Online Methods). Normalized reporter knockdown values of miR-E shRNAs assessed one-by-one in an RNAi reporter assay. The shRNAs were tested in 42 individual batches, each including several control shRNAs for data scaling (miR-E Ren.713, miR-30 Pten.1524) and quality control (miR-E Pten.1523, miR-E Pten.1524). Background fluorescence of the parental chicken cell line (ERC) and maximal fluorescence of the batch-specific reporter cell line (ERC cells expressing the shRNA target reporter) were also measured. All shRNAs were grouped into either a positive or negative class. A threshold value of 80 was chosen as a cutoff, based on the performance of miR-30 Pten.1524 and miR-E Ren.713.

(h) Nucleotide representation of positive shRNAs from the indicated data sets. Shown are the nucleotides one to eight of the guide strand (starting in the center), including the entire seed region. Unbiased TILE (miR-30) set, showing a diversified nucleotide composition (left panel). Preselected M1 (miR-30, DSIR + Sensor rules selected) set, showing a biased nucleotide representation (middle panel). Preselected miR-E + UltramiR set, showing a different nucleotide bias due to the altered shRNA backbone. More shRNAs starting with a C were found to be potent (compared to TILE, p = 0.002, Fisher’s exact test), indicating less restrictive sequence requirements when using the miR-E backbone.

Supplementary Figure 2 Kernel selection and data integration.

(a) Schematic of the first support vector machine (SVM) classifier that serves to eliminate non-functional sequences and prioritize shRNAs that are likely to be potent.

(b) Schematic of the kernel representation used by SplashRNA. A weighted degree kernel is calculated across the entire guide sequence, while two spectrum kernels are calculated across nucleotides 1-15 and 16-22, respectively.

(c) TILE score distribution (Online Methods ). We set a potency threshold separating the negative from the positive class at the minimal point between the two modes of the distribution (green line, for thresholds see Supplementary Table 1).

(d) Testing of multiple kernel combinations in a leave-one-gene-out nested cross-validation setting on the TILE data set found that the combination of a weighted degree kernel over positions 1-22 and two spectrum kernels at positions 1-15 and 16-22 (allKernels) yields the best performance. Spec1 is a spectrum kernel over positions 1-15. Spec2 is a spectrum kernel over positions 16-22. Spec1_spec2 is a combination of spec1 and spec2. Wdk is a weighted degree kernel over positions 1-22. Wdk_spec1 is a combination of wdk and spec1. Wdk_spec2 is a combination of wdk and spec2. All_kernels is a combination of wdk, spec1 and spec2.

(e) M1 score distribution (Supplementary Table 1, Online Methods). Cutoffs (green lines) were calculated by fitting Gaussian distributions to the modes and setting thresholds at 5% false positive rate (FPR) and 5% false negative rate (FNR).

(f) Incorporation of M1 positives, negatives or both into the TILE training set was tested in a nested leave-one-gene-out cross-validation setting. Inclusion of M1 negatives deteriorated performance on the TILE data set, whereas inclusion of the M1 positives alone improved performance. Note: TILE+M1pos = SplashmiR-30, the miR-30 classifier.

(g) Score distribution for the shERWOOD miR-30 set (Supplementary Table 1, Online Methods). We set the threshold at an arbitrary cutoff of zero (green line).

(h) Incorporation of M1 positives into the TILE training set improved performance on the external shERWOOD data set. Note: TILE+M1pos = SplashmiR-30, the miR-30 classifier.

Supplementary Figure 3 Calibration of the sequential SVM classifier SplashRNA.

(a) Precision-recall trade-off between the two classifiers SplashmiR-30 and SplashmiR-E. Selection of alpha (α) and theta (θ) hyperparameters leads to varied performance (area under the precision-recall curve, auPR) on the TILE miR-30 (x-axis) and miR-E + UltramiR (y-axis) sets. Each line represents a setting of alpha; points on the line represent distinct theta values. The circle indicates the alpha and theta choices for the final sequential classifier (SplashRNA: α = 0.6, θ = 1.1). The dashed line represents the performance of the convex linear classifier without a threshold at every alpha. Note that the performance of a sequential classifier equals or exceeds that of a linear combination since one can set the threshold (θ) to a small enough value such that all examples are evaluated by both classifiers.

(b) Performance on the TILE set, varying the value for theta with alpha set to 0.6. The insert shows a zoom in of the first 15% of the precision-recall.

(c) Performance on the miR-E + UltramiR set, varying the value for theta with alpha set to 0.6.

Supplementary Figure 4 Prediction performance of SplashRNA.

(a) Precision-recall curves on the TILE data set, comparing leave-one-gene-out nested cross-validation predictions from SplashRNA (auPR: 0.696) and SplashmiR-30 (auPR: 0.699) against the alternative prediction tools DSIR (auPR: 0.594), seqScore (auPR: 0.526) and miR_Scan (auPR: 0.449).

(b) Score distribution of the mRas + hRAS set (DSIR + Sensor rules selected). The green line indicates the threshold (Online Methods, Supplementary Table 1).

(c) Prediction performance comparison of the indicated algorithms on the external mRas + hRAS Sensor data set (Supplementary Table 1). SplashRNA outperformed the other algorithms.

(d) Score distributions of the miR-E and UltramiR data sets. For the miR-E set, the threshold was set to 80 (green line, Online Methods ). The UltramiR set represents the distribution of log depletion scores of shRNAs tested in a cell-viability screen (Supplementary Table 1).

(e) SplashRNA and DSIR based re-ranking of shERWOOD selected UltramiR shRNAs targeting essential genes that were tested in a cell-viability screen. X-axis: mean SplashRNA or DSIR score for equally sized groups (purple and blue dots, 20 groups) of 39 shRNAs each. Y-axis: Percent of shRNAs in each group that were potent (Online Methods ). SplashRNA and DSIR were compared against the published minimum (Min), median (Med) and maximum (Max) shERWOOD algorithm performance on the same data set (green-brown dots).

(f) Retrospective potency prediction of shRNAs from a large-scale essential genes RNAi screen. The biological screen used 20-25 miR-E-like shRNAs per gene to identify essential genes. shRNA potency was quantified by assessing their log fold changes (Online Methods ). For each of the top 50 essential genes, all tested algorithms selected their top and bottom five sequences by prediction score. Log fold changes for all selected shRNA across the 50 genes were compared. SplashRNA achieved the most significant discrimination between top and bottom predictions (p = 1.8e-11, one-sided Wilcoxon rank sum test). seqScore (p = 2.3e-5) was used to generate the initial library of approximately 25 shRNAs per gene.

(g) Retrospective potency prediction of shRNAs from a large-scale toxin resistance and sensitivity RNAi screen. The biological screen used 25 miR-E-like shRNAs per gene to identify resistance and sensitivity genes. shRNA potency was quantified by assessing their log fold changes (Online Methods ). For each of the top 20 sensitivity genes, all tested algorithms selected their top and bottom five sequences by prediction score. Log fold changes for all selected shRNA across the 20 genes were compared. SplashRNA was the only algorithm to achieve significant discrimination between the top and bottom predictions at p < 0.01 (p = 4.8e-4, one-sided Wilcoxon rank sum test). Of note, SplashRNA also outperformed the other algorithms when selecting smaller or larger numbers of top sensitivity genes from the biological screen (data not shown). seqScore was used to generate the initial library of approximately 25 shRNAs per gene.

Supplementary Figure 5 Transcript selection.

(a) Distribution of shRNA potency in functionally distinct transcript regions. Shown is the potency distribution of shRNAs in the unbiased TILE data set that target the 5’UTR, CDS or 3’UTR. Since these shRNAs were evaluated using the Sensor assay, their targets are not subject to alternative cleavage and polyadenylation (ApA) and/or splicing events.

(b) AU content of potent and weak miR-30 shRNAs from the unbiased TILE set. Potent shRNAs tend to have a higher proportion of A/U nucleotides (p < 2.2e-16, two-sided Kolmogorov-Smirnov test).

(c) AU content of functionally distinct transcript regions in the human genome. Shown are the AU densities in 5’UTR, CDS and 3’UTR.

(d) AU content in mouse transcripts.

(e) Alternative cleavage and polyadenylation (ApA) prevents potent shRNAs from inhibiting their putative target gene. Immunoblotting of Pten in NIH/3T3s transduced at single-copy with LEPG expressing the indicated shRNAs. Nine top predictions targeting the CDS or the 3’UTR after early ApA sites were compared alongside controls for their ability to suppress mouse Pten. Actb was used as loading control.

(f) Comparison of knockdown efficiency and annotation of ApA sites. Shown are potent Pten shRNA predictions and their position (start, end) on the mouse genome (mm9). KD indicates a qualitative degree of the knockdown observed in immunoblotting analyses of NIH/3T3s (e). ApA indicates previously published positions on the mouse genome (mm9) of ApA sites (alternative 3’ ends) identified in NIH/3T3 and mouse ES cells by 3P-Seq. 2P-Seq shows the quantification of transcript expression levels measured by 2P-Seq. All shRNAs and ApA sites are ordered according to their position along the mouse genome.

Supplementary Figure 6 Extensive validation of de novo SplashRNA predictions.

(a-f) Western blot validation of de novo SplashRNA predictions. All shRNAs were expressed using LEPG at single-copy conditions. β-Actin (Actb, ACTB) was used for normalization.

(a) Immunoblotting of Pbrm1 in NIH/3T3s (median KD: 97%, median SplashRNA score: 1.7).

(b) Immunoblotting of Rela in NIH/3T3s (median KD: 90%, median SplashRNA score: 1.1).

(c) Immunoblotting of Bcl2l11 in NIH/3T3s (median KD: 97%, median SplashRNA score: 0.7).

(d) Immunoblotting of Axin1 in NIH/3T3s (median KD: 95%, median SplashRNA score: 1.3).

(e) Schematic of the multiple human NF2 transcript variants. NF2 has nine variants with an intersection of only 198 nucleotides, excluding the 5’UTR, rendering the prediction task especially difficult due to limited sequence space.

(f) Predicting miR-E shRNAs for extremely short transcripts. Immunoblotting of NF2 in A375s transduced with the indicated shRNAs targeting all nine NF2 variants (median KD: 89%, median SplashRNA score: 0.6).

(g) Comparison of SplashRNA and DSIR predictions against CRISPR-Cas9 mediated suppression of Cd9 in mouse embryonic fibroblasts (MEFs). Shown are normalized (relative to the indicated controls) median anti-Cd9-APC fluorescence intensities of RRT-MEFs and CRT-MEFs expressing the indicated shRNAs or sgRNAs (Online Methods ). The six top-scoring predictions from DSIR + Sensor rules (DSIR) or SplashRNA (ordered according to their respective scores) were compared to six sgRNA sequences (Supplementary Table 2). *, Cd9.1137 is the top prediction from both algorithms and was plotted twice for clarity. While DSIR predictions triggered Cd9 knockdown with variable efficacy, SplashRNA predictions consistently induce strong Cd9 suppression, closely approaching knockout conditions.

(h) Transfer function of SplashRNA score versus protein knockdown for all 62 de novo predicted shRNAs validated by immunofluorescence (Supplementary Table 2). Green triangles indicate the minimum knockdown for 80% of the predictions for a given SplashRNA score bin. Bins were defined to have a width of 0.5 with the leftmost bin starting at 0.25. For the bin centered on SplashRNA score = 1, 80% of predictions showed at least 86% protein knockdown. The expected knockdown for the top 80% of predictions (e.g. 4/5 shRNAs) increases with the SplashRNA score. Overall, 91% of predictions with a SplashRNA score >1 showed more than 85% protein knockdown.

(i) Uncropped images of Pten (Figure 2d) and Bap1 (Figure 2e) western blots, and their respective β-Actin controls. Pten predicted molecular weight (MW): 47 kDa; MW validated by Cell Signaling Technology: 54 kDa. Bap1 predicted MW: 80 kDa; MW validated by Bethyl Laboratories: 80-95 kDa. β-Actin MW validated by Sigma-Aldrich: 42 kDa.

Supplementary information

Rights and permissions

About this article

Cite this article

Pelossof, R., Fairchild, L., Huang, CH. et al. Prediction of potent shRNAs with a sequential classification algorithm.Nat Biotechnol 35, 350–353 (2017). https://doi.org/10.1038/nbt.3807

Download citation