Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach (original) (raw)

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Sequence Read Archive

References

  1. Mali, P. et al. Science 339, 823–826 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  2. Cong, L. et al. Science 339, 819–823 (2013).
    CAS PubMed PubMed Central Google Scholar
  3. Doench, J.G. et al. Nat. Biotechnol. 32, 1262–1267 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  4. Gagnon, J.A. et al. PLoS ONE 9, e98186 (2014).
    Article PubMed PubMed Central Google Scholar
  5. Certo, M.T. et al. Nat. Methods 9, 973–975 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  6. Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Nat. Biotechnol. 32, 677–683 (2014).
    Article CAS PubMed Google Scholar
  7. Wu, X. et al. Nat. Biotechnol. 32, 670–676 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  8. ENCODE Project Consortium. Nature 489, 57–74 (2012).
  9. Koch, C.M. et al. Genome Res. 17, 691–707 (2007).
    Article CAS PubMed PubMed Central Google Scholar
  10. Ran, F.A. et al. Cell 154, 1380–1389 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  11. Mali, P. et al. Nat. Biotechnol. 31, 833–838 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  12. Tsai, S.Q. et al. Nat. Biotechnol. 32, 569–576 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  13. Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Nat. Biotechnol. 32, 279–284 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  14. Fu, Y. et al. Nat. Biotechnol. 31, 822–826 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  15. Guilinger, J.P., Thompson, D.B. & Liu, D.R. Nat. Biotechnol. 32, 577–582 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  16. Pattanayak, V. et al. Nat. Biotechnol. 31, 839–843 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  17. Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).
    Article CAS PubMed Google Scholar
  18. Aach, J., Mali, P. & Church, G.M. Preprint at bioRxiv 10.1101/005074 (2014).
  19. Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
    Article CAS PubMed PubMed Central Google Scholar
  20. Karolchik, D. et al. Nucleic Acids Res. 32, D493–D496 (2004).
    Article CAS PubMed PubMed Central Google Scholar
  21. Jiang, H. & Wong, W.H. Bioinformatics 24, 2395–2396 (2008).
    Article CAS PubMed PubMed Central Google Scholar
  22. Xu, Q., Schlabach, M.R., Hannon, G.J. & Elledge, S.J. Proc. Natl. Acad. Sci. USA 106, 2289–2294 (2009).
    Article CAS PubMed PubMed Central Google Scholar
  23. Esvelt, K.M. et al. Nat. Methods 10, 1116–1121 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  24. Harris, D.R. et al. PLoS Biol. 2, e304 (2004).
  25. Magoč, T. & Salzberg, S.L. Bioinformatics 27, 2957–2963 (2011).
    Article PubMed PubMed Central Google Scholar
  26. Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
    Article CAS PubMed PubMed Central Google Scholar
  27. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
    Article PubMed PubMed Central Google Scholar
  28. Droettboom, M. et al. Matplotlib version 1.4.0. 10.5281/zenodo.11451 (2014).
  29. Schölkopf, B., Burges, C.J.C. & Smola, A.J. Advances in Kernel Methods: Support Vector Learning (MIT Press, 1999).
  30. Karolchik, D. et al. Nucleic Acids Res. 42, D764–D770 (2014).
    Article CAS PubMed Google Scholar
  31. Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S. & Karolchik, D. Bioinformatics 26, 2204–2207 (2010).
    Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge J. Aach for help with CasFinder and useful discussion, B. Turczyk for help with custom array oligonucleotide synthesis, S. Byrne (Harvard Medical School) for providing PGP1 induced pluripotent stem cells, and A. Chavez for useful discussion. This work was supported by US National Institutes of Health grant P50 HG005550. R.C. was supported by a Banting Fellowship from the Canadian Institutes of Health Research. P.M. is supported by University of California, San Diego, startup funds and a Burroughs Wellcome Career Award at the Scientific Interface.

Author information

Author notes

  1. Raj Chari and Prashant Mali: These authors contributed equally to this work.

Authors and Affiliations

  1. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
    Raj Chari & George M Church
  2. Department of Bioengineering, University of California, San Diego, La Jolla, California, USA
    Prashant Mali
  3. Scripps Institute of Oceanography, University of California, San Diego, La Jolla, California, USA
    Mark Moosburner
  4. Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA
    George M Church

Authors

  1. Raj Chari
    You can also search for this author inPubMed Google Scholar
  2. Prashant Mali
    You can also search for this author inPubMed Google Scholar
  3. Mark Moosburner
    You can also search for this author inPubMed Google Scholar
  4. George M Church
    You can also search for this author inPubMed Google Scholar

Contributions

R.C. and P.M. designed the study and performed the experiments. R.C. and P.M. wrote and edited the manuscript. All authors approved the final version of the manuscript. R.C. implemented custom Python software and performed data analysis. M.M. provided technical assistance. G.M.C. supervised the project.

Corresponding authors

Correspondence toPrashant Mali or George M Church.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Technical and biological replicates of the library-on-library experiments.

Two separate transfections of the sgRNA library and Cas9Sp nuclease were performed BR1 and BR2. For each transfection, two libraries were prepared from each (TR1 and TR2). Thus, in total these two transfections have four samples. (A) Technical replicates comparison for each biological replicate (B) Scatter plots of each sample vs. each other sample across both biological replicates. Since there were two samples / transfection, a total of 4 comparisons are shown. Pearson correlation coefficients ranged from 0.819 to 0.853.

Source data

Supplementary Figure 2 Distribution of observed NHEJ mutation rates across different Cas9s and end-processing enzymes.

Total observed NHEJ rates in (A) Cas9Sp nuclease, (B) Cas9Sp nickase and (C) Cas9St1 nuclease transfected experiments. For each set of experiments, the respective sgRNA library was transfected alone, as well as four transfections each with a different end processing enzymes. Control samples represent cells with the integrated target library that were transfected with just Cas9. Of the enzymes we tested, TREX2 and Artemis have the strongest impact on increasing NHEJ-associated mutation rate.

Supplementary Figure 3 Distribution of observed NHEJ-mediated insertion and deletion rates across different Cas9s and end-processing enzymes.

Box plots of the insertion and deletion rates NHEJ events for Cas9Sp Nuclease (top), Cas9Sp Nickase (middle) and Cas9St1 nuclease (bottom). Consistent in both nuclease sets, TREX2 alters the NHEJ pattern by biasing towards deletions and away from insertions. We also notice that the insertion rate in the Cas9Sp Nuclease is modestly higher in the Tdt and ddrA samples.

Supplementary Figure 4 Impact of TREX2 and Artemis across the target-site library.

(A) Histogram plot of the fold increase in NHEJ-associated mutagenesis across all of the sites due to the addition of end processing enzymes. Top plot illustrates the effect of TREX2 while the bottom plot shows the effect of Artemis. The observed impact appears quite variable, suggesting sequence context may also be important. (B) Box plot depicting the range of fold increase. In some instances, the effect is little to none, but in other cases, this enhancement could be as much 10-15 fold. (C) Bar plot showing the percentage of sites which showed no mutagenesis with the sgRNA library alone, but exhibited mutagenesis upon addition of TREX2 or Artemis. Over half of these sites showed mutation upon addition of TREX2 or Artemis.

Supplementary Figure 5 Impact of end-processing enzymes on the distribution of net deletion sizes observed.

With no end processing enzymes (dashed line), the distribution observed is greatly biased towards smaller deletions. Intriguingly, while the addition of either Artemis or TREX2 increases the rate of observed deletions, Artemis tends to favor smaller deletions (< 5 bp in size) and TREX2 tends to mediate larger deletions (> 5 bp in size). This is consistent across both Cas9 systems (Cas9Sp left, Cas9St1 right).

Source data

Supplementary Figure 6 Impact of TREX2 on both on-target and off-target mutagenesis rates.

Three previously published sites (with 3 known off-target sites for each) were assessed for NHEJ-induced mutation in the absence and presence of the exonuclease TREX2. For each site, individual sgRNAs with Cas9 were co-transfected with or without TREX2 and cells were assessed for mutations 72hrs post-transfection. Notably, TREX2 does increase mutation rates across both on and off-target sites to varying degrees. This suggests that sgRNA site selection, with respect to minimal off-target sites is imperative, even more so in the presence of TREX2.

Source data

Supplementary Figure 7 Comparison of high- and low-activity St1 sgRNA sequences at both the integrated target site and endogenous loci in 293T cells.

(A) Position-by-position comparison of base distributions between the 82 high activity sgRNAs and 69 low activity sgRNAs. A window of 37 bp was used encompassing the 27 bp target site and 5 bp of flanking sequence on each side. For each position, p-values were calculated using a 2 x 4 Fisher’s exact test comparing the nucleotide distributions and were subsequently corrected for multiple hypothesis testing using the Benjamini-Hochberg method. Position 20 shows the largest peak with a strong preference for guanine. (B) Comparison of DNaseI hypersensitivity between the top and bottom quartile of sites with high and low activity at endogenous loci. Data was generated by ENCODE and were downloaded from the UCSC Genome Browser. A given region was defined as 225bp upstream and downstream of the target site for a total size of 427 bp. (C) Comparison of H3K4-Trimethylation between the same sites as (B).

Supplementary Figure 8 Validation of SVM predicted activity of sgRNAs for Cas9 S. pyogenes (Cas9Sp).

Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. By and large, those predicted to perform better did in fact perform better than those that were predicted to be poor.

Source data

Supplementary Figure 9 Validation of SVM-predicted activity of sgRNAs for Cas9 S. thermophilus (Cas9St1).

Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. Across all cell types, the predicted high activity guides for Cas9St1 consistently outperform those that are predicted to be low activity.

Source data

Supplementary Figure 10 Comparison of predicted SVM scores from our model with a previously published data set of 1,841 sgRNA sequences.

(A) Comparison of the gene percent rank vs. the SVM prediction score and (B) Comparison of the predicted sgRNA score from a previously published classifier vs. the SVM prediction score from our model. Spearman correlation coefficients were used to quantify the relationships. While there appears to be variability between models, there is a positive relationship between the two classifiers.

Source data

Supplementary Figure 11 Comparison of the mutation rates observed at the integrated target site with that observed at the endogenous locus.

Since chromatin effects are controlled for around the integrated site and multiple target sites were incorporated in each cell, as expected, the rates of mutagenesis are higher than that seen at the endogenous loci. This was consistent for both Cas9Sp and Cas9St1.

Source data

Supplementary Figure 12 Correlation analysis of observed mutation rates at lentiviral target sites and endogenous sites.

A strong correlation is observed when using all of the interrogated sites between the two experiments (r=0.42; p-value = 1.6e-53). However, given the endogenous sites have sites with a range of chromatin accessibility and the lentiviral sites are expected to be broadly accessible, it is likely that most accessible endogenous sites should bear closer resemblance to the lentiviral target sites. Indeed, this is the case. When taking progressively smaller subsets of the most accessible sites, the correlation dramatically improves. Using the top 100 most accessible sites, the observed correlation is 0.69 (p-value = 2.6e-15)

Source data

Supplementary Figure 13 Representation analysis of the four plasmid libraries that were used in this study.

Plasmid libraries were sequenced using the Illumina MiSeq. The Y-axis represents the percent of total reads and the X-axis represents all of the sequences in alphabetical order. The target libraries had fairly even representation across all sites, with the Cas9Sp showing slightly more uniform representation, while the sgRNA libraries appeared to have fairly uniform representation for both Cas9s.

Supplementary Figure 14 Design of oligonucleotides for the target sites and sgRNA sequences for both Cas9Sp and Cas9St1.

170 base pair oligonucleotides were synthesized using the Custom Array platform. Outer primer sequences were designed for selective amplification and 10 base barcodes were used as additional sequence to aid in mapping of target sites after Cas9-mediated mutagenesis.

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Chari, R., Mali, P., Moosburner, M. et al. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach.Nat Methods 12, 823–826 (2015). https://doi.org/10.1038/nmeth.3473

Download citation