Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation - PubMed (original) (raw)

. 2014 Dec;32(12):1262-7.

doi: 10.1038/nbt.3026. Epub 2014 Sep 3.

Affiliations

Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation

John G Doench et al. Nat Biotechnol. 2014 Dec.

Abstract

Components of the prokaryotic clustered, regularly interspaced, short palindromic repeats (CRISPR) loci have recently been repurposed for use in mammalian cells. The CRISPR-associated (Cas)9 can be programmed with a single guide RNA (sgRNA) to generate site-specific DNA breaks, but there are few known rules governing on-target efficacy of this system. We created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. We discovered sequence features that improved activity, including a further optimization of the protospacer-adjacent motif (PAM) of Streptococcus pyogenes Cas9. The results from 1,841 sgRNAs were used to construct a predictive model of sgRNA activity to improve sgRNA design for gene editing and genetic screens. We provide an online tool for the design of highly active sgRNAs for any gene of interest.

PubMed Disclaimer

Figures

Figure 1

Figure 1

sgRNA activity screens in mouse and human cells. (a) Representation of the sgRNA libraries. Colors represent genes assayed by FACS; light gray indicates genes either poorly expressed or not assayed; dark gray indicates targets not found in the mouse or human genomes. (b) Top: Antibody staining in cells (red) compared to unstained cells (black). Bottom: FACS plots indicating the negative population isolated for each cell surface marker after library transduction. (c) Percent of sgRNAs enriched >10-fold (mouse) or >2-fold (human) in the marker-negative population that were on-target.

Figure 2

Figure 2

Features of sgRNA activity. (a) sgRNA concordance across cell lines. Pairwise comparison between cell lines of sgRNA percent-rank (see Methods for percent-rank calculation) for sgRNAs targeting CD13 or CD33; Spearman rank correlation of 0.87 and 0.80, respectively. (b) Activity maps of sgRNA by cut site position. Exons and 100 nts of flanking intron are represented as lines on the x-axis with gaps marking the remaining intronic sequence. sgRNAs excluded from activity modeling are indicated in gray. Boundary sgRNAs (green) are those where the cut site, between nts 17 and 18, falls between annotated regions (e.g. CDS/intron). All sgRNAs with fold enrichment ≤ 0.25 are grouped at the bottom of the y-axis. Scale bar indicates 500nt of sequence. (c) Activity as a function of G/C content for the 1,841 CDS-targeting sgRNAs analyzed. The top, middle and bottom lines of the box represent the 25th, 50th, and 75th percentiles, respectively; the whiskers represent the 10th and 90th percentiles. p* = 0.0003, p** = 3 × 10−11, Kolmogorov-Smirnov test.

Figure 3

Figure 3

Model of sgRNA activity. (a) p-values of observing the conditional probability of a guide with a percent-rank activity of >0.8 under the null distribution examined at every position including the 4 nt upstream of the sgRNA target site, the 20 nt of sgRNA complementarity, the PAM, and the 3 nt downstream of the sgRNA target sequence. p-values were calculated from the binomial distribution with a baseline probability of 0.2 using 1,841 CDS-targeting guides. (b) Performance evaluation of sgRNA activity prediction scores based on nucleotide features. Scores for 1,841 sgRNAs are divided by quintile (x-axis) and experimentally-determined activity within each prediction group is assessed by sgRNA percent rank, and also binned by quintile (y-axis). (c) Performance validation of sgRNA prediction algorithm. The model was trained on all possible combinations of 8 genes and tested individually on the remaining held-out gene. Each gray line indicates the ROC curve for a held-out gene. The black line is the mean ROC curve. The bar graph inset indicates the Area Under the Curve (AUC) for each gene. (d) Distribution of 1,841 sgRNAs across predicted score quintiles. (e) Simulation of the fraction of most-active sgRNAs, arbitrarily defined as the top 20% of sgRNA for a gene, in hypothetical libraries with 6 sgRNAs per gene. For a library designed with no on-target criteria (null, in red) the values are simply the binominal expansion of 0.2. For the hypothetical library that incorporates sgRNA scoring rules to enrich for highly-active sgRNAs (blue), the model predicts that the top two quintiles of scores (0.6 – 1.0) contain 66.3% of most-active sgRNAs, and thus the values are the binomial expansion of 0.663.

Similar articles

Cited by

References

    1. Barrangou R, et al. CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes. Science. 2007;315:1709–1712. - PubMed
    1. Garneau JE, et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature. 2010;468:67–71. - PubMed
    1. Sapranauskas R, et al. The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucleic Acids Research. 2011;39:9275–9282. - PMC - PubMed
    1. Jinek M, et al. RNA-programmed genome editing in human cells. eLife. 2013;2:e00471–e00471. - PMC - PubMed
    1. Cong L, et al. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science. 2013;339:819–823. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources