Sequence determinants of improved CRISPR sgRNA design - PubMed (original) (raw)

. 2015 Aug;25(8):1147-57.

doi: 10.1101/gr.191452.115. Epub 2015 Jun 10.

Tengfei Xiao 2, Chen-Hao Chen 3, Wei Li 1, Clifford A Meyer 1, Qiu Wu 4, Di Wu 5, Le Cong 6, Feng Zhang 6, Jun S Liu 5, Myles Brown 7, X Shirley Liu 1

Affiliations

Sequence determinants of improved CRISPR sgRNA design

Han Xu et al. Genome Res. 2015 Aug.

Abstract

The CRISPR/Cas9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens using CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent data sets, the model achieved significant results in both positive and negative selection conditions and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies.

© 2015 Xu et al.; Published by Cold Spring Harbor Laboratory Press.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

A schematic view of procedures for sgRNA selection and categorization. (A,B) Venn diagrams showing the overlap of essential genes between human HL-60 and KBM-7 cells (A) and two biological replicates in mouse ESC JM8 cells (B). (C_–_E) Scatter plots showing the log2 fold-change of sgRNA abundance in negative selection upon cell growth. (C) sgRNAs targeting essential ribosomal genes in Wang data. (D) sgRNAs targeting essential nonribosomal genes in Wang data. (E) sgRNAs targeting essential genes in Koike-Yusa data. The dashed lines represent the threshold chosen to determine efficient and inefficient sgRNAs.

Figure 2.

Figure 2.

Preference of nucleotide sequences that impact sgRNA efficiency. (A_–_C) Logos showing the sequence preference of the three sgRNA sets defined in Figure 1. The height of the nucleotides represents the log odds ratio of nucleotide frequency between efficient and inefficient sgRNAs. (D) A logo showing the selected features that reproducibly impact sgRNA efficiency in the three sgRNA sets. The height of the nucleotides represents the coefficients computed from the Elastic-Net. (E,F) Scatter plots showing the correlation of sequence preference for sgRNAs targeting ribosomal versus nonribosomal genes in Wang data (E) and sgRNAs in Wang data versus Koike-Yusa data (F). Each dot represents a nucleotide in a 40-bp region centered by the spacer. The sequence preference is measured as the log2 odds ratio of nucleotide frequency between efficient and inefficient sgRNAs.

Figure 3.

Figure 3.

Experimental validation of the sequence model in predicting sgRNA efficiency. (A) A SURVEYOR gel picture (top) and a bar chart (bottom) showing the indel rates of the sgRNAs predicted to be inefficient (low sequence score) or efficient (high sequence score). The sgRNAs were selected to target the AAVS1 locus. The experiment was conducted in 293T cells. (B) A scatter plot showing the correlation of the predicted sequence scores and the protein knockout efficiency for sgRNAs targeting AR and FOXA1 in LNCaP-abl cells. The knockout efficiency is measured as the percentage of reduction in protein level upon sgRNA infection.

Figure 4.

Figure 4.

Predicting sgRNA efficiency from sequence context in CRISPR/Cas9 knockout screens. (A) ROC curves showing the predictive power of the proposed model. (Red) Threefold cross-validation on the sgRNAs targeting ribosomal genes in Wang data; (blue) trained on ribosomal genes, and tested on nonribosomal genes in Wang data; (green) trained on Wang data, and tested on Koike-Yusa data. The black error bars on the red curve represent standard deviations computed from 10 iterations of random sampling in cross-validation. (B) ROC curves comparing the performance of the proposed model and the Doench et al. (2014) model in predicting sgRNA efficiency in Shalem data. (C) Scatter plot showing the correlation between the predicted sequence score and the relative sgRNA abundance for ABL1 and BCR in KBM-7 cells. The _P_-values were computed based on the Pearson correlation test. (D) Box plot showing the distributions of correlations between sequence scores and relative sgRNA abundances for essential and nonessential genes in KBM-7. The distribution of random background was computed by permuting the sequence scores within each gene in the data set. (E) Distributions of relative sgRNA abundances in KBM-7 cells, where the sgRNAs were categorized based on the predicted efficiency and the essentiality of their targeted genes.

Figure 5.

Figure 5.

Assessment of the sequence models in predicting sgRNA efficiency in positive selection experiments. (A_–_E) Bar charts showing the capability of selection and the experimental reproducibility for predicted efficient and inefficient sgRNAs. The tested sgRNAs target the genes known to be involved in the resistance to different drug treatment or external stimulus. (F) ROC curves comparing the performance of the proposed model and the Doench et al. (2014) model in predicting sgRNA efficiency in positive selection experiments. In the evaluation, the positive test set consists of the sgRNAs selected in all replicates in B_–_E; and the negative test set consists of those not selected in B_–_E.

Figure 6.

Figure 6.

Preference of the length and sequence context of spacers in CRISPR/dCas9 inhibition (CRISPRi) and activation (CRISPRa) screens. (A) Distribution of phenotype scores (Gilbert et al. 2014) for sgRNAs targeting the top 500 essential genes and the control sgRNAs in CRISPRi experiments. The dashed line represents the threshold chosen to determine efficient and inefficient sgRNAs. (B) A bar chart showing the effect of spacer length on sgRNA efficiency. (C) Logos showing the sequence preference of spacers. The height of the nucleotides represents the log odds ratio of nucleotide frequency between efficient and inefficient sgRNAs. The nucleotide at the 5′ end of the spacers is fixed to be guanines in the library design and is excluded from the logos. (D) Bar charts comparing the performance of CRISPRi model and CRISPR/Cas9 KO model in predicting sgRNA efficiency in CRISPRi negative selection, CRISPRi positive selection upon CTx-DTA treatment, and CRISPRa negative selections in Gilbert data and Konermann data. The length of spacers is 20 nt. Cross-validation was used to assess the performance of the CRISPRi model in the CRISPRi negative selection experiment. The error bars represent the standard deviations in 10 iterations of threefold cross validation. The _P_-value was computed using a paired _t_-test.

Similar articles

Cited by

References

    1. Andersson BS, Collins VP, Kurzrock R, Larkin DW, Childs C, Ost A, Cork A, Trujillo JM, Freireich EJ, Siciliano MJ. 1995. KBM-7, a human myeloid leukemia cell line with double Philadelphia chromosomes lacking normal c-ABL and BCR transcripts. Leukemia 9: 2100–2108. - PubMed
    1. Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, Kerkhoven RM, Madiredjo M, Nijkamp W, Weigelt B, et al. 2004. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature 428: 431–437. - PubMed
    1. Chen B, Gilbert LA, Cimini BA, Schnitzbauer J, Zhang W, Li GW, Park J, Blackburn EH, Weissman JS, Qi LS, et al. 2013. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas9 system. Cell 155: 1479–1491. - PMC - PubMed
    1. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, et al. 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339: 819–823. - PMC - PubMed
    1. Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, Sullender M, Ebert BL, Xavier RJ, Root DE. 2014. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol 32: 1262–1267. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources