Sequence determinants of improved CRISPR sgRNA design - PubMed (original) (raw)
. 2015 Aug;25(8):1147-57.
doi: 10.1101/gr.191452.115. Epub 2015 Jun 10.
Tengfei Xiao 2, Chen-Hao Chen 3, Wei Li 1, Clifford A Meyer 1, Qiu Wu 4, Di Wu 5, Le Cong 6, Feng Zhang 6, Jun S Liu 5, Myles Brown 7, X Shirley Liu 1
Affiliations
- PMID: 26063738
- PMCID: PMC4509999
- DOI: 10.1101/gr.191452.115
Sequence determinants of improved CRISPR sgRNA design
Han Xu et al. Genome Res. 2015 Aug.
Abstract
The CRISPR/Cas9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens using CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent data sets, the model achieved significant results in both positive and negative selection conditions and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies.
© 2015 Xu et al.; Published by Cold Spring Harbor Laboratory Press.
Figures
Figure 1.
A schematic view of procedures for sgRNA selection and categorization. (A,B) Venn diagrams showing the overlap of essential genes between human HL-60 and KBM-7 cells (A) and two biological replicates in mouse ESC JM8 cells (B). (C_–_E) Scatter plots showing the log2 fold-change of sgRNA abundance in negative selection upon cell growth. (C) sgRNAs targeting essential ribosomal genes in Wang data. (D) sgRNAs targeting essential nonribosomal genes in Wang data. (E) sgRNAs targeting essential genes in Koike-Yusa data. The dashed lines represent the threshold chosen to determine efficient and inefficient sgRNAs.
Figure 2.
Preference of nucleotide sequences that impact sgRNA efficiency. (A_–_C) Logos showing the sequence preference of the three sgRNA sets defined in Figure 1. The height of the nucleotides represents the log odds ratio of nucleotide frequency between efficient and inefficient sgRNAs. (D) A logo showing the selected features that reproducibly impact sgRNA efficiency in the three sgRNA sets. The height of the nucleotides represents the coefficients computed from the Elastic-Net. (E,F) Scatter plots showing the correlation of sequence preference for sgRNAs targeting ribosomal versus nonribosomal genes in Wang data (E) and sgRNAs in Wang data versus Koike-Yusa data (F). Each dot represents a nucleotide in a 40-bp region centered by the spacer. The sequence preference is measured as the log2 odds ratio of nucleotide frequency between efficient and inefficient sgRNAs.
Figure 3.
Experimental validation of the sequence model in predicting sgRNA efficiency. (A) A SURVEYOR gel picture (top) and a bar chart (bottom) showing the indel rates of the sgRNAs predicted to be inefficient (low sequence score) or efficient (high sequence score). The sgRNAs were selected to target the AAVS1 locus. The experiment was conducted in 293T cells. (B) A scatter plot showing the correlation of the predicted sequence scores and the protein knockout efficiency for sgRNAs targeting AR and FOXA1 in LNCaP-abl cells. The knockout efficiency is measured as the percentage of reduction in protein level upon sgRNA infection.
Figure 4.
Predicting sgRNA efficiency from sequence context in CRISPR/Cas9 knockout screens. (A) ROC curves showing the predictive power of the proposed model. (Red) Threefold cross-validation on the sgRNAs targeting ribosomal genes in Wang data; (blue) trained on ribosomal genes, and tested on nonribosomal genes in Wang data; (green) trained on Wang data, and tested on Koike-Yusa data. The black error bars on the red curve represent standard deviations computed from 10 iterations of random sampling in cross-validation. (B) ROC curves comparing the performance of the proposed model and the Doench et al. (2014) model in predicting sgRNA efficiency in Shalem data. (C) Scatter plot showing the correlation between the predicted sequence score and the relative sgRNA abundance for ABL1 and BCR in KBM-7 cells. The _P_-values were computed based on the Pearson correlation test. (D) Box plot showing the distributions of correlations between sequence scores and relative sgRNA abundances for essential and nonessential genes in KBM-7. The distribution of random background was computed by permuting the sequence scores within each gene in the data set. (E) Distributions of relative sgRNA abundances in KBM-7 cells, where the sgRNAs were categorized based on the predicted efficiency and the essentiality of their targeted genes.
Figure 5.
Assessment of the sequence models in predicting sgRNA efficiency in positive selection experiments. (A_–_E) Bar charts showing the capability of selection and the experimental reproducibility for predicted efficient and inefficient sgRNAs. The tested sgRNAs target the genes known to be involved in the resistance to different drug treatment or external stimulus. (F) ROC curves comparing the performance of the proposed model and the Doench et al. (2014) model in predicting sgRNA efficiency in positive selection experiments. In the evaluation, the positive test set consists of the sgRNAs selected in all replicates in B_–_E; and the negative test set consists of those not selected in B_–_E.
Figure 6.
Preference of the length and sequence context of spacers in CRISPR/dCas9 inhibition (CRISPRi) and activation (CRISPRa) screens. (A) Distribution of phenotype scores (Gilbert et al. 2014) for sgRNAs targeting the top 500 essential genes and the control sgRNAs in CRISPRi experiments. The dashed line represents the threshold chosen to determine efficient and inefficient sgRNAs. (B) A bar chart showing the effect of spacer length on sgRNA efficiency. (C) Logos showing the sequence preference of spacers. The height of the nucleotides represents the log odds ratio of nucleotide frequency between efficient and inefficient sgRNAs. The nucleotide at the 5′ end of the spacers is fixed to be guanines in the library design and is excluded from the logos. (D) Bar charts comparing the performance of CRISPRi model and CRISPR/Cas9 KO model in predicting sgRNA efficiency in CRISPRi negative selection, CRISPRi positive selection upon CTx-DTA treatment, and CRISPRa negative selections in Gilbert data and Konermann data. The length of spacers is 20 nt. Cross-validation was used to assess the performance of the CRISPRi model in the CRISPRi negative selection experiment. The error bars represent the standard deviations in 10 iterations of threefold cross validation. The _P_-value was computed using a paired _t_-test.
Similar articles
- CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences.
Prykhozhij SV, Rajan V, Gaston D, Berman JN. Prykhozhij SV, et al. PLoS One. 2015 Mar 5;10(3):e0119372. doi: 10.1371/journal.pone.0119372. eCollection 2015. PLoS One. 2015. PMID: 25742428 Free PMC article. - A novel sgRNA selection system for CRISPR-Cas9 in mammalian cells.
Zhang H, Zhang X, Fan C, Xie Q, Xu C, Zhao Q, Liu Y, Wu X, Zhang H. Zhang H, et al. Biochem Biophys Res Commun. 2016 Mar 18;471(4):528-32. doi: 10.1016/j.bbrc.2016.02.041. Epub 2016 Feb 12. Biochem Biophys Res Commun. 2016. PMID: 26879140 - Efficient CRISPR/Cas9-mediated biallelic gene disruption and site-specific knockin after rapid selection of highly active sgRNAs in pigs.
Wang X, Zhou J, Cao C, Huang J, Hai T, Wang Y, Zheng Q, Zhang H, Qin G, Miao X, Wang H, Cao S, Zhou Q, Zhao J. Wang X, et al. Sci Rep. 2015 Aug 21;5:13348. doi: 10.1038/srep13348. Sci Rep. 2015. PMID: 26293209 Free PMC article. - Review of CRISPR/Cas9 sgRNA Design Tools.
Cui Y, Xu J, Cheng M, Liao X, Peng S. Cui Y, et al. Interdiscip Sci. 2018 Jun;10(2):455-465. doi: 10.1007/s12539-018-0298-z. Epub 2018 Apr 11. Interdiscip Sci. 2018. PMID: 29644494 Review. - Guide RNA engineering for versatile Cas9 functionality.
Nowak CM, Lawson S, Zerez M, Bleris L. Nowak CM, et al. Nucleic Acids Res. 2016 Nov 16;44(20):9555-9564. doi: 10.1093/nar/gkw908. Epub 2016 Oct 12. Nucleic Acids Res. 2016. PMID: 27733506 Free PMC article. Review.
Cited by
- Tissue-specific knockout in the Drosophila neuromuscular system reveals ESCRT's role in formation of synapse-derived extracellular vesicles.
Chen X, Perry S, Fan Z, Wang B, Loxterkamp E, Wang S, Hu J, Dickman D, Han C. Chen X, et al. PLoS Genet. 2024 Oct 10;20(10):e1011438. doi: 10.1371/journal.pgen.1011438. eCollection 2024 Oct. PLoS Genet. 2024. PMID: 39388480 Free PMC article. - Making gene editing accessible in resource limited environments: recommendations to guide a first-time user.
Goolab S, Scholefield J. Goolab S, et al. Front Genome Ed. 2024 Sep 25;6:1464531. doi: 10.3389/fgeed.2024.1464531. eCollection 2024. Front Genome Ed. 2024. PMID: 39386178 Free PMC article. Review. - Learning to quantify uncertainty in off-target activity for CRISPR guide RNAs.
Özden F, Minary P. Özden F, et al. Nucleic Acids Res. 2024 Oct 14;52(18):e87. doi: 10.1093/nar/gkae759. Nucleic Acids Res. 2024. PMID: 39275984 Free PMC article. - Codon usage and expression-based features significantly improve prediction of CRISPR efficiency.
Bergman S, Tuller T. Bergman S, et al. NPJ Syst Biol Appl. 2024 Sep 3;10(1):100. doi: 10.1038/s41540-024-00431-8. NPJ Syst Biol Appl. 2024. PMID: 39227603 Free PMC article. - A structural peculiarity of Antarctic fish IgM drives the generation of an engineered mAb by CRISPR/Cas9.
Ametrano A, Miranda B, Moretta R, Dardano P, De Stefano L, Oreste U, Coscia MR. Ametrano A, et al. Front Bioeng Biotechnol. 2024 Jul 25;12:1315633. doi: 10.3389/fbioe.2024.1315633. eCollection 2024. Front Bioeng Biotechnol. 2024. PMID: 39119272 Free PMC article.
References
- Andersson BS, Collins VP, Kurzrock R, Larkin DW, Childs C, Ost A, Cork A, Trujillo JM, Freireich EJ, Siciliano MJ. 1995. KBM-7, a human myeloid leukemia cell line with double Philadelphia chromosomes lacking normal c-ABL and BCR transcripts. Leukemia 9: 2100–2108. - PubMed
- Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, Kerkhoven RM, Madiredjo M, Nijkamp W, Weigelt B, et al. 2004. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature 428: 431–437. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- U01 CA180980/CA/NCI NIH HHS/United States
- P50 CA090381/CA/NCI NIH HHS/United States
- R01 GM113242/GM/NIGMS NIH HHS/United States
- R01 MH110049/MH/NIMH NIH HHS/United States
- R01 HG008728/HG/NHGRI NIH HHS/United States
- R01 GM113242-01/GM/NIGMS NIH HHS/United States
- DP1 MH100706/MH/NIMH NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources