Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases (original) (raw)
Journal Article
,
1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea
1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea
*To whom correspondence should be addressed.
Search for other works by this author on:
,
1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea
Search for other works by this author on:
1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea
1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea
*To whom correspondence should be addressed.
Search for other works by this author on:
Associate Editor: John Hancock
†The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
Received:
13 November 2013
Revision received:
27 December 2013
Accepted:
21 January 2014
Published:
24 January 2014
Cite
Sangsu Bae, Jeongbin Park, Jin-Soo Kim, Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases, Bioinformatics, Volume 30, Issue 10, May 2014, Pages 1473–1475, https://doi.org/10.1093/bioinformatics/btu048
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
Summary: The Type II clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system is an adaptive immune response in prokaryotes, protecting host cells against invading phages or plasmids by cleaving these foreign DNA species in a targeted manner. CRISPR/Cas-derived RNA-guided engineered nucleases (RGENs) enable genome editing in cultured cells, animals and plants, but are limited by off-target mutations. Here, we present a novel algorithm termed Cas-OFFinder that searches for potential off-target sites in a given genome or user-defined sequences. Unlike other algorithms currently available for identification of RGEN off-target sites, Cas-OFFinder is not limited by the number of mismatches and allows variations in protospacer-adjacent motif sequences recognized by Cas9, the essential protein component in RGENs. Cas-OFFinder is available as a command-line program or accessible via our website.
Availability and implementation: Cas-OFFinder free access at http://www.rgenome.net/cas-offinder.
Contact: baesau@snu.ac.kr or jskim01@snu.ac.kr
1 INTRODUCTION
Genome editing with engineered nucleases is broadly useful for biomedical research, biotechnology and medicine. Engineered nucleases cleave chromosomal DNA in a targeted manner, and the repair of the resulting double-strand breaks by endogenous systems gives rise to targeted genome modifications in cultured cells, animals and plants. We and others have developed three different types of engineered nucleases: zinc finger nucleases (ZFNs) (Bibikova et al., 2003; Kim et al., 2009), transcription activator-like effector nucleases (TALENs) (Kim et al., 2013; Miller et al., 2011) and RNA-guided engineered nucleases (RGENs) (Cho et al., 2013; Cong et al., 2013; Jinek et al., 2013; Mali et al., 2013) derived from the Type II clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system, an adaptive immune response in bacteria and archaea.
Unlike ZFNs and TALENs whose DNA specificities are determined by DNA-binding proteins, RGENs use complementary base pairing to recognize target sites. RGENs consist of (i) dual RNA components comprising sequence-invariant tracrRNA and sequence-variable guide RNA termed crRNA [or single-chain guide RNA (sgRNA) constructed by linking essential portions of tracrRNA and crRNA (Jinek et al., 2012)] and (ii) a fixed protein component, Cas9, that recognizes the protospacer-adjacent motif (PAM) downstream of target DNA sequences corresponding to guide RNA. Custom-designed RGENs are produced simply by replacing guide RNAs, making this system easy to access.
Unfortunately, RGENs cleave not only on-target sites but also off-target sites that differ by up to several nucleotides from the on-target sites (Cho et al., 2014; Fu et al., 2013; Hsu et al., 2013), causing unwanted off-target mutations and chromosomal rearrangements. These undesired off-target effects raise significant concerns for using RGENs as genome editing tools in diverse applications. To address this issue, researchers must be able to search for potential off-target sites in the genome. Sequence alignment tools such as TagScan (Cradick et al., 2011; Iseli et al., 2007), Bowtie (Langmead et al., 2009) or GPGPU-enabled CUSHAW (Liu et al., 2012) can be used to find potential off-target sites, but are limited by the number of mismatched bases allowed and a requirement for a fixed PAM sequence.
Here we introduce a fast and highly versatile off-target searching tool, Cas-OFFinder. Importantly, Cas-OFFinder is written in OpenCL, an open standard language for parallel programming in heterogeneous environments, enabling operation in diverse platforms such as central processing units (CPUs), graphics processing units (GPUs) and digital signal processors (DSPs).
2 METHODS
2.1 Concept of Cas-OFFinder
Versions of Cas9 derived from three different species have been exploited to edit genes in human cells. These Cas9 proteins recognize different PAM sequences. Cas9 originated from Streptococcus pyogenes (SpCas9) recognizes 5′-NGG-3′ PAM sequences and, to a lesser extent, 5′-NAG-3′. Cas9 from Streptococcus thermophilus (StCas9) (Cong et al., 2013) and that from Neisseria meningitidis (NmCas9) (Hou et al., 2013) recognizes 5′-NNAGAAW-3′ (W = A or T) and 5′-NNNNGMTT-3′ (M = A or C), respectively. The degeneracy in PAM recognition by Cas9 must be accounted for when searching for potential off-target sites. In the case of SpCas9, Cas-OFFinder first compiles all the 23-bp DNA sequences composed of 20-bp sequences corresponding to the sgRNA sequence of interest and the 5′-NRG-3′ PAM sequences (Fig. 1A). Cas-OFFinder then compares all the compiled sequences with the query sequence and counts the number of mismatched bases in the 20-bp sgRNA sequence.
Fig. 1.
(A) The scheme of Cas-OFFinder. (B) The workflow of Cas-OFFinder. (C) Running time per target site as a function of the number of input target sites via CPU (black squares) and GPU (red circles)
2.2 Workflow of Cas-OFFinder
Cas-OFFinder is composed of two different OpenCL kernels (a searching kernel and a comparing kernel) and C++ (wrapper) parts (Fig. 1B). First, Cas-OFFinder reads genome sequence data files in single or multi-sequence FASTA formats. To read and parse FASTA files, an open-source FASTA/FASTQ parser library is used. Although OpenCL supports various processors, the memory of the devices is not always large enough for big data analysis. To overcome the memory limitation of OpenCL devices, wrapper1 divides the genome data into units of the largest possible size allowed by the device memory. These divided chunks are then loaded into the searching kernel that compiles all sites that include a PAM sequence in the entire genome. To search for and select these specific sites rapidly and effectively, the searching kernel runs independently on every calculation unit of a processor, i.e. all searching processes on the calculation units are accomplished simultaneously. After this step, wrapper2 collects the information about the specific sites containing PAM sequences and delivers these sequences to the comparing kernel, which counts the number of mismatched bases. Similar to the searching kernel, all comparing processes on the calculation units are accomplished simultaneously. Finally, wrapper3 selects potential off-target sites that have fewer mismatched bases than a given threshold, and writes the results into an output file with the following information: chromosome number, position, direction, number of mismatched bases and potential off-target DNA sequences with mismatched bases noted in lowercase letters. These processes are repeated until all the divided chunks are loaded.
3 RESULTS AND DISCUSSION
To evaluate the performance of Cas-OFFinder, we first chose arbitrary SpCas9 target sites in the human genome and ran Cas-OFFinder with query sequences via CPU (Intel i7 3770K) or GPU (AMD Radeon HD 7870). Notably, running time per target site was decreased as the number of target sites was increased (Fig. 1C). This result is expected because the searching kernel works only once for many input targets. The speed of Cas-OFFinder based on GPU (3.0 s) was 20× faster than that of CPU (60.0 s) when 1000 target sites were analyzed. We also used Cas-OFFinder to search for potential off-target sites of NmCas9, which recognizes 5′-NNNNGMTT-3′ (where M is A or C) PAM sequences in addition to a 24-bp target DNA sequence specific to guide RNA in human and other genomes (Table 1). Note that Cas-OFFinder allows mixed bases to account for the degeneracy in PAM sequences.
Table 1.
Running time of Cas-OFFinder via GPU to search for NmCas9 potential off-target sites
Data set (size) | Number of mismatches | Time for 100 targets |
---|---|---|
H. sapiens genome (3.01 Gb) | 1 | 76.4 ± 2.0 s |
H. sapiens genome (3.01 Gb) | 5 | 79.9 ± 1.6 s |
H. sapiens genome (3.01 Gb) | 10 | 114.4 ± 3.0 s |
M. musculus genome (2.65 Gb) | 5 | 62.6 ± 2.4 s |
D. rerio genome (1.32 Gb) | 5 | 37.7 ± 3.5 s |
A. thaliana genome (116 Mb) | 5 | 4.8 ± 0.8 s |
Data set (size) | Number of mismatches | Time for 100 targets |
---|---|---|
H. sapiens genome (3.01 Gb) | 1 | 76.4 ± 2.0 s |
H. sapiens genome (3.01 Gb) | 5 | 79.9 ± 1.6 s |
H. sapiens genome (3.01 Gb) | 10 | 114.4 ± 3.0 s |
M. musculus genome (2.65 Gb) | 5 | 62.6 ± 2.4 s |
D. rerio genome (1.32 Gb) | 5 | 37.7 ± 3.5 s |
A. thaliana genome (116 Mb) | 5 | 4.8 ± 0.8 s |
Table 1.
Running time of Cas-OFFinder via GPU to search for NmCas9 potential off-target sites
Data set (size) | Number of mismatches | Time for 100 targets |
---|---|---|
H. sapiens genome (3.01 Gb) | 1 | 76.4 ± 2.0 s |
H. sapiens genome (3.01 Gb) | 5 | 79.9 ± 1.6 s |
H. sapiens genome (3.01 Gb) | 10 | 114.4 ± 3.0 s |
M. musculus genome (2.65 Gb) | 5 | 62.6 ± 2.4 s |
D. rerio genome (1.32 Gb) | 5 | 37.7 ± 3.5 s |
A. thaliana genome (116 Mb) | 5 | 4.8 ± 0.8 s |
Data set (size) | Number of mismatches | Time for 100 targets |
---|---|---|
H. sapiens genome (3.01 Gb) | 1 | 76.4 ± 2.0 s |
H. sapiens genome (3.01 Gb) | 5 | 79.9 ± 1.6 s |
H. sapiens genome (3.01 Gb) | 10 | 114.4 ± 3.0 s |
M. musculus genome (2.65 Gb) | 5 | 62.6 ± 2.4 s |
D. rerio genome (1.32 Gb) | 5 | 37.7 ± 3.5 s |
A. thaliana genome (116 Mb) | 5 | 4.8 ± 0.8 s |
In conclusion, Cas-OFFinder enables searching for potential off-target sites in any sequenced genome rapidly without limiting the PAM sequence or the number of mismatched bases. These features make Cas-OFFinder applicable to ZFNs, TALENs and transcription factors that are prone to off-target DNA recognition.
Funding: National Research Foundation of Korea (2013000718 to J.-S.K.) and the Plant Molecular Breeding Center of Next-Generation BioGreen 21 Program (PJ009081), the National Research Foundation of Korea (2013065262), TJ Park Science Fellowship (to S.B.).
Conflict of Interest: none declared.
REFERENCES
et al.
Enhancing gene targeting with designed zinc finger nucleases
,
Science
,
2003
, vol.
300
pg.
764
et al.
Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease
,
Nat. Biotechnol.
,
2013
, vol.
31
(pg.
230
-
232
)
et al.
Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases
,
Genome Res.
,
2014
, vol.
24
(pg.
132
-
141
)
et al.
Multiplex genome engineering using CRISPR/Cas systems
,
Science
,
2013
, vol.
339
(pg.
819
-
823
)
et al.
ZFN-site searches genomes for zinc finger nuclease target sites and off-target sites
,
BMC Bioinformatics
,
2011
, vol.
12
pg.
152
et al.
High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells
,
Nat. Biotechnol.
,
2013
, vol.
31
(pg.
822
-
826
)
et al.
Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis
,
Proc. Natl Acad. Sci. USA
,
2013
, vol.
110
(pg.
15644
-
15649
)
et al.
DNA targeting specificity of RNA-guided Cas9 nucleases
,
Nat. Biotechnol.
,
2013
, vol.
31
(pg.
827
-
832
)
et al.
Indexing strategies for rapid searches of short words in genome sequences
,
PLoS One
,
2007
, vol.
2
pg.
e579
et al.
A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity
,
Science
,
2012
, vol.
337
(pg.
816
-
821
)
et al.
RNA-programmed genome editing in human cells
,
eLife
,
2013
, vol.
2
pg.
e00471
et al.
Targeted genome editing in human cells with zinc finger nucleases constructed via modular assembly
,
Genome Res.
,
2009
, vol.
19
(pg.
1279
-
1288
)
et al.
A library of TAL effector nucleases spanning the human genome
,
Nat. Biotechnol.
,
2013
, vol.
31
(pg.
251
-
258
)
et al.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
,
Genome Biol.
,
2009
, vol.
10
pg.
R25
et al.
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform
,
Bioinformatics
,
2012
, vol.
28
(pg.
1830
-
1837
)
et al.
RNA-guided human genome engineering via Cas9
,
Science
,
2013
, vol.
339
(pg.
823
-
826
)
et al.
A TALE nuclease architecture for efficient genome editing
,
Nat. Biotechnol.
,
2011
, vol.
29
(pg.
143
-
148
)
Author notes
Associate Editor: John Hancock
†The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
© The Author 2014. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Citations
Views
Altmetric
Metrics
Total Views 33,298
25,945 Pageviews
7,353 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 26 |
December 2016 | 26 |
January 2017 | 65 |
February 2017 | 138 |
March 2017 | 164 |
April 2017 | 99 |
May 2017 | 166 |
June 2017 | 155 |
July 2017 | 120 |
August 2017 | 107 |
September 2017 | 146 |
October 2017 | 129 |
November 2017 | 164 |
December 2017 | 192 |
January 2018 | 264 |
February 2018 | 264 |
March 2018 | 287 |
April 2018 | 258 |
May 2018 | 241 |
June 2018 | 218 |
July 2018 | 239 |
August 2018 | 251 |
September 2018 | 213 |
October 2018 | 223 |
November 2018 | 307 |
December 2018 | 329 |
January 2019 | 293 |
February 2019 | 287 |
March 2019 | 392 |
April 2019 | 342 |
May 2019 | 378 |
June 2019 | 273 |
July 2019 | 354 |
August 2019 | 271 |
September 2019 | 278 |
October 2019 | 335 |
November 2019 | 241 |
December 2019 | 261 |
January 2020 | 276 |
February 2020 | 253 |
March 2020 | 258 |
April 2020 | 205 |
May 2020 | 233 |
June 2020 | 340 |
July 2020 | 244 |
August 2020 | 267 |
September 2020 | 271 |
October 2020 | 365 |
November 2020 | 313 |
December 2020 | 367 |
January 2021 | 365 |
February 2021 | 399 |
March 2021 | 460 |
April 2021 | 495 |
May 2021 | 561 |
June 2021 | 443 |
July 2021 | 400 |
August 2021 | 380 |
September 2021 | 445 |
October 2021 | 391 |
November 2021 | 442 |
December 2021 | 390 |
January 2022 | 338 |
February 2022 | 405 |
March 2022 | 528 |
April 2022 | 536 |
May 2022 | 461 |
June 2022 | 448 |
July 2022 | 481 |
August 2022 | 408 |
September 2022 | 463 |
October 2022 | 413 |
November 2022 | 407 |
December 2022 | 386 |
January 2023 | 317 |
February 2023 | 425 |
March 2023 | 471 |
April 2023 | 443 |
May 2023 | 464 |
June 2023 | 411 |
July 2023 | 370 |
August 2023 | 382 |
September 2023 | 444 |
October 2023 | 451 |
November 2023 | 482 |
December 2023 | 487 |
January 2024 | 686 |
February 2024 | 953 |
March 2024 | 926 |
April 2024 | 500 |
May 2024 | 499 |
June 2024 | 378 |
July 2024 | 497 |
August 2024 | 443 |
September 2024 | 465 |
October 2024 | 387 |
November 2024 | 84 |
×
Email alerts
Citing articles via
More from Oxford Academic