Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases (original) (raw)

Journal Article

,

1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea

1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea

*To whom correspondence should be addressed.

Search for other works by this author on:

,

1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea

Search for other works by this author on:

1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea

1National Creative Research Initiatives Center for Genome Engineering, 2Department of Chemistry and 3Department of Physics and Astronomy, Seoul National University, 599 Gwanak-ro, Seoul 151-742, South Korea

*To whom correspondence should be addressed.

Search for other works by this author on:

Associate Editor: John Hancock

†The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

Author Notes

Received:

13 November 2013

Revision received:

27 December 2013

Accepted:

21 January 2014

Published:

24 January 2014

Cite

Sangsu Bae, Jeongbin Park, Jin-Soo Kim, Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases, Bioinformatics, Volume 30, Issue 10, May 2014, Pages 1473–1475, https://doi.org/10.1093/bioinformatics/btu048
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Summary: The Type II clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system is an adaptive immune response in prokaryotes, protecting host cells against invading phages or plasmids by cleaving these foreign DNA species in a targeted manner. CRISPR/Cas-derived RNA-guided engineered nucleases (RGENs) enable genome editing in cultured cells, animals and plants, but are limited by off-target mutations. Here, we present a novel algorithm termed Cas-OFFinder that searches for potential off-target sites in a given genome or user-defined sequences. Unlike other algorithms currently available for identification of RGEN off-target sites, Cas-OFFinder is not limited by the number of mismatches and allows variations in protospacer-adjacent motif sequences recognized by Cas9, the essential protein component in RGENs. Cas-OFFinder is available as a command-line program or accessible via our website.

Availability and implementation: Cas-OFFinder free access at http://www.rgenome.net/cas-offinder.

Contact: baesau@snu.ac.kr or jskim01@snu.ac.kr

1 INTRODUCTION

Genome editing with engineered nucleases is broadly useful for biomedical research, biotechnology and medicine. Engineered nucleases cleave chromosomal DNA in a targeted manner, and the repair of the resulting double-strand breaks by endogenous systems gives rise to targeted genome modifications in cultured cells, animals and plants. We and others have developed three different types of engineered nucleases: zinc finger nucleases (ZFNs) (Bibikova et al., 2003; Kim et al., 2009), transcription activator-like effector nucleases (TALENs) (Kim et al., 2013; Miller et al., 2011) and RNA-guided engineered nucleases (RGENs) (Cho et al., 2013; Cong et al., 2013; Jinek et al., 2013; Mali et al., 2013) derived from the Type II clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system, an adaptive immune response in bacteria and archaea.

Unlike ZFNs and TALENs whose DNA specificities are determined by DNA-binding proteins, RGENs use complementary base pairing to recognize target sites. RGENs consist of (i) dual RNA components comprising sequence-invariant tracrRNA and sequence-variable guide RNA termed crRNA [or single-chain guide RNA (sgRNA) constructed by linking essential portions of tracrRNA and crRNA (Jinek et al., 2012)] and (ii) a fixed protein component, Cas9, that recognizes the protospacer-adjacent motif (PAM) downstream of target DNA sequences corresponding to guide RNA. Custom-designed RGENs are produced simply by replacing guide RNAs, making this system easy to access.

Unfortunately, RGENs cleave not only on-target sites but also off-target sites that differ by up to several nucleotides from the on-target sites (Cho et al., 2014; Fu et al., 2013; Hsu et al., 2013), causing unwanted off-target mutations and chromosomal rearrangements. These undesired off-target effects raise significant concerns for using RGENs as genome editing tools in diverse applications. To address this issue, researchers must be able to search for potential off-target sites in the genome. Sequence alignment tools such as TagScan (Cradick et al., 2011; Iseli et al., 2007), Bowtie (Langmead et al., 2009) or GPGPU-enabled CUSHAW (Liu et al., 2012) can be used to find potential off-target sites, but are limited by the number of mismatched bases allowed and a requirement for a fixed PAM sequence.

Here we introduce a fast and highly versatile off-target searching tool, Cas-OFFinder. Importantly, Cas-OFFinder is written in OpenCL, an open standard language for parallel programming in heterogeneous environments, enabling operation in diverse platforms such as central processing units (CPUs), graphics processing units (GPUs) and digital signal processors (DSPs).

2 METHODS

2.1 Concept of Cas-OFFinder

Versions of Cas9 derived from three different species have been exploited to edit genes in human cells. These Cas9 proteins recognize different PAM sequences. Cas9 originated from Streptococcus pyogenes (SpCas9) recognizes 5′-NGG-3′ PAM sequences and, to a lesser extent, 5′-NAG-3′. Cas9 from Streptococcus thermophilus (StCas9) (Cong et al., 2013) and that from Neisseria meningitidis (NmCas9) (Hou et al., 2013) recognizes 5′-NNAGAAW-3′ (W = A or T) and 5′-NNNNGMTT-3′ (M = A or C), respectively. The degeneracy in PAM recognition by Cas9 must be accounted for when searching for potential off-target sites. In the case of SpCas9, Cas-OFFinder first compiles all the 23-bp DNA sequences composed of 20-bp sequences corresponding to the sgRNA sequence of interest and the 5′-NRG-3′ PAM sequences (Fig. 1A). Cas-OFFinder then compares all the compiled sequences with the query sequence and counts the number of mismatched bases in the 20-bp sgRNA sequence.

(A) The scheme of Cas-OFFinder. (B) The workflow of Cas-OFFinder. (C) Running time per target site as a function of the number of input target sites via CPU (black squares) and GPU (red circles)

Fig. 1.

(A) The scheme of Cas-OFFinder. (B) The workflow of Cas-OFFinder. (C) Running time per target site as a function of the number of input target sites via CPU (black squares) and GPU (red circles)

2.2 Workflow of Cas-OFFinder

Cas-OFFinder is composed of two different OpenCL kernels (a searching kernel and a comparing kernel) and C++ (wrapper) parts (Fig. 1B). First, Cas-OFFinder reads genome sequence data files in single or multi-sequence FASTA formats. To read and parse FASTA files, an open-source FASTA/FASTQ parser library is used. Although OpenCL supports various processors, the memory of the devices is not always large enough for big data analysis. To overcome the memory limitation of OpenCL devices, wrapper1 divides the genome data into units of the largest possible size allowed by the device memory. These divided chunks are then loaded into the searching kernel that compiles all sites that include a PAM sequence in the entire genome. To search for and select these specific sites rapidly and effectively, the searching kernel runs independently on every calculation unit of a processor, i.e. all searching processes on the calculation units are accomplished simultaneously. After this step, wrapper2 collects the information about the specific sites containing PAM sequences and delivers these sequences to the comparing kernel, which counts the number of mismatched bases. Similar to the searching kernel, all comparing processes on the calculation units are accomplished simultaneously. Finally, wrapper3 selects potential off-target sites that have fewer mismatched bases than a given threshold, and writes the results into an output file with the following information: chromosome number, position, direction, number of mismatched bases and potential off-target DNA sequences with mismatched bases noted in lowercase letters. These processes are repeated until all the divided chunks are loaded.

3 RESULTS AND DISCUSSION

To evaluate the performance of Cas-OFFinder, we first chose arbitrary SpCas9 target sites in the human genome and ran Cas-OFFinder with query sequences via CPU (Intel i7 3770K) or GPU (AMD Radeon HD 7870). Notably, running time per target site was decreased as the number of target sites was increased (Fig. 1C). This result is expected because the searching kernel works only once for many input targets. The speed of Cas-OFFinder based on GPU (3.0 s) was 20× faster than that of CPU (60.0 s) when 1000 target sites were analyzed. We also used Cas-OFFinder to search for potential off-target sites of NmCas9, which recognizes 5′-NNNNGMTT-3′ (where M is A or C) PAM sequences in addition to a 24-bp target DNA sequence specific to guide RNA in human and other genomes (Table 1). Note that Cas-OFFinder allows mixed bases to account for the degeneracy in PAM sequences.

Table 1.

Running time of Cas-OFFinder via GPU to search for NmCas9 potential off-target sites

Data set (size) Number of mismatches Time for 100 targets
H. sapiens genome (3.01 Gb) 1 76.4 ± 2.0 s
H. sapiens genome (3.01 Gb) 5 79.9 ± 1.6 s
H. sapiens genome (3.01 Gb) 10 114.4 ± 3.0 s
M. musculus genome (2.65 Gb) 5 62.6 ± 2.4 s
D. rerio genome (1.32 Gb) 5 37.7 ± 3.5 s
A. thaliana genome (116 Mb) 5 4.8 ± 0.8 s
Data set (size) Number of mismatches Time for 100 targets
H. sapiens genome (3.01 Gb) 1 76.4 ± 2.0 s
H. sapiens genome (3.01 Gb) 5 79.9 ± 1.6 s
H. sapiens genome (3.01 Gb) 10 114.4 ± 3.0 s
M. musculus genome (2.65 Gb) 5 62.6 ± 2.4 s
D. rerio genome (1.32 Gb) 5 37.7 ± 3.5 s
A. thaliana genome (116 Mb) 5 4.8 ± 0.8 s

Table 1.

Running time of Cas-OFFinder via GPU to search for NmCas9 potential off-target sites

Data set (size) Number of mismatches Time for 100 targets
H. sapiens genome (3.01 Gb) 1 76.4 ± 2.0 s
H. sapiens genome (3.01 Gb) 5 79.9 ± 1.6 s
H. sapiens genome (3.01 Gb) 10 114.4 ± 3.0 s
M. musculus genome (2.65 Gb) 5 62.6 ± 2.4 s
D. rerio genome (1.32 Gb) 5 37.7 ± 3.5 s
A. thaliana genome (116 Mb) 5 4.8 ± 0.8 s
Data set (size) Number of mismatches Time for 100 targets
H. sapiens genome (3.01 Gb) 1 76.4 ± 2.0 s
H. sapiens genome (3.01 Gb) 5 79.9 ± 1.6 s
H. sapiens genome (3.01 Gb) 10 114.4 ± 3.0 s
M. musculus genome (2.65 Gb) 5 62.6 ± 2.4 s
D. rerio genome (1.32 Gb) 5 37.7 ± 3.5 s
A. thaliana genome (116 Mb) 5 4.8 ± 0.8 s

In conclusion, Cas-OFFinder enables searching for potential off-target sites in any sequenced genome rapidly without limiting the PAM sequence or the number of mismatched bases. These features make Cas-OFFinder applicable to ZFNs, TALENs and transcription factors that are prone to off-target DNA recognition.

Funding: National Research Foundation of Korea (2013000718 to J.-S.K.) and the Plant Molecular Breeding Center of Next-Generation BioGreen 21 Program (PJ009081), the National Research Foundation of Korea (2013065262), TJ Park Science Fellowship (to S.B.).

Conflict of Interest: none declared.

REFERENCES

et al.

Enhancing gene targeting with designed zinc finger nucleases

,

Science

,

2003

, vol.

300

pg.

764

et al.

Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease

,

Nat. Biotechnol.

,

2013

, vol.

31

(pg.

230

-

232

)

et al.

Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases

,

Genome Res.

,

2014

, vol.

24

(pg.

132

-

141

)

et al.

Multiplex genome engineering using CRISPR/Cas systems

,

Science

,

2013

, vol.

339

(pg.

819

-

823

)

et al.

ZFN-site searches genomes for zinc finger nuclease target sites and off-target sites

,

BMC Bioinformatics

,

2011

, vol.

12

pg.

152

et al.

High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells

,

Nat. Biotechnol.

,

2013

, vol.

31

(pg.

822

-

826

)

et al.

Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis

,

Proc. Natl Acad. Sci. USA

,

2013

, vol.

110

(pg.

15644

-

15649

)

et al.

DNA targeting specificity of RNA-guided Cas9 nucleases

,

Nat. Biotechnol.

,

2013

, vol.

31

(pg.

827

-

832

)

et al.

Indexing strategies for rapid searches of short words in genome sequences

,

PLoS One

,

2007

, vol.

2

pg.

e579

et al.

A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity

,

Science

,

2012

, vol.

337

(pg.

816

-

821

)

et al.

RNA-programmed genome editing in human cells

,

eLife

,

2013

, vol.

2

pg.

e00471

et al.

Targeted genome editing in human cells with zinc finger nucleases constructed via modular assembly

,

Genome Res.

,

2009

, vol.

19

(pg.

1279

-

1288

)

et al.

A library of TAL effector nucleases spanning the human genome

,

Nat. Biotechnol.

,

2013

, vol.

31

(pg.

251

-

258

)

et al.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

,

Genome Biol.

,

2009

, vol.

10

pg.

R25

et al.

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

,

Bioinformatics

,

2012

, vol.

28

(pg.

1830

-

1837

)

et al.

RNA-guided human genome engineering via Cas9

,

Science

,

2013

, vol.

339

(pg.

823

-

826

)

et al.

A TALE nuclease architecture for efficient genome editing

,

Nat. Biotechnol.

,

2011

, vol.

29

(pg.

143

-

148

)

Author notes

Associate Editor: John Hancock

†The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

© The Author 2014. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Citations

Views

Altmetric

Metrics

Total Views 33,298

25,945 Pageviews

7,353 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 26
December 2016 26
January 2017 65
February 2017 138
March 2017 164
April 2017 99
May 2017 166
June 2017 155
July 2017 120
August 2017 107
September 2017 146
October 2017 129
November 2017 164
December 2017 192
January 2018 264
February 2018 264
March 2018 287
April 2018 258
May 2018 241
June 2018 218
July 2018 239
August 2018 251
September 2018 213
October 2018 223
November 2018 307
December 2018 329
January 2019 293
February 2019 287
March 2019 392
April 2019 342
May 2019 378
June 2019 273
July 2019 354
August 2019 271
September 2019 278
October 2019 335
November 2019 241
December 2019 261
January 2020 276
February 2020 253
March 2020 258
April 2020 205
May 2020 233
June 2020 340
July 2020 244
August 2020 267
September 2020 271
October 2020 365
November 2020 313
December 2020 367
January 2021 365
February 2021 399
March 2021 460
April 2021 495
May 2021 561
June 2021 443
July 2021 400
August 2021 380
September 2021 445
October 2021 391
November 2021 442
December 2021 390
January 2022 338
February 2022 405
March 2022 528
April 2022 536
May 2022 461
June 2022 448
July 2022 481
August 2022 408
September 2022 463
October 2022 413
November 2022 407
December 2022 386
January 2023 317
February 2023 425
March 2023 471
April 2023 443
May 2023 464
June 2023 411
July 2023 370
August 2023 382
September 2023 444
October 2023 451
November 2023 482
December 2023 487
January 2024 686
February 2024 953
March 2024 926
April 2024 500
May 2024 499
June 2024 378
July 2024 497
August 2024 443
September 2024 465
October 2024 387
November 2024 84

×

Email alerts

Citing articles via

More from Oxford Academic