Broad specificity profiling of TALENs results in engineered nucleases with improved DNA-cleavage specificity - PubMed (original) (raw)

Broad specificity profiling of TALENs results in engineered nucleases with improved DNA-cleavage specificity

John P Guilinger et al. Nat Methods. 2014 Apr.

Abstract

Although transcription activator-like effector nucleases (TALENs) can be designed to cleave chosen DNA sequences, TALENs have activity against related off-target sequences. To better understand TALEN specificity, we profiled 30 unique TALENs with different target sites, array length and domain sequences for their abilities to cleave any of 10(12) potential off-target DNA sequences using in vitro selection and high-throughput sequencing. Computational analysis of the selection results predicted 76 off-target substrates in the human genome, 16 of which were accessible and modified by TALENs in human cells. The results suggest that (i) TALE repeats bind DNA relatively independently; (ii) longer TALENs are more tolerant of mismatches yet are more specific in a genomic context; and (iii) excessive DNA-binding energy can lead to reduced TALEN specificity in cells. Based on these findings, we engineered a TALEN variant that exhibits equal on-target cleavage activity but tenfold lower average off-target activity in human cells.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

D.R.L and J.K.J. have filed a provisional patent related to this work and are consultants for Editas Medicine, a company that applies genome engineering technologies. J.K.J. has financial interests in Editas Medicine and Transposagen Biopharmaceuticals. J.K.J.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.

Figures

Figure 1

Figure 1. TALEN architecture and selection scheme

(A) Architecture of a TALEN. A TALEN monomer contains an N-terminal domain (blue) followed by an array of TALE repeats (brown), a C-terminal domain (green), and a _Fok_I nuclease cleavage domain (purple). The 12th and 13th amino acids (the RVD, red) of each TALE repeat recognize a specific DNA base pair. Two different TALENs bind their corresponding half-sites, allowing FokI dimerization and DNA cleavage. The C-terminal domain variants used in this study are shown in green. (B) A single-stranded library of DNA oligonucleotides containing partially randomized left half-site (L), spacer (S), right half-site (R) and constant region (thick black line) was circularized, then concatemerized by rolling circle amplification. The concatemerized double stranded DNA (double arrows) contained repeated target sites with L′ S′ R′ representing the reverse sequence complement of L S R. (C) The concatemerized DNA libraries of mutant target sites were incubated with an _in vitro_-translated TALEN of interest. Cleaved library members were blunted and ligated to adapter #1. The ligation products were amplified by PCR using one primer consisting of adapter #1 and the other primer consisting of adapter #2–constant sequence, which anneals to the constant regions. From the resulting ladder of amplicons containing a half-site with an integral number (n) of repeats of a target site (represented by brackets), amplicons corresponding to 1.5 target-sites in length were isolated by gel purification and subjected to high-throughput DNA sequencing and computational analysis (Supplementary Algorithms).

Figure 1

Figure 1. TALEN architecture and selection scheme

(A) Architecture of a TALEN. A TALEN monomer contains an N-terminal domain (blue) followed by an array of TALE repeats (brown), a C-terminal domain (green), and a _Fok_I nuclease cleavage domain (purple). The 12th and 13th amino acids (the RVD, red) of each TALE repeat recognize a specific DNA base pair. Two different TALENs bind their corresponding half-sites, allowing FokI dimerization and DNA cleavage. The C-terminal domain variants used in this study are shown in green. (B) A single-stranded library of DNA oligonucleotides containing partially randomized left half-site (L), spacer (S), right half-site (R) and constant region (thick black line) was circularized, then concatemerized by rolling circle amplification. The concatemerized double stranded DNA (double arrows) contained repeated target sites with L′ S′ R′ representing the reverse sequence complement of L S R. (C) The concatemerized DNA libraries of mutant target sites were incubated with an _in vitro_-translated TALEN of interest. Cleaved library members were blunted and ligated to adapter #1. The ligation products were amplified by PCR using one primer consisting of adapter #1 and the other primer consisting of adapter #2–constant sequence, which anneals to the constant regions. From the resulting ladder of amplicons containing a half-site with an integral number (n) of repeats of a target site (represented by brackets), amplicons corresponding to 1.5 target-sites in length were isolated by gel purification and subjected to high-throughput DNA sequencing and computational analysis (Supplementary Algorithms).

Figure 1

Figure 1. TALEN architecture and selection scheme

(A) Architecture of a TALEN. A TALEN monomer contains an N-terminal domain (blue) followed by an array of TALE repeats (brown), a C-terminal domain (green), and a _Fok_I nuclease cleavage domain (purple). The 12th and 13th amino acids (the RVD, red) of each TALE repeat recognize a specific DNA base pair. Two different TALENs bind their corresponding half-sites, allowing FokI dimerization and DNA cleavage. The C-terminal domain variants used in this study are shown in green. (B) A single-stranded library of DNA oligonucleotides containing partially randomized left half-site (L), spacer (S), right half-site (R) and constant region (thick black line) was circularized, then concatemerized by rolling circle amplification. The concatemerized double stranded DNA (double arrows) contained repeated target sites with L′ S′ R′ representing the reverse sequence complement of L S R. (C) The concatemerized DNA libraries of mutant target sites were incubated with an _in vitro_-translated TALEN of interest. Cleaved library members were blunted and ligated to adapter #1. The ligation products were amplified by PCR using one primer consisting of adapter #1 and the other primer consisting of adapter #2–constant sequence, which anneals to the constant regions. From the resulting ladder of amplicons containing a half-site with an integral number (n) of repeats of a target site (represented by brackets), amplicons corresponding to 1.5 target-sites in length were isolated by gel purification and subjected to high-throughput DNA sequencing and computational analysis (Supplementary Algorithms).

Figure 2

Figure 2. In vitro selection results

The fraction of sequences surviving selection (green) and before selection (black) are shown for CCR5A TALENs (A) and ATM TALENs (B) with EL/KK _Fok_I domains as a function of the number of mutations in both half-sites (left and right half-sites combined excluding the spacer). (C) Specificity scores for the CCR5A TALENs at all positions in the target half-sites plus a single flanking position. The colors range from dark blue (maximum specificity score of 1.0) to white (no specificity, score of 0) to dark red (maximum negative score of −1.0); see the main text for details. Boxed bases represent the intended target base. Note for the right half-site, the R18 TALENs, the sense strand is shown. (D) Same as (C) for the ATM TALENs. For (A), (B), (C) and (D) sample statistics (sample sizes, means, standard deviations, and P-values) are given in Supplementary Table S2 and S3. (E) Enrichment values from the selection of L13+R13 CCR5B TALEN for 16 mutant DNA sequences (mutations in red) relative to on-target DNA (OnB). (F) Correspondence between discrete in vitro TALEN cleavage efficiency (cleaved DNA as a fraction of total DNA) for the sequences listed in (E) normalized to on-target cleavage (= 1) versus their enrichment values in the selection normalized to the on-target enrichment value (= 1). The Pearson’s r coefficient of correlation between normalized cleavage efficiency and normalized enrichment value is 0.90. (G) Discrete assays of on-target and off-target sequences used in (F) as analyzed by PAGE.

Figure 3

Figure 3. Cellular modification induced by TALENs at on-target and predicted off-target genomic sites

(A) For cells treated with either no TALEN or CCR5A TALENs containing heterodimeric EL/KK, heterodimeric ELD/KKR, or the homodimeric (Homo) _Fok_I cleavage domain variants, cellular modification rates are shown as the percentage of observed insertions or deletions (indels) consistent with TALEN cleavage relative to the total number of sequences for on-target (On) and predicted off-target sites (Off). See the main text for details. ND refers to no data collected since the cellular modification of off-target sites OffC-38, OffC-49, OffC-69 and OffC-76 was not assayed for CCR5A TALENs containing EL/KK and ELD/KKR FokI domains. (B) Same as (A) for ATM TALENs. For (A) and (B) sample sizes and P-values are given in Supplementary Tables S7 and S9.

Figure 4

Figure 4. In vitro specificity as a function of TALEN length

The enrichment value of on-target (zero mutation) and off-target sequences containing one to six mutations are shown for CCR5B TALENs of varying TALE repeat array lengths with EL/KK _Fok_I domains. The TALENs targeted DNA sites of 32 bp (L16+R16), 29 bp (L16+R13 or L13+R16), 26 bp (L16+R10 or L13+R13 or L10+R16), 23 bp (L13+R10 or L10+R13) or 20 bp (L10+R10) in length.

Figure 5

Figure 5. In vitro specificity and discrete cleavage efficiencies of TALENs containing canonical or engineered C-terminal domains

(A and B) On-target enrichment values for selections of (A) CCR5A TALENs containing canonical, Q3, Q7, or 28-aa C-terminal domains with EL/KK _Fok_I cleavage domains or (B) ATM TALENs containing canonical, Q3 or Q7 C-terminal domains with EL/KK _Fok_I cleavage domains. (C) CCR5A on-target sequence (OnC) and double-mutant sequences with mutations in red. For CCR5A, sequences containing two mutations were assayed because one-mutation and zero-mutation sequences were similarly enriched (Supplementary Table S4A). (D) ATM on-target sequence (OnA), single-mutant sequences, and double-mutant sequences with mutations in red. (E) Discrete in vitro cleavage efficiency of DNA sequences listed in (C) with CCR5A TALENs containing either canonical or engineered Q7 C-terminal domains with EL/KK FokI domains. Error bars reflect s.d. from three biological replicates, except two replicates for C4. All pairwise P-values were calculated between the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the canonical C-terminal vs. the cleavage efficiencies of a mutant sequence (C1, C2, …., or C8) digested by CCR5A TALENs containing the canonical C-terminal domain. All pairwise P-values were also calculated between the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the Q7 C-terminal domain vs. the cleavage efficiencies of a mutant sequence (C1, C2, …., or C8) digested by CCR5A TALENs containing the Q7 C-terminal domain. The cleavage efficiencies of mutant sequences C1, C2, C6, C7 and C8 digested by CCR5A TALENs containing the canonical C-terminal domain demonstrate a P-value significantly different (P-value < 0.025) from the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the canonical C-terminal domain. The cleavage efficiencies of mutant sequences C1, C3, C4, C6, C7 and C8 digested by CCR5A TALENs containing the Q7 C-terminal domain demonstrate a P-value significantly different (P-value < 0.025) from the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the Q7 C-terminal domain. (F) Same as (E) for ATM TALENs. The cleavage efficiencies of mutant sequences A1, A2, A3, A5, A6, A7 and A8 digested by ATM TALENs containing the Q7 C-terminal domain demonstrate a P-value significantly different (P-value < 0.025) from the cleavage efficiencies of ATM TALENs containing the canonical C-terminal domain digestion of the on-target sequence (OnC).

Figure 5

Figure 5. In vitro specificity and discrete cleavage efficiencies of TALENs containing canonical or engineered C-terminal domains

(A and B) On-target enrichment values for selections of (A) CCR5A TALENs containing canonical, Q3, Q7, or 28-aa C-terminal domains with EL/KK _Fok_I cleavage domains or (B) ATM TALENs containing canonical, Q3 or Q7 C-terminal domains with EL/KK _Fok_I cleavage domains. (C) CCR5A on-target sequence (OnC) and double-mutant sequences with mutations in red. For CCR5A, sequences containing two mutations were assayed because one-mutation and zero-mutation sequences were similarly enriched (Supplementary Table S4A). (D) ATM on-target sequence (OnA), single-mutant sequences, and double-mutant sequences with mutations in red. (E) Discrete in vitro cleavage efficiency of DNA sequences listed in (C) with CCR5A TALENs containing either canonical or engineered Q7 C-terminal domains with EL/KK FokI domains. Error bars reflect s.d. from three biological replicates, except two replicates for C4. All pairwise P-values were calculated between the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the canonical C-terminal vs. the cleavage efficiencies of a mutant sequence (C1, C2, …., or C8) digested by CCR5A TALENs containing the canonical C-terminal domain. All pairwise P-values were also calculated between the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the Q7 C-terminal domain vs. the cleavage efficiencies of a mutant sequence (C1, C2, …., or C8) digested by CCR5A TALENs containing the Q7 C-terminal domain. The cleavage efficiencies of mutant sequences C1, C2, C6, C7 and C8 digested by CCR5A TALENs containing the canonical C-terminal domain demonstrate a P-value significantly different (P-value < 0.025) from the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the canonical C-terminal domain. The cleavage efficiencies of mutant sequences C1, C3, C4, C6, C7 and C8 digested by CCR5A TALENs containing the Q7 C-terminal domain demonstrate a P-value significantly different (P-value < 0.025) from the cleavage efficiencies of the on-target sequence (OnC) digested by CCR5A TALENs containing the Q7 C-terminal domain. (F) Same as (E) for ATM TALENs. The cleavage efficiencies of mutant sequences A1, A2, A3, A5, A6, A7 and A8 digested by ATM TALENs containing the Q7 C-terminal domain demonstrate a P-value significantly different (P-value < 0.025) from the cleavage efficiencies of ATM TALENs containing the canonical C-terminal domain digestion of the on-target sequence (OnC).

Figure 6

Figure 6. Specificity of engineered TALENs in human cells

(A) The cellular modification efficiency of canonical and engineered TALENs expressed as a percentage of indels consistent with TALEN-induced modification out of total sequences is shown for the on-target CCR5A site (OnCCR5A) and for CCR5A off-target site #5 (OffC5), the most highly cleaved off-target substrate tested. All pairwise P-values comparing the number of observed sequences containing insertions or deletions consistent with TALEN-induced cleavage vs. the total number of sequences were calculated with a Fischer exact test between samples (see Supplementary Table S7). P-values are < 0.005 for samples of canonical vs. Q3 vs. Q7 TALENs in the same _Fok_I background for both on-target and off-target sites with the exception of off-target site #5 modified with Q3 vs. Q7 TALENs in the EL/KK FokI background (P-value < 0.087). On:off target activity, defined as the ratio of on-target to off-target modification, is shown above each pair of bars. (B) The on:off target activity of the canonical, Q3, and Q7 TALENs for each detected genomic off-target substrate of the CCR5A TALEN with the ELD/KKR _Fok_I domain are shown. The absolute genomic modification frequency for the on-target site is in parentheses. (C) Same as (B) for the ATM TALENs and off-target sites. (D) The on:off target activities of the canonical, Q3, and Q7 TALENs for each detected genomic off-target substrate of the PMS2, SDHD, and HDAC1 TALENs with the ELD/KKR _Fok_I domain are shown. The absolute genome modification frequency for the on-target site is in parentheses.

Similar articles

Cited by

References

    1. Moscou MJ, Bogdanove AJ. A simple cipher governs DNA recognition by TAL effectors. Science. 2009;326:1501. - PubMed
    1. Boch J, et al. Breaking the code of DNA binding specificity of TAL-type III effectors. Science. 2009;326:1509–1512. - PubMed
    1. Doyon Y, et al. Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nature methods. 2011;8:74–79. - PubMed
    1. Cade L, et al. Highly efficient generation of heritable zebrafish gene mutations using homo- and heterodimeric TALENs. Nucleic Acids Res. 2012;40:8001–8010. - PMC - PubMed
    1. Miller JC, et al. A TALE nuclease architecture for efficient genome editing. Nat Biotechnol. 2011;29:143–148. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources