Comprehensive analysis of the specificity of transcription activator-like effector nucleases - PubMed (original) (raw)

. 2014 Apr;42(8):5390-402.

doi: 10.1093/nar/gku155. Epub 2014 Feb 24.

Gwendoline Dubois, Julien Valton, Séverine Thomas, Stefano Stella, Alan Maréchal, Stéphanie Langevin, Nassima Benomari, Claudia Bertonati, George H Silva, Fayza Daboussi, Jean-Charles Epinat, Guillermo Montoya, Aymeric Duclert, Philippe Duchateau

Affiliations

Comprehensive analysis of the specificity of transcription activator-like effector nucleases

Alexandre Juillerat et al. Nucleic Acids Res. 2014 Apr.

Abstract

A key issue when designing and using DNA-targeting nucleases is specificity. Ideally, an optimal DNA-targeting tool has only one recognition site within a genomic sequence. In practice, however, almost all designer nucleases available today can accommodate one to several mutations within their target site. The ability to predict the specificity of targeting is thus highly desirable. Here, we describe the first comprehensive experimental study focused on the specificity of the four commonly used repeat variable diresidues (RVDs; NI:A, HD:C, NN:G and NG:T) incorporated in transcription activator-like effector nucleases (TALEN). The analysis of >15 500 unique TALEN/DNA cleavage profiles allowed us to monitor the specificity gradient of the RVDs along a TALEN/DNA binding array and to present a specificity scoring matrix for RVD/nucleotide association. Furthermore, we report that TALEN can only accommodate a relatively small number of position-dependent mismatches while maintaining a detectable activity at endogenous loci in vivo, demonstrating the high specificity of these molecular tools. We thus envision that the results we provide will allow for more deliberate choices of DNA binding arrays and/or DNA targets, extending our engineering capabilities.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Gene disruption-relative activities of collection of TALEN at the integrated GFP locus. (A) Schematic representation of the WT GFP TALEN on the chromosomal target. (B) Collections of TALE used at the chromosomal GFP locus derived by mutation of the right DNA binding domain. X represents any of the four, namely, NI, HD, NN and NG RVD. Positions are numbered relative to the first thymine of the target (T0). (C) Influence of the number of mismatches on the GFP disruption. Activity ratio between the mismatched and the WT TALEN is represented on a boxplot, indicating the median (thick bar), quartiles (box) and extreme values (r = −0.68, P = 4e−16). Mismatches are defined relative to the NI:A, HD:C, NN:G and NG:T codes. The size of the sample is indicated in brackets. (D) Boxplot representation, including the median (thick bar), quartiles (box) and extreme values, of the activity ratio between the mismatched and the WT TALEN in function of the collections for one mismatch. P = 0.08 (Kruskal–Wallis test). (E) Same as for (D) but for two mismatches. P = 4e−4. (F) Same as in (D) but for three mismatches. P = 2e−4.

Figure 2.

Figure 2.

Design of experimental setup and optimal TALEN array length for the nuclease activity screening. (A) Activities of 52 TALEN pairs measured in vivo, in function of the repeat array length. Error bars represent standard errors. (B) Activities of 15 TALEN pairs on targets randomized in their positions N and N-1. Error bars represent standard errors. An analysis of variance demonstrates a significant effect of the TALEN length on specificity, position N: r = 0.68, P = 4e−6 (Kruskal–Wallis test); position N-1: r = 0.7, P = 7e−6). The yeast activity assay is based on the single-strand annealing (SSA) pathway used after the creation of a DSB by the TALEN in the target sequence. Target sites were designed to allow TALEN use in the homodimer format. All TALEN pairs showed a significant activity. The number of TALEN/target pairs for each class is indicated in brackets.

Figure 3.

Figure 3.

Setup of collections used in the large-scale experiments and graphical representation of activity results from Collection 6. (A) Collections of TALE and targets used for the study in yeast, where X represents any of the four NI, HD, NN and NG RVD and N any of the four A, C, G and T bases. Collections 6–8 are composed of arrays containing 9.5 repeats and Collection 9 is composed of arrays of 18.5. The TALEN collections were used in the homodimer format. (B) Heatmap showing the activity of the 64 TALEN of Collection 6 on the 64 corresponding targets. The outer line of text of the target (abscissa) represents the first nucleotide of the NNN triplet, the middle line of text represents the second nucleotide and the innermost line of text represents the third nucleotide. Likewise for the RVD array (XXX) on the ordinate. Red corresponds to maximum activity, whereas blue corresponds to no activity. The diagonal with framed squares represents the NI:A, HD:C, NN:G and NG:T pairings. In the case of a perfect one to one RVD/nucleotide association code, activity should be recovered on the diagonal only.

Figure 4.

Figure 4.

Graphical representation of RVD specificities. (A) Specificity measured for positions 1–7. The level of gray (black = 1, white = 0) represents the relative activity compared with the HD:C, NG:T NI:A and NN:G RVD/nucleotide pairing code. (B) Average specificity of the four HD, NG, NI and NN RVDs on the first 7 positions. (C) Logo representation of the global specificity matrix. Logo was generated using WebLogo (

http://weblogo.berkeley.edu/logo.cgi

). Values for relative specificities are presented in

Supplementary Table S4

.

Figure 5.

Figure 5.

Effect of target mutations on TALEN activity in mammalian cells. (A) Correlation between experimental relative activities represented by the percentage of GFP-negative cells in the mammalian gene-targeting assay and scoring using the matrix presented in Figure 4B (Collections 1, 2 and 3 are represented in red: r = 0.81, P = 4e−16, and Collections 4 and 5 are represented in blue: r = 0.84, P = 3e−12). Linear regressions are presented for both subsets and 95% confidence intervals are represented by dashed lines. (B) Pie chart representation of the percentage of TALEN composed of 15.5 RVDs that will have, in the human genome, potential off-site targets containing no, one, two, three or four and more mismatches when considering a test set of 15 000 putative TALEN. All possible combinations of half TALEN (left + right, left + left, right + right) with a spacer length ranging from 9 to 30 bp were taken into account. (C) A collection of 33 targets comprising two AvrBs3 target sequences facing each other on both DNA strand with spacer length (between the two targets) ranging from 5 to 40 bp were designed and assayed in the CHO-K1 SSA assay to determine the optimal cleavage conditions. Targets containing a spacer of 21 and 35 bp were absent from the study.

Similar articles

Cited by

References

    1. DeFrancesco L. Move over ZFNs. Nat. Biotechnol. 2011;29:681–684. - PubMed
    1. Bogdanove AJ, Voytas DF. TAL effectors: customizable proteins for DNA targeting. Science. 2011;333:1843–1846. - PubMed
    1. Perez-Pinera P, Ousterout DG, Gersbach CA. Advances in targeted genome editing. Curr. Opin. Chem. Biol. 2012;16:268–277. - PMC - PubMed
    1. Mak AN, Bradley P, Cernadas RA, Bogdanove AJ, Stoddard BL. The crystal structure of TAL effector PthXo1 bound to its DNA target. Science. 2012;335:716–719. - PMC - PubMed
    1. Deng D, Yan C, Pan X, Mahfouz M, Wang J, Zhu JK, Shi Y, Yan N. Structural basis for sequence-specific recognition of DNA by TAL effectors. Science. 2012;335:720–723. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources