Decoding non-random mutational signatures at Cas9 targeted sites - PubMed (original) (raw)

. 2018 Sep 19;46(16):8417-8434.

doi: 10.1093/nar/gky653.

Benjamin J M Taylor 2, Roberto Nitsch 1 3, Anders Lundin 1, Anna-Lina Cavallo 1, Katja Madeyski-Bengtson 1, Fredrik Karlsson 4, Maryam Clausen 1, Ryan Hicks 1, Lorenz M Mayr 1 5, Mohammad Bohlooly-Y 1, Marcello Maresca 1

Affiliations

Decoding non-random mutational signatures at Cas9 targeted sites

Amir Taheri-Ghahfarokhi et al. Nucleic Acids Res. 2018.

Abstract

The mutation patterns at Cas9 targeted sites contain unique information regarding the nuclease activity and repair mechanisms in mammalian cells. However, analytical framework for extracting such information are lacking. Here, we present a novel computational platform called Rational InDel Meta-Analysis (RIMA) that enables an in-depth comprehensive analysis of Cas9-induced genetic alterations, especially InDels mutations. RIMA can be used to quantitate the contribution of classical microhomology-mediated end joining (c-MMEJ) pathway in the formation of mutations at Cas9 target sites. We used RIMA to compare mutational signatures at 15 independent Cas9 target sites in human A549 wildtype and A549-POLQ knockout cells to elucidate the role of DNA polymerase θ in c-MMEJ. Moreover, the single nucleotide insertions at the Cas9 target sites represent duplications of preceding nucleotides, suggesting that the flexibility of the Cas9 nuclease domains results in both blunt- and staggered-end cuts. Thymine at the fourth nucleotide before protospacer adjacent motif (PAM) results in a two-fold higher occurrence of single nucleotide InDels compared to guanine at the same position. This study provides a novel approach for the characterization of the Cas9 nucleases with improved accuracy in predicting genome editing outcomes and a potential strategy for homology-independent targeted genomic integration.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Analysis of DNA repair profiles following Cas9 cleavage of genomic sites in mammalian cells. (A) Overview of double strand break (DSBs) repair pathways; Non Homologous End Joining (NHEJ), alternative End Joining (alt-EJ), Single Strand Annealing (SSA), and Homologous Recombination (HR), and their InDel footprints in mammalian cells. (B) Schematic of experimental procedures used to detect and analyse the InDel footprints after Cas9 cuts a genomic locus. (C) Flowchart and overview of the algorithm used in RIMA to adjust and classify the mutations. sgRNAs and PAM locations are highlighted on the reference sequence (RefSeq) in purple and yellow respectively. Complex indels (e.g. multiple nucleotide variations and replacements) were categorized as ‘Other type’. (D) RIMA generates a colour-coded alignment of the mutations detected in the NGS data. The wild-type sequence is shown on the top. (E) A graphical representation of the alignment shown in (D) generated using RIMA. The orientation of PAM and the sgRNA are shown beneath the scale bar. The length of all deletions is represented by the scale bar on the top. The deletions associated with microhomologies are visualized according to the bars shown in the legend. For the single and double nucleotide insertions or duplications, the corresponding nucleotides are shown under the symbol indicating their position. Only the length of insertions and duplications longer than two nucleotides are indicated. The vertical black line indicates the cut site. (F) Classification of the InDels was based on their attributes. The frequency of each class was calculated as a fraction of the mutant reads or a fraction of their parental category. For example, the frequency of the single nucleotide duplications is calculated as the fraction of the single nucleotide insertions equal to the preceding nucleotides.

Figure 2.

Figure 2.

Validation of RIMA using publicly available datasets (15,18). (A) Workflow used in this study to download and reanalyse data from previous studies. (B) Percentages of modified reads and (C) fraction of modified reads with out-of-frame InDels calculated by Bae et al. (y-axis) plotted against RIMA calculations (x-axis). (D) Microhomology scores (MHscore) reported by Bae et al. (y-axis) compared to RIMA generated percentage c-MMEJ (x-axis). The correlation between datasets in B, C and D was calculated by linear regression (solid line) with 95% confidence intervals indicated (dashed line) and Pearson correlation coefficients (r) and _P_-values displayed. Time-dependent changes in the mutation patterns after Cas9 cleavage from van Overbeek et al. dataset were analysed for the (E) c-MMEJ/other-EJ ratio, (F) InDel size and frequency of in-frame and (G) out-of-frame mutations. (H) Comparison of other-EJ/c-MMEJ ratios in InDels with different lengths 48 h after transfection. Corresponding Spearman correlation coefficients (ρ) with _P_-values indicated within the graphs. All error bars indicate the standard errors of the mean (S.E.M.) for NHEK293 = 94, NHCT116 = 94, NK562 = 94. (I) Histogram plot of c-MMEJ ratio distribution in different cell lines. The number of target sites with dominant MMEJ repair is highlighted in yellow. Data shown were obtained 48 h after transfection (15,18).

Figure 3.

Figure 3.

c-MMEJ-associated mutations are insensitive to the inhibition of the MRN complex. (A) Schematic of experimental procedures. Mirin was added to the cells 1 h before transfection. Plasmids expressing the Cas9 gene and sgRNA were transiently co-transfected into HEK293 or HCT116 cells. Genomic DNA was harvested 72 h after transfection for NGS and RIMA analysis. The percentages of the (B) modified reads and (C and D) c-MMEJ were determined for different Mirin doses (5, 10, 20 and 40 μM), DMSO vehicle or untreated control samples at three target sites. Data shown were obtained from three independent biological replicates. The TGAT nucleotides shown on the sgRNA are the second to fifth nucleotides before PAM. All error bars indicate the s.e.m. Dunnett's method was performed to statistically analyse and compare to the DMSO group (**P< 0.01; ***P< 0.001). Relative frequencies of each repair event are presented as the mean ± standard deviation (S.D.).

Figure 4.

Figure 4.

Polθ contributes to the formation of c-MMEJ associated deletions. (A) Schematic of experimental procedures. Three cell lines were transfected with one of 15 sgRNA expressing plasmids. Genomic DNA was harvested at 12, 24 and 60 h after transfection and were subjected to NGS and RIMA analysis. (B) Overall mutagenesis and (C) c-MMEJ rates are shown for all sgRNAs at indicated time point. (D) The summary of B and C for data obtained at 60 h after transfection is shown top and bottom, respectively. Significance determined by students t-test: non-significant (ns), ***P< 0.001. Error bars on all graphs indicate the S.E.M. for three independent biological replicates. (E) Mutation patterns were visualized using RIMA for two sgRNAs (GFAP-sg1 and BCL6-sg1) in A549-Clone and A549-Clone-POLQ-KO cells at 60 h time after transfection. Relative frequencies of each repair event are presented as the mean ± standard deviation (S.D.).

Figure 5.

Figure 5.

Analysis of single nucleotide insertions/deletions (InDels) at the Cas9 target site in different human cell lines. (A) Schematic of RNA-guided Cas9 targeting DNA. Cas9 (yellow) complexed with sgRNA (red) and bound to DNA (blue). RuvC and HNH nuclease domains cut the non-target and target strands, respectively. (B) Schematic of frequent single nucleotide insertions and deletions at different positions. For simplicity, the nucleotides were numbered according to their distance from the PAM. (C) Comparison of single nucleotide InDel frequencies at Cas9 target sites; nt, nucleotide. NHeLa = 67, NHEK293 = 94, NHCT116 = 94, and NK562 = 94, *** P< 0.001. (D) Frequencies of single nucleotide insertions observed at position three (light green) or position four (dark green) relative to the PAM. To avoid the ambiguity of the mutation locations, only target sites with different nucleotides at positions three and four were analysed. (E) Percentage of similarities between the inserted single nucleotide and its 5′ precedent nucleotide compared to baseline. Red dashed line denotes a 25% random chance of a single nucleotide insertion to be similar to the adjacent 5′ nucleotide. ***P< 0.001 for comparisons among means and the baseline. (F) Observed frequencies of each single nucleotide deletion at the Cas9 target sites. Only target sites with different nucleotides at the cut sites (sgRNAs with different nucleotides at positions 3 and 4) were selected to precisely locate the InDels. All error bars represent the s.e.m.; the numbers of target sites shown in d, e and f were as follows: NHeLa = 50, NHEK293 = 58, NHCT116 = 58, and NK562 = 58. Student's _t_-test (one-tailed) was performed for the statistical analysis (***P< 0.001). (G) The fraction of single nucleotide InDels at target sites with different nucleotides at positions 20 nucleotides before to one nucleotide after PAM. The minimum (blue) and maximum (red) observed InDels rate are highlighted. (H) The fraction of single nucleotide InDels is plotted against the target site nucleotides. (I) Association between deletions and different nucleotides at position 4. NHeLa = 67, NHEK293 = 94, NHCT116 = 94 and NK562 = 94. All results illustrated in this figure were obtained from an analysis of the mutation patterns after 48 h (for HEK293, HCT116 and K562 cells) (18) and 72 h (for HeLa cells) (15).

Figure 6.

Figure 6.

Cas9 endonuclease activity generates frequent 5′ overhangs. (A) Model explaining the interplay between DNA repair and the catalytic activity of the Cas9 nuclease domains: (i) shows the cut position of the RuvC and HNH domains to generate blunt ends and subsequent NHEJ-mediated precise repair, (ii) shows the insertion generation by staggered cuts creating a single nucleotide 5′ overhangs followed by DNA polymerases ends filling the overhangs and subsequent NHEJ repair leading to duplication of the preceding 5′ nucleotide. For simplicity, the nucleotides upstream PAM are numbered. (B) Schematic of the competitive oligo-duplex incorporation assay. Oligo-duplexes were co-transfected with Cas9 and sgRNA expressing plasmids. Genomic DNA was extracted from cells 72 h after transfection for barcode deconvolution via NGS analysis. (C) The pattern of the oligo-duplexes captured at seven target sites. PAM is shown in green. The expected position of the cuts induced by Cas9 resulting in a single nucleotide 5′ overhang is shown by a red line on the protospacer sequence. The staggered nucleotide is indicated by a yellow star. Heatmaps are generated based on normalised values, ranging from high (red) to low (blue) detection frequency. Each cell in the heatmap represents the mean of three independent biological replicates in one experiment.

Figure 7.

Figure 7.

Delineation of mutational mechanism by RIMA: c-MMEJ is unperturbed by exonucleases activity at Cas9 induced breakpoints whilst they increase overall mutations but decrease insertions. (A) Schematic of experimental design used to investigate the effect of TREX2 and DNTT on the SpCas9 and FnCas9 mutagenesis rate and mutation patterns. Genetic modifications were identified by performing deep sequencing 72 h after transfection. (B) Percentage of modified reads and (C) percentage of insertions from cells mock (grey), DNTT (orange) or TREX2 (blue) co-transfected with SpCas9 or FnCas9. (D) Quantification of c-MMEJ at all target sites in analysed cell. All error bars represent the S.E.M. of three independent biological replicates in one experiment. (E) Visualized mutation patterns at one genomic locus targeted using SpCas9 and FnCas9 with and without the overexpression of TREX2. S.D.: standard deviation.

References

    1. Lillestol R.K., Redder P., Garrett R.A., Brugger K.. A putative viral defence mechanism in archaeal cells. Archaea. 2006; 2:59–72. - PMC - PubMed
    1. Makarova K.S., Grishin N.V., Shabalina S.A., Wolf Y.I., Koonin E.V.. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol. Direct. 2006; 1:7. - PMC - PubMed
    1. Marraffini L.A., Sontheimer E.J.. CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea. Nat. Rev. Genet. 2010; 11:181–190. - PMC - PubMed
    1. Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E.. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012; 337:816–821. - PMC - PubMed
    1. Mali P., Yang L., Esvelt K.M., Aach J., Guell M., DiCarlo J.E., Norville J.E., Church G.M.. RNA-guided human genome engineering via Cas9. Science. 2013; 339:823–826. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources