CasX enzymes comprise a distinct family of RNA-guided genome editors - PubMed (original) (raw)

. 2019 Feb;566(7743):218-223.

doi: 10.1038/s41586-019-0908-x. Epub 2019 Feb 4.

Jun-Jie Liu 1 2 3, Benjamin L Oakes 4, Enbo Ma 1, Hannah B Spinner 4, Katherine L M Baney 4, Jonathan Chuck 1, Dan Tan 5, Gavin J Knott 1, Lucas B Harrington 1, Basem Al-Shayeb 6, Alexander Wagner 7, Julian Brötzmann 8, Brett T Staahl 1 4, Kian L Taylor 4, John Desmarais 4, Eva Nogales 9 10 11 12, Jennifer A Doudna 13 14 15 16 17 18

Affiliations

CasX enzymes comprise a distinct family of RNA-guided genome editors

Jun-Jie Liu et al. Nature. 2019 Feb.

Erratum in

Abstract

The RNA-guided CRISPR-associated (Cas) proteins Cas9 and Cas12a provide adaptive immunity against invading nucleic acids, and function as powerful tools for genome editing in a wide range of organisms. Here we reveal the underlying mechanisms of a third, fundamentally distinct RNA-guided genome-editing platform named CRISPR-CasX, which uses unique structures for programmable double-stranded DNA binding and cleavage. Biochemical and in vivo data demonstrate that CasX is active for Escherichia coli and human genome modification. Eight cryo-electron microscopy structures of CasX in different states of assembly with its guide RNA and double-stranded DNA substrates reveal an extensive RNA scaffold and a domain required for DNA unwinding. These data demonstrate how CasX activity arose through convergent evolution to establish an enzyme family that is functionally separate from both Cas9 and Cas12a.

PubMed Disclaimer

Figures

Extended Data Figure 1.

Extended Data Figure 1.. CasX purification and substrate cleavage

a, RaxML Maximum Likelihood phylogenetic tree of type V effector proteins with TnpB nucleases. Triangle denotes collapsed branches. Bootstrap values are indicated as percentage points; values above 88 are shown between the major branches. b, Percent sequence identity pairwise comparisons between the conserved RuvC domains of Class 2 effectors Cas9 (type II-A), Cpf1 (type V-A), and CasX (type V-E) inferred from MAFFT alignment, depicted in an all vs all fashion. High identity is shown in blue with low identity shown in red. Histograms representing interfamily and intrafamily sequence identity value distributions are shown along the edge. c, DNA cleavage site comparison among Cas12a, Cas12b and CasX. 5 repeats with consistent results. d, DNA cleavage activity of DpbCasX mutations (n =3, mean ± s.d.). e, Schematic cartoon of GFP gene. Target regions for guide 1 to 9 are marked a long the gene. CasX guide screening by GFP disruption (n >2, mean ± s.d.) f, CRISPRi efficiency for CasX active site mutations. The Cas proteins and guide RNAs used in each assay are marked. Cas9-ng indicates non-targeting RNA guide of Streptococcus pyogenes Cas9 (SpyCas9), Cas9-g indicates the targeting RNA guide of SpyCas9. CasX-ng indicates non-targeting RNA guide of DpbCasX. CasX-g indicates targeting RNA guide of DpbCasX. GFP Disruption efficiency of targeting guide is shown by GFP signal/OD compared to the non-targeting guide control. (n = 4, mean ± s.d.) g, Purification of ApoCasX, CasX-gRNA binary complex and CasX-gRNA-DNA ternary complex with three DNA designs by size exclusion chromatography. The representative S200 size exclusion traces by UV280 absorbance are shown. Samples were taken from the labeled peaks and analyzed with urea-PAGE with sybrGold. sgRNA indicates the single-guide RNA. NTS indicates the non-target strand from target DNA. TS indicates the target strand from target DNA. All the reconstitutions have been repeated for more than 3 times with consistent results.

Extended Data Figure 2.

Extended Data Figure 2.. Mammalian cell editing by CasX

a, PlmCasX T7E1 gene editing validation of the mammalian cell GFP disruption assay from Figure 2g. b, PlmCasX T7EI quantification of a (n =3, mean ± s.d.). c, PlmCasX GFP disruption dose response (n =3, mean ± s.d.). The Cas proteins and guide RNAs used in each assay are marked. Cas9-ng indicates non-targeting RNA guide of Streptococcus pyogenes Cas9 (SpyCas9), Cas9-g indicates the targeting RNA guide of SpyCas9. CasX-ng indicates non-targeting RNA guide of DpbCasX. CasX-g indicates targeting RNA guide of DpbCasX. In the human assays, CasX-g2 & CasX-g3 are GFP targeting guides to the template and non-template strand respectively, and the GFP targeting guide of Cas9 (Cas9-g) which is not expected to direct CasX activity is used as the negative control for CasX-g2 and CasX-g3. d, EGFP disruption of clonal EGFP HEK293T cell lines with PlmCasX & various doses of plasmid (n =3, mean ± s.d.). Raw FACS data is plotted GFP on the X axis and FSC on the y axis with gates drawn to demonstrate how GFP negative cells are gated. e, Indels of GFP generated by PlmCasX cleavage as analyzed by sub-cloning and sanger sequencing of 20 clones. 3 repeats with consistent results f, Map of depicting the target sites for each of the CasX & Cas9 guides on the EGFP coding sequence for Figure 2h.

Extended Data Figure 3.

Extended Data Figure 3.. EM analysis of CasX-gRNA-DNA ternary complex with a 30bp target DNA

a, Target DNA sequence in this complex. b, EM analysis pipeline. 1,698,815 particles were picked from 7,500 drift-corrected micrographs and then used for 2D classification. By 2D based manual screening, 713,219 good particles were selected for 3D classification into 4 classes. 363,431 particles from the class that shows the most intact architecture were further used for heterogeneous refinement, which generated two reconstructions, State I and State II, with 71% and 29% of the particles, respectively. State I and State II were then independently refined to 3.8 Å and 4.2 Å. c, Euler angle distribution of the refined particles belonging to State I and State II. d, Fourier shell correlation (FSC) curve calculated using two independent half maps. e, The density maps for both states, colored by local resolution as calculated in Cryopsarc. Resolution ranges from 3Å to 7 Å. Panels c and d are directly taken from the standard output of Cryosparc.

Extended Data Figure 4.

Extended Data Figure 4.. EM analysis of CasX-gRNA-DNA ternary complex with full R-loop (45bp target DNA)

a, Target DNA sequence in this complex. b, Cryo-EM analysis pipeline. 1,135,443 particles were picked from 5,000 drift-corrected micrographs and then used for 2D classification. By 2D based manual screening, 485,163 good particles were selected for 3D classification into 4 classes. 222,927 particles from the class showing better structure preservation were further used for heterogeneous refinement, which generated two models, State I and State II, with 67% and 33% of the particles, respectively. State I and State II were then independently refined to 3.2 Å and 5.2 Å. c, The Euler angle distribution for State I and State II. d, FSC curve calculated using two independent half maps. e, Cryo-EM structures of State I and State II colored by local resolution as calculated in Cryopsarc. Resolution ranges from 3 to 7 Å. Panels c and d are standard outputs of Cryosparc.

Extended Data Figure 5.

Extended Data Figure 5.. Atomic model building of CasX ternary complexes for State I and State II.

Atomic models and cryo-EM maps (shown with a threshold of 8σ or 9σ) for the CasX ternary complex with 30bp DNA in State I (a) and State II (b), and for State I of the CasX ternary complex with full R-loop (45bp DNA) c, Representative regions of the cryo-EM density for different secondary structure regions are shown. d, Map against model FSCs. e and f, Zoomed views of atomic models fitted in EM densities. GLY917/ GLN920 and the DNA residues within 4 angstrom distance are linked by dashed lines.

Extended Data Figure 6.

Extended Data Figure 6.. Structural comparison of CRISPR effectors

a, OBD (WED) domains are shown in aquamarine, Helical-I (REC1) domains are shown in yellow, Helical-II (REC2) domains are shown in orange, RuvC domains are shown in green, Nuc (TSL) domains are shown in pink, Bridge Helixes are shown in blue. NTSB domain in CasX is shown in red, PI domain of LbCas12a is shown in purple. Guide RNA and target DNA are shown in gray. Two orientations are presented for each model. b, Overall structure and individual domains of CasX were analyzed using Dali server against the full PDB. The protein hit with highest Z-score for each target is shown in left panel. The hits are marked with protein name and PDB code. The similarity scores between CasX overall structure/domains and AscCas12b are pulled out from Dali full PDB analysis and shown in middle panel. The similarity scores between CasX overall structure/domains and AscCas12a are pulled out from Dali full PDB analysis and shown in left panel. Z-score above 8 indicates a high degree of similarity. Z-score below 8 but above 2 indicates moderate similarity (usually irrelevant random match). Z-score below 2 indicates noise. c, TSL domain and full R-loop structures are subtracted from the ternary complex. Zinc ribbon residues are colored in blue. The primary sequence across TSL-loop is shown. Tyrosines are marked with teal circles. Positive charged residues are marked with red circles. d, Zinc finger validation by X-ray fluorescence elemental analysis. Bovine erythrocyte carbonic anhydrase that contains zinc in the active site was used as a positive control. Representative Zinc peaks appeared in the purified CasX sample but not in the purified Cas9 sample. e, Atomic models of DpbcasX, AacCas12b, LbCas12a and SpyCas9 binary complexes are shown by surface representation. Protein parts are colored in cyan, and nucleic acid in dark gray. CasX, AacCas12b and SpyCas9 require both crRNA and tracrRNA (or a fused single guide RNA), while LbCas12a uses only crRNA. Guide RNAs are subtracted out from the complexes and shown as ribbons in bottom panels, independently. Mass ratio of protein and guide RNA is shown in the right. Values of relative mass occupancy for protein and guide RNA within the three binary complexes (protein+guide RNA) are shown. Protein mass occupancies are colored in cyan, and guide RNA in dark gray. f, CRISPRi efficiency by guide RNA mutation (n = 3, mean ± s.d.). Sequence for the fused single guide RNA is shown. tracrRNA, the joint loop, crRNA and spacer region are marked respectively. The sequences for mutated guide RNA are aligned with the original guide RNA sequence and shown. Cas9 is used for positive control. (+) indicates a targeting guide (−) indicates a non-targeting guide for negative control. NC indicates the non-complementary CasX guide. WT indicates the complementary wild type guide for CasX. GFP Disruption efficiency of targeting guide is shown by GFP signal/OD compared to the non-targeting guide control.

Extended Data Figure 7.

Extended Data Figure 7.. Structural comparison of apo, binary and ternary CasX samples

a, Drift-corrected image of apoCasX obtained with a 70o phase shift and defocus of 0.5μm. The scale bar is 50nm. b, Drift-corrected image of CasX-gRNA complex with a defocus of −1.5μm. c, Drift-corrected image of CasX-gRNA-DNA complex with a defocus of −1.5μm. Representative reference-free 2D class-averages are shown on the bottom panels for the three samples. The scale bar is 20nm.d, Cryo-EM reconstruction of apoCasX. 3 representative orientations are shown with colored domains. OBD colored by aquamarine, NTSB by red, Helical-I by yellow, Helical-II by orange, RuvC by dark green, TSL by light pink and the bridge helix by blue. e, BS3 cross-linking signals revealed by mass spectrometry for the apoCasX sample. The two lysine within a cross-linked pair are connected with purple curve. f, g, As d and e for CasX-gRNA binary complex. h, i, As d and e for CasX-gRNA-DNA ternary complex. j, k, Accessibility of target strand DNA by the RuvC domain in State I and State II. Distance between the TS DNA cleavage region and RuvC active site as calculated using Pymol is 43.8 Å for State I (j) and 10.9 Å for State II (k).

Extended Data Figure 8.

Extended Data Figure 8.. EM analysis of CasX-gRNA-DNA ternary complex with shortened NTS (20nt NTS and 45nt TS).

a, Target DNA sequence in this complex. b, Cryo-EM analysis pipeline. 801,927 particles were picked from 3,500 drift-corrected micrographs and then used for 2D classification. By 2D based manual screening, 369,430 good particles were selected for 3D classification into 4 classes. 181,009 particles from the class class showing better structure preservation were further used for heterogeneous refinement, which generated two models, state I and state II, with 33.6% and 66.4% of the particles, respectively. State I and State II were then independently refined to 4.5 Å and 4.4 Å by homogenous reconstruction. c, The Euler angle distribution of refined particles belong to State I and State II. d, FSC curve calculated using two independent half maps, indicating an overall resolution of 4.5Å for state I and 4.4 Å for state II. e, Cryo-EM structures of State I and State II colored by local resolution as calculated in Cryopsarc. Resolution ranges from 3Å to 7 Å. Panels c and d are directly adopted from the standard outputs of Cryosparc.

Extended Data Figure 9.

Extended Data Figure 9.. CasX ΔNTSBD purification and substrate cleavage.

a, The representative S200 size exclusion traces by UV280 absorbance for wt CasX and for CasX with NTSB domain truncation. SDS-PAGE of wt CasX protein and CasX protein with NTSB domain truncation by Coomassie brilliant blue staining is shown on the up-right panel. b, Comparison of the cleavage activities of wt CasX and CasX with NTSB domain truncation on an unwound probe (only the PAM region is base-paired, the rest of the probe is mismatched) and on just a single target DNA strand. All the assays have been repeated for 3 times with consistent results.

Extended Data Figure 10.

Extended Data Figure 10.. Proposed model for sequential CasX activation for DNA cleavage.

a, Proposed overall architecture of apoCasX. The different protein domains are colored as in Figure 3. b, Cryo-EM map of the gRNA-bound CasX. Upon gRNA binding, CasX undergoes a domain rearrangement (gRNA is shown as a gray solid surface). c, Cryo-EM map of the CasX ternary complex in the NTS-loading state (State I). Upon target dsDNA recognition and unwinding by the CasX-gRNA complex, the non-target strand is preferentially positioned into the RuvC active site for cleavage. d, Cryo-EM map of the CasX ternary complex in the TS-loading state (State II). After non-target strand cleavage, the entire RNA-DNA duplex is bent by the TSL domain, thus positioning the target strand into RuvC active site. e, Cryo-EM of the CasX ternary complex mimicking a hypothetical Trans-active state. After target strand DNA cleavage, the tension within the bent RNA-DNA duplex favors the return of the CasX ternary complex to State I, thus enabling the RuvC domain to cut any accessible single strand DNA. The model shown here corresponds to the CasX ternary complex with a short NTS DNA in State I to mimic the trans-ssDNA cleavage state (the 5’ overhang of TS DNA which folds back to RuvC domain is colored by blue).

Figure 1.

Figure 1.. CasX cuts double stranded DNA with single guide RNA in vitro

a, A schematic of CRISPR-CasX locus with CasX gene (RuvC domain highlighted) in orange, Cas4, Cas1 and Cas2 in blue, tracrRNA in gray, and CRISPR array in teal. Cartoons are scaled according to the gene size. Schematic of the CasX dual-guide RNA and single guide is shown in the bottom panel-- tracrRNA in gray, crRNA in teal and the target DNA in black. TS and NTS indicate the target strand and non-target strand, respectively. The RNA loop fusing tracrRNA and crRNA is in red. b, DNA cleavage efficiency by DpbCasX. P indicates the cleavage product. The cleavage fraction is calculated based on the NTS band density compared to input NTS band density at reaction time of 0 min. c, Conservation of cleavage specificity of DpbCasx with Cas12a. Lane M shows labeled ladders. d, The cleavage sites for NTS and TS (marked with black arrows). e, Cleavage activity of DpbCasX on trans ssDNA. The cleavage fraction is calculated based on the trans-ssDNA band density compared to input trans-ssDNA band density at reaction time of 0 min. 4 biological repeats for all the assays showed consistent results.

Figure 2.

Figure 2.. CasX effectively manipulates genomes in vivo

a, Genomic cleavage assay in E. coli (n = 3, mean ± s.d.). b, Schematic of E. coli CRISPRi c, E. coli GFP repression as visualized on plates on a dark reader. This assay has been repeated more than 3 times with consistent results. d, Quantitative analysis of E. coli CRISPRi based GFP repression at 12 hrs (n = 4, mean ± s.d.). e, Schematic of CasX human cell assay and readout. f, DpbCasX (Deltaproteobacteria CasX) and PlmCasX (Planctomycetes CasX) GFP disruption in a mammalian cell (HEK293T) assays at two doses of plasmids. g, Sustained GFP disruption of the high dosage mammalian cell GFP disruption assay from f. h, PlmCasX & SpyCas9 GFP disruption at 10 guide sites throughout EGFp (n=3, mean ± s.d.). The average GFP disruption across all EGFP guides for CasX & Cas9 is shown. Cas9-g and cas9-ng indicate targeting and non-targeting RNA guide of Streptococcus pyogenes Cas9 (SpyCas9), respectively. CasX-g and CasX-ng indicate targeting and non-targeting RNA guide of DpbCasX. SpyCas9 or inactive SpyCas9 (dSpyCas9) was used as positive controls.

Figure 3.

Figure 3.. Overall structure of the CasX ternary complex

a, Domain composition of CasX. CasX contains: NTSB (non-target strand binding, red), Helical-I (yellow), Helical-II (orange), OBD (oligo binding domain, aquamarine), RuvC (green) and TSL (target-strand loading, pink) domains, and a BH (bridge helix, blue). b, Model of CasX ternary complex with 30bp target DNA in State I, shown on side and top views. Different domains are colored as in a. and sgRNA is in teal. For the target DNA, the NTS is in magenta and the TS is in purple. c and d, Models of the CasX ternary complex with 30bp target DNA in State II and State I, shown on top view. Residues Arg917 and Gln920 are shown as red sticks. The TSL-loop is shown as a red ribbon. The positions of the RuvC active site residues are shown as red sticks to illustrate the distance to the active site from the TSL domain elements. The right panels show a zoomed-in views of the TSL domain. e, Schematic of the single guide RNA fold with tracrRNA sequence in gray, crRNA sequence in teal, and the joint loop in red. f, Molecular interactions between CasX and gRNA. Protein residues interacting with gRNA recognition are shown as magenta sticks. Helical-II and OBD are colored in orange and aquamarine, respectively. g, Models of CasX ternary complex in State I and II are aligned and superimposed. CasX is shown as a transparent grey cartoon, and the residues responsible for cleavage activity are shown in red. The nucleic acids are shown as ribbons to emphasize the rotation of the RNA-DNA duplex required for the transition between the two states.

Figure 4.

Figure 4.. Distinct CasX conformational states

a, Conformational states with 30bp target DNA; b, with a DNA target forming the full R-loop; and c, with the short NTS (20nt) and the 45nt TS. The schematic of the DNA probe used for each data collection is shown on the left, with cleavage sites shown by arrowheads. The top views of the cryo-EM maps for the CasX ternary complex in States I and II are shown on the center panels. The TS density is colored purple, the NTS is colored magenta, and the sgRNA density is colored teal. The RuvC domain is indicated in each map. All the EM maps are low-pass filtered to 4.5 Å. The relative percentage of particles belonging to each state revealed by cryo-EM analysis is shown in the right panel.

Figure 5.

Figure 5.. Novel domains for target DNA unwinding and loading

a, Electron density map showing the presence of a domain that directly interacts with the NTS, with models for the CasX ternary complex in State I and II within the cryo-EM map (shown as mesh surface, low-pass filtered to 4.5 Å). CasX is shown in grey with the NTSB domain highlighted in red, the TS in purple and the NTS in magenta. b, Comparison of the cleavage activity of the wild-type CasX and NTSB domain deletion (CasXΔ101–191). The reactions were analyzed at time points from 0 to 120 minutes. Completely base-paired probe and a bubbled probe were used to test the on-target activity, and a random 50nt oligo was used to test the trans- cleavage activity. P indicates the cleavage product. 3 biological repeats for the assays showed consistent results.

Comment in

Similar articles

Cited by

References

    1. Marraffini LA CRISPR-Cas immunity in prokaryotes. Nature 526, 55–61, doi:10.1038/nature15386 (2015). - DOI - PubMed
    1. Wright AV, Nunez JK & Doudna JA Biology and Applications of CRISPR Systems: Harnessing Nature’s Toolbox for Genome Engineering. Cell 164, 29–44, doi:10.1016/j.cell.2015.12.035 (2016). - DOI - PubMed
    1. Barrangou R & Doudna JA Applications of CRISPR technologies in research and beyond. Nature biotechnology 34, 933 (2016). - PubMed
    1. Strutt SC, Torrez RM, Kaya E, Negrete OA & Doudna JA RNA-dependent RNA targeting by CRISPR-Cas9. Elife 7, e32724 (2018). - PMC - PubMed
    1. Koonin EV, Makarova KS & Zhang F Diversity, classification and evolution of CRISPR-Cas systems. Current opinion in microbiology 37, 67–78 (2017). - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources