Cas9 specifies functional viral targets during CRISPR-Cas adaptation (original) (raw)

Nature. Author manuscript; available in PMC 2015 Sep 12.

Published in final edited form as:

PMCID: PMC4385744

NIHMSID: NIHMS657330

Robert Heler

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

Poulami Samai

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

Joshua W. Modell

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

Catherine Weiner

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

Gregory W. Goldberg

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

David Bikard

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

Luciano A. Marraffini

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

1Laboratory of Bacteriology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

*Equal contribution.

#Synthetic Biology Group, Institut Pasteur, 28 Rue du Dr. Roux, 75015 Paris, France.

Abstract

Clustered, regularly interspaced, short palindromic repeat (CRISPR) loci and their associated (Cas) proteins provide adaptive immunity against viral infection in prokaryotes. Upon infection, short phage sequences known as spacers integrate between CRISPR repeats and are transcribed into small RNA guides that identify the viral targets (protospacers) of the Cas9 nuclease. Streptococcus pyogenes Cas9 cleavage of the viral genome requires the presence of an NGG protospacer adjacent motif (PAM) sequence immediately downstream of the target. It is not known if and how viral sequences with the correct PAM are chosen as new spacers. Here we show that Cas9 specifies functional PAM sequences during spacer acquisition. The replacement of cas9 with alleles that lack the PAM recognition motif or recognize an NGGNG PAM eliminated or changed PAM specificity during spacer acquisition, respectively. Cas9 associates with other proteins of the acquisition machinery (Cas1, Cas2 and Csn2), presumably to provide PAM-specificity to this process. These results establish a new function for Cas9 in the genesis of the prokaryotic immunological memory.

Introduction

Clustered, regularly interspaced, short palindromic repeat (CRISPR) loci and their CRISPR associated (Cas) proteins provide adaptive immunity to bacteria and archaea against their viruses1. To adapt to highly dynamic viral populations, CRISPR-Cas loci evolve rapidly, acquiring short phage sequences, known as spacers, that integrate between CRISPR repeats and constitute a memory record of infection2. Spacers are transcribed into small CRISPR RNAs (crRNAs) that identify viral targets (defined as protospacers) by direct Watson-Crick pairing with invasive DNA3. Based on their cas gene content, CRISPR-Cas systems can be classified into three distinct types, I, II and III4. Each CRISPR-Cas type possesses different mechanisms of crRNA biogenesis, target destruction and prevention of autoimmunity. In the type II CRISPR-Cas system present in Streptococcus pyogenes the Cas9 nuclease inactivates infective phages using crRNAs as guides to introduce double-strand DNA breaks into the viral genome5. Cas9 cleavage requires the presence of a protospacer adjacent motif (PAM) sequence immediately downstream of the protospacer6,7. This requirement avoids the cleavage of the spacer sequence within the CRISPR array, i.e. autoimmunity, since the adjacent repeat lacks a PAM sequence. The importance of the PAM sequence for target recognition and cleavage69 suggests the presence of a mechanism to ensure that newly acquired spacer sequences match protospacers flanked by a proper PAM sequence. For the type I-E CRISPR-Cas system of Escherichia coli, over-expression of cas1 and cas2 is sufficient for the acquisition of new spacers in the absence of phage infection. Reports indicate that spacers acquired in this fashion match preferentially (25–70%, depending on the study) to protospacers with the correct PAM (AWG, W=A/T)1013, suggesting that Cas1 and Cas2 are sufficient for spacer acquisition and have some intrinsic ability to recognize protospacers with the right PAM. In the type II system of S. pyogenes the PAM sequence is NGG (and also NAG at a much lower frequency)3,6,14, where N is any nucleotide, and it is recognized and bound by a domain within the Cas9 tracrRNA:crRNA-guided nuclease during target cleavage7,15. How spacers are acquired in this system, particularly how spacers with correct PAM sequences are selected during this process, is not known.

Cas9 is required for spacer acquisition

To investigate the mechanisms of recognition of PAM-adjacent protospacers during spacer acquisition, we cloned the type II-A CRISPR-Cas locus of S. pyogenes (Fig. 1a) into the staphylococcal vector pC19416 and introduced the resulting plasmid [pWJ40 (ref.17)] into Staphylococcus aureus RN422018, a strain lacking CRISPR-Cas loci. We chose this experimental system because it facilitates the genetic manipulation of the S. pyogenes CRISPR-Cas system. We first tested the ability of the cells to mount adaptive CRISPR immunity by infecting them with the staphylococcal phage ϕNM4γ4, a lytic variant of ϕNM419 (see Methods for a description of ϕNM4γ4 isolation). Plate-based assays performed by mixing bacteria and phage in top agar allowed the selection of phage-resistant colonies that were checked by PCR to look for the expansion of the CRISPR array (Extended Data Fig. 1a). On average 50 % of the colonies acquired one or more spacers (8/13, 5/11 and 7/16 in three independent experiments), whereas the rest of the resistant colonies survived phage infection by a non-CRISPR mechanism, most likely including phage receptor mutations (Extended Data Fig. 2a). To maximize the capture of new spacer sequences, we performed the same assay in liquid and recovered surviving bacteria at the end of the phage challenge. These were analyzed by PCR of the CRISPR array and the amplification products of expanded loci were subjected to Illumina MiSeq sequencing to determine the extent of spacer acquisition. Analysis of 2.96 million reads detected protospacers adjacent to 2083 out of 2687 NGG sequences present in the viral genome, although with variation in the frequency of acquisition of each sequence (Extended Data Fig. 1b). The data revealed a prominent selection of spacers matching protospacers with downstream NGG PAM sequences (99.97 %, Extended Data Fig. 1c). The acquisition of new spacers by cells in liquid culture proved to be simple and highly efficient, providing the possibility to look at millions of new spacers in a single step. It was therefore implemented in the rest of our studies.

An external file that holds a picture, illustration, etc. Object name is nihms657330f1.jpg

Cas9 is required for spacer acquisition

a, Organization of the S. pyogenes type II CRISPR-Cas locus. Arrows indicate the annealing position of the primers used to check for the expansion of the CRISPR array. b, PCR-based analysis of liquid cultures to check for the acquisition of new spacer sequences in the presence or the absence of phage ϕNM4γ4 infection. Wild-type (WT) as well as different cas mutants were analyzed. Image is representative of three technical replicates. MOI; multiplicity of infection. c, Cultures over-expressing Cas1, Cas2 and Csn2 under the control of a tetracycline-inducible promoter were analyzed using PCR for spacer acquisition in the absence of phage infection. The strain was complemented with plasmids carrying either St or Sp Cas9 (see Extended Data Fig. 3), in the last case with or without the tracrRNA gene (Δ_tracr_). Image is representative of three technical replicates. aTc; anhydrotetracycline.

To determine the genetic requirements for spacer acquisition we made individual deletions of cas1, cas2 or csn2 and challenged the mutant strains with phage ϕNM4γ4. Spacer acquisition was decreased to levels below our limit of detection in each of these mutants (Fig. 1b), corroborating previous experiments12,20. Therefore while Cas1, Cas2 and Csn2 are dispensable for anti-phage immunity in the presence of a pre-existing spacer (Extended Data Fig. 2b and c), they are required for spacer acquisition. To determine whether these genes are also sufficient for this process, we over-expressed cas1, cas2 and csn2 in the absence of cas9 using a tetracycline-inducible promoter in plasmid pRH223 and looked for the integration of new spacers in the absence of phage infection using a highly sensitive PCR assay (Extended Data Fig. 3). We were unable to detect new spacers even in the presence of the inducer (Fig. 1c). However, the addition of a second plasmid expressing tracrRNA (see below) and Cas9 from their native promoters (Fig. S4a) enabled spacer acquisition only in the presence of the inducer, with all the new spacers matching chromosomal or plasmid sequences (Fig. 1c and Extended Data Table 1). Although most likely the acquisition of such spacers causes cell death or plasmid curing, respectively, the acquisition event can still be detected in liquid culture using our highly sensitive PCR assay (Extended Data Fig. 3b and c). The tracRNA (Fig. 1a) is a small RNA bound by Cas9 that is required for crRNA processing3 and Cas9 nuclease activity6. We wondered if Cas9 involvement in spacer acquisition also required the presence of the tracrRNA. Deletion of the tracrRNA prevented spacer acquisition in the absence of phage infection (Fig. 1c), suggesting that apo-Cas9 is not sufficient to promote spacer acquisition and that association with its cofactor is also required. Altogether these data indicate that Cas1, Cas2 and Csn2 are necessary but not sufficient for the incorporation of new spacers and that tracrRNA/Cas9 is also required. This is in contrast to the type I-E CRISPR-Cas system of E. coli, where over-expression of Cas1 and Cas2 alone is sufficient for spacer acquisition1013. It is important to note that the CRISPR array used in this assay consists of a single repeat, without pre-existing spacers (Extended Data Fig. 3). Therefore the Cas9 requirement is not a consequence of the phenomenon known as “primed” spacer acquisition. This refers to an increase in the frequency of spacer acquisition observed in type I CRISPR-Cas systems that relies on the presence of a pre-existing spacer with a partial match to the phage genome as well as the full targeting complex (Cascade)12,21,22.

Cas9 specifies the PAM sequence of newly acquired spacers

Given this newfound requirement in the CRISPR adaptation process and the well-established PAM recognition function of Cas9 during the surveillance and destruction of viral target sequences, we hypothesized that this nuclease could participate in the selection of PAM sequences during spacer acquisition. To test this we exchanged the cas9 genes of S. pyogenes (Sp) and S. thermophilus (St) CRISPR-Cas systems to create two chimeric CRISPR loci: _tracrRNA_Sp-_cas9_St-_cas1_Sp-_cas2_Sp-_csn2_Sp and _tracrRNA_St-_cas9_Sp-_cas1_St-_cas2_St-_csn2_St (Fig. 2a). We chose the type II-A CRISPR-Cas system of S. thermophilus (also known as CRISPR323) because it is an ortholog of the S. pyogenes system24. While the PAM sequence for the Sp CRISPR-Cas system is NGG, the PAM sequence for the St system is NGGNG23 (Fig. 2b and Extended Data Table 1). We infected each naïve strain with phage ϕNM4γ4, sequenced the newly acquired spacers, and obtained the PAM of the matching protospacers using WebLogo25. We found that each chimeric system acquired spacers with PAMs that correlated with the cas9, but not the tracrRNA, cas1, cas2 or csn2, allele present (Fig. 2b and Extended Data Table 1). To rule out the possibility that non-functional spacers are negatively selected during phage infection, i.e. they are acquired randomly and only those cells containing spacers with a correct PAM for Cas9 cleavage provide immunity and allow cell survival, we sequenced the PAMs of spacers acquired in the absence of phage infection (Fig. 1c and ​2c). Either Cas9Sp or Cas9St were produced in cells overexpressing Cas1Sp, Cas2Sp and Csn2Sp. In this experiment, as explained above, spacers matching chromosomal or plasmid sequences will be acquired. The PCR products containing new spacers were cloned into a commercial vector from which they were sequenced (Extended Data Table 1). Expression of Cas9Sp led to the incorporation of spacers matching protospacers with an NGG PAM sequence, whereas the expression of Cas9St in the same cells shifted the composition of the PAM to NGGNG (Fig. 2d). These results demonstrate that Cas9 specifies PAM sequences to ensure the acquisition of functional spacers during CRISPR adaptation.

An external file that holds a picture, illustration, etc. Object name is nihms657330f2.jpg

Cas9 determines the PAM sequence of acquired spacers

a, c, Genetic composition of the CRISPR-Cas loci tested for spacer during phage infection (a), or in the absence of infection, with the experimental set up shown in Fig. S4 (c). b, d, Sequence logos obtained after the alignment of the 3’ flanking sequences of the protospacers matched by the newly acquired spacers in panels a and c, respectively. Numbers indicate the positions of the flanking nucleotides downstream from the spacer. n; number of sequences used in each alignment.

Cas9 associates with other Cas proteins involved in spacer acquisition

In type I CRISPR-Cas systems, Cas1 and Cas2 form a complex13 and the dsDNA nuclease activity of Cas1 has been implicated in the initial cleavage of the invading viral DNA to generate a new spacer26. The genetic analyses presented above suggest that in the type II S. pyogenes CRISPR-Cas system, the PAM-binding function of Cas9 observed in vitro7 could specify a PAM-adjacent site of cleavage for Cas1, or other members of the spacer acquisition machinery. This would guarantee that newly acquired spacers have the correct PAM needed for Cas9 activity later in this immune pathway. This hypothesis predicts an interaction between Cas9 and Cas1, Cas2 and/or Csn2. To test this we expressed the type II Cas operon in E. coli, using a histidyl tagged version of Cas9, and looked for other proteins that co-purify. We observed an abundant co-purifying protein with an apparent molecular weight close to 33 kDa, the expected size of Cas1 (Extended Data Fig. 4a). Mass spectrometry confirmed the identity of both of these proteins as well as the presence of Cas2 and Csn2 co-purifying with Cas9 (Extended Data Table 2). This result suggested the formation of a Cas9-Cas1-Cas2-Csn2 complex and therefore we explored other purification strategies to unequivocally determine its existence. We were able to isolate a Cas9-Cas1-Cas2-Csn2 complex when the histidyl tag was added to Csn2 (Fig. 3a and b). The identity of the purified proteins was confirmed by mass spectrometry (Extended Data Table 3). This demonstrates a biochemical link between the Cas9 nuclease and the other Cas proteins that function exclusively to acquire new spacers, supporting the role of Cas9 as a PAM specificity factor in the adaptation phase of CRISPR immunity.

An external file that holds a picture, illustration, etc. Object name is nihms657330f3.jpg

Cas9Sp PAM recognition domain is required for the acquisition of spacers with an NGG PAM sequence

a, Separation of the Cas9-Cas1-Cas2-Csn2 complex by ion exchange chromatography. b, SDS-PAGE of fraction 19 (peak) from the complex elution shown in panel a, representative of five technical replicates. The four proteins of the complex were individually purified and run alongside the purified fraction to identify each protein in the complex. c, Spacer acquisition was tested as in Fig. 1c in the presence or absence of different Cas1 or Cas9 activities. Image is representative of eight technical replicates. dCas1, nuclease-dead Cas1 (E220A mutation); dCas9, nuclease-dead Cas9 (D10A, H840A mutations); Cas9PAM lacks the PAM recognition function (R1333Q, R1335Q mutations). d, Sequence logos obtained after the alignment of the 3’ flanking sequences of the protospacers matched by the newly acquired spacers in panel c. Numbers indicate the positions of the flanking nucleotides downstream from the spacer. n; number of sequences used in each alignment.

The PAM binding motif of Cas9 is required for PAM selection

Within this complex the PAM-binding domain of Cas9 would specify a functional spacer (one adjacent to a correct PAM) and the nuclease activity of Cas1 and/or Cas9 would cleave the invading DNA to extract the spacer sequence. To test this we performed adaptation studies in the absence of phage selection as described in Extended Data Fig. 3 but using different combinations of wild-type Cas1, Cas1E220A (catalytically dead or dCas126), wild-type Cas9, Cas9D10A,H840A (catalytically dead or dCas96) and Cas9R1333Q,R1335Q (Cas9PAM, containing mutations in the PAM-binding motif that substantially reduces binding to target DNA sequences with NGG PAMs in vitro15). We observed that the nuclease activity of Cas1 is necessary for spacer acquisition (Fig. 3c). In contrast, the nuclease activity and PAM-binding function of Cas9 are dispensable for this process. Next we determined the PAM of the acquired spacers in the presence of mutated Cas9 (Fig. 3d). We found that whereas spacers acquired in the presence of dCas9 displayed correct PAMs, those acquired in the presence of Cas9PAM matched DNA regions without a conserved flanking sequence, i.e. without a PAM sequence. The same result was obtained with St dCas9 (Extended Data Fig. 5). Altogether these results indicate that Cas1 and Cas9 are part of a complex dedicated to spacer acquisition which requires Cas1 nuclease activity and Cas9 PAM-binding properties for the selection of new spacer sequences.

Discussion

The selection of new spacers with a correct PAM is fundamental for the survival of the infected host during CRISPR-Cas immunity. In the simplest scenario there is no active selection of PAM-flanked protospacers; any spacer sequence can be acquired but only those with the correct PAM allow Cas9 cleavage of the invader and survival. Bacteria that acquire spacers with ineffective flanking sequences are killed by the virus and as a consequence PAM-flanking spacers are enriched in the population. Here we show that even in the absence of phage selection, the type II CRISPR-Cas system acquires new spacers with correct PAMs, a result that rules out the possibility of random spacer selection with subsequent selection for functional spacers. How are PAM-flanked protospacers selected during type II CRISPR-Cas immunity? One possibility is that the proteins exclusively dedicated to spacer acquisition perform the PAM-selection function. The inability of cells over-expressing only cas1, cas2 and csn2 to expand the CRISPR array strongly suggest that none of the proteins encoded by these genes can recognize and select correct PAMs. Another possibility is that the known PAM recognition function of Cas915,27, essential for destroying the invading virus, could also be used during spacer acquisition to recognize PAM-flanking viral sequences. Experiments showing that the cas9 allele, but not the cas1-cas2-csn2 alleles, determine the PAM sequence of the newly acquired spacers, demonstrated that this scenario is likely correct. How does Cas9 select new spacers with the correct PAMs? Our experiments demonstrate that Cas9 forms a stable complex with Cas1, Cas2 and Csn2 that presumably participates in the selection of new spacers. The nuclease activity of Cas1, but not of Cas9, is required for spacer acquisition. The tracrRNA is also required, suggesting that the apo-Cas9 structure27, very different from holo-Cas915, does not have the correct conformation to participate in spacer acquisition. The key residues involved in Cas9 PAM recognition are not required for spacer acquisition, but they are necessary for the incorporation of new spacers with the correct PAM sequence. This suggests that the reported non-specific DNA binding property of Cas96,7 is sufficient for spacer acquisition, but not for the selection of functional spacers. There are currently two models for the incorporation of new spacers into the CRISPR array, one where the future spacer sequence is cut from the invading viral DNA, the “cut and paste” model, and another where this sequence is copied from the viral genome, the “copy and paste” model28. In the context of the first model, our data suggests that, at a low frequency that may reflect the dynamics of spacer acquisition, Cas1 cleaves the invading genome to extract a new spacer sequence. However, on its own, Cas1 nuclease activity is non-specific26. Therefore we propose that through the formation of the Cas9-Cas1-Cas2-Csn2 complex, Cas9 binding to PAM-adjacent sequences provides specificity to Cas1 endonuclease activity. In the “copy and paste” model, Cas1 nuclease activity is most likely necessary for downstream events, such as the cleavage of the repeat sequence that precedes spacer insertion, and Cas9 is required to “mark” sequences adjacent to GG motifs to be copied into the CRISPR array. In any case, following yet unknown processing and integration events, the selected DNA becomes a new functional spacer, i.e. its matching protospacer will have the correct PAM to license Cas9 cleavage (Extended Data Fig. 6). The molecular steps that take place after protospacer selection to incorporate it as a new spacer in the CRISPR array are still unknown. All genes of the type II-A CRISPR-Cas locus (tracrRNA, cas9, cas1, cas2 and csn2) are required for spacer acquisition, therefore most likely all the members of the Cas9-Cas1-Cas2-Csn2 complex participate in the process. Future work will address this and other aspects of the mechanisms of spacer integration in different CRISPR-Cas systems. The present work reveals a new function for Cas9 in CRISPR immunity. This nuclease is fundamental for both the execution of immunity, participating in the surveillance and destruction of infectious target viruses, and the generation of immunological memory, selecting the viral sequences that allow adaptation and resistance to viral predators.

Methods

Bacterial strains and growth conditions

Cultivation of S. aureus RN4220 (ref.18), was carried out in brain-heart infusion (BHI) or heart infusion (HI) media (BD) at 37 °C. Whenever applicable, media were supplemented with chloramphenicol at 10 µg/ml or erythromycin at 5 µg/ml to ensure pC194- (ref.16) and pE194-derived29 plasmid maintenance, respectively.

On-plate spacer acquisition assay

To detect individual adapted colonies on a plate, cells from overnight cultures were mixed with phage at MOI = 1 in top agar containing appropriate antibiotic and 5 mM CaCl2. The mixture was poured on BHI plates with antibiotic and incubated at 37°C overnight. Subsequently, colonies that survived phage infection were restreaked on fresh BHI plates in order to remove contaminating virus and dead cells. Plates were incubated at 37°C overnight. To check for spacer acquisition, individual colonies were resuspended in lysis buffer (250 mM KCl, 5 mM MgCl2 50 mM Tris-HCl at pH 9.0, 0.5% Triton X-100), treated with 50 ng/µl lysostaphin and incubated at 37°C for 5 minutes, then 98°C for 5 minutes. Following centrifugation (13,200 rpm), a sample of the supernatant was used as template for TopTaq PCR amplification with primers L400 and H050. The PCR reactions were analyzed on 2% agarose gels (Fig. 1a).

In-liquid spacer acquisition assay

Overnight cultures launched from single colonies were diluted 1:1000 into a fresh 10-ml culture of BHI containing appropriate antibiotic and 5 mM CaCl2. When the cultures reached OD600 of 0.4, depending on the experiment, they were either infected with phage MOI = 1 (Fig. 1b) or induced with 1 µg/ml anhydrotetracycline (Fig. 1c). After 16 hours, plasmids carrying the CRISPR systems were extracted using a slightly modified QIAprep Spin Miniprep Kit protocol: the pelleted bacterial cells were resuspended in 250 µl buffer P1 containing 50 ng/µl lysostaphin and incubated at 37°C for 1 h, followed by the standard QIAprep protocol. 100 ng of plasmid DNA was used to amplify the CRISPR locus using Phusion DNA Polymerase (New England Biolabs) with the following primer mix: 3 parts JW8 and 1 part each of JW3, JW4 and JW5. The following cycling conditions were used: (1) 98 °C for 30 s; (2, 30 times) 98 °C for 10 s, 64 °C for 20 s, 72 °C for 10 s; (3) 72 °C for 5 min. The PCR reactions were analyzed on 2% agarose gels. To sequence individual spacers, the adapted bands were extracted, gel-purified and cloned via Zero Blunt TOPO PCR Cloning Kit (Invitrogen). CRISPR loci of individual clones were checked for expansion of the arrays by PCR using the primers listed above and sent for sequencing.

Phage Adsorption Assay

The phage adsorption assay was performed as described previously30 with minor modifications. Cells were grown in BHI and 10 mM CaCl2 to an OD600 of 0.4. The phage solution was prepared at 106 pfu/ml and 100 µl of this was added to 900 µl of cells. The mixture was incubated for 10 min at 37°C to allow adsorption of the phage to the cellular membrane. The mixture was centrifuged for 1 min at 13,600 rpm and the number of phage particles left in the supernatant was determined by phage titer assay.

Phage titer assay

Serial dilutions of the phage stock were prepared in triplicate and spotted on fresh top agar lawns of RN4220 in HI agar supplemented with the appropriate antibiotic and 5 mM CaCl2. Plates were incubated at 37°C overnight (Fig. S2).

High-throughput sequencing

Plasmid DNA was extracted from adapted cultures using the in-liquid spacer acquisition assay described above. 100 ng of plasmid DNA was used as template for Phusion PCR to amplify the CRISPR locus with primers H182 and H183. Following gel extraction and purification of the adapted bands, samples were subject to Illumina MiSeq Sequencing.

Plasmid construction

Construction of pWJ40 was described elsewhere17. For the construction of pC194- and pE194-derived plasmids, cloning was performed using chemically competent S. aureus cells, as described previously17. The Δ_cas1_ (pRH059), Δ_cas2_ (pRH061) and Δ_csn2_ (pRH063) mutants were constructed by one-piece Gibson assembly31 from pWJ40 using the pairs of primers H016-H017, H018-H019, H020-H021, respectively. Plasmid pRH087 containing the wild type cas genes of S. pyogenes was obtained by inserting the first spacer of S. pyogenes (annealed primers H049 and H050 containing compatible BsaI overhangs) in pDB184 using BsaI cloning32. BsaI cloning was also used to construct pRH079 and pRH233 by inserting a φNM4γ4 targeting spacer (annealed primers H029 and H030) into pDB114 and pDB184, respectively. Plasmid pRH200 harbors the wild type CRISPR3 system from S. thermophilus LMD-9 amplified with H168 and H169 from genomic DNA. The fragment was inserted on pE194 via Gibson assembly using H166 and H167. pRH213 was constructed by replacing Cas9Sp on pRH087 with Cas9St from pRH200 using the primer pairs H232-H233 and H231-H234, respectively. pRH214 was constructed by replacing Cas9St on pRH200 with Cas9Sp from pRH087 using the primer pairs H227-H230 and H228-H229, respectively. pGG32 was created by reducing the CRISPR locus of pWJ40 to a single repeat. This was accomplished by 'round the horn PCR33 using primers oGG82 and oGG83, followed by blunt ligation. pRH228 was constructed by replacing Cas9Sp on pGG32 with Cas9St from pRH200 using the primer pairs H232-H233 and H231-H234, respectively. pRH223 was constructed as a three-piece Gibson assembly combining TetR+ptet from pKL55-iTet (primers B534 and B616), pE194 (primers B532 and B617) and the cas1, cas2, csn2 genes and the array from pGG32 (primers H176-H177). pRH231 was constructed from pGG32 by one piece Gibson assembly with primers H289-H290. pRH234 contains Cas1 E220A and was constructed via one-piece Gibson assembly from pRH223, respectively, using the primer pair H312-H313. pRH227 was constructed from pGG32 via two sequential single-piece Gibson assemblies: first, D10A was introduced with B337-B338 and second, H840A was introduced with B339-B340. pRH229 was constructed via one-piece Gibson assembly from pGG32 using the primer pair H276-H277. Plasmids pRH240, pRH241, pRH242, pRH243 and pRH244 were constructed by one-piece Gibson assembly with primers H237-H238 from pGG32, pRH228, pRH227, pRH229 and pRH231, respectively. pRH245 was constructed from pRH241 via two sequential single-piece Gibson assemblies: first, D10A was introduced with H336-H337 and second, H847A was introduced with H338-H339.

Isolation and sequencing of φNM4γ4

For the initial isolation of φNM4, supernatants from overnight cultures of S. aureus Newman were filtered and used to infect soft agar lawns of TB4::φNM1,2 double lysogens19. A single plaque was picked and then plaque-purified in two additional rounds of infection using TB4 soft agar lawns, and subsequently used to lysogenize TB4. For the resultant lysogen, specific primers were used to verify the presence of φNM4 and the absence of φNM1,2 by colony PCR. High titer lysates of φNM4 (~1011 p.f.u. ml−1) were then prepared from this lineage and used for infection of TB4/pGG9 soft agar lawns harboring spacer 2B17. An escaper plaque was picked and then plaque-purified in two additional rounds of infection using TB4/pGG9 soft agar lawns. The resultant φNM4γ4 phage exhibited a clear plaque phenotype and was used to prepare a high titer lysate from which DNA was purified, deep sequenced, and assembled as described previously17. The full sequence of the ϕNM4γ4 has been deposited on Genebank under accession number KP209285, and includes a 2784 bp deletion encompassing the C-terminal 80% of the φNM4 _cI_-like repressor gene.

Protein purification

Cas9

pMJ806 (WT Cas9) plasmid was obtained from Addgene. Both the proteins were purified as described before6 with minor modifications as follows. The proteins were expressed in E. coli BL21 Rosetta 2(DE3) codon plus cells (EMD Millipore). Cultures (2 L) were grown at 37°C in Terrific Broth medium containing 50 µg/ml kanamycin and 34 µg/ml chloramphenicol until the _A_600 reached 0.6. The cultures were supplemented with 0.2 mM isopropyl-1-thio-β-d-galactopyranoside and incubation was continued for 16 h at 16 °C with constant shaking. The cells were harvested by centrifugation and the pellets stored at −80 °C. All subsequent steps were performed at 4 °C. Thawed bacteria were resuspended in 30 ml of buffer A (50 mM Tris–HCl pH 7.5, 500 mM NaCl, 200 mM Li2SO4, 10% sucrose, 15 mM Imidazole) supplemented with complete EDTA free protease inhibitor tablet (Roche). Triton X-100 and lysozyme were added to final concentrations of 0.1 % and 0.1 mg/ml, respectively. After 30 min, the lysate was sonicated to reduce viscosity. Insoluble material was removed by centrifugation for 1 hr at 15,000 rpm in a Beckman JA-3050 rotor. The soluble extract was bound in batch to mixed for 1 hr with 5 ml of Ni2+-Nitrilotriacetic acid-agarose resin (Qiagen) that had been pre-equilibrated with buffer A. The resin was recovered by centrifugation, and then washed extensively with buffer A. The bound protein was eluted step-wise with aliquots of IMAC buffer (50 mM Tris-HCl pH 7.5, 250 mM NaCl, 10% glycerol) containing increasing concentrations of imidazole. The 200 mM imidazole elutes containing the His6-MBP tagged Cas9 polypeptide was pooled together. The His6-MBP affinity tag was removed by cleavage with TEV protease during overnight dialysis against 20 mM Tris-HCl pH 7.5, 150 mM KCl, 1 mM TCEP and 10% glycerol. The tagless Cas9 protein was separated from the fusion tag by using a 5 ml SP Sepharose HiTrap column (GE Life Sciences). The protein was further purified by size exclusion chromatography using a Superdex 200 10/300 GL in 20 mM Tris HCl pH 7.5, 150 mM KCl, 1 mM TCEP, and 5% glycerol. The elution peak from the size exclusion was aliquoted, frozen and kept at −80 °C.

Cas1

Plasmid pKW01 (Cas1-WT) was constructed by through amplification of pWJ40 as a template for polymerase chain reactions (PCRs) to clone Cas1 into pET28b-His10Smt3 using the primers PS192 and PS193. Full sequencing of cloned DNA fragment confirmed perfect matches to the original sequence. The pKW01 plasmid was transformed into E. coli BL21 (DE3) Rosetta 2 cells (EMD Millipore). Cultures were grown and protein was purified by Ni- affinity chromatography step, as mentioned before in Cas9 purification. The 200 mM imidazole elutes containing the His10-Smt3 tagged Cas1 polypeptide was pooled together. The His10-Smt3 affinity tag was removed by cleavage with SUMO protease during overnight dialysis against 50 mM Tris-HCl pH 7.5, 250 mM NaCl, 20 mM and 10% glycerol. The tagless Cas1 protein was separated from the fusion tag by using a second Ni-NTA affinity step. The protein was further purified by size exclusion chromatography using a Superdex 200 10/300 GL in 20 mM Tris HCl pH 7.5, 500 mM KCl, 1 mM TCEP, and 5% glycerol. The elution peak from the size exclusion was aliquoted, frozen and kept at −80 °C.

Cas2

The sequence encoding Cas2 was PCR amplified with primers PS334 and PS335 from pWJ40 and inserted into a pET-His6 MBP TEV cloning vector (Addgene Plasmid # 29656) using ligation independent cloning (LIC). Sequencing of the resultant plasmid (pPS059) confirmed the matches to the wild type sequence. The protein was expressed and purified following the same procedure as that for Cas9.

Csn2

Plasmid pPS060 was constructed by through amplification of pWJ40 as a template for polymerase chain reactions (PCRs) to clone Csn2 into pET28b-His10Smt3 using the primers PS336 and PS337. Full sequencing of cloned DNA fragment confirmed perfect matches to the original sequence. Csn2 was expressed and purified following the same method as that of Cas1. Protein concentrations for all the purifications were determined by using the Bradford dye reagent with BSA as the standard. Previously Csn2 was shown to form a tetramer34.

Cas9-Cas1-Cas2-Csn2 complex

pKW07 (His10-Cas9-Cas1-Cas2-Csn2) was constructed by amplification of pWJ40 with primers PS199/PS202 and pET16b (Novagen) with primers PS200/PS203, followed by Gibson assembly of the fragments. Plasmid pPS061 (His10Cas9-Cas1) was created by amplification of pWJ40 with primers PS202/PS355 and pET16b (Novagen) with primers PS203/PS354, followed by Gibson assembly of the fragments. Full sequencing of cloned DNA fragment was done to confirm perfect matches to the original sequence. The proteins were expressed in E. coli BL21 Rosetta 2(DE3) codon plus cells (EMD Millipore). Cultures were grown and protein was purified by Ni- affinity chromatography step, as mentioned before in Cas9 purification with minor modifications. The 200 mM imidazole eluates were dialyzed overnight against 20 mM Tris-HCl pH 7.5, 150 mM KCl, 1 mM TCEP and 10% glycerol and subjected to mass spectrometry for the identification of the co-purifying proteins. pKW06 (Cas9-Cas1-Cas2-Csn2-His6) was constructed by amplification of pWJ40 with primers PS204/PS205 and pET23a (Novagen) with primers PS206/PS207, followed by Gibson assembly of the fragments. Full sequencing of cloned DNA fragment was done to confirm perfect matches to the original sequence. The proteins were expressed in E. coli BL21 Rosetta 2(DE3) codon plus cells (EMD Millipore). Cultures were grown and protein was purified by Ni- affinity chromatography step, as mentioned before in Cas9 purification with minor modifications. The 200 mM imidazole eluates were dialyzed overnight against 20 mM Tris-HCl pH 7.5, 150 mM KCl, 1 mM TCEP and 10% glycerol. The proteins were further purified using a 5 ml SP Sepharose HiTrap column (GE Life Sciences), eluting with a linear gradient of 150 mM – 1 M KCl.

Extended Data

Extended Data Figure 1

An external file that holds a picture, illustration, etc. Object name is nihms657330f4.jpg

The S. pyogenes type II CRISPR-Cas system displays a strong bias for the acquisition of spacers matching viral protospacers with NGG PAMs

a, Analysis of bacteriophage-insensitive mutant colonies using PCR and agarose gel electrophoresis, representative of five technical replicates. Bacteria and phage were mixed in top agar and incubated overnight. DNA was isolated from individual colonies resistant to phage infection and used as template for a PCR reaction with primers (arrows) H182 and H183 (Extended Data Table 2), which amplify the 5’ end of the S. pyogenes CRISPR array. The size of the PCR band indicates the number of new spacers (shown at the top of the gel). Cells without additional spacers resist infection by a CRISPR-independent mechanisms, presumably envelope resistance. b, Analysis of acquired spacers during phage infection of a population of bacteria carrying the S. pyogenes type II CRISPR-Cas system. Liquid cultures of bacteria were infected with phage, surviving cells were collected at the end of the infection, DNA extracted and used as template for a PCR reaction as described above. Amplification products were separated by agarose gel electrophoresis and the DNA of the bands corresponding to products with additional spacers was extracted and sent for Mi-Seq next generation sequencing. Reads corresponding to newly acquired spacers were plotted according to their position in the phage ϕNM4γ4 genome (x-axis) and their abundance (y-axis). Each dot represents a unique spacer sequence; blue and red dots indicate a corresponding protospacer with an NGG or non-NGG PAM. Top and bottom plots indicate protospacers in the top and bottom strands of the ϕNM4γ4 DNA. The map as well as the different functions of the phage genes are indicated in between the plots. The raw data used to make this graph is in the Supplementary file. c, Weblogo showing the conservation of the 5’ flanking sequences of 10,000 protospacers randomly selected from the experiment shown in b. Absolute conservation of the NGG PAM was observed.

Extended Data Figure 2

An external file that holds a picture, illustration, etc. Object name is nihms657330f5.jpg

a, Analysis of bacteriophage-resistant mutants that do not acquire a new spacer. Three colonies that survived phage infection in our in-plate adaptation assay (Fig. S1a) were subjected to phage adsorption assay. Briefly, surviving colonies as well as the wild-type S. aureus RN4220 control were grown in liquid and mixed with bacteriophage. After a brief incubation, cells were pelleted by centrifugation and the phages present in the supernatant (unable to bind and infect cells) were counted on a lawn of sensitive cells. The number of plaque-forming units (pfu) of a control experiment in the absence of host cells were used to determine the 100% free-phage, or 0% adsorption, value. No plaques were observed in the control experiment using wild-type cells and this value was used to set the 100% adsorption limit. The three CRISPR-independent, bacteriophage-resistant mutants displayed a marked defect in phage adsorption (about 50 %), indicating that most likely they carry envelope resistance mutations. Error bars: mean ± s.d. (n_=3). b, cas1, cas2 and csn2 are not required for the execution of immunity using previously acquired spacers. Position within the phage ϕNM4γ4 genome of the type II CRISPR-Cas target used in this experiment. The protospacer sequence is in the bottom strand (shown in 3’–5’ direction) and flanked by a TGG PAM (in green). c, Comparison of immunity provided by a type II CRISPR-Cas system programmed to target the sequence shown in panel a in the presence (wild-type, wt) or absence (Δ_cas1,Δ_cas2_, Δ_csn2_) of cas1, cas2 and csn2. Immunity is measured as the plaque forming units (pfu) of a φNM4γ4 phage lysate spotted on top agar lawns of S. aureus RN4220 cells containing no CRISPR system (−), a wild type S. pyogenes CRISPR-Cas type II system (wt, pRH233), or the same CRISPR-Cas systems with a deletion of cas1, cas2 and csn2 genes (Δ_cas1_,Δ_cas2_, Δ_csn2_, pRH079). Error bars: mean ± s.d. (_n_=3).

Extended Data Figure 3

An external file that holds a picture, illustration, etc. Object name is nihms657330f6.jpg

Generation of an experimental system for the overexpression of cas1, cas2 and csn2 and the detection of spacer acquisition in the absence of phage infection

a, Plasmids used in the spacer acquisition experiments presented in Fig. 1c and Fig. 2c–d. pRH223 contains cas1, cas2 and csn2 from S. pyogenes under a tetracycline-inducible promoter. Cells containing this plasmid only acquired spacers when a second plasmid expressing cas9 was introduced, pRH240 or pRH241, containing the tracrRNA gene, the leader and first repeat from the S. pyogenes type II CRISPR-Cas system as well as cas9 from S. pyogenes (_cas9_Sp) or S. thermophilus (_cas9_St), respectively. The leader is a short, AT rich sequence immediately upstream of the first repeat that contains the promoter for the transcription of the CRISPR array. b, Highly sensitive PCR assay to enrich for amplification products of adapted CRISPR loci. Arrows indicate primer annealing position and direction. The forward primer (JW8) anneals on the leader. For the reverse primer, a cocktail of JW3, JW4 and JW5 was used. The three reverse primers anneal on the repeat and differ only in their 3’-end nucleotide that never matches the last nucleotide of the leader (red arrowhead). Because this nucleotide is critical for the annealing of the primers, loci that acquire spacers ending in A, C or T are preferentially amplified over unadapted loci. c, To quantify the sensibility of this technique, we mixed pGG32 (one repeat, unadapted) with pRH087 (repeat-spacer-repeat, adapted) in known ratios. The amplification of adapted plasmid was detected even when it represented 0.01% (104) of the total plasmid template, representative of three technical replicates. This highly sensitive PCR assay is not required to detect acquisition during phage infection, as in this case adapted cells survive and are enriched within the population, making their detection much easier.

Extended Data Figure 4

An external file that holds a picture, illustration, etc.
Object name is nihms657330f7.jpg

Purification of a Cas9-Cas1-Cas2-Csn2 complexes

a, The cas9-cas1-cas2-csn2 operon of S. pyogenes SF370 was cloned into the pET16b vector (generating pKW07) to add an N-terminal histidyl tag to Cas9 and express all proteins in E. coli. Purification was performed using Ni-NTA affinity chromatography. SDS-PAGE followed by Coomassie stain of the purified proteins revealed a co-purifying protein that was identified as Cas1 by mass spectrometry, representative of five technical replicates. Mass spectrometry identification of all the eluted proteins co-purifying with Cas9 is shown in Extended Data Table 2. b, The cas9-cas1-cas2-csn2 operon of S. pyogenes SF370 was cloned into the pET23a vector (generating pKW06) to add an C-terminal histidyl tag to Csn2 and express all proteins in E. coli. Purification was performed using Ni-NTA affinity chromatography followed by ion exchange chromatography. The elution fractions that constituted the peak containing the complex (Fig. 3a) were separated by SDS-PAGE and visualized by Coomassie staining, representative of three technical replicates.

Extended Data Figure 5

An external file that holds a picture, illustration, etc. Object name is nihms657330f8.jpg

dCas9St can also support spacer acquisition

A plasmid derived from pRH241 containing mutations in the active site of St Cas9 (D10A, H847A; dCas9St) was used to characterize spacer acquisition in the absence of phage infection. Upon over-expression of Cas1, Cas2 and Csn2 using anydrotetracycline (aTc), we were able to detect spacer acquisition. Sequencing of spacers and alignment of the protospacer flanking sequences demonstrated the selection of an NGGNG PAM. Image is representative of three technical replicates.

Extended Data Figure 6

An external file that holds a picture, illustration, etc. Object name is nihms657330f9.jpg

A model for the selection of PAM-flanking spacers by Cas9

After injection of the phage DNA an adaptation complex formed by Cas9, Cas1, Cas2 and Csn2 uses the Cas9 PAM binding domain to specify functional protospacers, i.e., that are followed by the correct PAM. It is not known how the protospacer sequence is extracted from the viral DNA to become a spacer. In the “cut and paste” model, a nuclease, possibly Cas1, cuts the viral DNA to generate the spacer. In the “copy and paste” model the protospacer sequence is copied first. Once loaded with the selected protospacer sequence, this complex promotes the integration of this sequence into the CRISPR array, thus becoming a new spacer. Previous studies demonstrated that Cas1 dimerizes and interacts with Cas2 (ref.13); Csn2 has been determined to form a tetramer34.

Extended Data Table 1

Primer Sequence
B337 gacgctatttgtgccgatagctaagcctattgagtatttc
B338 gaaatactcaataggcttagctatcggcacaaatagcgtc
B339 ggaaactttgtggaacaatggcatcgacatcataatcact
B340 agtgattatgatgtcgatgccattgttccacaaagtttcc
B532 ctttttccgtgatggtaactgttcatatttatcagagctcgtg
B534 gagctctgataaatatgaacagttaccatcacggaaaaaggttatg
B616 ttattttaattatgctctatcaa
B617 gagtgatcgttaaatttatactgc
H016 aggaggtgactgatgggagttcctgaatttaggatatgag
H017 taaattcaggaactcccatcagtcacctcctagctgactc
H018 ttaggatatgagtgaggcttttgatgaatcttaatttttc
H019 ttcatcaaaagcctcactcatatcctaaattcaggaactc
H020 tttgatgaatcttaataaaaatatggtataatactcttaa
H021 ttataccatatttttattaagattcatcaaaagcctcccc
H029 aaacaaaaatgttttaacacctattaacgtagtatg
H030 aaaacatactacgttaataggtgttaaaacattttt
H049 aaactgcgctggttgatttcttcttgcgctttttg
H050 aaaacaaaaagcgcaagaagaaatcaaccagcgca
H166 gaaatgtgagaagggacctctgataaatatgaacatgatgagtgatcg
H167 ggactcttttatctctactcgtgctataattatactaattttataaggagg
H168 agtataattatagcacgagtagagataaaagagtcctttggatgattcc
H169 tgttcatatttatcagaggtcccttctcacatttcaatactagactc
H176 ttgatagagcataattaaaataagatgccactcttatccatcaatcc
H177 gcagtataaatttaacgatcactctaaaacctctccaactacctccc
H182 nnnnncagcaaaattttttagacaaaaatagtc
H183 nnnnncagaagaagaaatcaaccagcgc
H227 taatggcaggttggagaacagtagtc
H228 actactgttctccaacctgccattagtcacctcctagctgactc
H229 agatttttcaaataaggagaaatgtttgaaatcatcaaactcattatggatttaatttaaactttttattttagg
H230 acatttctccttatttgaaaaatctaaatttatagaaattattatacgc
H231 aactttttattttaggaggcaaaaagcgtataataatttctataaatttagatttttcaaataagg
H232 ttttgcctcctaaaataaaaagtttaaattaaatccataatgag
H233 tgatggctggttggcgtac
H234 caacagtacgccaaccagccatcaaccctctcctagtttggc
H237 ggcgtactgatgaagattatttcttaataactaaaaatatgg
H238 tttagttattaagaaataatcttcatcagtacgccaaccagcc
H276 ttgatcaaaaacaatatacgtctacaaaagaag
H277 tagacgtatattgtttttgatcaattgttgtatcaa
H289 agcgcttgggagaaattcaaagaaatttatcagcc
H290 tttctttgaatttctcccaagcgctttcaaaacgc
H312 gatattatggcaccatttaggcctttagtgg
H313 aaaggcctaaatggtgccataatatcgctagc
H336 catactcaattggacttgctattggaacgaatagtgttgg
H337 cgttccaatagcaagtccaattgagtatggcttagtc
H338 gtaattatgatattgatgctattattcctcaagc
H339 gaggaataatagcatcaatatcataattacttaatc
JW3 aaaacagcatagctctaaaacg
JW4 aaaacagcatagctctaaaaca
JW5 aaaacagcatagctctaaaact
JW8 ggcttttcaagactgaagtctag
L400 cgaaattttttagacaaaaatagtc
oGG82 aacattgccgatgataacttgag
oGG83 gttttgggaccattcaaaacagcatagctctaaaacctcgtag
PS192 CGCGGATCCATGGCTGGTTGGCGTACTGTTGTGG
PS193 CGCCTCGAGTCATATCCTAAATTCAGGAACTCC
PS199 CGAGCATATGACGACCTTCGATATGATCGGCAATGTTGAATGGAGACCATTC
PS200 GAATGGTCTCCATTCAACATTGCCGATCATATCGAAGGTCGTCATATGCTCG
PS202 CATCATCATCATCATCACAGCAGCGGCATGGATAAGAAATACTCAATAGG
PS203 CCTATTGAGTATTTCTTATCCATGCCGCTGCTGTGATGATGATGATGATG
PS204 CGACAAGCTTGCGGCCGCACTCGAGCTTTTTATTTTAGGAGGCAAAAATG
PS205 GGATCTCAGTGGTGGTGGTGGTGGTGTACCATATTTTTAGTTATTAAGAAATAATC
PS206 GATTATTTCTTAATAACTAAAAATATGGTACACCACCACCACCACCACTGAGATCC
PS207 CATTTTTGCCTCCTAAAATAAAAAGCTCGAGTGCGGCCGCAAGCTTGTCG
PS284 GCTAGCGATATTATGGcACCATTTAGGCCTTTAG
PS285 CTAAAGGCCTAAATGGTgCCATAATATCGCTAGC
PS334 TACTTCCAATCCAATGCAATGAGCTATCGCTATATG
PS335 TTATCCACTTCCAATGTTATTATTAGCTTTCATCAAAGGC
PS336 CGCGGATCCATGAACCTGAACTTTAGCCTGCTGG
PS337 CGCCTCGAGTTACACCATATTTTTGGTAATCAG
PS354 GTTCCTGAATTTAGGATATGAAACATTGCCGATCATATCGAAGG
PS355 CCTTCGATATGATCGGCAATGTTTCATATCCTAAATTCAGGAAC

Extended Data Table 2

Mass spectrometry analysis of proteins purified through Ni-NTA shown in Extended Data Fig. 4a.

Figure Spacer Sequence PAM Target
2b 1 gcaacaatgggaaccaagctatgttgatag aGGgt phage
1st logo 2 gagaacaaaaccatcctacccggtaataaa tGGta phage
3 aatagagatactttatctaacatgatacac gGGag phage
4 ccattttagatttcaaaagtttagtatctat aGGca phage
5 agtattggaatctgatgaatattcatctct cGGta phage
6 agaaaatttatacattgattattcaccaac aGGca phage
7 acatactccaaacaattgatggatttgtgt aGGtg phage
8 gctaagactgtgaagcataatactgctact aGGta phage
9 ttttaagctattcattttaaaaggtcatat gGGca phage
10 acttatgccgtttctatacttcactacagca tGGtc phage
11 atgaatggattgaagagaacacagacgaac aGGac phage
12 ccacaaatagaaatagagctagggagtttaa cGGta phage
13 attagttactccacaaatagaaatagagct aGGga phage
14 ggagtaactaatatctgaattgttatcagt tGGtt phage
15 tagttttttgagtatgcttactttttcttg tGGtt phage
16 tgaacgaattgtcagtatgtacagattaat aGGaa phage
17 cattacggacgtagtagaagcaattagaaa tGGaa phage
18 tggatatgacgaccaagatttagcgtttta aGGtg phage
19 cgacataacgctaatacatgtttgtcatag tGGtt phage
20 acaaacttaacaatagtggttttttcaaga gGGag phage
2b 1 agagtacaatattgtcctcattggagacac tGGgG phage
2nd logo 2 tgtttgggaaaccgcagtagccatgattaa gGGtG phage
3 ctcatattcgttagttgcttttgtcataaa aGGtG phage
4 tttatgtctatatactcaaagtaatcatttt cGGaG phage
5 taatatcaacggtatgtgggtgtctggtga cGGtG phage
6 aataagtctaaaaaaccaacgtttaatgat tGGgG phage
7 gttgatattacgttcatagaacatacctga tGGtG phage
8 tcaatgtttggtacaagttggtcacagata tGGaG phage
9 ttagttactccacaaatagaaatagagcta gGGaG phage
10 caattgtttttcttggaaatcatatttata cGGcG phage
11 tatctaagtttgccaattattacattaaagc tGGtG phage
12 taggacatagagatgaaaaaacgactataa aGGtG phage
13 tgaagaaatgattcaagaaacacaaaagag tGGcG phage
14 tcggactgttagggtacgcgaagggcaaaa aGGaG phage
15 aatactttcttctaaaaaacctaagtcaac aGGaG phage
16 taatccaattacaacattaaaaattaatga cGGaG phage
17 acaatgttaagcaaccagcacattacacata cGGcG phage
18 ggattttaaaataaaagtaaatgttgatac tGGcG phage
19 caggcaatgttattttatcggattttaaaaa cGGcG phage
20 agaatctttattattagctgacttacaaga aGGtG phage
21 aaaaccccaatatcttttaaaaataaagtt aGGtG phage
22 tagggcaatgattgaagaatttgatgataa cGGaG phage
2b 1 aaaggcaacatatttgaatcatcacatttat tGGaG phage
3rd logo 2 ttggaatggaattaaacaataaaactttta tGGaG phage
3 atattcatcagattccaatactacgttaat aGGtG phage
4 acaatttaaaaattagaaatgtaaatgtag aGGtG phage
5 cagaatgaactatgaaacaggggtccaact aGGtG phage
6 acataacatcaaaaccctttctgaagaaat tGGtG phage
7 taagttgtttgaaatgtacgagatggaagg aGGaG phage
8 atacgtgtaaagacatattagatcgagtca aGGaG phage
9 tgtgcaggagctacgttcaataaatgtgaa aGGaG phage
10 ttaagaaagttattgtcatcgagcttaaat tGGtG phage
11 acacacatactaaacctgaacgattaagga gGGgG phage
12 tttaccaacatccttagttgatagattttt aGGcG phage
13 gtttgaatacgttccgtttctgatacccagt aGGcG phage
14 aagttaaaaagaatttaaagtcaagaagta tGGgG phage
15 attctcagaagatagcgaagatgggagaaa aGGaG phage
16 tgagcgactgctgggtgtgcttcgaatagtt tGGcG phage
17 taatatatgctcatacttaattgaattgtc tGGtG phage
18 atcttcttttttaatacgtccatcaacaag cGGtG phage
19 cgatattggcggtgtgaataataactttaa aGGaG phage
20 caacgagctggcaacaacataagatgacag aGGcG phage
2b 1 taaactactacgacttaagcaggtgccata tGGca phage
4th logo 2 gacaaatgctattcaacattcagttaaaga aGGta phage
3 acaattattaattgaacaagcgcaagctaa cGGct phage
4 cacatcaattagtaagacgccaaaagtaac aGGta phage
5 aaacgatgagtacacaaaatacaaaatcta cGGca phage
6 gtaataatatttttaataacctcaacatct tGGtc phage
7 tcatgaaaaagtgaattgctagtagtgtgt tGGtc phage
8 tacgctatcgcaaaagcagtcaaagctaaa gGGca phage
9 agggaatcttacagttattaaataactatt tGGat phage
10 aaaacgagcaaattaagtggtacgtagaca aGGgt phage
11 ctaaatgttgccatttcgttatctcctttc tGGta phage
12 actggatgacattgaacaaagcaccgaata tGGcc phage
13 taaatatttgataacaacattatacacgaa aGGag phage
14 cacatcaattagtaagacgccaaaagtaac aGGta phage
15 aaggtgatgacggcgaatggtacacaacata tGGtc phage
16 taacgacggtacttattccgtcgttgctac tGGtg phage
17 ataaataaaaaagttactactcacacacta aGGca phage
18 tctaggttcgaactcttctttaaatttaat aGGca phage
19 ctcatcaatatcattctgattggttatttt gGGat phage
20 tctctttgataaataactttatccacataa aGGtg phage
21 ttagacttttactttccattacttaaatca tGGtc phage
22 aatttgttcttgcgcttcaatagtgatagt aGGgt phage
23 ataagtctaaaaaaccaacgtttaatgatt gGGga phage
2d 1 acatgttatgcatatcgtaagtgaagtcac aGGta chromosome
1st logo 2 agatcaaattgtaacaactaatcctattgc aGGta chromosome
3 gtttcagcaatatatctcttagtgcatcac cGGtt chromosome
4 tacaatgtaggctgctctacacctagcttc tGGgc pRH223
5 tttgattacaatggcacatgtacttatgcc tGGat chromosome
6 catttgtcttagcacatgaattaggtcatgc aGGtc chromosome
7 catgattgcacccattgttgcacctagtac aGGtt chromosome
8 taccaataacttaagggtaactagcctcgc cGGca chromosome
9 gagtatgtttgcgcgtgaagtggttgtgtc tGGat pRH223
10 agaatggttagatttatggcgtgatgtaac gGGca chromosome
11 ctgcttccatgataactggaccatcagcaac cGGat chromosome
12 gattgaagctacaatacctgatgttgctgc gGGaa chromosome
13 cgaaatacttggctaagcacgacgaggcct tGGtg pRH223/240
14 gctctacacctagcttctgggcgagtttac gGGtt pRH223
15 atatggaagttacattttttggaacgagtgc aGGtt chromosome
16 ttatgaagcgttacgtcaacaagattttcc aGGat chromosome
17 ttatcgaagtatacgagttcacagaagaac aGGct chromosome
18 ccagttcttgttgttttggtgctttagtca aGGtt chromosome
19 tggatgatcttgtctttcatgtgtacctgt tGGaa chromosome
20 caggatttagttttcctagcggtcatgcta tGGga chromosome
2d 1 ttgactatcaaatgtctttttcaatgtttc gGGtG chromosome
2nd logo 2 atccgttctgcagaagagattgtttcttgc aGGcG pRH223
3 tgaacatttcgattatgtattaatgagtgc tGGtG chromosome
4 catctttaggacgaatgccagcacgttctgc tGGaG chromosome
5 caccatgttaaaaatacctccatcatcacc aGGaG pRH223/240
6 tcgtgagacagttcggtccctatccgtcgt gGGcG chromosome
7 tttgcgcagtcggcttaaaccagttttcgc tGGtG pRH223
8 aaagaagtcataagtaccatgacttgagtt tGGtG chromosome
9 ctaatttttcttcttcaacaccatctatggc tGGcG chromosome
10 ccaagtattcaaagttggaacgggtggtct aGGtG chromosome
11 atccgttctgcagaagagattgtttcttgc aGGcG pRH223
12 tttgcgcagtcggcttaaaccagttttcgc tGGtG pRH223
13 aacgcgtatacatagcaagcgttctcatgt tGGaG chromosome
14 agtttgggagtcaattatcggctttttaac tGGcG chromosome
3d 1 tgacttctctgaagagccatctttttgcact tGGaa chromosome
1st logo 2 ggtcagatgcaattcgacatgtggacggac tGGtt pRH223
3 atcttttctagcttttctccaagcacagac aGGac chromosome
4 gttggtctaattgtttcaatagttccacct tGGtc chromosome
5 tgccggttggggtggctgagacggcaccct aGGaa chromosome
6 tgagtatgtttgcgcgtgaagtggttgtgtc tGGat pRH223
7 ttgagttagaaaacggtcgtaaacggatgc tGGct pRH242
8 agtttgggagtcaattatcggctttttaac tGGcg chromosome
9 aattaagaaatcttctaaccaactgattgc tGGaa chromosome
10 aacagaaagaataggaaggtatccgactgc tGGta pRH223/240
11 tggtattgtaggcgttattttaggtattcc gGGat chromosome
12 aaatctcagcaggacaagctggtacaggtgc tGGtt chromosome
13 ctcaagagatttggagcatccaatcaatgc aGGtc pRH223
14 ctaaggtggcaccacggtaacgcgtccttac aGGta chromosome
15 tgattaaacttaaaaatgtattacctagtgc aGGta chromosome
16 atttgagtcagctaggaggtgactgatggc tGGtt pRH223
17 ataagagaagatgctagacgtataagttcac tGGtc chromosome
18 acgttttatctgtatttgcgacaatcgttg gGGta chromosome
19 ataacatacgccgagttatcacataaaagc gGGaa pRH223
20 gcattttaaacaaaaaaagatagacagcac tGGca pRH223
21 aatcccagttagaacaaacgctaaaatggc gGGcc chromosome
22 taccaataacttaagggtaactagcctcgc cGGca chromosome
3d 1 gaagtctagctgagacaaatagtgcgatta caaaa pRH223/240
2nd logo 2 agcatagctctaaaacctcgtagactattt ttgtc pRH223/240
3 aaattttttagacaaaaatagtctacgag gtttt pRH223/240
4 aagtcgaacttcataatcatcgctttcgg catat chromosome
5 ccaatttctacagacaatgcaagttggggt gtggg chromosome
6 gttatttctgaaatgcccgttacatcacgc cataa pRH243
7 tgtttgccctccaaatatgaaaacatggcc cggta chromosome
8 atgagatgaggcgataaaagaacgtcgcta aaacg chromosome
9 tactacttcaaggaattctatagaacctac tatat chromosome
10 gtaccacagtgccacatgttggcaattggc gagac chromosome
11 taaagctggtgaagcgattaacactgtacc aagta chromosome
12 atttcttcgttattagaaatataaaattgc gttgt chromosome
13 attttttatgattaagccatatggggttaa gcaag pRH223/240
14 aaagactgggatccaaaaaaatatggtggt tttga pRH243
15 attttcaaatgcataaaaactgtttctcaac gatat chromosome
16 ttttgtattggaatggcattttttgctatc aaggt chromosome
17 taaaacaggaccacttgtcatgtaagcttt aagtt pRH223/240
18 tgtatcttgtggtttcatctgtgctaacttt ggcag chromosome
19 agaggatgcagaacgtgcaatcttagctgc aagac chromosome
20 ttcaaacgagaataattatggcgttggttta ggtat chromosome
21 gcgaatacactcattaaaacaattgcatcc tgatt chromosome
22 tcttatcttgataataagggtaactattgc cgatg chromosome
23 caaataaaggtgcgttattaataacagtgc caggc chromosome
24 acaacagtacgccaaccagccatcagtcac ctcct pRH223
S6 1 cagctaacaatgccatgattggtcggctga gGGaG pRH223
2 tggtaaatttacagaagatgctgaagatgc tGGtG chromosome
3 gagtcagctaggaggtgactgatggctggt tGGcG pRH223
4 caaataagtctagacatattagctcgttatc aGGtG chromosome
5 acgaccttgttgcaacatagcgccccactc tGGtG chromosome
6 acatgttatgcatatcgtaagtgaagtcac aGGtG chromosome
7 atccgttctgcagaagagattgtttcttgc aGGcG pRH223
8 agatgcttgttgtgttgtttgtgttgatgc cGGtG chromosome
9 atccgttctgcagaagagattgtttcttgc aGGcG pRH223
10 agtttgggagtcaattatcggctttttaac tGGcG chromosome
11 aaaaagttatctcgtagacattacactggc tGGgG pRH245

Extended Data Table 3

Mass spectrometry analysis of protein bands from the purified Cas9-Cas1-Cas2-Csn2 complex shown in Extended Data Fig. 4b.

Accession Protein %Coverage UniquePeptides Totalpeakarea
Cas9 83.26 170 9.7×1010
Cas1 91.35 40 1.9×1010
Cas2 84.07 13 1.9×109
Csn2 91.82 18 2.9×109
P77398 Bifunctional polymyxin resistance protein ArnA (arnA) 85.76 43 8.2×108
P60422 50S ribosomal protein L2 (rplB) 67.40 24 1.9×109
P17169 Glucosamine--fructose-6-phosphate aminotransferase (glmS) 79.31 38 1.8×108
P0AA43 Ribosomal small subunit pseudouridine synthase A (rsuA) 85.71 17 8.9×108
P0A9K9 FKBP-type peptidyl-prolyl cis-trans isomerase (slyD) 68.88 7 3.7×109
P0ACJ8 Catabolite gene activator (crp) 82.86 18 5.4×108
P45395 Arabinose 5-phosphate isomerase (kdsD) 73.17 21 1.2×108
P0A6F5 60 kDa chaperonin (groL) 83.94 38 2.8×108
P0A9A9 Ferric uptake regulation protein (fur) 78.38 8 1.2×109
P08622 Chaperone protein DnaJ (dnaJ) 72.07 19 1.4×109
P00393 NADH dehydrogenase (ndh) 59.22 16 3.6×108

Extended Data Table 4

Oligonucleotides used in this study.

Protein % Coverage Unique Peptides Total peak area
Cas1 67.82 26 3.4×108
Cas2 90.27 13 1.2×109
Cas9 68.49 111 4.1×108
Csn2 82.27 19 4.1×108

Acknowledgements

We thank members of the lab for critical discussion of the experiments and their results, Alexey Zaytsev for help with the deep sequencing data analysis and Andrew Sherlock for help with plasmid construction. R.H. is the recipient of a Howard Hughes International Student Research Fellowship. P.S. is supported by a Helmsley Postdoctoral Fellowship for Basic and Translational Research on Disorders of the Digestive System at The Rockefeller University. J.W.M is a Fellow of The Jane Coffin Childs Memorial Fund for Medical Research. D.B. is supported by a Harvey L. Karp Discovery Award and the Bettencourt Schuller Foundation. L.A.M is supported by the Rita Allen Scholars Program, an Irma T. Hirschl Award, a Sinsheimer Foundation Award and a NIH Director’s New Innovator Award (1DP2AI104556-01).

Footnotes

Author Contributions. RH, PS, DB and LAM conceived the study and designed experiments. RH and PS executed the experimental work with help from CW. JWM set up the experimental system to detect spacer acquisition in the absence of phage infection. GWG isolated and characterized phage ϕNM4γ4 and constructed the pGG32 plasmid. DB analyzed MiSeq data. LAM wrote the paper with the help of the rest of the authors.

The authors have no conflicting financial interests.

References

1. Barrangou R, Marraffini LA. CRISPR-Cas systems: prokaryotes upgrade to adaptive immunity. Mol. Cell. 2014;54:234–244. [PMC free article] [PubMed] [Google Scholar]

2. Barrangou R, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. [PubMed] [Google Scholar]

3. Deltcheva E, et al. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature. 2011;471:602–607. [PMC free article] [PubMed] [Google Scholar]

4. Makarova KS, et al. Evolution and classification of the CRISPR-Cas systems. Nat. Rev. Microbiol. 2011;9:467–477. [PMC free article] [PubMed] [Google Scholar]

5. Garneau JE, et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature. 2010;468:67–71. [PubMed] [Google Scholar]

6. Jinek M, et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337:816–821. [PMC free article] [PubMed] [Google Scholar]

7. Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature. 2014;507:62–67. [PMC free article] [PubMed] [Google Scholar]

8. Gasiunas G, Barrangou R, Horvath P, Siksnys V. Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl. Acad. Sci. U.S.A. 2012;109:E2579–E2586. [PMC free article] [PubMed] [Google Scholar]

9. Szczelkun MD, et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc. Natl. Acad. Sci. U.S.A. 2014;111:9798–9803. [PMC free article] [PubMed] [Google Scholar]

10. Diez-Villasenor C, Guzman NM, Almendros C, Garcia-Martinez J, Mojica FJ. CRISPR-spacer integration reporter plasmids reveal distinct genuine acquisition specificities among CRISPR-Cas I-E variants of Escherichia coli. RNA Biol. 2013;10:792–802. [PMC free article] [PubMed] [Google Scholar]

11. Yosef I, Goren MG, Qimron U. Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 2012;40:5569–5576. [PMC free article] [PubMed] [Google Scholar]

12. Datsenko KA, et al. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nat. Commun. 2012;3:945. [PubMed] [Google Scholar]

13. Nunez JK, et al. Cas1-Cas2 complex formation mediates spacer acquisition during CRISPR-Cas adaptive immunity. Nat. Struct. Mol. Biol. 2014;21:528–534. [PMC free article] [PubMed] [Google Scholar]

14. Jiang W, Bikard D, Cox D, Zhang F, Marraffini LA. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 2013;31:233–239. [PMC free article] [PubMed] [Google Scholar]

15. Anders C, Niewoehner O, Duerst A, Jinek M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature. 2014;513:569–573. [PMC free article] [PubMed] [Google Scholar]

16. Horinouchi S, Weisblum B. Nucleotide sequence and functional map of pC194, a plasmid that specifies inducible chloramphenicol resistance. J. Bacteriol. 1982;150:815–825. [PMC free article] [PubMed] [Google Scholar]

17. Goldberg GW, Jiang W, Bikard D, Marraffini LA. Conditional tolerance of temperate phages via transcription-dependent CRISPR-Cas targeting. Nature. 2014;514:633–637. [PMC free article] [PubMed] [Google Scholar]

18. Kreiswirth BN, et al. The toxic shock syndrome exotoxin structural gene is not detectably transmitted by a prophage. Nature. 1983;305:709–712. [PubMed] [Google Scholar]

19. Bae T, Baba T, Hiramatsu K, Schneewind O. Prophages of Staphylococcus aureus Newman and their contribution to virulence. Mol. Microbiol. 2006;62:1035–1047. [PubMed] [Google Scholar]

20. Sapranauskas R, et al. The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucleic Acids Res. 2011;39:9275–9282. [PMC free article] [PubMed] [Google Scholar]

21. Li M, Wang R, Zhao D, Xiang H. Adaptation of the Haloarcula hispanica CRISPR-Cas system to a purified virus strictly requires a priming process. Nucleic Acids Res. 2014;42:2483–2492. [PMC free article] [PubMed] [Google Scholar]

22. Richter C, et al. Priming in the Type I-F CRISPR-Cas system triggers strand-independent spacer acquisition, bi-directionally from the primed protospacer. Nucleic Acids Res. 2014;42:8516–8526. [PMC free article] [PubMed] [Google Scholar]

23. Horvath P, et al. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus. J. Bacteriol. 2008;190:1401–1412. [PMC free article] [PubMed] [Google Scholar]

24. Fonfara I, et al. Phylogeny of Cas9 determines functional exchangeability of dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems. Nucleic Acids Res. 2014;42:2577–2590. [PMC free article] [PubMed] [Google Scholar]

25. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. [PMC free article] [PubMed] [Google Scholar]

26. Wiedenheft B, et al. Structural basis for DNase activity of a conserved protein implicated in CRISPR-mediated genome defense. Structure. 2009;17:904–912. [PubMed] [Google Scholar]

27. Jinek M, et al. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science. 2014;343:1247997. [PMC free article] [PubMed] [Google Scholar]

28. Arslan Z, Hermanns V, Wurm R, Wagner R, Pul U. Detection and characterization of spacer integration intermediates in type I-E CRISPR-Cas system. Nucleic Acids Res. 2014;42:7884–7893. [PMC free article] [PubMed] [Google Scholar]

Methods references

29. Horinouchi S, Weisblum B. Nucleotide sequence and functional map of pE194, a plasmid that specifies inducible resistance to macrolide, lincosamide, and streptogramin type B antibodies. J. Bacteriol. 1982;150:804–814. [PMC free article] [PubMed] [Google Scholar]

30. Duplessis M, Moineau S. Identification of a genetic determinant responsible for host specificity in Streptococcus thermophilus bacteriophages. Mol. Microbiol. 2001;41:325–336. [PubMed] [Google Scholar]

31. Gibson DG, et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods. 2009;6:343–345. [PubMed] [Google Scholar]

32. Bikard D, et al. Exploiting CRISPR-Cas nucleases to produce sequence-specific antimicrobials. Nat. Biotechnol. 2014;32:1146–1150. [PMC free article] [PubMed] [Google Scholar]

33. Moore SD, Prevelige PE., Jr A P22 scaffold protein mutation increases the robustness of head assembly in the presence of excess portal protein. J. Virol. 2002;76:10245–10255. [PMC free article] [PubMed] [Google Scholar]

34. Arslan Z, et al. Double-strand DNA end-binding and sliding of the toroidal CRISPR-associated protein Csn2. Nucleic Acids Res. 2013;41:6347–6359. [PMC free article] [PubMed] [Google Scholar]