Evolutionary Genomics of Defense Systems in Archaea and Bacteria (original) (raw)

. Author manuscript; available in PMC: 2018 Apr 13.

Abstract

Evolution of bacteria and archaea involves an incessant arms race against an enormous diversity of genetic parasites. Accordingly, a substantial fraction of the genes in most bacteria and archaea are dedicated to antiparasite defense. The functions of these defense systems follow several distinct strategies, including innate immunity; adaptive immunity; and dormancy induction, or programmed cell death. Recent comparative genomic studies taking advantage of the expanding database of microbial genomes and metagenomes, combined with direct experiments, resulted in the discovery of several previously unknown defense systems, including innate immunity centered on Argonaute proteins, bacteriophage exclusion, and new types of CRISPR-Cas systems of adaptive immunity. Some general principles of function and evolution of defense systems are starting to crystallize, in particular, extensive gain and loss of defense genes during the evolution of prokaryotes; formation of genomic defense islands; evolutionary connections between mobile genetic elements and defense, whereby genes of mobile elements are repeatedly recruited for defense functions; the partially selfish and addictive behavior of the defense systems; and coupling between immunity and dormancy induction/programmed cell death.

Keywords: antivirus defense, mobile genetic elements, innate immunity, adaptive immunity, dormancy, programmed cell death, restriction-modification, toxins-antitoxins, CRISPR-Cas

INTRODUCTION

It is well established that the most abundant biological entities in the biosphere are viruses, in particular, tailed bacteriophages. In most environments, the ratio of the number of virus particles to the number of cells is between 10 and 100 (140, 154, 155). Furthermore, genomes of most cellular life-forms, including the compact genomes of bacteria and archaea, contain numerous integrated mobile genetic elements (MGEs), such as transposons and proviruses, that make up the vast prokaryotic mobilome (47, 164). Strikingly, transposases are the most common genes detected in metagenomes (6). This enormous abundance is paralleled by the remarkable genetic diversity of viruses and other mobile elements that could be the principal reservoir of new genes on earth. Both theory and observation indicate that persistence of MGEs of different kinds is virtually inevitable throughout the evolution of life (68, 78, 160). Indeed, viruses and/or mobile elements have been found in association with virtually every cellular organism that has been studied in sufficient detail, with the possible exception of some intracellular parasitic and symbiotic bacteria. Thus, the entire history of life is the story of host-parasite coevolution, in which a key element is the perennial arms race (45, 85, 151). Driven by the arms race with parasites, cellular organisms have evolved extremely elaborate and diverse defense systems that function via several distinct strategies, and the great majority of organisms deploy more than one such strategy.

Based on their principles of action, the defense mechanisms of bacteria and archaea can be classified into three broad groups: (a) resistance based on variation of virus receptors, (b) immunity, and (c) dormancy induction and programmed cell death (101, 109). Resistance involves programmed mechanisms of receptor change, such as phase variation, and in some cases, physical masking of receptors, such that virus binding and penetration are precluded (25, 65, 94). The immunity systems’ function depends on the ability to distinguish genomes of invaders (nonself) from the host (self) genome and protect the latter while inactivating the former. The immune systems themselves are naturally divided into the relatively nonspecific innate immunity and the highly specific adaptive (acquired) immunity. The best-characterized innate immunity systems are the extremely numerous and diverse restriction-modification (RM) modules that employ methylation to label and thus protect the self genomic DNA while cleaving any unmodified nonself DNA (89, 126, 175). A distinct variant of RM is DNA phosphorothioation (known as the DND system), which labels self DNA by phosphorothioation instead of methylation and destroys unmodified, nonself DNA (38). Recently, two additional bacterial and archaeal innate immunity systems were characterized. One, bacteriophage exclusion, can be considered yet another major variation on the RM theme, in which, however, the target foreign DNA is not degraded (8, 52). The second newly discovered type of innate immunity, centered on the prokaryotic Argonaute proteins, generates guide RNA or DNA molecules from the invader genomes and utilizes them to recognize and destroy the targets (63, 158).

Unlike the innate immunity systems, which attack nonself invaders (more or less) indiscriminately, the adaptive immunity systems, represented by CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated genes), memorize the encounters with infectious agents by incorporating pieces of foreign genetic information into the host genome and attack invaders specifically upon new encounters using the cognate guide RNAs (7, 72, 105, 166, 174).

The second major type of defense system functions through induction of dormancy or programmed cell death in response to infection. Numerous and enormously diverse toxin-antitoxin (TA) systems belong in this category. In this type of defense, infection disrupts the balance between toxin and antitoxin, resulting in activation of the former and abrogation of central cellular functions, such as translation, hence precluding virus reproduction (50, 94, 108). In addition to the TA systems, the functionally similar abortive infection (ABI) systems often employ the mechanism of programmed cell death or dormancy (28).

Recent comparative genomic studies taking advantage of the rapidly growing amount and diversity of genomic and metagenomics sequences, and in some cases focused specifically on the search for new defense systems, have substantially expanded the range of identified defense functions. In addition, previously unknown connections between different types of defense systems have been revealed, and several general trends in the evolution of bacterial and archaeal defense have been elucidated.

In this review article, we present a current snapshot of the diversity of bacterial and archaeal immune and dormancy induction systems, highlight new discoveries, and discuss emerging generalizations on the evolution of antivirus defense in prokaryotes.

A COMPREHENSIVE CENSUS OF DEFENSE SYSTEMS IN BACTERIAL AND ARCHAEAL GENOMES

The fraction of bacterial and archaeal genes that encode defense systems varies broadly across microbial diversity, from complete absence, in intracellular parasitic bacteria with the smallest known genomes, to about 10% (Figure 1). All types of defense systems show bell-shaped distributions that are at first approximation compatible with random scatter and suggest that there are no distinct classes of defense-rich and defense-poor microbes. Compared to our previous similar census (109), the distributions have shifted to the left owing to the discovery of numerous genomes with few identifiable defense systems, the so-called candidate phyla radiation (CPR) of bacteria, and novel phyla of archaea with small genomes (17, 62). Clearly, these distributions represent the low bound for each type of defense system because, given the characteristic rapid evolution of defense genes, many additional variants most likely remain to be identified; moreover, completely new defense strategies are expected to be discovered, adding to the overall count.

Figure 1.

Figure 1

Abundance of the major types of defense systems in bacterial and archaeal genomes. Smoothed probability density for the distributions across 4,961 complete genomes of bacteria and archaea is shown. The number of genes in each category was calculated as described earlier (109). Abbreviations: ABI, abortive infection; CRISPR, clustered regularly interspaced short palindromic repeats; RM, restriction modification; TA, toxin-antitoxin.

The overall occurrence of defense systems shows nearly perfect linear scaling with the total number of genes in microbial genomes (109). However, this linear scaling results from significantly different trends for distinct types of defense cancelling out. The number of TA genes scales superlinearly, as a power of ∼1.3 of the total number of genes, and ABI system genes take an approximately constant fraction of the genome (∼1 per 1,000 genes), whereas RM and cas genes scale sublinearly (exponents of 0.65 and 0.73, respectively) with the genome size (Figure 2). The superlinear scaling of TA genes has been observed previously (109) and remains consistent. In contrast, our previous analysis has failed to show significant scaling with genome size for CRISPR-Cas systems (exponent indistinguishable from zero). The change in the observed trend, again, can be linked to the discovery of many small bacterial genomes, most of which conspicuously lack CRISPR-Cas systems (17, 18). The differential scaling of defense systems with genome size implies that the most informative approach for comparative analysis requires comparing the observed occurrence of defense genes relative to the expected occurrence given the genome size in the respective microbes. The biological underpinnings of the class-specific scaling factors for defense systems remain difficult to infer; some of these trends might have to do with additional, nondefense functions of defense genes (172).

Figure 2.

Figure 2

Scaling of the major types of defense systems with the total number of genes. Tan data points represent the total abundance of the respective defense systems in individual genomes; lines show the power law scaling relative to the genome size. The number of genes in each category and the power law scaling parameters were calculated as described earlier (109). Abbreviations: ABI, abortive infection; CRISPR, clustered regularly interspaced short palindromic repeats; RM, restriction modification; TA, toxin-antitoxin.

In our previous survey of prokaryotic defense systems, principal component analysis led to the delineation of four seemingly discrete classes that apparently corresponded to distinct defense strategies (109). This distinct structure disappeared from the current analysis of the substantially expanded genome collection, indicating that there is neither strong association nor antagonism between the representation of different classes of defense systems in bacterial and archaeal genomes (Figure 3). Nevertheless, the observed smoothing of the distribution does not imply absence of significant biases in the representation of defense systems in particular bacterial or archaeal taxa, or in microbes with a specific lifestyle. The main trends reported previously remain in place, namely, the pronounced enrichment of defense systems in archaea compared to bacteria and in thermophiles (especially hyperthermophiles) compared to mesophiles and psychrophiles (109). These two trends, the connections of defense system representation with domain-level taxonomy and growth temperature, seem to be independent of each other. Temperature dependence is particularly dramatic in the case of the CRISPR-Cas systems that are virtually ubiquitous among hyperthermophiles but are only found in about one-third of mesophiles (105, 106).

Figure 3.

Figure 3

Principal component analysis of bacterial and archaeal defense systems (abortive infection, CRISPR, restriction modification, toxin-antitoxin). Excess or deficit (in natural log scale) of the five groups of defense systems relative to the expectations, derived from the genome size, was projected into the space of the first two principal components, as described previously (109).

At the next level of granularity, significant enrichment or depletion of defense systems (relative to the expectation derived from the genome size and the trends in Figure 2) is noticeable in several bacterial and archaeal phyla (Table 1). Observations that stand out include the substantial enrichment of all classes of defense systems in Chlorobi, dramatic enrichment of CRISPR-Cas in Thermotogae in contrast to a paucity of other defense mechanisms, and major enrichment of CRISPR-Cas in Crenarchaeota. In contrast, the apparent depletion of all classes of defense systems in Thaumarchaeota implies the existence of unrecognized defense machineries awaiting discovery; the same most likely applies to the CPR bacteria (not included in Table 1 because only a few genomes are currently complete).

Table 1.

Over- and underrepresentation of defense systems in bacterial and archaeal phyla

ABI CRISPR RM TA Unknown
Archaea
Crenarchaeota −0.640 1.106 −0.413 0.459 −0.900
Euryarchaeota −0.286 0.343 0.103 −0.069 −0.185
Thaumarchaeota −0.815 −0.944 −0.319 −1.643 −0.975
Bacteria
Acidobacteria −0.038 −0.678 −0.385 0.326 −0.387
Actinobacteria −0.249 −0.179 −0.286 −0.394 −0.103
Aquificae −1.025 0.604 −0.719 −0.192 −0.078
Bacteroidetes 0.165 −0.449 −0.086 −0.642 −0.056
Chlamydiae −0.079 −0.802 −1.563 −1.093 0.174
Chlorobi 0.745 0.696 0.348 0.978 0.488
Chloroflexi −0.464 0.298 0.307 −0.094 −0.065
Cyanobacteria 0.114 −0.150 0.730 −0.169 −0.172
Deinococcus-Thermus −0.821 0.192 −0.193 0.026 −0.147
Firmicutes 0.333 −0.233 −0.349 −0.336 −0.144
Fusobacteria −0.046 0.083 0.048 0.122 0.350
Nitrospirae −0.593 0.160 0.398 0.785 −0.030
Peregrinibacteria −0.982 −0.778 −0.562 −0.326 −2.405
Planctomycetes 0.125 0.105 0.247 −0.923 −0.193
Proteobacteria −0.783 −0.682 −0.440 −0.309 −0.132
Spirochaetes −0.161 −0.345 −0.193 −0.204 0.133
Tenericutes −0.201 −1.313 0.352 −1.219 −0.209
Thermotogae −0.105 0.839 −0.596 −0.564 −0.722
Verrucomicrobia −0.261 −0.632 −0.241 −0.830 −0.341
Unclassified −0.189 −0.604 0.080 −0.132 −0.319

INNATE IMMUNITY

Restriction Modification

RM systems are the first type of bacterial defense against foreign DNA that were discovered and studied in great detail, thanks primarily to the enormous utility of restriction endonucleases (REases) (89, 126, 132, 175).

The latest classification structure divides RM systems into four types (I–IV) on the basis of the subunit composition, ATP (GTP) requirement, and cleavage mechanism (132, 138, 139). type II RM systems are the simplest and by far the most common and are mostly used for experimental applications because they cleave target DNA at highly specific sites. Subtypes of the type II systems are primarily based on cleavage specificity (138). The type II systems consist solely of the methyltransferase-REase pair that is typically encoded within the same operon, although cases of apparent disjointed localization of the two genes have been reported (40). The most complex, ATP-dependent type I RM systems encompass three genes that encode the R (restriction), M (modification), and S (specificity) subunits of the RMS complex; the R subunit, in addition to REase, also contains a distinct ATPase domain that belongs to helicase superfamily II (15, 99, 163). Type III RM systems resemble type II systems in that they consist of only R and M subunits, but they are similar to type I systems in that the R subunit also contains the helicase domain and the reaction is ATP dependent (19, 136). Type IV RM systems are distinct two-subunit complexes that consist of a AAA+ family GTPase and an endonuclease (15, 100). Their mode of action is fundamentally different from the other three types in that they nonspecifically cleave modified phage DNA containing 5-hydroxymethylcytosine or 5-hydroxymethyluracil. In fact, type IV systems are best denoted R, because modification enzymes in this case are not parts of the defense machinery. These modifications protect phage DNA from cleavage by the conventional REases of types I, II, and III, and apparently type IV R systems have evolved on multiple occasions during the bacterial-phage arms race, as a counter-counterdefense mechanism that overcomes such protection (15, 100).

A distinct variation on the RM theme, more recently discovered defense systems function through site-specific DNA backbone modification, namely, phosphorothioation, and cleavage of unmodified DNA. The genes required for modification (dndABCDE; after DNA damage) and restriction (dndFGH) have been identified in several hundred bacterial and archaeal genomes (59). The structures and biochemical activities of the DndA (cysteine desulfurase), DndC (phosphoadenosine phosphosulfate reductase), and DndE (DNA-binding protein with a distinct fold) that are directly involved in phosphorothioation have been thoroughly characterized (26, 61, 178, 184). Additionally, it has been shown that DndB is a transcriptional regulator of the dnd operons (57), but the functions of the other genes associated with this system are less clear (177).

Coevolution of the RM systems with the prokaryotic genomes is an intriguing yet understudied subject. An early genomic analysis has shown that RM recognition sites are avoided in some bacterial genomes at a statistically significant level (48). Recently, however, it has been demonstrated that type II recognition sites are avoided in about half of the bacterial and archaeal genomes whereas sites for other RM types are not avoided (141). Avoidance of type II sites apparently depends on the lifespan of RM systems in the host genome; i.e., it evolves over relatively long periods of coevolution (141). Recent extensive genomic analysis has revealed a strong connection between the abundance of RM modules and MGEs in microbial genomes, suggesting that although RM systems are rarely encoded by plasmids or prophages, they nevertheless disseminate with the help of MGEs (123). Conversely, the abundance of RM modules also strongly positively correlates with the estimated rate of horizontal gene transfer (HGT), suggesting a major role of RM in this process (124). These findings indicate that the roles of RM in prokaryotes go beyond defense functions (41).

Phage Growth Limitation and Bacteriophage Exclusion

An early study has reported a unique bacteriophage-resistant phenotype, denoted Pgl (phage growth limitation), in Streptomyces coelicolor (27). Strains that carried the Pgl locus supported one cycle of phage growth, but the released phage was not infectious (27, 153). The Pgl system consists of the adenine-specific DNA methylase PglX, serine-threonine kinase PglW, predicted P-loop ATPase PglY, and predicted alkaline phosphatase PglZ (186). Comparative analysis of the neighborhoods of the pglZ gene (the hallmark of the Pgl systems found in all its variants but not in other genomic contexts) revealed substantial complexity of genetic organization of this system. Judging by the presence of the pglZ gene, different variants of this system are encoded in about 10% of the analyzed complete genomes that represent most of the major bacterial lineages as well as several methanogenic and halophilic archaea. The unusual properties of the Pgl system prompted the hypothesis that it functions via a reverse RM mechanism, i.e., by methylating the DNA of the phage progeny rather than the host DNA, so that upon reinfection, the surviving cells in the same colony activate the Pgl system and prevent phage growth (27, 153). Recent work has validated the prediction of activities of the Pgl proteins and additionally demonstrated cellular toxicity of PglX that is mitigated by PglZ (60) (see below for discussion of the connections between immunity and toxicity in bacteria and archaea). The new findings appear compatible with the reverse RM mechanism, although direct evidence remains to be garnered.

The recently discovered BREX (bacteriophage exclusion) system shares two genes with the Pgl systems (pglX and pglZ) but appears to employ a different defense strategy (8, 52). The BREX system protects bacteria from even a single-cycle phage burst. The abrogation of the phage DNA replication by BREX involves methylation of the host DNA at nonpalindromic sites, whereby only one strand is methylated, but not cleavage of the phage DNA. The exact mechanism of action remains unknown, but it has been shown that DNA methylation by PglX is essential for protection (52).

Phylogenetic analysis of PglZ identified six major branches in which PglZ is associated with different, although overlapping, sets of additional genes (52). Accordingly, it has been proposed that these defense systems be denoted BREX types 1 to 6. The two most common types are BREX, to be renamed BREX type 1, and the originally discovered Pgl, which will become BREX type 2.

Although key details remain to be elucidated, the BREX/Pgl class of defense mechanisms seems to follow the general principle of discrimination between self and nonself that is characteristic of the RM systems; i.e., discrimination between modified and unmodified DNA. However, downstream of the discrimination step, the mechanisms of RM and BREX/Pgl diverge, as the latter do not cleave the foreign DNA but rather inactivate it via unknown mechanisms.

Argonaute-Centered RNA/DNA-Guided Defense

The Argonaute-centered defense systems represent the branch of innate immunity that relies on guide RNA or DNA molecules for self versus nonself discrimination. This principle of nonself recognition fundamentally differentiates Argonaute-based defense from RM-type systems, which function by recognizing modifications that tag self or, less commonly, nonself DNA. The Argonaute proteins (Ago) have been initially characterized as the key players in eukaryotic RNA interference (RNAi). The first function of Ago characterized at the molecular level was that of a slicer; i.e., the RNase that cleaves the target RNA base paired with a small interfering (si)RNA (96, 97, 150). Shortly thereafter, it was shown that enzymatically inactive members of the Ago family complexed with microRNAs (miRNAs) reversibly suppress the translation of the target mRNAs without cleaving them (64).

Argonautes are large proteins of about 800 to 1,200 amino acids that contain, in addition to the PIWI endonuclease domain (RNase H fold), noncatalytic domains, namely, the PAZ (PIWI-Argonaute-Zwille), MID (middle), and N domains, as well as two linkers, L1 and L2 (21, 22, 129, 130, 157). The MID domain is essential for binding the 5′ end of the guide and is present in all Ago proteins. The PAZ domain, which adopts an oligosaccharide-binding (OB)-fold core typical of diverse nucleic acid–binding proteins, is not essential for guide binding but stabilizes the guide from the 3′ end. The N domain is not required for guide loading but contributes to the dissociation of the second, passenger strand of the loaded double-stranded RNA and to the target cleavage. Only the PIWI and MID domains are present throughout the Ago family, whereas the PAZ and N domains are missing in some family members.

Initially, Argonautes were described as highly conserved eukaryote-specific proteins (14, 161), but within a short time, prokaryotic homologs of eukaryotic Ago (henceforth, pAgo and eAgo, respectively) were discovered in many bacteria and archaea (5). However, the spread of pAgo across the diversity of prokaryotes is limited, with about one-third of the archaeal genomes and about 10% of the bacterial genomes encoding a member of this family (158). The structures of several pAgos have been solved, establishing the identities of the PIWI, PAZ, and MID domains and unexpectedly demonstrating that at least some pAgos preferentially bind guide DNA rather than RNA molecules (170, 185). For several years after the discovery of pAgos, their biological functions remained obscure. However, comparative analysis of the genomic neighborhoods of the pAgo genes has strongly suggested a role in defense (112). Many of the pAgo genes are embedded in defense islands, the regions of bacterial and archaeal genomes that are enriched for genes involved in various defense functions. Thus, it has been proposed that pAgos are DNA- and/or RNA-guided nucleases that recognize and cleave cognate (foreign) nucleic acids. Furthermore, genes encoding pAgo variants with inactivated PIWI domains are often adjacent to genes encoding other nucleases, leading to the hypothesis that these enzymatically inactive pAgos mediate target recognition via the guide DNA or RNA, after which the target is cleaved by the associated active nuclease.

The hypothesis on the defense function of pAgo has been experimentally validated, although the scope of the experiments has been limited. In vitro guide-dependent endonuclease activity has been demonstrated for pAgos from the bacteria Aquifex aeolicus (185) and Thermus thermophilus (144) and the archaea Methanocaldococcus jannaschii (176) and Pyrococcus furiosus (156). All three catalytically active pAgos employ single-stranded (ss)DNA guides but differ in their ability to cleave RNA or DNA. In contrast, the RNA-specific pAgo of the bacterium Rhodobacter sphaeroides, in which the catalytic center of the PIWI domain is disrupted (125), displayed no nuclease activity.

Defense functions have been demonstrated for the pAgo from R. sphaeroides (125) and T. thermophilus (157). The T. thermophilus pAgo restricts plasmid replication by cleaving the plasmid DNA using plasmid-derived small ssDNA guides. The mechanism of the guide generation is not understood in detail, but the involvement of the catalytic residues of the PIWI domain has been demonstrated (157). Thus, pAgo probably first shreds the plasmid DNA in a guide-independent (and, presumably, sequence-independent) manner and then becomes a target-specific nuclease after acquiring the guides. It is unclear what determines self/nonself discrimination at this first stage of the pAgo defense. For the R. sphaeroides pAgo, association with short RNAs that represent much of the bacterial transcriptome has been demonstrated (125). In addition, this pAgo is associated with ssDNA molecules complementary to the small RNAs, and this DNA population is enriched in foreign sequences, those from plasmids as well as mobile elements integrated into the bacterial chromosome. Apparently, in R. sphaeroides, pAgo samples degradation products of the bacterial transcriptome and then, via unknown mechanisms, preferentially generates complementary DNAs for foreign sequences that are used to repress the expression of the cognate elements. Whether or not the function of this catalytically inactive pAgo requires other nucleases remains to be determined. Nevertheless, the presence of pAgo within evolutionarily conserved operons with genes for nucleases and helicases (112, 158) implies complex organization of the pAgo-centered defense systems that remains to be investigated. In particular, such experiments should clarify the mechanisms employed by the pAgo-centered defense systems to generate the guide RNA and DNA molecules and discriminate between self and nonself sequences.

ADAPTIVE IMMUNITY: THE CRISPR-CAS SYSTEM

The CRISPR-Cas system employs a unique defense mechanism that involves incorporation of foreign DNA fragments into CRISPR arrays and subsequent utilization of processed transcripts of these inserts (spacers) as guide RNAs to cleave the cognate genome (54, 69, 83, 104, 114). Thus, CRISPR-Cas is a bona fide adaptive (acquired) immunity system, a function that had not been known to exist in prokaryotes. Moreover, CRISPR-Cas mediates inheritance of acquired characters, i.e., resistance to a virus or plasmid, and hence appears to be the most compelling demonstrated case of Lamarckian evolution (84). Apart from their role in antiviral defense that by now has been demonstrated in numerous experiments, CRISPR-Cas systems gave rise to the new generation of tools for genome editing and regulation. Thanks to this enormous practical utility, CRISPR research has turned into a burgeoning, highly dynamic field of microbiology and biotechnology that is covered, from different angles, in numerous recent reviews (11, 12, 20, 42, 43, 135, 173, 174); here we only briefly outline the functional and architectural diversity and comparative genomics of CRISPR-Cas and discuss likely scenarios for the evolution of the different types of CRISPR-Cas.

CRISPR-Cas systems show remarkable diversity of gene composition, genomic loci organization, and Cas protein sequences (106). Nevertheless, comprehensive comparative analysis has revealed major unifying themes in their evolution. These common trends include multiple, major contributions of MGEs; duplications of cas genes yielding functionally versatile effector complexes; and modular organization, with frequent recombination of the modules (106, 110, 113). The two main modules of the CRISPR-Cas systems comprise the suites of genes encoding proteins involved in adaptation (spacer acquisition) and effector functions, i.e., pre–CRISPR RNA (pre-crRNA) processing, and target recognition and cleavage. Additionally, various proteins involved in ancillary roles such as regulation of the CRISPR response and probably CRISPR-associated programmed cell death, can be assigned to a third, accessory module.

The CRISPR-Cas systems comprise two classes that differ with respect to the composition and complexity of the effector modules: Class 1 systems possess multisubunit effector complexes, whereas the effector modules of class 2 consist of a single, large protein, such as Cas9, Cas12, and Cas13 (106, 147). In contrast to the effector module, the composition of the adaptation module is nearly uniform across the diverse CRISPR-Cas systems. The adaptation module consists of Cas1 and Cas2, although in some CRISPR-Cas variants, additional proteins, such as the effectors themselves (e.g., Cas9) and accessory proteins (e.g., Cas4), interact or even form fusions with Cas1 or Cas2 and are also required for adaptation (2). Cas1 is the active integrase that catalyzes the protospacer excision from the target DNA and insertion into the CRISPR array, whereas Cas2 forms the structural scaffold of the adaptation complex (120, 121).

Comparative genomic analysis has revealed the likely ancestry of Cas1. Examination of the genomic context of cas1 homologs that are not associated with CRISPR-cas loci led to the discovery of a novel superfamily of self-synthesizing transposons, the casposons, so named because the Cas1 homolog they encode was predicted to function as the transposase (integrase) (91, 92). The integrase activity of the casposon-encoded Cas1 (dubbed the casposase) subsequently has been validated experimentally (58), and similar target site specificities of casposon integration and CRISPR spacer incorporation have been demonstrated (9). Although the currently identified casposons do not encode Cas2, some encode Cas4 and additional nucleases (92). It seems likely that the entire adaptation module and perhaps even additional Cas proteins have been contributed to the emerging CRISPR-Cas system by the ancestral casposon (82). Furthermore, the prototype CRISPR repeats and the leader sequence could have originated from a duplicated target site of the casposon (90). The origin of the CRISPR-Cas adaptation module from the integration machinery of a transposon indicates that the adaptive immunity systems in prokaryotes and eukaryotes evolved along parallel trajectories, through recruitment of unrelated MGEs (81).

The ancestry of the effector module is far less clear. Given that class 1 CRISPR-Cas systems are almost universally present in archaea and common in bacteria, whereas class 2 systems are an order of magnitude less abundant, the multisubunit effector complexes of class 1 are the most likely ancestral form (146). Despite the high diversity of Cas proteins, the core subunits of the class 1 effector complexes largely consist of multiple variants of the same domain, the RNA recognition motif (RRM) (110). Some of the RRM domains possess nuclease activity, whereas others are nonenzymatic RNA-binding proteins. This construction of the effector complexes from ultimately homologous, even if highly diverged, building blocks implies evolution by gene duplication, with subsequent extensive diversification driven by the host-parasite arms race. Conceivably, the ultimate ancestor of the core Cas proteins could have been an RRM domain–containing nuclease, such as Cas10, that gave rise to the extant multitude of active and inactivated versions (110). Subsequent evolution of the CRISPR-Cas systems also involved recruitment of additional proteins, such as the helicase-nuclease Cas3 in the type I systems. What was the function of the original effector CRISPR-Cas module, before the fusion with the adaptation module, supposedly brought about by a casposon? The previously proposed possibility is that the effector module evolved from an ancestral innate immunity system that acquired the adaptation capability following the integration of a casposon next to the innate immunity locus (82). So far, however, innate immunity systems homologous to CRISPR-Cas effector complexes have not been identified. Therefore, an alternative scenario would derive the class 1 effector module from within the ancestral casposon, which in this case would be postulated to have encoded at least one RRM domain–containing protein endowed with nuclease activity.

The provenance of class 2 effector modules has been established with much greater confidence (146, 147). The type II and type V effectors (Cas9 and Cas12, respectively) appear to derive from the extremely abundant but poorly characterized transposon genes known as tnpB, which encode nucleases that belong to the RuvC-like family of RNase H fold nucleases. The role of TnpB in the transposon life cycle is unclear, as it is not required for transposition (131). In the type II and type V-A effectors, this nuclease cleaves the nontarget DNA strand, whereas the target strand (complementary to the crRNA) is cleaved by an additional nuclease, the identity of which differs between Cas9 and Cas12a (34, 181). Remarkably, in the case of the type V-B effectors, both target and nontarget DNA strands are cleaved by the RuvC-like endonuclease domain, which undergoes a major conformational change triggered by the initial, nontarget strand cleavage (98, 182). Notably, the effector nucleases of type V-A and type VI are also responsible for the processing of pre-crRNA yielding mature crRNA guides; the catalytic domains and sites responsible for crRNA maturation remain poorly characterized but are clearly distinct from those involved in target cleavage (37, 44, 98). These findings are in sharp contrast to processing in type II, which involves RNase III, a ubiquitous bacterial enzyme (24), and emphasize the striking diversity of CRISPR-Cas molecular mechanisms.

For the type II CRISPR-Cas effectors, Cas9, a distinct family of TnpB homologs, denoted IscB, has been identified as the direct ancestor, as indicated by the high level of sequence similarity and the presence of an HNH nuclease domain inserted into the RuvC-like domain (71). For the type V effectors, the direct ancestors are difficult to identify, but different subfamilies of TnpB appear to have given rise to different subtypes, as indicated by sequence similarity and phylogenetic analysis (146, 147). The type VI effectors, Cas13, are unrelated to those in other CRISPR-Cas types and contain two HEPN (higher eukaryotes and prokaryotes nucleotide-binding) domains, which cleave RNA targets (1, 3, 149). As with type V effectors, the exact ancestors of these proteins are difficult to pinpoint, but it appears likely that either HEPN domain–containing Cas proteins of class 1 CRISPR-Cas systems, such as Csx6 and Csn1, or a distinct HEPN domain–containing toxin could be implicated (147). The current evolutionary scenario posits that class 2 CRISPR-Cas systems evolved when mobile elements encoding ancestors of class 2 effectors, most commonly TnpB nucleases, integrated near orphan CRISPR arrays or displaced class 1 effector operons (147). Type II, type V, and type VI systems (and most likely different types and subtypes of type V) evolved independently on several occasions, as indicated by their distinct evolutionary affinities with different groups of TnpB or HEPN domain–containing proteins (147). Thus, the evolutionary history of class 2 systems is largely a story of the second major contribution (after the casposons) of mobile elements to the evolution of CRISPR-Cas adaptive immunity.

CRISPR-Cas systems are present in nearly all archaeal genomes but only in about 30–40% of bacterial genomes (18, 106). The fate of CRISPR-Cas systems, i.e., maintenance or loss from a microbial population, appears to be determined by the balance between the benefits of adaptive immunity for efficient host defense against viruses and other MGEs, and the cost incurred by these systems that is thought to be associated with both autoimmunity and abrogation of HGT by CRISPR-Cas (86). Mathematical models of virus-host coevolution suggest that this balance hinges on the diversity of viruses encountered by a host population, with the greatest benefit of adaptive immunity associated with moderate diversity (171) as well as the host population size, because CRISPR-Cas systems are predicted to be more efficacious in smaller populations (67). It appears likely that both conditions are met in microbial communities that thrive under extreme (in particular, hyperthermal) conditions, resulting in the ubiquitous presence of CRISPR-Cas in archaeal hyperthermophiles (67, 171). However, definitive study of virus-host coevolution requires detailed analysis of the dynamics of both populations in nature.

DORMANCY INDUCTION AND PROGRAMMED CELL DEATH: TOXINS-ANTITOXINS AND ABORTIVE INFECTION

Prokaryotic TA systems were originally described as addictive modules that are carried by plasmids and ensure their persistence in microbial cell lineages (49, 50). TA systems currently are partitioned into six types, with type I and type II systems the most abundant; the latter are by far the best characterized (23). The toxin component of all TA systems is a protein that kills cells if expressed above a certain level; the antitoxin reversibly inactivates the toxin and/or regulates its expression, thereby preventing cell killing (127). In type I and type III TA systems, the antitoxin is a small RNA that downregulates expression of the respective toxin gene (16), whereas in type II TA systems, the antitoxin is a protein that forms a complex with the toxin, in which the toxin is reversibly inactivated. Unlike the toxins, the antitoxins typically are metabolically unstable; unless the antitoxin is continuously expressed, the free toxin can accumulate in amounts sufficient to kill a cell (50, 51, 55, 168). After division, a daughter cell that fails to receive a copy of a plasmid carrying a TA gene module will die because the antitoxin will be depleted before the toxin (115). Given that tight coregulation of the toxin and antitoxin resulting in stoichiometric production of the two components of the TA systems is required for cell survival, these systems most often form two-gene operons; typically, the toxins (and in type II TA systems, the antitoxins as well) are small, highly compact proteins, apparently of the minimum size required to carry out the respective biochemical activities (108) (see below).

Analysis of multiple bacterial and archaeal genomes has shown that numerous TA systems are encoded not only on plasmids but also on chromosomes (50, 159). This surprising finding triggered a lively debate on potential cellular functions of the chromosomal TA systems and prompted comparative genomic and experimental studies that resulted in the discovery of numerous new TA systems. These discoveries and the current understanding of the biological roles of TA systems are summarized in several recent reviews (49, 55, 159, 180). Although the biology of the TA systems is far from fully understood, the prevailing view is that they provide a mechanism for cell persistence to cope with various stress conditions, in particular virus infection (128, 167, 168). A more contentious issue is whether or how often the TA systems cause programmed cell death, or altruistic cell suicide (see below).

The type I toxins are small membrane proteins that permeabilize cell membranes (35, 46). In contrast, the majority of type II toxins target different components of the translation systems, especially mRNA that is cleaved by toxin nucleases, known as interferases (29, 179). However, other targets of type II toxins have been identified as well, such as DNA gyrase (30) and the cell division GTPase FtsZ (162). Because type I toxins have never been implicated in virus resistance or other defense functions and are not frequently observed in defense islands, we do not discuss them further in this review. Instead, we focus on type II TA systems and discuss the results of the recent efforts to identify new TA families using comparative genomic approaches.

Three computational approaches for prediction of new TA systems have been developed: (a) guilt by association, that is, prediction of new toxins or antitoxins based on linkage, in bacterial and archaeal genomes, to genes that belong to known antitoxin or toxin families (95, 108); (b) identification of gene pairs with characteristic features of TA systems, such as tight linkage of genes encoding small proteins, propensity for HGT, and presence on plasmids or within genomic islands with other defense genes (108, 111); and (c) statistical analysis of whole-genome sequencing clones aimed at identification of genes that are unclonable (toxic) in E. coli (74).

Recent comprehensive studies revealed numerous genes that are unclonable in E. coli but do not meet the definition of TA systems, including many metabolic enzymes and informational genes, such as ribosomal proteins (74, 142). Although not all of these genes form two-gene operons that are typical of TA systems, these findings imply that dosage imbalance or toxicity of an intermediate metabolite can result in apparent toxicity of a gene that can be offset by tight regulation or coexpression of an enzyme utilizing the toxic product, thus masquerading as a TA system. Thus, gene toxicity is a phenomenon that extends beyond the typical TA systems, so that identification of the latter requires additional experimentation and/or analysis of the domain architectures of the respective proteins. Conversely, searches of genome sequences for novel TA systems identified numerous stand-alone homologs of toxins (108, 109). Such solo toxins might belong to still uncharacterized TA systems in which the toxin and the antitoxin are tightly coregulated despite the lack of the typical operon organization, or are tightly regulated to minimize expression, or perhaps lack toxicity owing to some subtle structural changes.

Systematic computational and experimental search for TA systems resulted in the identification of many abundant but experimentally uncharacterized variants (74, 108, 109, 142). One of these consists of a pair of genes that are among the most common genes in hyperthermophilic archaea and encode a HEPN domain–containing protein and a small, so-called minimal nucleotidyltransferase (MNT) (4). The HEPN domain–containing protein is predicted to function as an RNA-cleaving toxin, whereas the MNT is the predicted antitoxin (108). The HEPN-MNT module shows all the typical features of TA systems (108), and the toxic effect of the HEPN protein along with the antitoxin activity of the MNT indeed have been experimentally demonstrated in the bacterium Shewanella oneidensis (183). Nevertheless, the molecular mechanism of this system—particularly, the role of the nucleotidyltransferase activity of the antitoxin—remains unclear.

ABI involves a distinct variety of TA systems. The ABI mechanisms abrogate virus infection at different stages, often by causing death of infected cells and thus precluding virus spread (28, 94). Some of the ABI systems are two-component modules that display the typical features of TA systems, in particular those of type III systems. As with other TA systems, although numerous ABI systems have been identified by genetic approaches, molecular mechanisms for only a small minority of these are understood (94). Many of the ABI systems share domains with TA systems, and the HEPN domain in particular is among the most prominent (109). These ABI systems are predicted to function as TA targeting translation. Notably, several other, membrane-associated, ABI systems cause membrane leakage similar to type I TA systems (33, 36). Several ABI systems, including AbiU1, AbiL, and AbiR, are often associated and might interact with RM modules (109, 111). Another frequent component of ABI systems is reverse transcriptase, a hallmark protein of MGEs. In the case of ABI, the reverse transcriptase catalyzes nontemplated synthesis of random sequence DNA that remains covalently attached to the protein and contributes to abortive infection (169).

In summary, TA and related ABI modules are among the most common and versatile defense systems in bacteria and archaea. These modules are often carried by plasmids and themselves represent a distinct type of mobile element (see below). TA systems are nearly ubiquitous in bacteria and archaea, but they have not been detected in most bacterial endosymbionts or, among archaea, in Thermoplasmatales; several methanotrophs with small genomes; and the only known group of archaeal symbionts, the Nanoarchaeota (95, 108). The distribution of TA systems across phyla is clearly nonuniform, with many systems significantly over- or underrepresented in various taxa (95, 108) (Table 1). Unlike other defense systems, the occurrence of TA systems in genomes scales superlinearly with genome size (Figure 2), possibly reflecting the comparatively high rate of plasmid assimilation by large genomes. Genomic occurrence of TA systems shows exceptional variability even in closely related genomes, presumably owing to plasmid-mediated transfer (95, 108). Because of the combinatorial reassortment of toxins and antitoxins, the TA systems form a strongly connected network in which the main hubs, i.e., common domains linked to many partners, are the RNase toxins PIN and RelE and antitoxins containing the ribbon-helix-helix (RHH) and helix-turn-helix (HTH) DNA-binding domains (109). Genome analysis has led to the identification of numerous stand-alone toxin and antitoxin genes that account for over 50% of the genes in the largest families. These findings suggest that in many cases, the interaction between toxins and antitoxins occurs in trans and the required tight coregulation of the respective genes is secured by still unknown mechanisms (95, 108).

Virtually all currently characterized ABI systems come from one group of model organisms, the lactococci (28). In all likelihood, this is only the tip of the proverbial iceberg, whereas the true diversity of this type of defense module in bacteria and archaea remains to be revealed. Indeed, analysis of defense islands leads to the identification of numerous uncharacterized gene families that could be new ABI-like defense systems (111).

Genome-wide analysis of defense systems reveals notable general trends. Reconstruction of gene loss and gain history in many groups of closely related bacterial and archaeal genomes has shown that, after the MGEs, the defense genes are the most evolutionarily dynamic functional class of genes (133, 134). Both gain and loss rates of defense genes are significantly higher than the respective mean rates across all gene categories, although they generally follow the overall patterns of microbial gene dynamics. Thus, gene loss rates are typically two to three times greater than gain rates, most likely because genes are lost in a clock-like manner, whereas gene gain appears to occur in spurts. There are, however, notable exceptions, such as bacteria in the genus Shewanella that appear to be actively gaining defense systems (134). A detailed analysis of individual cases of gain and loss of defense systems has shown that such events are often associated with and possibly mediated by concomitant loss or gain of MGEs (134).

A second, perhaps related major trend in the evolution of defense systems is their frequent clustering in distinct genomic regions that have been denoted defense islands, by analogy with pathogenicity and symbiotic islands (Figures 4, 5) (111). The trend for island formation has been found to be statistically highly significant for all classes of defense systems, with the exception of CRISPR-Cas. The majority of the defense islands are small and only include a few genes, but some reach over 100 (Figure 4). Many islands combine diverse defense systems, such as various RM and TA modules, and might also include genes involved in novel mechanisms, as illustrated in Figure 5 by the gene pair encoding a predicted ATPase and an HNH family nuclease.

Figure 4.

Figure 4

Length distributions of defense islands in bacterial and archaeal genomes.

Figure 5.

Figure 5

Defense islands as a potential source of novel defense systems. For each island, the genome name, the respective nucleotide sequence ID, and genomic coordinates are provided. Block arrows indicate the direction of tAranscription, roughly to scale. PIN, HNH, and PD-D/ExK are distinct nuclease families. Abbreviations: HTH, helix-turn-helix; HTH-MP, HTH domain fused to Zn-dependent metalloprotease.

The clustering of defense genes most likely reflects two evolutionary factors that are quite different in character but have similar effects (109). The first is a garbage pile effect, whereby defense genes and MGEs are often attracted to the same regions in genomes simply because these genes are nonessential, such that gain or loss of genetic material there is unlikely to be strongly deleterious. However, a different cause behind the emergence and especially persistence of defense islands is likely to be selection for colocalization of functionally interacting defense systems, as discussed in the next section. Regardless of the relative contributions of different evolutionary forces, the wide spread of defense islands in microbial genomes creates potential for prediction of novel defense systems via the guilt by association principle.

INTERACTIONS BETWEEN DEFENSE STRATEGIES: COUPLING BETWEEN IMMUNITY AND DORMANCY INDUCTION/PROGRAMMED CELL DEATH

The genomic loci encoding immune systems often also include dedicated programmed cell death modules, such as TA modules, and some proteins are shared by the two types of defense systems. The CRISPR-Cas systems, the most complex form of prokaryotic defense, present the most remarkable cases. One of the key proteins in the first, adaptation, phase of the CRISPR response, Cas2, is homologous to the toxins of the VapD family of mRNA interferases (103, 104). The primary role of Cas2 in CRISPR-Cas is that of a structural scaffold of the adaptation complex, in which Cas1 is the endonuclease component (2, 120, 121, 152). The interferase catalytic site is conserved in some, though not all, Cas2 proteins but is not required for adaptation (120). Thus, at least in some CRISPR-Cas systems, Cas2 might play a secondary role as an RNase, possibly a toxin (101). Indeed, non-sequence-specific nuclease activity of several Cas2 proteins against both DNA and RNA, but typically with a preference for RNA substrates, has been demonstrated although catalytically active Cas2 proteins do not appear to be toxic when overexpressed in E. coli (10, 32, 53, 70, 118). The conservation of the catalytic site of Cas2 implies that the RNase activity of this protein is functional in a subset of CRISPR-Cas systems, and interferase activity remains a distinct possibility.

Many CRISPR-Cas systems also encompass additional nucleases, in particular (predicted) RNases of the HEPN superfamily (3, 102). The RNase activity of two of these proteins, Csm6 and Csx1, has been demonstrated experimentally (39, 119, 145). Most of the HEPN-containing Cas proteins additionally contain the CARF (CRISPR-associated Rossmann fold) that is predicted to bind ligands, most likely nucleotides, and perform signaling functions (102). Notably, the Csm6 protein, which consists of a CARF and a HEPN domain, is not required for the type III-B CRISPR-Cas interference (39), suggesting a different, accessory function for this protein. As pointed out above, the HEPN domain–containing RNases are the most abundant among the toxin components of TA modules in archaea and are common in bacteria as well (3, 108). Thus, the HEPN domain–containing Cas proteins also might possess toxin activity that could be masked by another domain of the same protein or by a distinct Cas protein serving as the antitoxin. In some CRISPR-Cas systems, the CARF domain is fused to predicted nucleases not containing the HEPN domain, in particular, Cas4 homologs (102). An intriguing possibility is that these diverse CARF-linked nucleases are toxins regulated by the CARF domain through ligand binding.

CRISPR-associated toxin activity has been experimentally demonstrated for the Csa5 protein of the type I-A CRISPR-Cas system from the archaeon Sulfolobus solfataricus. Infection of S. solfataricus with the SIRV2 virus induced expression of Csa5 to the toxic level and resulted in cell death, suggesting that the toxicity of this protein represents a programmed cell death response to virus infection (56). The Csa5 protein is the _α_-helical small subunit of the Cascade CRISPR RNA-processing complex of type I-A and appears to lack nuclease activity (31). Thus, as noted above for the TA modules in general, CRISPR-associated toxicity might involve different, still uncharacterized mechanisms.

Apart from the CRISPR-Cas systems, genomic analysis demonstrates association of TA modules with innate immunity (in particular, RM) loci (101, 109) (Figure 5). These observations have prompted the hypothesis on functional coupling between immunity and programmed cell death/dormancy (101). Two versions of such coupling were considered. First, programmed cell death can be viewed as the strategy of last resort, to which an infected cell turns when it senses the impending failure of defense systems to stop virus reproduction. An alternative, albeit arguably less plausible, scenario is that, faced with intense virus reproduction, the immune system turns on the dormancy induction machinery, not only protecting the surrounding cells but also giving the infected cell itself a chance to recover once the virus clears. The two strategies might merge, considering that there is never a guarantee that a cell will reemerge from dormancy. The common presence of genes encoding CARF fused with diverse nucleases in CRISPR-cas loci (102) (Figure 2) implies that the CARF domain functions as a sensor of the imminent defeat of the immune system in the battle against the virus, in response to a hypothetical alarmone (most likely, a nucleotide derivative) that remains to be identified (88). Although the immunity-suicide coupling hypothesis was conjured largely on indirect evidence, an experimentally well-characterized case of such coupling is presented by a bacterial antiphage system which includes two HEPN domain–containing RNases, RloC and PrrC, that exert a toxic effect through tRNA cleavage (13, 73, 75, 165).

The recent comprehensive search for genomic loci that encode large proteins containing putative nuclease domains that could function as class 2 CRISPR-Cas effectors has revealed, arguably, the most direct links between microbial immunity and programmed cell death so far discovered (1, 146, 147, 149). Type VI effector proteins contain two HEPN domains predicted to possess RNase activity (146, 147). An RNase activity that requires both HEPN domains indeed has been demonstrated for the type VI-A and VI-B effectors (Cas13a, Cas13b) (1, 149). The type VI effectors provide efficient protection against the RNA bacteriophage MS2, but in addition, when primed with a cognate RNA they turn into a promiscuous RNase that cleaves any RNA molecules present in the reaction mix with little sequence specificity (1, 149). A significant decrease in bacterial viability was observed when Cas13a was coexpressed with the cognate RNA, most likely indicative of dormancy induction (1). Given that RNA bacteriophages are minor components of the bacterial virosphere (80), the principal function of type VI CRISPR-Cas systems is most likely defense against DNA phages through the toxic effect triggered by the recognition of a cognate phage transcript and resulting in dormancy or programmed cell death.

Whether the cell that turns on the self-afflicting program kills itself or goes into dormancy, the decision it faces is the same: The cell has to employ a built-in sensor to predict the outcome of infection and act accordingly (88). If the sensor module predicts that the viral onslaught is manageable, the immune system is mobilized to full capacity. In contrast, a dire prognosis activates self-destruction. The sensors and the signals they register differ between defense systems. The antiphage HEPN-containing RNases RloC and PrrC sense cell damage directly, through double-stranded DNA break, and the concentration of dTTP, which serves as an alarmone during phage infection (75, 76, 93). The ligand(s) sensed by the CARF domain in the case of CRISPR-Cas systems remain to be identified, but it is highly likely that CARF domains (102) function as toggles between the immune and self-afflicting responses. The nature and modes of the switching signals, their threshold values and what determines these, and whether these features specifically depend on the character of virus-host interaction are all intriguing directions for further study.

Type VI CRISPR-Cas systems are a special and so far the most obvious case of immunity-suicide coupling. These systems appear to short-circuit the typical defense relay by effectively skipping the damage-sensing step and employing the main immune effector as the suicide weapon as well (Figure 3, Table 1). For the Cas13 proteins to switch to the promiscuous RNA cleavage mode, the only signal required seems to be the target recognition (1, 149). Still, sensing the target RNA concentration, which would reflect multiplicity of infection and/or the expression rate of the virus genome, by the Cas13 proteins themselves could play a role even in this case. Conceivably, more elaborate defense strategies that involve dedicated sensors, such as class 1 CRISPR-Cas, outcompete the simple ones where the self-destruction program is activated at the first alarm signal, perhaps explaining the relative rarity of type VI systems among bacteria (147).

MOBILE GENETIC ELEMENTS AND DEFENSE SYSTEMS IN BACTERIA AND ARCHAEA

A major theme emerging in the studies on the evolution of defense systems is their close link to MGEs. This connection might at first seem paradoxical in that defense machineries share components and have similar properties with the very elements against which they defend the host. However, there seems to be a clear evolutionary logic behind these relationships. First, major classes of bacterial and archaeal defense systems, in particular RM and TA modules, themselves possess properties of MGEs (77, 116, 117). These defense systems do not encode proteins involved in their own replication or transposition and therefore do not qualify as aggressive MGEs, such as viruses or transposons. Nevertheless, the extensive horizontal mobility of these modules is immediately apparent from phylogenetic analysis. Furthermore, the RM and especially TA modules often reside on plasmids and thus employ a piggyback mode of dissemination. Most importantly, although these defense modules lack means for active replication and/or transposition, they do possess the addiction mechanisms that ensure their own persistence (66, 122).

Such mechanisms are outlined above and involve cell killing by the toxin when a cell does not receive a copy or in other situations when the antitoxin gene is lost or inactivated. Given the tight coregulation of the toxin and antitoxin genes, along with the specific properties of the proteins themselves, such as the instability of type II antitoxins, it appears clear that addiction mechanisms are adaptations for a distinct type of persistence strategy, and accordingly, these defense systems qualify as selfish MGEs. From this perspective, the capacity of the RM and TA modules to kill virus-infected cells can be viewed as a kin selection strategy that rescues neighboring host cells in the population that are likely to carry the same or a closely related MGE. The host defense, then, comes across as a by-product of MGE evolution. Importantly, although RM systems are traditionally discussed in the context of defense whereas TA systems are more often viewed as MGEs, the mechanisms and lifestyles of the two classes of elements are very similar, to the extent that they can be justifiably analyzed within the same toxin-antitoxin framework (88, 115).

The second aspect of the defense-MGE connection is the recurrent recruitment of MGEs, or more typically, individual MGE genes, for defense functions. As discussed above, CRISPR-Cas systems are the best case in point. Casposons apparently gave rise to the adaptation module, whereas TnpB-encoding transposons are the ancestors of the effector genes of most of class 2 systems (82, 90, 147). Additionally, some type III CRISPR-Cas loci have recruited reverse transcriptases, most likely from group II introns, providing for spacer acquisition by reverse transcription of RNA molecules (148). The recruitment of MGEs by defense systems seems to stem from two factors. First, the very mobility of these elements, combined with the existence of defense islands that are usually tolerant to insertion of genetic material, facilitates utilization of MGEs. Second, enzymes that are central to the lifestyle of MGEs, such as transposases and various other nucleases, are readily utilizable for defense function. Thus, such enzymes serve as “guns for hire” that are alternately employed for offense, defense, and counterdefense, depending on the highest bid (81). Indeed, plasmid-encoded TA modules are effectively counterdefense devices that prevent the host population from purging the plasmid, and counterdefense functions of CRISPR-Cas systems recruited by phages have been demonstrated as well (143).

CONCLUDING REMARKS

Defense mechanisms in bacteria, in particular RM systems, have been studied for decades. However, the expanding comparative genomic analysis followed by intensive experimental testing of predictions has led to both quantitative and qualitative expansion of the prokaryotic defense repertoire. It has become clear that, in addition to systems that function via protein–nucleic acid recognition, such as RM and TA systems, bacteria and archaea widely employ small guide RNAs and DNAs, as demonstrated by the discovery of pAgo-based and CRISPR-based immune mechanisms (79).

Discovery of the CRISPR-Cas mechanism led to the fundamental realization that adaptive (acquired) immunity is not a prerogative of “higher” animals, such as vertebrates, but rather an ancient mode of defense that is widespread in prokaryotes. Moreover, CRISPR-Cas presents the best known case for heritable acquired immunity that appears to qualify as a Lamarckian mode of evolution (84, 87). Thus, it has been speculated that all organisms possess some forms of both innate and acquired immunity (137).

Certain general principles of defense evolution are becoming apparent. The defense systems can be naturally partitioned on a plane bounded by the principle of target recognition on one axis and the existence of immune memory (innate versus acquired immunity) on the other (Table 2). An intriguing question, then, is whether acquired immunity systems based on protein–nucleic acid or protein-protein recognition remain to be discovered. Taken together, the findings discussed here show that the versatility of prokaryotic defense mechanisms matches that in eukaryotes. The analysis of defense islands, and more generally, of the dark matter in bacterial and archaeal genomes (8, 107), has led to the identification of many gene clusters, some of which contain proteins or domains that likely participate in defense. However, the mechanisms of such defense cannot be easily predicted by analogy with the well-characterized defense systems, suggesting that novel principles or at least major variations on known ones remain to be discovered.

Table 2.

The prokaryotic defense space

Interaction defining specificity
RNA/DNA–DNA/RNA Protein–RNA/DNA Protein–protein
Innate immunity pAgo-centered defense RM Receptor escape
Adaptive immunity CRISPR-Cas ? Receptor phase variation?
Dormancy/PCD ? TA/ABI TA with protease, kinase toxins

The defense strategy adopted by infected cells hinges on a life-or-death decision that is made on the basis of information gathered and processed by dedicated sensors. Depending on the damage level measured by such sensors, either the immunity program or dormancy/programmed cell death is turned on (88). Admittedly, this perspective on antivirus defense is speculative, and the study of damage sensors and the signals they recognize, in particular alarmones, can be expected to develop into a rich research program.

Another major theme in the evolution of defense is the close and multifaceted relationship between defense systems and MGEs. Some of them, such as RM and TA systems, possess defining features of MGEs, whereas others, such as CRISPR-Cas, evolved via recruitment of MGEs. The notion of guns for hire that can be employed for either offense (in MGEs) or defense (81) implies that additional cases of such recruitment exist and merit systematic investigation.

Finally, the very nature of the enzymatic activities (such as those of transposases and other sequence-specific nucleases) employed by both MGEs and defense systems makes them top candidates for the development of genome editing tools. Two generations of such tools, based first on RM and then on CRISPR-Cas, have sequentially transformed laboratory practice. Further study of microbial defense systems might eventually lead to new breakthroughs in biotechnology.

SUMMARY POINTS.

  1. Bacteria and archaea are engaged in an incessant arms race with parasitic genetic elements, and nearly all encode multiple defense systems.
  2. Prokaryotic defense systems can be classified as innate immunity, adaptive immunity, and dormancy or programmed cell death induction.
  3. Defense systems are often encoded in genomic islands.
  4. Some defense systems show features of mobile genetic elements, whereas others recruit genes from mobile elements for defense functions.
  5. There seems to be a coupling between immunity and dormancy/programmed cell death, so that a cell chooses one of the two defense strategies by sensing the degree of DNA damage caused by infection.

Acknowledgments

The authors thank Sergey Shmakov for expert technical assistance. The authors’ research is supported by intramural funds of the US Department of Health and Human Services (to National Library of Medicine).

Footnotes

*

This is a work of the U.S. Government and is not subject to copyright protection in the United States.

DISCLOSURE STATEMENT

The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.

LITERATURE CITED