Regulation of genetic flux between bacteria by restriction-modification systems - PubMed (original) (raw)

Regulation of genetic flux between bacteria by restriction-modification systems

Pedro H Oliveira et al. Proc Natl Acad Sci U S A. 2016.

Abstract

Restriction-modification (R-M) systems are often regarded as bacteria's innate immune systems, protecting cells from infection by mobile genetic elements (MGEs). Their diversification has been recently associated with the emergence of particularly virulent lineages. However, we have previously found more R-M systems in genomes carrying more MGEs. Furthermore, it has been suggested that R-M systems might favor genetic transfer by producing recombinogenic double-stranded DNA ends. To test whether R-M systems favor or disfavor genetic exchanges, we analyzed their frequency with respect to the inferred events of homologous recombination and horizontal gene transfer within 79 bacterial species. Genetic exchanges were more frequent in bacteria with larger genomes and in those encoding more R-M systems. We created a recognition target motif predictor for Type II R-M systems that identifies genomes encoding systems with similar restriction sites. We found more genetic exchanges between these genomes, independently of their evolutionary distance. Our results reconcile previous studies by showing that R-M systems are more abundant in promiscuous species, wherein they establish preferential paths of genetic exchange within and between lineages with cognate R-M systems. Because the repertoire and/or specificity of R-M systems in bacterial lineages vary quickly, the preferential fluxes of genetic transfer within species are expected to constantly change, producing time-dependent networks of gene transfer.

Keywords: bacterial evolution; homologous recombination; horizontal gene transfer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Analysis of HR and HGT events. (A) 16S rRNA phylogenetic tree of the 79 bacterial species. The tree was drawn using the iTOL server (

itol.embl.de/index.shtml

) (40). The innermost circle layer indicates the species and associated clade. The six subsequent layers correspond (in an outwardly direction) to the average number of HGT events per genome computed using Count; the number of recombined genes per genome given by NSS, MaxChi, and PHI; and the number of recombination events per genome given by Geneconv and CFML (outermost layer), respectively. These values are given in Dataset S1. (B) Distribution of the average number of horizontal gene transfer (HGT) events and homologous recombination (HR) events (inferred by Geneconv) per clade according to genome size (_G_S). Spearman’s ρHGT = 0.65, P HGT < 10−4; Spearman’s ρGeneconv = 0.32, _P_Geneconv < 10−2. Data obtained with the remaining recombination inference tools are shown in Fig. S1.

Fig. S1.

Fig. S1.

Association between genetic flux and genome size. Distribution of the average homologous recombination (HR) events per clade computed using NSS (A), MaxChi (B), PHI (C), and CFML (D) in function of genome size (G_S, given in megabases). Similarly to what was obtained with Geneconv (Fig. 1_B), we can observe positive associations between HR + 1 and _G_S (Spearman's ρNSS = 0.42, _P_NSS = 10−4; Spearman's ρMaxChi = 0.48, _P_MaxChi < 10−4; Spearman's ρPHI = 0.40, _P_PHI < 10−3; Spearman's ρCFML = 0.48, _P_CFML < 10−4).

Fig. 2.

Fig. 2.

Association between gene transfer and R-M systems. Distribution of the average HGT events (A) and homologous recombination (HR) events inferred by Geneconv (C) per clade according to the total number of R-M systems. Spearman's ρHGT = 0.43, Spearman's ρGeneconv = 0.62; both P < 10−4. Distribution of the average HGT (B) and Geneconv HR events (D) per clade according to the presence (Yes)/absence (No) of Type II R-M systems (both P < 10−4; Mann–Whitney–Wilcoxon test). We obtained similar qualitative results with the remaining recombination inference tools (Fig. S2).

Fig. S2.

Fig. S2.

Association between gene transfer and R-M systems. Distribution of the average HR per clade computed using NSS (A), MaxChi (B), PHI (C), and CFML (D) in function of the total number of R-M systems. Positive associations were observed in all cases (Spearman's ρNSS = 0.50, Spearman's ρMaxChi = 0.55, Spearman's ρPHI = 0.53, Spearman's ρCFML = 0.60; all P < 10−4). Also shown are the average HR per clade computed using NSS (E), MaxChi (F), PHI (G), and CFML (H) in function of the presence (Yes) or absence (No) of Type II R-M systems (all P < 10−4; Mann–Whitney–Wilcoxon test).

Fig. 3.

Fig. 3.

Relation between target specificity and protein similarity in R-M components. Percentage of equal target motifs recognized by Types I, II, and III MTases (A) and REases (B) according to their pairwise protein sequence similarity. (C) Plot of all pairwise similarities of Type II MTases versus the cognate Type II REases of the REBASE gold standard. Blue dots correspond to equal target motifs, red dots to unequal target motifs, and green dots to nested motifs. The dashed horizontal and vertical lines indicate the threshold similarity limits for MTases and REases. (D) The same dataset was used to plot the corresponding receiver operating characteristic (ROC) curves. These curves depict the Sensitivity (true-positive rate) versus 1-Specificity (false-positive rate) for several values of percentage similarity of Type II MTases and REases. We selected the cutoff values of similarity that maximized the true-positive rate and minimized the false-positive rate. Details on the number of R-M proteins of each type can be found in Table S2. ROC data including curve-fitting equations can be found in Table S3.

Fig. S3.

Fig. S3.

Assessing the robustness of the recognition target motif predictor. (A) Percentage of equal target motifs recognized by Type I Specificity domains and Type III TRDs according to their pairwise sequence similarity. (B–E) Reanalysis of the data from Fig. 3 including PacBio data. Percentage of equal target motifs recognized by Types I, II, and III MTases (B) and REases (C) according to their pairwise protein sequence similarity. (D) Plot of all pairwise similarities of Type II MTases versus the cognate Type II REases of the gold standard of REBASE. Blue dots correspond to equal target motifs, red dots to unequal target motifs, and green dots to nested motifs. The dashed horizontal and vertical lines indicate the threshold similarity limits for MTases and REases. (E) The same dataset was used to plot the corresponding ROC curves. These curves depict the Sensitivity (true-positive rate) versus 1-Specificity (false-positive rate) for several values of percentage similarity of Type II MTases and REases.

Fig. S4.

Fig. S4.

Association between gene transfer and R-M systems excluding all Type II R-M systems (IIC included). Distribution of the average HGT events per clade computed with Count (A) and homologous recombination (HR) events per clade computed with NSS (B), MaxChi (C), PHI (D), Geneconv (E), and CFML (F) according to the total number of Type I, III R-M systems and Type IV REases. Positive associations were observed in all cases (Spearman's ρHGT = 0.41, Spearman's ρNSS = 0.43, Spearman's ρMaxChi = 0.51, Spearman's ρPHI = 0.46, Spearman's ρCFML = 0.54; all P < 10−4 with the exception of HGT for which P < 10−3).

Fig. S5.

Fig. S5.

Analysis of the rate of turnover of R-M systems in the clades and how that relates to the length of the tips of the tree. (A) Schema of the analysis. We calculated the frequency of Type II R-M systems shared by the genomes of two taxa (R). For this, we computed the number of systems in the genomes, while grouping together in a family those that are part of the same family of the pangenome (e.g., duplicated systems X and X″ are put together with X′ when they are all more than 80% identical in protein sequence). We then computed the number of families with members in both genomes (one in the example: X, X′, and X″), divided by the total number of families (with members in at least one of the two genomes, three in the example: the family X, X′, and X″ and the families W and Z). The values of R are in general small. In more than 50% of the comparisons, R < 0.1. Note that two R-M systems can be cognate and not be put in the same family of pangenome (if they are not sufficiently similar, e.g., because they were acquired independently from another species). The Count model can be used to analyze the evolution of orthologous families, but not of cognate families because the dataset is not large enough to parameterize the model. (B) Distribution of the patristic distances (d) between genomes with R < 1 (i.e., at least one R-M system not in common). (C) Distribution of the sizes of tips. The comparison between B and C shows that the length of the tips is, on average, smaller than the patristic distances between genomes with different R-M systems. Therefore, the R-M system found in the tip is likely to have been in the lineage for most if not all of the time since the split with the closest neighbor of the taxa in the tree. The comparison also shows that the length of the largest tips is close to the patristic distances for which one starts finding noncognate genomes. Hence, one cannot reliably assume that a given R-M system is present in most of the internal branches because the trait evolves fast.

Fig. S6.

Fig. S6.

Gene flux in bacteria encoding R-M systems. Contrary to Fig. 4 in the main text, we have not filtered any clade in this analysis: all 79 clades are represented. (A) Histogram of patristic distances (colored by quartiles) between bacteria with Type II R-M systems. (B) Median values of HGT and recombination events for each quartile (Q) and for the full dataset (All) between terminal branches of bacteria with Type II R-M systems recognizing (or not) the same target motif. (C) Correlation between Wagner parsimony gene family gains and maximum likelihood (ML) gains for values of posterior probability (PP) between 0.2 and 0.9. Spearman ρ values are indicated in each graph, and in all cases, P < 10−4; *P < 0.05; **P < 0.01; and ***P < 0.001.

Fig. 4.

Fig. 4.

Gene flux in bacteria encoding R-M systems. (A) We analyzed the patterns of HR and HGT in the tree of each clade, comparing the flux between tips ending in cognate (similar recognition motifs) or noncognate (different motifs) extant taxa. (B) Histogram of patristic distances (colored by quartiles) between bacteria with Type II R-M systems. (C) Median values of HGT and recombination events for each quartile (Q) and for the full dataset (All) between terminal branches of bacteria with Type II R-M systems recognizing (or not) the same target motif. We analyzed Bacillus amyloliquefaciens, Bifidobacterium longum, Escherichia coli, Haemophilus influenza, Listeria monocytogenes, Neisseria meningitidis, Salmonella enterica, and Streptococcus pneumoniae. *P < 0.05; **P < 0.01; ***P < 0.001 (see Fig. S6 A and B for the data including all clades). (D) Genetic flux in function of time and the presence of R-M systems. As lineages diverge and R-M systems change (circles indicate such changes), the lineages with cognate R-M systems (same color) share more genetic material than the other lineages. For example, the lineage B changes R-M systems twice since the last common ancestor (LCA). Initially transfer is favored with all lineages, then with the sister lineage A, and finally with the distantly related lineage C.

Fig. S7.

Fig. S7.

Distribution of ΔmedianHGT in the 100 bootstrap experiments (boxplot on Top and histogram on Bottom). The red dashed line indicates the (null) expectation if the flux between R-M cognate genomes was similar to that of noncognate ones.

Similar articles

Cited by

References

    1. Frost LS, Leplae R, Summers AO, Toussaint A. Mobile genetic elements: The agents of open source evolution. Nat Rev Microbiol. 2005;3(9):722–732. - PubMed
    1. Vulić M, Dionisio F, Taddei F, Radman M. Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria. Proc Natl Acad Sci USA. 1997;94(18):9763–9767. - PMC - PubMed
    1. Didelot X, Wilson DJ. ClonalFrameML: Efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol. 2015;11(2):e1004041. - PMC - PubMed
    1. Oliveira PH, Touchon M, Rocha EP. The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res. 2014;42(16):10618–10631. - PMC - PubMed
    1. Mruk I, Kobayashi I. To be or not to be: Regulation of restriction-modification systems and other toxin-antitoxin systems. Nucleic Acids Res. 2014;42(1):70–86. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources