Inference of homologous recombination in bacteria using whole-genome sequences - PubMed (original) (raw)

Inference of homologous recombination in bacteria using whole-genome sequences

Xavier Didelot et al. Genetics. 2010 Dec.

Abstract

Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—

Figure 1.—

Illustration of our model for a single region of 300 bp and a sample of four isolates. (A) The full graph of ancestry, with the clonal genealogy shown in thick black lines and two recombination events shown in red and blue. The red event, for example, affected the positions 50–200 of an ancestor of the first isolate at the point _b_1, and the donor last shared clonal ancestry with the sample at the point _a_1. (B) The local trees for each site. Points ai are denoted as “departures” of recombinant edges from the tree and bi are “arrivals,” with bi occurring closer to the observed sequences at the tips.

F<sc>igure</sc> 2.—

Figure 2.—

Results on simulated data for a single simulation. The clonal genealogy is shown on the left, and each node is given a color. Each horizontal row on the right represents the arrival of recombination on the branch of the clonal genealogy it is aligned with. For each row, the _x_-axis represents the sequence measured in base pairs and the _y_-axis represents the probability of recombination on a scale from 0 (where the magenta line is most of the time) to 1 (just below the light gray line). ClonalFrame inference is represented by a thin magenta line. ClonalOrigin inference is shown in solid colors according to their reconstructed origin. Small bars above each row correspond to the true recombined regions in the ARG and are colored according to their origin (or in very light gray to represent absence of recombination). For example, on the branch above genome 9, two real events have occurred, both from an “orange” origin. The first one (around position 900) was fairly short and therefore stayed undetected. The second one (around position 5200) was detected by ClonalFrame with posterior probability close to 100% and by ClonalOrigin with posterior probability ∼50% and an origin very likely to be orange but that could also be brown or red.

F<sc>igure</sc> 3.—

Figure 3.—

Inferred values of ρ relative to true values for many simulated data sets across various parameter values. Shown are values for ClonalFrame (magenta) and ClonalOrigin (blue). For each of the six values of θ we plot the median (thick line) and interquartile range (thin line) of the ratio of inferred ρ/true ρ, considering the combined results for 10 different instances of the ARG. Lines are labeled by the order they appear at ρ = 400. The true value of 1 is shown as a horizontal line for comparison.

F<sc>igure</sc> 4.—

Figure 4.—

Scatterplots for all blocks of the stage 2 analysis of Bacillus, showing the inferred values of the log-average tract length (log(δ)), the mutation rate per site (θs/2), and recombination rate per site (ρs/2). A density plot of the scatterplot is shown using gray shading. The median for all blocks is shown in red.

F<sc>igure</sc> 5.—

Figure 5.—

Heat map for the Bacillus stage 3 analysis showing the number of recombination events inferred relative to its expectation under our prior model given the stage 2 inferred recombination rate, for each donor/recipient pair of branches. The cells in very light gray are the ones for which the ratio would be meaningless because there are less than three observed and expected events.

F<sc>igure</sc> 6.—

Figure 6.—

Scatterplot of the stage 2 analysis of Bacillus showing the number of recombination event boundaries per site for each block in the alignment of Bacillus. Details of the two blocks shown by a blue and a green dot are shown in Figure 7.

F<sc>igure</sc> 7.—

Figure 7.—

Results of our stage 3 analysis for two example regions of the Bacillus alignment. The representation is the same as in Figure 2. The two regions are shown by a blue and a green dot, respectively, in Figure 6.

Similar articles

Cited by

References

    1. Achtman, M., T. Azuma, D. E. Berg, Y. Ito, G. Morelli et al., 1999. Recombination and clonal groupings within Helicobacter pylori from different geographical regions. Mol. Microbiol. 32 459–470. - PubMed
    1. Canchaya, C., G. Fournous, S. Chibani-Chennoufi, M. L. Dillmann and H. Brüssow, 2003. Phage as agents of lateral gene transfer. Curr. Opin. Microbiol. 6 417–424. - PubMed
    1. Challacombe, J. F., M. R. Altherr, G. Xie, S. S. Bhotika, N. Brown et al., 2007. The complete genome sequence of Bacillus thuringiensis Al Hakam. J. Bacteriol. 189 3680–3681. - PMC - PubMed
    1. Chen, I., P. J. Christie and D. Dubnau, 2005. The ins and outs of DNA transfer in bacteria. Science 310 1456–1460. - PMC - PubMed
    1. Claverys, J. P., B. Martin and P. Polard, 2009. The genetic transformation machinery: composition, localization, and mechanism. FEMS Microbiol. Rev. 33 643–656. - PubMed

Publication types

MeSH terms

LinkOut - more resources