Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli - PubMed (original) (raw)

Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli

Bob Mau et al. Genome Biol. 2006.

Abstract

Background: Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through genome comparison. Other more subtle lateral transfers involve homologous recombination events that result in substitution of alleles within conserved genomic regions. This type of event is observed infrequently among distantly related organisms. It is reported to be more common within species, but the frequency has been difficult to quantify since the sequences under comparison tend to have relatively few polymorphic sites.

Results: Here we report a genome-wide assessment of homologous recombination among a collection of six complete Escherichia coli and Shigella flexneri genome sequences. We construct a whole-genome multiple alignment and identify clusters of polymorphic sites that exhibit atypical patterns of nucleotide substitution using a random walk-based method. The analysis reveals one large segment (approximately 100 kb) and 186 smaller clusters of single base pair differences that suggest lateral exchange between lineages. These clusters include portions of 10% of the 3,100 genes conserved in six genomes. Statistical analysis of the functional roles of these genes reveals that several classes of genes are over-represented, including those involved in recombination, transport and motility.

Conclusion: We demonstrate that intraspecific recombination in E. coli is much more common than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A multiple whole-genome alignment of six strains consists of 34 rearranged pieces larger than 1 kb. Each genome is laid out horizontally with homologous segments (LCBs) outlined as colored rectangles. Regions inverted relative to E. coli K-12 are set below those that match in the forward orientation. Lines collate aligned segments between genomes. Average sequence similarities within an LCB, measured in sliding windows, are proportional to the heights of interior colored bars. Large sections of white within blocks and gaps between blocks indicate lineage specific sequence.

Figure 2

Figure 2

Small sample segment of the alignment spanning the start of the mutS gene (denoted in blue). Location of a mismatch is indicated by the integer '1' along the bottom row. Five columns contain SNDs: TTTCTT, AAAGAA, AAATAA, GGGAGG, and GAAAAA. The first four share the same bipartition pattern (111211) and are deemed equivalent, even though one of them results from a transversion. The other SND is considered distinct despite having the same mutation (A to G) found in the second SND.

Figure 3

Figure 3

Three excursions (KS, KO, and KC) spanning the alignment with K-12 MG1655 as reference genome. The KS random walk plot, representing the dominant clonal topology, decreases more gradually than do the two other plots. Excursions for the discordant topologies (patterns KO and KC) run parallel to one another, except in a 100 kb region at 2 Mb where KO abruptly increases. Parallel flat gaps common to all three plots reflect K-12 lineage specific sequence.

Figure 4

Figure 4

The KS local random walk plot showing homologous recombination in the tryptophan (trp) operon. Genes are rectangular boxes positioned above or below the axis based on transcribed strand. KS SNDs form two non-overlapping MSCs with significant local scores exceeding 170. Both MSCs, with a combined length under 2 kb, are contained in a single 6.5 kb HSS covering most the trp operon. The positions of each KO, KC, and KS SND in E. coli K-12 are shown above the KS excursion. Random walk values below 50 are not plotted, resulting in the absence of visible KC or KO excursions.

Figure 5

Figure 5

Mosaic operons and genes. Three of six rha genes (rhaB, rhaA, and rhaD) belong to an operon on the reverse strand. This operon is unusual because well-defined recombination events clearly fall within gene boundaries; rhaD contains two dense KC clusters, whereas rhaA and rhaB contain predominantly KS and KO SNDs, respectively. In a nearby operon consisting of fdoG, fdoH, fdoI, and fdhE, there has been a KC intragenic recombination event with fdoG a mosaic, resulting from two recombination events, one of which is shared with fdoH.

Figure 6

Figure 6

Random walk plots for positive local scores in the vicinity of the speF gene. SpeF is a mosaic gene by virtue of its KS and KO clusters. Note the small cluster of KC SNDs appears to divide a large KS segment near coordinate 718,600. This short KC spike, though not statistically significant on a whole genome scale, would undoubtedly pass a single gene substitution distribution type test.

Figure 7

Figure 7

Percentage of SNDs supporting each of three topologies in a phylogenetic network for six E. coli genomes (four OTUs). Black lines describe the 'species' topology. Green, blue, and orange lines indicate the alternative pairings of sister taxa that result from KS, KO, and KC recombinations, respectively. Also shown is the percentage of SNDs supporting each bipartition in Table 1.

Figure 8

Figure 8

The location of all SNDs in a 5 kb region. In clusters demarcated by colored lines, note the corresponding absence of two more common types of SNDs. Three diamonds in lighter shades of blue, green, and red are compatible tri-partitions (see Additional data file 1). Colored lines demarcate regions where the absence of lineage-specific SNDs is offset by an increase in the corresponding recombinant pattern (for example, in yiaA, no K-12 or S. flexneri only SNDs).

Figure 9

Figure 9

Statistical justification of threshold values - 100, 100, and 170 for topologies KO, KC, and KS, respectively - used to identify recombination events. Values on the x-axis are maximal local scores. EVD probability densities for the maximum maximal local score attained by random walks of length M' appear as bell-shaped curves with a pronounced skew to the right. Threshold values, demarcated by vertical lines, correspond to conservative significance levels (α = 0.05) for these distributions.

Similar articles

Cited by

References

    1. Milkman R. Recombination and population structure in Escherichia coli. Genetics. 1997;146:745–750. - PMC - PubMed
    1. Daubin V, Moran NA, Ochman H. Phylogenetics and the cohesion of bacterial genomes. Science. 2003;301:829–832. doi: 10.1126/science.1086568. - DOI - PubMed
    1. Feil EJ, Maiden MC, Achtman M, Spratt BG. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol Biol Evol. 1999;16:1496–1502. - PubMed
    1. Spratt BG, Hanage WP, Feil EJ. The relative contributions of recombination and point mutation to the diversification of bacterial clones. Curr Opin Microbiol. 2001;4:602–606. doi: 10.1016/S1369-5274(00)00257-5. - DOI - PubMed
    1. Gogarten JP, Doolittle WF, Lawrence JG. Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 2002;19:2226–2238. - PubMed

Publication types

MeSH terms

LinkOut - more resources