A global network of coexisting microbes from environmental and whole-genome sequence data - PubMed (original) (raw)

A global network of coexisting microbes from environmental and whole-genome sequence data

Samuel Chaffron et al. Genome Res. 2010 Jul.

Abstract

Microbes are the most abundant and diverse organisms on Earth. In contrast to macroscopic organisms, their environmental preferences and ecological interdependencies remain difficult to assess, requiring laborious molecular surveys at diverse sampling sites. Here, we present a global meta-analysis of previously sampled microbial lineages in the environment. We grouped publicly available 16S ribosomal RNA sequences into operational taxonomic units at various levels of resolution and systematically searched these for co-occurrence across environments. Naturally occurring microbes, indeed, exhibited numerous, significant interlineage associations. These ranged from relatively specific groupings encompassing only a few lineages, to larger assemblages of microbes with shared habitat preferences. Many of the coexisting lineages were phylogenetically closely related, but a significant number of distant associations were observed as well. The increased availability of completely sequenced genomes allowed us, for the first time, to search for genomic correlates of such ecological associations. Genomes from coexisting microbes tended to be more similar than expected by chance, both with respect to pathway content and genome size, and outliers from these trends are discussed. We hypothesize that groupings of lineages are often ancient, and that they may have significantly impacted on genome evolution.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Detection of coexisting microbial lineages. (A) Schematic description of the analysis procedure. Publicly available 16S ribosomal RNA sequences are first grouped into operational taxonomic units (OTUs), then annotated according to unique environmental sampling events, and finally searched for statistically significant co-occurrences. Where available, completely sequenced genomes are mapped onto the resulting network, which is then clustered and annotated. (B) Example for a specific lineage association. The two lineages (defined at a 97% 16S sequence identity cutoff) have been sampled overall relatively rarely, but they occurred together three times, at three distinct sites. (or) Odds ratio. Under “Sampling sites,” the investigative work of “Baati H. et al., 2009” refers to R Amdouni, E Ammar, H Baati, N Gharsallah, and A Sghir (unpubl.).

Figure 2.

Figure 2.

Global network of coexisting microbial lineages. (A) Overview of the network of lineage associations. Each node denotes a microbial lineage, and each line a significant co-occurrence relationship. Node size is proportional to the number of sequences in the lineage, and node color indicates the connectivity degree of a node (along a color gradient: blue, low connectivity; red, high connectivity). Throughout the figure, the OTU definition cutoff is at 97% sequence identity, and the _P_-value cutoff for an association is 0.001 (i.e., FDR after correction for multiple testing). (B) Connectivity degree distribution plot for the network in A. The distribution is coarsely compatible with a power law distribution. (C) Same network as in A, but partitioned using unsupervised Markov clustering, to reveal modules (clusters) of co-occurring lineages. Here, node color denotes taxonomic classification at the phylum level. Lineages suspected to contain potential laboratory contaminants (Tanner et al. 1998; Barton et al. 2006) are mainly observed in small clusters, and are marked with a small black X (17 such lineages in total).

Figure 3.

Figure 3.

Example of a novel, previously undescribed module of coexisting lineages. (A) Five distinct microbial lineages are shown; they belong to three different phyla and are defined at an OTU-clustering distance of 90% sequence identity at the 16S rRNA gene. The five lineages have been exclusively observed through environmentally sampled sequences and have not been named. (B) The table shows all occurrence counts of these lineages among our sampling data; the _P_-values indicated have been corrected for multiple testing, against the background of all lineages defined at 90%. Adjusted _P_-values (FDR; p) and odds ratios (or) are indicated. (*) The samples by Li et al.(2008) have been collected at distinct sites, covering a distance of more than 600 miles; collection was at different water depths and sampling dates. Investigators involved in unpublished work are as follows: E Julies, V Bruechert, and BM Fuchs; B Orcutt, SB Joye, S Kleindienst, K Knittel, A Ramette, A Rietz, V Samarkin, T Treude, and A Boetius; A Postec, R Warthmann, C Vaconcelos, K Hanselmann, and J McKenzie; Z Zhang, H Xiao, and X Tang.

Figure 4.

Figure 4.

Coexisting lineages display similarities in genomic features. Here, we exclusively focus on co-occurring lineages for which completely sequenced genomes could be mapped to both partners (this genome mapping is globally visualized in Supplemental Fig. S6). Properties of such co-occurring genomes are compared, and contrasted against randomly paired genomes. (A) The distribution of 16S sequence divergence scores; shifted to the left for co-occurring genome pairs (i.e. they tend to be related phylogenetically). In panels E, F, and G, we test for independence between phylogenetic relatedness, and observations as shown in panels B, C, and D, respectively. Here, each dot denotes a pairwise genome comparison, and lines correspond to running medians.

Figure 5.

Figure 5.

Functional similarities among co-occurring genomes. Each dot denotes a pair of genomes, which are either co-occurring in the environment (red to orange dots) or randomly paired (blue dots). The plot shows differences in functional genome content (_y_-axis), and in genome size (_x_-axis). Lines denote running medians. Note that, in general, the more divergent two genomes are in size, the more they are functionally distinct (blue line). In co-occurring genomes, this trend is strongly shifted toward similar functions, at all levels of phylogenetic relatedness (color-coded from red to orange). Examples of genome pairs that are discussed in the text are indicated.

Similar articles

Cited by

References

    1. Ahmed N 2009. A flood of microbial genomes—do we need more? PLoS One 4: e5831 doi: 10.1371/journal.pone.0005831 - PMC - PubMed
    1. Alonso C, Warnecke F, Amann R, Pernthaler J 2007. High local and global diversity of Flavobacteria in marine plankton. Environ Microbiol 9: 1253–1266 - PubMed
    1. Angly FE, Willner D, Prieto-Davo A, Edwards RA, Schmieder R, Vega-Thurber R, Antonopoulos DA, Barott K, Cottrell MT, Desnues C, et al. 2009. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 5: e1000593 doi: 10.1371/journal.pcbi.1000593 - PMC - PubMed
    1. Baati H, Guermazi S, Amdouni R, Gharsallah N, Sghir A, Ammar E 2008. Prokaryotic diversity of a Tunisian multipond solar saltern. Extremophiles 12: 505–518 - PubMed
    1. Barabasi AL, Oltvai ZN 2004. Network biology: Understanding the cell's functional organization. Nat Rev Genet 5: 101–113 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources