Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes - PubMed (original) (raw)

Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes

Todd J Treangen et al. PLoS Genet. 2011.

Abstract

Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes--xenologs--persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes--paralogs--are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein-protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Histogram of the normalized size of gene families.

For each family we compute the number of genes in the family and subtract it by the number of genomes containing at least one member of the family.

Figure 2

Figure 2. Relative contribution of horizontal gene transfer in protein family expansions.

Figure 3

Figure 3. Abundance of IS and prophages and increased inference of IGD events when included in analysis.

The bar plot (left y-axis) shows the percentage of gene family expansions of IS and phage origin. The line plot (right y-axis) indicates the increase of the number of expansions assigned to duplications when the co-localization criterion is ignored and IS and prophages are included in the dataset.

Figure 4

Figure 4. Gene expression differs according to gene origin.

Paralogs are more expressed, as measured by the codon adaptation index, than xenologs. Xenologs, however, are more expressed than the genes without paralogs and xenologs.

Figure 5

Figure 5. Evolutionary rates differ between paralogs and xenologs.

Non-synonymous (dN) and synonymous (dS) substitution rates in paralogs (blue; dashed linear fit) and xenologs (red; solid linear fit) in all clades computed using Codeml from PAML (model = 1, fix_omega = 0).

Figure 6

Figure 6. Protein family construction pipeline.

Starting with a databank of proteins, we first performed all pairwise similarity searches using BLASTP. The hits were filtered regarding the length of the match (70% of the length of the query) and the bitscore (30% of the maximal bitscore calculated by aligning a protein against itself). To build the gene families we ran MCL blastline and then removed all singletons, IS and Phage. To build the core genome we used OrthoMCL along with a synteny filter based on M-GCAT Clusters. Finally, using presence/absence and phylogenetic information, we obtained the protein families with expansions

Figure 7

Figure 7. Cumulative distribution function plot of protein similarity.

Colored lines correspond to CDF plots of the similarity between orthologous proteins of the core genome for the comparison of E. coli K12 W3110 with genomes of increasing phylogenetic distances. The gray line corresponds to the similarity between homologous genes in the E. coli K12 W3110 genome.

Similar articles

Cited by

References

    1. McCutcheon JP, McDonald BR, Moran NA. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet. 2009;5:e1000565. doi: 10.1371/journal.pgen.1000565. - DOI - PMC - PubMed
    1. Schneiker S, Perlova O, Kaiser O, Gerth K, Alici A, et al. Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol. 2007;25:1281–1289. - PubMed
    1. Pasek S, Risler JL, Brezellec P. The role of domain redundancy in genetic robustness against null mutations. J Mol Biol. 2006;362:184–191. - PubMed
    1. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 2007;8:R51. - PMC - PubMed
    1. Wagner A. Gene duplications, robustness and evolutionary innovations. Bioessays. 2008;30:367–373. - PubMed

Publication types

MeSH terms

LinkOut - more resources