Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes - PubMed (original) (raw)
Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes
Todd J Treangen et al. PLoS Genet. 2011.
Abstract
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes--xenologs--persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes--paralogs--are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein-protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. Histogram of the normalized size of gene families.
For each family we compute the number of genes in the family and subtract it by the number of genomes containing at least one member of the family.
Figure 2. Relative contribution of horizontal gene transfer in protein family expansions.
Figure 3. Abundance of IS and prophages and increased inference of IGD events when included in analysis.
The bar plot (left y-axis) shows the percentage of gene family expansions of IS and phage origin. The line plot (right y-axis) indicates the increase of the number of expansions assigned to duplications when the co-localization criterion is ignored and IS and prophages are included in the dataset.
Figure 4. Gene expression differs according to gene origin.
Paralogs are more expressed, as measured by the codon adaptation index, than xenologs. Xenologs, however, are more expressed than the genes without paralogs and xenologs.
Figure 5. Evolutionary rates differ between paralogs and xenologs.
Non-synonymous (dN) and synonymous (dS) substitution rates in paralogs (blue; dashed linear fit) and xenologs (red; solid linear fit) in all clades computed using Codeml from PAML (model = 1, fix_omega = 0).
Figure 6. Protein family construction pipeline.
Starting with a databank of proteins, we first performed all pairwise similarity searches using BLASTP. The hits were filtered regarding the length of the match (70% of the length of the query) and the bitscore (30% of the maximal bitscore calculated by aligning a protein against itself). To build the gene families we ran MCL blastline and then removed all singletons, IS and Phage. To build the core genome we used OrthoMCL along with a synteny filter based on M-GCAT Clusters. Finally, using presence/absence and phylogenetic information, we obtained the protein families with expansions
Figure 7. Cumulative distribution function plot of protein similarity.
Colored lines correspond to CDF plots of the similarity between orthologous proteins of the core genome for the comparison of E. coli K12 W3110 with genomes of increasing phylogenetic distances. The gray line corresponds to the similarity between homologous genes in the E. coli K12 W3110 genome.
Similar articles
- Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes.
Puigbò P, Lobkovsky AE, Kristensen DM, Wolf YI, Koonin EV. Puigbò P, et al. BMC Biol. 2014 Aug 21;12:66. doi: 10.1186/s12915-014-0066-4. BMC Biol. 2014. PMID: 25141959 Free PMC article. - Lineage-specific gene expansions in bacterial and archaeal genomes.
Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV. Jordan IK, et al. Genome Res. 2001 Apr;11(4):555-65. doi: 10.1101/gr.gr-1660r. Genome Res. 2001. PMID: 11282971 Free PMC article. - Genome plasticity and systems evolution in Streptomyces.
Zhou Z, Gu J, Li YQ, Wang Y. Zhou Z, et al. BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S8. doi: 10.1186/1471-2105-13-S10-S8. BMC Bioinformatics. 2012. PMID: 22759432 Free PMC article. - Genome duplication and gene-family evolution: the case of three OXPHOS gene families.
De Grassi A, Lanave C, Saccone C. De Grassi A, et al. Gene. 2008 Sep 15;421(1-2):1-6. doi: 10.1016/j.gene.2008.05.011. Epub 2008 Jun 23. Gene. 2008. PMID: 18573316 Review. - Gene duplication, transfer, and evolution in the chloroplast genome.
Xiong AS, Peng RH, Zhuang J, Gao F, Zhu B, Fu XY, Xue Y, Jin XF, Tian YS, Zhao W, Yao QH. Xiong AS, et al. Biotechnol Adv. 2009 Jul-Aug;27(4):340-7. doi: 10.1016/j.biotechadv.2009.01.012. Biotechnol Adv. 2009. PMID: 19472510 Review.
Cited by
- A universal and constant rate of gene content change traces pangenome flux to LUCA.
Trost K, Knopp MR, Wimmer JLE, Tria FDK, Martin WF. Trost K, et al. FEMS Microbiol Lett. 2024 Jan 9;371:fnae068. doi: 10.1093/femsle/fnae068. FEMS Microbiol Lett. 2024. PMID: 39165128 Free PMC article. - Investigating Additive and Replacing Horizontal Gene Transfers Using Phylogenies and Whole Genomes.
Kloub L, Gosselin S, Graf J, Gogarten JP, Bansal MS. Kloub L, et al. Genome Biol Evol. 2024 Sep 3;16(9):evae180. doi: 10.1093/gbe/evae180. Genome Biol Evol. 2024. PMID: 39163267 Free PMC article. - Lessons from Extremophiles: Functional Adaptations and Genomic Innovations across the Eukaryotic Tree of Life.
Rappaport HB, Oliverio AM. Rappaport HB, et al. Genome Biol Evol. 2024 Aug 5;16(8):evae160. doi: 10.1093/gbe/evae160. Genome Biol Evol. 2024. PMID: 39101574 Free PMC article. Review. - Pathogenic Bacteroides fragilis strains can emerge from gut-resident commensals.
Oles RE, Terrazas MC, Loomis LR, Neal MJ, Paulchakrabarti M, Zuffa S, Hsu CY, Vasquez Ayala A, Lee MH, Tribelhorn C, Belda-Ferre P, Bryant M, Zemlin J, Young J, Dulai P, Sandborn WJ, Sivagnanam M, Raffatellu M, Pride D, Dorrestein PC, Zengler K, Choudhury B, Knight R, Chu H. Oles RE, et al. bioRxiv [Preprint]. 2024 Jun 19:2024.06.19.599758. doi: 10.1101/2024.06.19.599758. bioRxiv. 2024. PMID: 38948766 Free PMC article. Preprint. - Gene Protein Sequence Evolution Can Predict the Rapid Divergence of Ovariole Numbers in the Drosophila melanogaster Subgroup.
Whittle CA, Extavour CG. Whittle CA, et al. Genome Biol Evol. 2024 Jul 3;16(7):evae118. doi: 10.1093/gbe/evae118. Genome Biol Evol. 2024. PMID: 38848313 Free PMC article.
References
- McCutcheon JP, McDonald BR, Moran NA. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet. 2009;5:e1000565. doi: 10.1371/journal.pgen.1000565. - DOI - PMC - PubMed
- Schneiker S, Perlova O, Kaiser O, Gerth K, Alici A, et al. Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol. 2007;25:1281–1289. - PubMed
- Pasek S, Risler JL, Brezellec P. The role of domain redundancy in genetic robustness against null mutations. J Mol Biol. 2006;362:184–191. - PubMed
- Wagner A. Gene duplications, robustness and evolutionary innovations. Bioessays. 2008;30:367–373. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources