Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins - PubMed (original) (raw)
doi: 10.1371/journal.pbio.1000096.
Sarath Chandra Janga, Mohan Babu, J Javier Díaz-Mejía, Gareth Butland, Wenhong Yang, Oxana Pogoutse, Xinghua Guo, Sadhna Phanse, Peter Wong, Shamanta Chandran, Constantine Christopoulos, Anaies Nazarians-Armavil, Negin Karimi Nasseri, Gabriel Musso, Mehrab Ali, Nazila Nazemof, Veronika Eroukova, Ashkan Golshani, Alberto Paccanaro, Jack F Greenblatt, Gabriel Moreno-Hagelsieb, Andrew Emili
Affiliations
- PMID: 19402753
- PMCID: PMC2672614
- DOI: 10.1371/journal.pbio.1000096
Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins
Pingzhao Hu et al. PLoS Biol. 2009.
Abstract
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Conflict of interest statement
Competing interests. The authors have declared that no competing interests exist.
Figures
Figure 1. Annotated and Functional Orphan Genes of the E. coli K-12 Reference Strain
(A) Frequency distribution of supporting publications per E. coli protein-coding gene. (B) Summary of existing annotations for E. coli, showing proteins of unknown function (orphans) lacking proper names and functional annotations in MultiFun [25] or EcoCyc [11]. (C–F) Although the functional orphans are encoded by transcripts with half-lives comparable to those of annotated genes, they tend to be expressed at lower levels based on (D) microarray analysis of mRNA and (E) CAI scores, and (F) have lower molecular weights on average. The _x_-axis in (D) represents the average of the log-scale mRNA expression level of each gene for all the arrays using the Robust Multi-Array Average normalized data obtained from the M3D database [103]. NS, not statistically significant; p, _p_-value; t, _t_-test. (G) Orthologs of orphans are also less prevalent in sequenced genomes than those of annotated genes. (H) However, examination of environmental metagenomic libraries indicates that the orphans are not necessarily exclusive to the Escherichia lineage. Anammox, anaerobic ammonium oxidation bacteria; AMO, methane oxidizing Archaea.
Figure 2. Generation and Integration of Physical and Functional Networks and Orphan Function Prediction
(A) Construction of a PI network based on protein copurification and detection by mass spectrometry. The confidence scoring of the LCMS and MALDI networks was conducted using a logistic regression with datasets consisting of PI from low-throughput studies curated in DIP, BIND, and IntAct (gold positives) and proteins in different subcellular localizations (gold negatives). The two networks were integrated using a probabilistic model [61] (Protocol S6). The resulting PI network, with edge weights corresponding to likelihood ratios, was clustered using MCL to delimit “multiprotein complexes.” (B) Integration of four GC methods into a single functional interaction network using the same probabilistic model [61] and resulting scores (edge weights) were input to MCL to delimit “functional modules.” (C) Orphan function prediction was conducted using a “guilt-by-association” procedure. After integration of PI and GC interactions into a single probabilistic network [61], a machine learning algorithm (StepPLR) newly developed for this study was used to assign functions based on the binary associations of orphans with annotated proteins, the respective interaction edge weights, and the overall network topology. Correlations between vectors of these function predictions (orphans), and the annotations were then used as input to delimit “functional neighborhoods” by clustering using MCL.
Figure 3. High-Confidence Physical Interactions and Putative Multiprotein Complexes
(A) Benchmarking of the experimentally derived PI network in E. coli against positive and negative gold standards by receiver operating characteristic (ROC)-curve analysis; cumulative area-under-the-curve (AUC) is shown as an overall performance measure. (B) Overlap of PI identified in this study with previous proteomic reports [1,4] and low-throughput PI obtained from DIP, BIND, and IntAct. (C and D) Putatively interacting proteins have highly correlated gene expression patterns (C) and similar phylogenetic profiles (D) based on mutual information as for low-throughput curated PI and in contrast to control protein pairs derived from different subcellular compartments. (E) Graphical schematic of putative stable, soluble multiprotein complexes, drawn using the GenePRO Cytoscape plugin [104] (see Table S7 for listing). Each node represents a complex, whose size reflects the number of contained proteins; edge widths reflect the number of interactions between subunits of different complexes. (F) Multiprotein complexes implicated in the bacterial translation apparatus; orphan and annotated genes mentioned in the main text are highlighted in bold. (G) Reduced rate of total protein synthesis in a strain lacking ybcJ relative to wild-type cells (WT). (H) Perturbed ribosome profiles in an yfgB deletion strain. (I) Elevated rates of frame-shifting and stop-codon readthrough in yfgB and ybcJ deletion strains relative to wild-type (WT). β-gal activity is only produced after the corresponding translational defect has occurred; error bars indicate standard deviation.
Figure 4. High-Confidence Genomic Context Associations and Putative Functional Modules
(A) Benchmarking of unified GC interactions in E. coli against positive and negative gold standards by ROC-curve analysis. (B) Overlap of high-confidence functional interactions predicted in this study with two other public GC databases. (C) Even after eliminating adjacent gene pairs to control for known and predicted E. coli operons, functionally linked genes have highly correlated patterns of mRNA expression comparable to components of the same curated EcoCyc pathways rather than different pathways. (D) Functionally linked genes are enriched for annotations to the same COG functional categories. (E) Graphical representation of putative E. coli functional modules (see Table S9 for listing); node size and colors are proportional to the number and fraction of orphan and annotated subunits, respectively, while lines represent interactions connecting modules. (F) Putative fimbriae-related module. (G) Defective motility of mutant strains deleted for orphans linked to fimbriae (from [F]); single dashes (-) indicate moderately impaired motility, while double dashes (–) represent strong repression. Other mutants displaying a normal phenotype comparable to the wild-type strain BW25113 (WT) are not shown. (H) Defective biofilm formation by mutants deleted for fimbriae-related orphans (from [F]); significant differences (_t_-test) in cell adhesion (absorbance) between mutant and WT strains are denoted by asterisks (single asterisks [*], p < 0.01; double asterisks [**], p < 0.0001). Error bars indicate standard deviation of the mean. CFA, colonization factor antigen medium; LB, Luria Bertani medium. (I) Metabolic modules mentioned in main text. (J) Mutants auxotrophic for aromatic amino acids show defective growth on minimal media containing shikimic acid. A prototroph—_aroL_Δ mutant strain (P)—is shown for comparison.
Figure 5. The Functional Neighborhoods of E. coli
(A) A “clustergram” displaying existing annotations (orange) and the orphan predicted functions (this study; blue) for all the protein-coding genes of E. coli (_y_-axis) and their associated biological processes (_x_-axis) (complete terms are provided in Figure S5 and Table S16). Proteins were clustered using MCL based on the paired similarity of the functional annotations and predictions in this matrix to delimit “functional neighborhoods” (see Table S17 for listing). (B) Putative functional neighborhood showing high-confidence integrated functional interactions (combined PI and GC networks) of select orphans with the protein synthesis machinery. For clarity, individual names of ribosomal proteins and tRNA synthetases are not shown. (C) Heat map showing the differential sensitivity of orphan deletion strains to antibiotics targeting protein synthesis relative to the colony size in the absence of drug (see Protocol S11 for details). Mutants deleted for annotated proteins from this neighborhood are shown as positive controls, whereas deletion mutants lacking genes not contained within this neighborhood are shown as negative controls. (D) Neighborhood with three orphans putatively involved in flagellum assembly and motility. (E) Deletions of the corresponding components reduce swarming capability; single dash (-), moderately impaired motility; double dash (- -), strong repression. (F) Subnetwork of orphans associated with DNA enzymes. (G) Deletion of the orphan yhcG results in synthetic lethality when combined with hypomorphic alleles (as indicated by an asterisk [*]) of three essential DNA replication factors (parE, dnaN, and dnaB).
Figure 6. Functional Neighborhoods Involved in Cell Envelope Biogenesis
(A) Functional neighborhood showing high-confidence integrated functional interactions (combined PI and GC networks) between components of the complete peptidoglycan biogenesis pathway and putatively functionally related orphans. (B) Serial-dilution assay showing perturbed growth of E. coli strains deleted for putative cell wall–related components (from [A]) in the presence of antibiotic inhibitors of cell wall assembly; an asterisk (*) indicates a hypomorphic allele of the essential gene murA. (C) Functional neighborhood of metabolic pathways involved in biogenesis of the bacterial cell envelope. Vertical groupings correspond to known pathways, whereas orphan components are tentatively positioned according to their interaction patterns; brown lines indicate aggravating genetic interactions recorded between the orphan yfbJ and yadE and annotated pathway components. (D) Conjugation-based double-mutant growth assays (only test results for yfbJ are shown). (E) Heat map showing the differential sensitivity of strains deleted for the orphans shown in (C) to antibiotics targeting the cell envelope. Mutants deleted of relevant annotated proteins are shown as positive controls, while mutants lacking genes not contained within this neighborhood are shown as negative controls.
Figure 7. Evolutionary Conservation of Orphan Protein Function
(A) Evolutionary conservation of orphan and annotated E. coli protein interaction partners in the integrated PI-GC network based on the co-occurrence of putative orthologs across fully sequenced prokaryotic genomes. (B) Phylogenetic distribution of the components of 97 functional neighborhoods with at least one orphan; the proportion of genomes showing conservation is indicated. (C) Atypical neighborhood illustrating broader conservation of orphans than annotated components; node shading reflects average phylogenetic conservation. (D) Representative neighborhood involved in drug efflux exhibiting similar phylogenetic distributions among its orphan and annotated components; analogous neighborhoods shown in preceding figures are indicated below [B]. (E) Serial-dilution assay showing the drug hypersensitivity of deletion mutants of the orphans listed in [D]. (F) DNA-replication neighborhood exhibiting a tendency of annotated components to be more widely distributed than the orphans. (G) Serial-dilution assay showing the perturbed growth of strains deleted for the two orphan components shown in [F] in the presence of an antibiotic inhibitor of DNA-replication; an asterisk (*) indicates hypomorphic alleles. Mutants deleted of relevant annotated proteins are shown as positive controls, while mutants lacking genes not contained within this neighborhood are shown as negative controls. LB, Luria Bertani medium.
Similar articles
- Genomic peculiarity of coding sequences and metabolic potential of probiotic Escherichia coli strain Nissle 1917 inferred from raw genome data.
Sun J, Gunzer F, Westendorf AM, Buer J, Scharfe M, Jarek M, Gössling F, Blöcker H, Zeng AP. Sun J, et al. J Biotechnol. 2005 May 4;117(2):147-61. doi: 10.1016/j.jbiotec.2005.01.008. J Biotechnol. 2005. PMID: 15823404 - The Modular Organization of Protein Interactions in Escherichia coli.
Peregrín-Alvarez JM, Xiong X, Su C, Parkinson J. Peregrín-Alvarez JM, et al. PLoS Comput Biol. 2009 Oct;5(10):e1000523. doi: 10.1371/journal.pcbi.1000523. Epub 2009 Oct 2. PLoS Comput Biol. 2009. PMID: 19798435 Free PMC article. - Computational and experimental approaches to chart the Escherichia coli cell-envelope-associated proteome and interactome.
Díaz-Mejía JJ, Babu M, Emili A. Díaz-Mejía JJ, et al. FEMS Microbiol Rev. 2009 Jan;33(1):66-97. doi: 10.1111/j.1574-6976.2008.00141.x. Epub 2008 Nov 27. FEMS Microbiol Rev. 2009. PMID: 19054114 Free PMC article. Review. - The binary protein-protein interaction landscape of Escherichia coli.
Rajagopala SV, Sikorski P, Kumar A, Mosca R, Vlasblom J, Arnold R, Franca-Koh J, Pakala SB, Phanse S, Ceol A, Häuser R, Siszler G, Wuchty S, Emili A, Babu M, Aloy P, Pieper R, Uetz P. Rajagopala SV, et al. Nat Biotechnol. 2014 Mar;32(3):285-290. doi: 10.1038/nbt.2831. Epub 2014 Feb 23. Nat Biotechnol. 2014. PMID: 24561554 Free PMC article. - Escherichia coli Small Proteome.
Hemm MR, Weaver J, Storz G. Hemm MR, et al. EcoSal Plus. 2020 May;9(1):10.1128/ecosalplus.ESP-0031-2019. doi: 10.1128/ecosalplus.ESP-0031-2019. EcoSal Plus. 2020. PMID: 32385980 Free PMC article. Review.
Cited by
- EcoliNet: a database of cofunctional gene network for Escherichia coli.
Kim H, Shim JE, Shin J, Lee I. Kim H, et al. Database (Oxford). 2015 Feb 2;2015:bav001. doi: 10.1093/database/bav001. Print 2015. Database (Oxford). 2015. PMID: 25650278 Free PMC article. - Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks.
Liu M, Chen XW, Jothi R. Liu M, et al. Bioinformatics. 2009 Oct 1;25(19):2492-9. doi: 10.1093/bioinformatics/btp480. Epub 2009 Aug 10. Bioinformatics. 2009. PMID: 19667081 Free PMC article. - Computational Network Inference for Bacterial Interactomics.
James K, Muñoz-Muñoz J. James K, et al. mSystems. 2022 Apr 26;7(2):e0145621. doi: 10.1128/msystems.01456-21. Epub 2022 Mar 30. mSystems. 2022. PMID: 35353009 Free PMC article. Review. - Protein-protein interactions in bacteria: a promising and challenging avenue towards the discovery of new antibiotics.
Carro L. Carro L. Beilstein J Org Chem. 2018 Nov 21;14:2881-2896. doi: 10.3762/bjoc.14.267. eCollection 2018. Beilstein J Org Chem. 2018. PMID: 30546472 Free PMC article. Review. - The next frontier of systems biology: higher-order and interspecies interactions.
Fischbach MA, Krogan NJ. Fischbach MA, et al. Genome Biol. 2010;11(5):208. doi: 10.1186/gb-2010-11-5-208. Epub 2010 May 5. Genome Biol. 2010. PMID: 20441613 Free PMC article.
References
- Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 2005;433:531–537. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases