Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins - PubMed (original) (raw)

doi: 10.1371/journal.pbio.1000096.

Sarath Chandra Janga, Mohan Babu, J Javier Díaz-Mejía, Gareth Butland, Wenhong Yang, Oxana Pogoutse, Xinghua Guo, Sadhna Phanse, Peter Wong, Shamanta Chandran, Constantine Christopoulos, Anaies Nazarians-Armavil, Negin Karimi Nasseri, Gabriel Musso, Mehrab Ali, Nazila Nazemof, Veronika Eroukova, Ashkan Golshani, Alberto Paccanaro, Jack F Greenblatt, Gabriel Moreno-Hagelsieb, Andrew Emili

Affiliations

Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins

Pingzhao Hu et al. PLoS Biol. 2009.

Abstract

One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Annotated and Functional Orphan Genes of the E. coli K-12 Reference Strain

(A) Frequency distribution of supporting publications per E. coli protein-coding gene. (B) Summary of existing annotations for E. coli, showing proteins of unknown function (orphans) lacking proper names and functional annotations in MultiFun [25] or EcoCyc [11]. (C–F) Although the functional orphans are encoded by transcripts with half-lives comparable to those of annotated genes, they tend to be expressed at lower levels based on (D) microarray analysis of mRNA and (E) CAI scores, and (F) have lower molecular weights on average. The _x_-axis in (D) represents the average of the log-scale mRNA expression level of each gene for all the arrays using the Robust Multi-Array Average normalized data obtained from the M3D database [103]. NS, not statistically significant; p, _p_-value; t, _t_-test. (G) Orthologs of orphans are also less prevalent in sequenced genomes than those of annotated genes. (H) However, examination of environmental metagenomic libraries indicates that the orphans are not necessarily exclusive to the Escherichia lineage. Anammox, anaerobic ammonium oxidation bacteria; AMO, methane oxidizing Archaea.

Figure 2

Figure 2. Generation and Integration of Physical and Functional Networks and Orphan Function Prediction

(A) Construction of a PI network based on protein copurification and detection by mass spectrometry. The confidence scoring of the LCMS and MALDI networks was conducted using a logistic regression with datasets consisting of PI from low-throughput studies curated in DIP, BIND, and IntAct (gold positives) and proteins in different subcellular localizations (gold negatives). The two networks were integrated using a probabilistic model [61] (Protocol S6). The resulting PI network, with edge weights corresponding to likelihood ratios, was clustered using MCL to delimit “multiprotein complexes.” (B) Integration of four GC methods into a single functional interaction network using the same probabilistic model [61] and resulting scores (edge weights) were input to MCL to delimit “functional modules.” (C) Orphan function prediction was conducted using a “guilt-by-association” procedure. After integration of PI and GC interactions into a single probabilistic network [61], a machine learning algorithm (StepPLR) newly developed for this study was used to assign functions based on the binary associations of orphans with annotated proteins, the respective interaction edge weights, and the overall network topology. Correlations between vectors of these function predictions (orphans), and the annotations were then used as input to delimit “functional neighborhoods” by clustering using MCL.

Figure 3

Figure 3. High-Confidence Physical Interactions and Putative Multiprotein Complexes

(A) Benchmarking of the experimentally derived PI network in E. coli against positive and negative gold standards by receiver operating characteristic (ROC)-curve analysis; cumulative area-under-the-curve (AUC) is shown as an overall performance measure. (B) Overlap of PI identified in this study with previous proteomic reports [1,4] and low-throughput PI obtained from DIP, BIND, and IntAct. (C and D) Putatively interacting proteins have highly correlated gene expression patterns (C) and similar phylogenetic profiles (D) based on mutual information as for low-throughput curated PI and in contrast to control protein pairs derived from different subcellular compartments. (E) Graphical schematic of putative stable, soluble multiprotein complexes, drawn using the GenePRO Cytoscape plugin [104] (see Table S7 for listing). Each node represents a complex, whose size reflects the number of contained proteins; edge widths reflect the number of interactions between subunits of different complexes. (F) Multiprotein complexes implicated in the bacterial translation apparatus; orphan and annotated genes mentioned in the main text are highlighted in bold. (G) Reduced rate of total protein synthesis in a strain lacking ybcJ relative to wild-type cells (WT). (H) Perturbed ribosome profiles in an yfgB deletion strain. (I) Elevated rates of frame-shifting and stop-codon readthrough in yfgB and ybcJ deletion strains relative to wild-type (WT). β-gal activity is only produced after the corresponding translational defect has occurred; error bars indicate standard deviation.

Figure 4

Figure 4. High-Confidence Genomic Context Associations and Putative Functional Modules

(A) Benchmarking of unified GC interactions in E. coli against positive and negative gold standards by ROC-curve analysis. (B) Overlap of high-confidence functional interactions predicted in this study with two other public GC databases. (C) Even after eliminating adjacent gene pairs to control for known and predicted E. coli operons, functionally linked genes have highly correlated patterns of mRNA expression comparable to components of the same curated EcoCyc pathways rather than different pathways. (D) Functionally linked genes are enriched for annotations to the same COG functional categories. (E) Graphical representation of putative E. coli functional modules (see Table S9 for listing); node size and colors are proportional to the number and fraction of orphan and annotated subunits, respectively, while lines represent interactions connecting modules. (F) Putative fimbriae-related module. (G) Defective motility of mutant strains deleted for orphans linked to fimbriae (from [F]); single dashes (-) indicate moderately impaired motility, while double dashes (–) represent strong repression. Other mutants displaying a normal phenotype comparable to the wild-type strain BW25113 (WT) are not shown. (H) Defective biofilm formation by mutants deleted for fimbriae-related orphans (from [F]); significant differences (_t_-test) in cell adhesion (absorbance) between mutant and WT strains are denoted by asterisks (single asterisks [*], p < 0.01; double asterisks [**], p < 0.0001). Error bars indicate standard deviation of the mean. CFA, colonization factor antigen medium; LB, Luria Bertani medium. (I) Metabolic modules mentioned in main text. (J) Mutants auxotrophic for aromatic amino acids show defective growth on minimal media containing shikimic acid. A prototroph—_aroL_Δ mutant strain (P)—is shown for comparison.

Figure 5

Figure 5. The Functional Neighborhoods of E. coli

(A) A “clustergram” displaying existing annotations (orange) and the orphan predicted functions (this study; blue) for all the protein-coding genes of E. coli (_y_-axis) and their associated biological processes (_x_-axis) (complete terms are provided in Figure S5 and Table S16). Proteins were clustered using MCL based on the paired similarity of the functional annotations and predictions in this matrix to delimit “functional neighborhoods” (see Table S17 for listing). (B) Putative functional neighborhood showing high-confidence integrated functional interactions (combined PI and GC networks) of select orphans with the protein synthesis machinery. For clarity, individual names of ribosomal proteins and tRNA synthetases are not shown. (C) Heat map showing the differential sensitivity of orphan deletion strains to antibiotics targeting protein synthesis relative to the colony size in the absence of drug (see Protocol S11 for details). Mutants deleted for annotated proteins from this neighborhood are shown as positive controls, whereas deletion mutants lacking genes not contained within this neighborhood are shown as negative controls. (D) Neighborhood with three orphans putatively involved in flagellum assembly and motility. (E) Deletions of the corresponding components reduce swarming capability; single dash (-), moderately impaired motility; double dash (- -), strong repression. (F) Subnetwork of orphans associated with DNA enzymes. (G) Deletion of the orphan yhcG results in synthetic lethality when combined with hypomorphic alleles (as indicated by an asterisk [*]) of three essential DNA replication factors (parE, dnaN, and dnaB).

Figure 6

Figure 6. Functional Neighborhoods Involved in Cell Envelope Biogenesis

(A) Functional neighborhood showing high-confidence integrated functional interactions (combined PI and GC networks) between components of the complete peptidoglycan biogenesis pathway and putatively functionally related orphans. (B) Serial-dilution assay showing perturbed growth of E. coli strains deleted for putative cell wall–related components (from [A]) in the presence of antibiotic inhibitors of cell wall assembly; an asterisk (*) indicates a hypomorphic allele of the essential gene murA. (C) Functional neighborhood of metabolic pathways involved in biogenesis of the bacterial cell envelope. Vertical groupings correspond to known pathways, whereas orphan components are tentatively positioned according to their interaction patterns; brown lines indicate aggravating genetic interactions recorded between the orphan yfbJ and yadE and annotated pathway components. (D) Conjugation-based double-mutant growth assays (only test results for yfbJ are shown). (E) Heat map showing the differential sensitivity of strains deleted for the orphans shown in (C) to antibiotics targeting the cell envelope. Mutants deleted of relevant annotated proteins are shown as positive controls, while mutants lacking genes not contained within this neighborhood are shown as negative controls.

Figure 7

Figure 7. Evolutionary Conservation of Orphan Protein Function

(A) Evolutionary conservation of orphan and annotated E. coli protein interaction partners in the integrated PI-GC network based on the co-occurrence of putative orthologs across fully sequenced prokaryotic genomes. (B) Phylogenetic distribution of the components of 97 functional neighborhoods with at least one orphan; the proportion of genomes showing conservation is indicated. (C) Atypical neighborhood illustrating broader conservation of orphans than annotated components; node shading reflects average phylogenetic conservation. (D) Representative neighborhood involved in drug efflux exhibiting similar phylogenetic distributions among its orphan and annotated components; analogous neighborhoods shown in preceding figures are indicated below [B]. (E) Serial-dilution assay showing the drug hypersensitivity of deletion mutants of the orphans listed in [D]. (F) DNA-replication neighborhood exhibiting a tendency of annotated components to be more widely distributed than the orphans. (G) Serial-dilution assay showing the perturbed growth of strains deleted for the two orphan components shown in [F] in the presence of an antibiotic inhibitor of DNA-replication; an asterisk (*) indicates hypomorphic alleles. Mutants deleted of relevant annotated proteins are shown as positive controls, while mutants lacking genes not contained within this neighborhood are shown as negative controls. LB, Luria Bertani medium.

Similar articles

Cited by

References

    1. Arifuzzaman M, Maeda M, Itoh A, Nishikata K, Takita C, et al. Large-scale identification of protein-protein interaction of Escherichia coli K-12. Genome Res. 2006;16:686–691. - PMC - PubMed
    1. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006;2:2006.0008. - PMC - PubMed
    1. Barrett CL, Herring CD, Reed JL, Palsson BO. The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states. Proc Natl Acad Sci U S A. 2005;102:19103–19108. - PMC - PubMed
    1. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 2005;433:531–537. - PubMed
    1. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5:e8. doi: <10.1371/journal.pbio.0050008>. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources