STRING v10: protein-protein interaction networks, integrated over the tree of life - PubMed (original) (raw)

. 2015 Jan;43(Database issue):D447-52.

doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.

Andrea Franceschini 1, Stefan Wyder 1, Kristoffer Forslund 2, Davide Heller 1, Jaime Huerta-Cepas 2, Milan Simonovic 1, Alexander Roth 1, Alberto Santos 3, Kalliopi P Tsafou 3, Michael Kuhn 4, Peer Bork 5, Lars J Jensen 6, Christian von Mering 7

Affiliations

STRING v10: protein-protein interaction networks, integrated over the tree of life

Damian Szklarczyk et al. Nucleic Acids Res. 2015 Jan.

Abstract

The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.

© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The STRING network view. Combined screenshots from the STRING website, which has been queried with a subset of proteins belonging to two different protein complexes in yeast (the COP9 signalosome, as well as the proteasome). Colored lines between the proteins indicate the various types of interaction evidence. Protein nodes which are enlarged indicate the availability of 3D protein structure information. Inset top right: for each protein, accessory information is available which includes annotations, cross-links and domain structures. Inset bottom right: the same network is shown after the addition of a user-configurable ‘payload’-dataset (26). In this case, the payload corresponds to color-coded protein abundance information, and reveals systematic differences in the expression strength of both complexes.

Figure 2.

Figure 2.

Improved Co-expression analysis. STRING v10 features a completely re-designed pipeline for accessing and processing gene expression information. Left: overview of the individual steps; note that redundant expression experiments are now detected and pruned automatically. Right: improved benchmark performance of the resulting co-expression links, relative to the previous version of STRING, in four model organisms (ROC curves). The benchmark is based on the KEGG pathway maps; predicted interactions are considered to be true positives when both interacting proteins are annotated to the same KEGG map.

Figure 3.

Figure 3.

Access to STRING from R/Bioconductor. Left: example session describing how to initialize a human protein network from the STRING database backend, and how to map a set of gene names against it. A subset of the proteins is then plotted as a STRING network (right), complete with auxiliary numerical payload-information highlighting some nodes of interest (red color halos).

Similar articles

Cited by

References

    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Lee D., Redfern O., Orengo C. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 2007;8:995–1005. - PubMed
    1. Ouzounis C.A., Coulson R.M., Enright A.J., Kunin V., Pereira-Leal J.B. Classification schemes for protein structure and function. Nat. Rev. Genet. 2003;4:508–519. - PubMed
    1. Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 1994;22:3578–3580. - PMC - PubMed
    1. Kanehisa M., Goto S., Sato Y., Kawashima M., Furumichi M., Tanabe M. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014;42:D199–D205. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources