The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored (original) (raw)

Journal Article

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

Search for other works by this author on:

,

1 Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, 2 Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, 3 Biotechnology Center, Technical University Dresden, 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 5 Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6 Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and 7 Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany

*To whom correspondence should be addressed. Tel: +49 6221 387 8526; Fax: +49 6221 387 8517; Email: bork@embl.de

Search for other works by this author on:

... Show more

The authors wish it to be known that, in their opinion, the first three authors should also be regarded as joint First Authors.

Author Notes

Received:

07 September 2010

Accepted:

03 October 2010

Published:

02 November 2010

Cite

Damian Szklarczyk, Andrea Franceschini, Michael Kuhn, Milan Simonovic, Alexander Roth, Pablo Minguez, Tobias Doerks, Manuel Stark, Jean Muller, Peer Bork, Lars J. Jensen, Christian von Mering, The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored, Nucleic Acids Research, Volume 39, Issue suppl_1, 1 January 2011, Pages D561–D568, https://doi.org/10.1093/nar/gkq973
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein–protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org .

INTRODUCTION

Proteins can form a variety of functional connections with each other, including stable complexes, metabolic pathways and a bewildering array of direct and indirect regulatory interactions. These connections can be conceptualized as networks and the size and complex organization of these networks present a unique opportunity to view a given genome as something more than just a static collection of distinct genetic functions. Indeed, the ‘network view’ on a genome is increasingly being taken in many areas of applied biology: protein networks are used to increase the statistical power in human genetics ( 1 , 2 ), to aid in drug discovery ( 3 , 4 ), to close gaps in metabolic enzyme knowledge ( 5 ,6) and to predict phenotypes and gene functions ( 7 , 8 ), to name just a few examples.

While clearly very useful, the annotation and storage of protein–protein associations in databases is less straightforward than for other types of data (such as genomic sequence data or taxonomy information). This is because functional interactions between proteins can span a wide spectrum of mechanisms and specificities, often have high error rates and may depend on biological context (such as environmental condition or tissue type). Consequently, considerable information is needed to describe the various aspects of a given protein–protein association and a number of standards have been developed for this purpose with distinct levels of expressivity and specialization ( 9–13 ). Likewise, the actual annotations and interaction records themselves are scattered over a number of public resources. Experimental data on physical protein–protein interactions are mostly stored in a group of dedicated databases that together form the International Molecular Exchange (IMEx) consortium ( 14–21 ). Annotated pathway knowledge is mostly kept in a separate set of resources ( 22–24 ) and yet other interactions can be found in various organism-specific databases ( 25 , 26 ) or text-mining resources ( 27 , 28 ). Furthermore, a number of algorithms have been devised that allow de novo prediction of functional links between proteins ( 29–32 ), albeit usually with considerable rates of false positives and without providing hints on the specificity and type of a predicted interaction.

Given all these distinct types and sources of protein–protein association information, it is highly desirable for users to have an integration and re-appraisal that can be easily searched and browsed, at one single site. The Search Tool for the Retrieval of Interacting Genes (STRING) database resource aims to provide this service, by acting as a ‘one-stop shop’ for all information on functional links between proteins. It is by no means the only such site: related resources that are currently being actively maintained include VisANT ( 33 ), GeneMANIA ( 34 ), N-Browse ( 35 ), I 2 D ( 36 ), APID ( 37 ), bioPIXIE ( 38 ) and ConsensusPathDB ( 39 ). Each of these sites has unique features and distinct strengths and users should carefully compare them for any specific task at hand. The main strengths of STRING lie in its unique comprehensiveness, its confidence scoring and its interactive and intuitive user interface. STRING is the only site to cover hundreds (and soon more than 1100) organisms—ranging from Bacteria and Archaea to humans. This large number of organisms, represented by their fully sequenced genomes, also enables STRING to periodically execute interaction prediction algorithms that depend on exhaustive genome sequence information. The resource also transfers interaction information between organisms where applicable, thereby significantly increasing coverage particularly for poorly studied organisms. The confidence scoring is another key feature of STRING, giving guidance to users who want to balance different levels of coverage and accuracy. Lastly, the unique and compact user interface enables fast and ad hoc use of the resource, with a quick learning curve and no need for setup or installation.

Here, we briefly describe the content and procedures currently used in STRING and describe new features that have been added since our last update on the resource ( 40 ).

User experience and content

Users enter STRING via its web portal ( http://string-db.org ) and identify one or more proteins of interest. Various types of identifiers are recognized by the system and a full-text search on gene annotations is conducted in parallel to aid in the identification. Using the search results, STRING will either recognize automatically or ask the user to disambiguate, the organism of interest. The user is then presented with the input protein(s) in the context of a graphical network of interaction partners ( Figure 1 ). From this network, pop-up windows lead to detailed information on each node (or edge) in the network, providing accessory information on a protein or on the evidence behind a proposed connection. The network display can be modified by adding or removing proteins, changing the required confidence level and by selecting or de-selecting certain evidence types (for example, users might choose to filter out the results of computational predictions).

 Protein network visualization on the STRING website. The figure shows a composite of two screenshots, illustrating a typical user interaction with STRING (focused on a specific protein network in Saccharomyces cerevisiae ). Upon querying the database with four yeast proteins, the resource first reports a raw network consisting of the highest scoring interaction partners (upper left corner). This network can then be rearranged and clustered directly in the browser window revealing tightly connected functional modules (arrow). For each interaction (or protein), additional information is accessible via dedicated pop-up windows; the bottom part of the figure shows an exemplary pop-up with the information regarding a specific yeast protein.

Figure 1.

Protein network visualization on the STRING website. The figure shows a composite of two screenshots, illustrating a typical user interaction with STRING (focused on a specific protein network in Saccharomyces cerevisiae ). Upon querying the database with four yeast proteins, the resource first reports a raw network consisting of the highest scoring interaction partners (upper left corner). This network can then be rearranged and clustered directly in the browser window revealing tightly connected functional modules (arrow). For each interaction (or protein), additional information is accessible via dedicated pop-up windows; the bottom part of the figure shows an exemplary pop-up with the information regarding a specific yeast protein.

The interactive network viewer in STRING has been re-designed extensively. It is now based on Adobe’s Flash Player (version 10 or better is recommended) and allows users to freely reposition nodes in the network. Optionally, this can be done while running a spring-embedded layout algorithm in real time. Upon switching to the ‘advanced’ mode of the viewer, users can also apply clustering algorithms to the network ( 41–43 ), which is then visually partitioned accordingly, in real time. All of this can be done in the context of a user-supplied background illustration; publication-ready, high-resolution image files can then be exported. Search results can also be saved in a number of abstract file formats for later use elsewhere, including the proteomics standards initiative-molecular interaction format (PSI-MI) molecular interaction standard ( 9 ). The protein information pop-up window ( Figure 1 , bottom) has also been re-designed using the Flash framework and now shows all available 3D structure information for a protein in the context of its domain architecture, which can be browsed interactively along the protein from N- to C-terminus. Apart from PDB entries, the structure information now also includes pre-computed homology models, made available via a collaboration with the SwissModel repository ( 44 ).

The current extent of protein–protein association information in STRING is summarized in Figure 2 . The majority of associations actually derive from predictions—either from prediction algorithms that are based on analyzing genomic information (‘genomic context’-methods) or from transferring associations/interactions between organisms (‘interolog’-transfer). Importantly, all associations in STRING are provided with a probabilistic confidence score, which is derived by separately benchmarking groups of associations against the manually curated functional classification scheme of the KEGG database ( 22 ). Each score represents a rough estimate of how likely a given association describes a functional linkage between two proteins that is at least as specific as that between an average pair of proteins annotated on the same ‘map’ or ‘pathway’ in KEGG. The various major sources of interaction/association data in STRING are benchmarked independently; a combined score is computed which indicates higher confidence when more than one type of information supports a given association. All scores and association data in STRING are pre-computed and are also available for wholesale download (free for non-profit institutions). Fully sequenced genomes in STRING are imported from RefSeq ( 45 ) and Ensembl ( 46 ), as well as from a number of dedicated sites, and are hand-screened for completeness and non-redundancy. For this large space of complete genomes, STRING also stores the results of exhaustive cross-genome homology searches, in order to be able to transfer interactions among organisms. As of version 9.0, this extensive body of protein–protein similarity data is imported from and cross-linked with the Similarity Matrix of Proteins (SIMAP) project ( 47 ).

Association counts and data sources. The table shows the number of pair-wise protein–protein associations processed for STRING (version 8.3), listed separately for three important model organisms as well as for the database as a whole. The associations are counted non-directionally, i.e. protein pairs A–B and B–A are counted only once. Identical associations reported by different sources are counted separately under each source, unless they can be traced to the very same publication record and have been imported from primary interaction databases (in case several such databases agree on an interaction, it is arbitrarily counted for only one of them).

Figure 2.

Association counts and data sources. The table shows the number of pair-wise protein–protein associations processed for STRING (version 8.3), listed separately for three important model organisms as well as for the database as a whole. The associations are counted non-directionally, i.e. protein pairs A–B and B–A are counted only once. Identical associations reported by different sources are counted separately under each source, unless they can be traced to the very same publication record and have been imported from primary interaction databases (in case several such databases agree on an interaction, it is arbitrarily counted for only one of them).

It should be stressed that interactions in STRING are not limited to direct, physical interactions between two proteins. Instead, proteins may also be linked because, for example, they exhibit a genetic interaction or are known to catalyze subsequent steps in a metabolic pathway. Most associations, especially when derived from one of the prediction algorithms, currently can neither be specified with much precision in terms of their mode of interaction, nor in terms of the cellular conditions under which they occur (e.g. development time points, environmental conditions, specific cell types, etc.). Because of this, the fundamental unit stored in STRING is the ‘functional association’, i.e. the specific and biologically meaningful functional connection between two proteins. Within this definition, STRING aims to uncover the entire space of ‘possible’ interactions for any fully sequenced organism; it is likely that only a subset of these interactions will be realized in any given cell. The number of interactions stored in STRING has grown considerably over the years and is projected to grow further as more information becomes available. Previous versions of the resource are kept accessible online, such that studies that refer to a given version of STRING can later be reproduced.

Integration with other resources

One central aim of the STRING project is to achieve and maintain cross-connectivity and integration with other public resources in a user-friendly manner. Apart from making the entire SQL database back-end available for download (free for non-profit institutions), this is mainly achieved via the following routes:

First, the database maintains mutual HTML cross-references with a number of widely used websites, including UniProt ( 48 ), SMART ( 49 ), GeneCards ( 50 ) and SwissModelRepository ( 44 ). Notably, such cross references do not have to be limited to simple text-based HTML links. Instead, partner websites can embed minimized icon-previews of STRING networks within their own web pages, using the capabilities of STRINGs API interface (as described in the last update) ( 40 ). The SMART and SwissModelRepository sites already use this option, requesting the network preview images—when needed, at run time—based on pre-determined name-space mappings. Such embedded previews do not have to be limited to static images; external sites can also provide pop-up windows for any protein of interest, the content of the pop-up is then provided by STRING [variants of this mechanism are currently used by the resources Reflect ( 51 ) and ViralZone ( http://expasy.org/viralzone )]. As another new feature of the user interface, permanent URLs can now be retrieved for almost all pages served by STRING—this facilitates cross-linking and archiving and also indexing by search engines and meta-sites.

Second, partner websites can choose to embed the entire STRING website into their own pages ( 52 , 53 ), for example, using HTML inline frames (iframes). A notable example for this is the BioGPS Community Gene Portal System ( 53 ); this site provides ‘plugins’ through which users can connect any number of external websites into freely configurable screen layouts. A STRING plugin has been established at BioGPS; it is currently among the most frequently used plugins there.

Third, users can choose to work with STRING networks from inside the Cytoscape software. Cytoscape is a widely used open-source software framework for network visualization and manipulation ( 54 , 55 ); it can be very flexibly extended, with a rapidly growing number of network-centered manipulation and analysis tools. There are several options for loading STRING data into Cytoscape: users can save a given network from the STRING site to a local file, which can then be opened by Cytoscape (preferably using the PSI-MI format). Users can also query STRING directly from within Cytoscape; this is made possible via a dedicated plugin ‘StringWSClient’ that exposes much of the STRING query interface, including organism disambiguation. Lastly, the perhaps most important way to query STRING from within Cytoscape is via the ‘PSICQUIC’ query interface (‘PSICQUIC Web Service Universal Client’ in Cytoscape). PSICQUIC is a newly developed standard that allows interaction queries across a growing number of compliant database resources ( 56 ); STRING has implemented this standard as of version 8.3 and can thus now be queried directly alongside a number of other resources ( Figure 3 ).

 Accessing STRING data from within Cytoscape. Two proteins from Escherichia coli were used as queries for the ‘PSICQUIC Web Service Universal Client’ import-plugin of Cytoscape. Multiple databases have reported hits for these queries (upper left panel); in this case STRING has reported the largest number of hits. The resulting four networks are largely non-overlapping, both in terms of name-spaces as well as in terms of the actual interactors reported. The imported STRING network (right) is shown in detail; it can be used as the basis of further refinement, post-processing and analysis in Cytoscape.

Figure 3.

Accessing STRING data from within Cytoscape. Two proteins from Escherichia coli were used as queries for the ‘PSICQUIC Web Service Universal Client’ import-plugin of Cytoscape. Multiple databases have reported hits for these queries (upper left panel); in this case STRING has reported the largest number of hits. The resulting four networks are largely non-overlapping, both in terms of name-spaces as well as in terms of the actual interactors reported. The imported STRING network (right) is shown in detail; it can be used as the basis of further refinement, post-processing and analysis in Cytoscape.

Lastly, a new call-back interface allows STRING to be ‘branded’ by third-party resources, who may wish to project their own information onto the STRING name space and thereby onto the STRING network data ( Figure 4 ). This allows such resources to take advantage of the extensive user-interface features of STRING, as well as tapping into the existing user base, with very little additional coding effort of their own. This mechanism requires no specific setup on the STRING side—instead, our resource is simply instructed to query the third-party site at runtime, for any additional information that is to be displayed alongside the STRING network. Data updates at the STRING site are usually accommodated automatically, since the name space itself is changed only at the major release updates.

Projecting third-party data onto the STRING web-surface. STRING provides a consistent name space that encompasses genes, genomes, protein and interaction networks, all of which can be easily searched and browsed. These features can now be employed by external web-resources, via a simple call-back mechanism. External resources can provide cross-links to STRING, together with a call-back address capable of serving a simple text-based interface protocol. At run-time, STRING will then automatically call the external site and project arbitrary ‘payload’ information onto the protein network that is being browsed. The figure shows a fictitious example scenario, served from an in-house test server. As of version 9.0, STRING will also be able to accept protein–protein connections as payload, showing them in a dedicated ‘evidence channel’ distinct from the seven built-in channels. Implementation details are available in the online documentation.

Figure 4.

Projecting third-party data onto the STRING web-surface. STRING provides a consistent name space that encompasses genes, genomes, protein and interaction networks, all of which can be easily searched and browsed. These features can now be employed by external web-resources, via a simple call-back mechanism. External resources can provide cross-links to STRING, together with a call-back address capable of serving a simple text-based interface protocol. At run-time, STRING will then automatically call the external site and project arbitrary ‘payload’ information onto the protein network that is being browsed. The figure shows a fictitious example scenario, served from an in-house test server. As of version 9.0, STRING will also be able to accept protein–protein connections as payload, showing them in a dedicated ‘evidence channel’ distinct from the seven built-in channels. Implementation details are available in the online documentation.

Published use cases

STRING has been used in projects of various scales—both in large, organism-wide studies but also in focused projects that are restricted to a few proteins or to a single pathway only. Studies of the latter type often make use of STRING as a discovery tool, taking advantage of the pre-computed and confidence-scored association predictions that it provides. Examples include the discoveries of a missing enzyme in Bacillothiol biosynthesis in Bacilli ( 57 ), of a previously unknown chaperone subunit in Cytochrome C oxidase assembly ( 58 ) or of a missing enzyme in uric acid degradation in mammals ( 59 ).

Another way to use STRING is to download and extend its relational database schema; this can, for example, be useful for projects dedicated to additional types of information (e.g. small molecule interactors in the case of our partner project STITCH) ( 60 ) or for projects wishing to rely on a single source of completely sequenced genomes with associated homology data (e.g. in the case of the gene orthology resource eggNOG) ( 61 ). Users not wishing to download and install the entire database schema have the alternative to download compact flat-files; these contain only the actual interaction information or information regarding the interacting proteins themselves (sequences, identifiers, etc.).

A unique strength of STRING lies in its comprehensiveness, albeit at the expense of considerable false-positive rates. Because of this, organism-wide studies represent perhaps the most interesting use cases and they are probably best done when they involve integration of orthogonal data types (since this may allow the noise in both data sets to cancel out). Examples include the filtering and extension of results from large-scale genetic screens ( 62 , 63 ) or the annotation of large groups of proteins having a specific post-translational modification ( 64 ). Another intriguing application scenario is to use STRING for search-space reduction in epistasis screens. This is done under the assumption that gene loci showing genetic epistasis should also often show up as functionally linked in STRING. Indeed, this approach has been demonstrated to work on human association mapping data, providing the statistical power to link up loci that show a non-additive effect when mutated together ( 1 , 2 ). Approaches such as this are expected to gain further power, as the information in STRING becomes even more comprehensive and precise in future updates.

FUNDING

STRING is funded by the Swiss Institute of Bioinformatics, by the Novo Nordisk Foundation Center for Protein Research and by the European Molecular Biology Laboratory (EMBL). Funding for open access charges: University of Zurich, through its Research Priority program ‘Systems Biology and Functional Genomics’.

Conflict of interest statement . None declared.

ACKNOWLEDGEMENTS

The authors wish to thank the PSICQUIC consortium for early access to their standardization effort, and Dr Gary Bader for technical help with the Cytoscape plugin.

REFERENCES

1

, .

Role for protein-protein interaction databases in human genetics

,

Expert Rev. Proteomics

,

2009

, vol.

6

(pg.

647

-

659

)

2

, , , , .

Using biological networks to search for interacting loci in genome-wide association studies

,

Eur. J. Hum. Genet.

,

2009

, vol.

17

(pg.

1231

-

1240

)

3

, , , .

Unveiling the role of network and systems biology in drug discovery

,

Trends Pharmacol. Sci.

,

2010

, vol.

31

(pg.

115

-

123

)

4

, , .

Biochemical network-based drug-target prediction

,

Curr. Opin. Biotechnol.

,

2010

, vol.

21

(pg.

511

-

516

)

5

, , .

Network-based function prediction and interactomics: The case for metabolic enzymes

,

Metab. Eng

,

2010

6

, .

Systematizing the generation of missing metabolic knowledge

,

Biotechnol. Bioeng.

,

2010

, vol.

107

(pg.

403

-

412

)

7

, .

It's the machine that matters: Predicting gene function and phenotype from protein networks

,

J. Proteomics

,

2010

, vol.

73

(pg.

2277

-

2289

)

8

, , , , , , , , , , et al.

A human phenome-interactome network of protein complexes implicated in genetic disorders

,

Nat. Biotechnol.

,

2007

, vol.

25

(pg.

309

-

316

)

9

, , , , , , , , , , et al.

The HUPO PSI's molecular interaction format–a community standard for the representation of protein interaction data

,

Nat. Biotechnol.

,

2004

, vol.

22

(pg.

177

-

183

)

10

.

PAX of mind for pathway researchers

,

Drug Discov. Today

,

2005

, vol.

10

(pg.

937

-

942

)

11

, , , , , , , , , , et al.

The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models

,

Bioinformatics

,

2003

, vol.

19

(pg.

524

-

531

)

12

, , , , , , , , , , et al.

The Systems Biology Graphical Notation

,

Nat. Biotechnol.

,

2009

, vol.

27

(pg.

735

-

741

)

13

, , .

CellML: its future, present and past

,

Prog. Biophys. Mol. Biol.

,

2004

, vol.

85

(pg.

433

-

450

)

14

, , , , , , , .

Submit your interaction data the IMEx way: a step by step guide to trouble-free deposition

,

Proteomics

,

2007

, vol.

7

Suppl 1

(pg.

28

-

34

)

15

, , , , , .

The Database of Interacting Proteins: 2004 update

,

Nucleic Acids Res.

,

2004

, vol.

32

(pg.

D449

-

D451

)

16

, , , , , , , , , , et al.

The IntAct molecular interaction database in 2010

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D525

-

D531

)

17

, , , , , , , .

MINT, the molecular interaction database: 2009 update

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D532

-

D539

)

18

, , , , , , .

MPact: the MIPS protein interaction resource on yeast

,

Nucleic Acids Res.

,

2006

, vol.

34

(pg.

D436

-

D441

)

19

, , , .

MatrixDB, a database focused on extracellular protein-protein and protein-carbohydrate interactions

,

Bioinformatics

,

2009

, vol.

25

(pg.

690

-

691

)

20

, , , , , .

MPIDB: the microbial protein interaction database

,

Bioinformatics

,

2008

, vol.

24

(pg.

1743

-

1744

)

21

, , , , , , , , , , et al.

The BioGRID Interaction Database: 2008 update

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

D637

-

D640

)

22

, , , , .

KEGG for representation and analysis of molecular networks involving diseases and drugs

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D355

-

D360

)

23

, , , , , , , , , , et al.

Reactome knowledgebase of human biological pathways and processes

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D619

-

D622

)

24

, , , , , , , , , .

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

,

Nucleic Acids Res.

,

2005

, vol.

33

(pg.

6083

-

6089

)

25

, , , , , , , , , , et al.

Human Protein Reference Database–2009 update

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D767

-

D772

)

26

, , , , , , , , , , et al.

FlyBase: enhancing Drosophila Gene Ontology annotations

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D555

-

D559

)

27

.

Biomedical text mining and its applications

,

PLoS Comput. Biol.

,

2009

, vol.

5

pg.

e1000597

28

, .

A gene network for navigating the literature

,

Nat. Genet.

,

2004

, vol.

36

pg.

664

29

, , .

Predicting protein-protein interactions in the context of protein evolution

,

Mol. Biosyst.

,

2010

, vol.

6

(pg.

55

-

64

)

30

, , , .

Computational prediction of protein-protein interactions

,

Mol. Biotechnol.

,

2008

, vol.

38

(pg.

1

-

17

)

31

, , , .

Function prediction and protein networks

,

Curr. Opin. Cell Biol.

,

2003

, vol.

15

(pg.

191

-

198

)

32

, .

Computational methods for the prediction of protein interactions

,

Curr. Opin. Struct. Biol.

,

2002

, vol.

12

(pg.

368

-

373

)

33

, , , , , , .

VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

W115

-

W121

)

34

, , , , , , , , , , et al.

The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function

,

Nucleic Acids Res.

,

2010

, vol.

38

Suppl

(pg.

W214

-

W220

)

35

, .

Browsing multidimensional molecular networks with the generic network browser (N-Browse)

,

Curr. Protoc. Bioinformatics

,

2008

Chapter 9 , Unit 9 11

36

, , .

Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D

,

Bioinformatics

,

2010

, vol.

26

(pg.

111

-

119

)

37

, .

APID: Agile Protein Interaction DataAnalyzer

,

Nucleic Acids Res.

,

2006

, vol.

34

(pg.

W298

-

W302

)

38

, , .

Discovering biological networks from diverse functional genomic data

,

Methods Mol. Biol.

,

2009

, vol.

563

(pg.

157

-

175

)

39

, , , .

ConsensusPathDB–a database for integrating human functional interaction networks

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D623

-

D628

)

40

, , , , , , , , , , et al.

STRING 8–a global view on proteins and their functional interactions in 630 organisms

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D412

-

D416

)

41

, , , , .

jClust: a clustering and visualization toolbox

,

Bioinformatics

,

2009

, vol.

25

(pg.

1994

-

1996

)

42

, , , .

Open source clustering software

,

Bioinformatics

,

2004

, vol.

20

(pg.

1453

-

1454

)

43

, , .

An efficient algorithm for large-scale detection of protein families

,

Nucleic Acids Res.

,

2002

, vol.

30

(pg.

1575

-

1584

)

44

, , , , .

The SWISS-MODEL Repository and associated resources

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D387

-

D392

)

45

, , , .

NCBI Reference Sequences: current status, policy and new initiatives

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D32

-

D36

)

46

, , , , , , , , , , et al.

Ensembl 2009

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D690

-

D697

)

47

, , , , , , , .

SIMAP–a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D223

-

D226

)

48

, , , , , , , , , , et al.

The Universal Protein Resource (UniProt) in 2010

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D142

-

D148

)

49

, , .

SMART 6: recent updates and new developments

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D229

-

D232

)

50

, , , , , , , , , , et al.

GeneCards Version 3: the human gene integrator

,

Database

,

2010

, vol.

2010

pg.

baq020

51

, , , , , , .

Reflect: augmented browsing for the life scientist

,

Nat. Biotechnol.

,

2009

, vol.

27

(pg.

508

-

510

)

52

, , .

Bioinformatic “Harvester”: a search engine for genome-wide human, mouse, and rat protein resources

,

Methods Enzymol.

,

2005

, vol.

404

(pg.

19

-

26

)

53

, , , , , , , , , , et al.

BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources

,

Genome Biol.

,

2009

, vol.

10

pg.

R130

54

, , , , , , , , .

Cytoscape: a software environment for integrated models of biomolecular interaction networks

,

Genome Res.

,

2003

, vol.

13

(pg.

2498

-

2504

)

55

, , , , , , , , , , et al.

Integration of biological networks and gene expression data using Cytoscape

,

Nat. Protoc.

,

2007

, vol.

2

(pg.

2366

-

2382

)

56

, , , , , .

implementing data standards: a report on the HUPOPSI workshop September 2009, Toronto, Canada

,

Proteomics

,

2010

, vol.

10

(pg.

1895

-

1898

)

57

, , , , , , , , .

Biosynthesis and functions of bacillithiol, a major low-molecular-weight thiol in Bacilli

,

Proc. Natl Acad. Sci. USA

,

2010

, vol.

107

(pg.

6482

-

6486

)

58

, , , , , , .

A copper(I) protein possibly involved in the assembly of CuA center of bacterial cytochrome c oxidase

,

Proc. Natl Acad. Sci. USA

,

2005

, vol.

102

(pg.

3994

-

3999

)

59

, , , , .

Completing the uric acid degradation pathway through phylogenetic comparison of whole genomes

,

Nat. Chem. Biol.

,

2006

, vol.

2

(pg.

144

-

148

)

60

, , , , , , , .

STITCH 2: an interaction network database for small molecules and proteins

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D552

-

D556

)

61

, , , , , , , , , , et al.

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D190

-

D195

)

62

, , .

A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila

,

BMC Genomics

,

2009

, vol.

10

pg.

220

63

, , , , , , , , .

Genome-wide analysis of Notch signalling in Drosophila by transgenic RNAi

,

Nature

,

2009

, vol.

458

(pg.

987

-

992

)

64

, , , , , , , .

Lysine acetylation targets protein complexes and co-regulates major cellular functions

,

Science

,

2009

, vol.

325

(pg.

834

-

840

)

Author notes

The authors wish it to be known that, in their opinion, the first three authors should also be regarded as joint First Authors.

© The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 11,357

8,302 Pageviews

3,055 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 3
December 2016 8
January 2017 23
February 2017 61
March 2017 85
April 2017 54
May 2017 60
June 2017 50
July 2017 71
August 2017 60
September 2017 51
October 2017 54
November 2017 58
December 2017 158
January 2018 196
February 2018 117
March 2018 139
April 2018 147
May 2018 132
June 2018 122
July 2018 122
August 2018 119
September 2018 88
October 2018 115
November 2018 161
December 2018 135
January 2019 92
February 2019 94
March 2019 169
April 2019 166
May 2019 155
June 2019 168
July 2019 192
August 2019 192
September 2019 137
October 2019 121
November 2019 80
December 2019 97
January 2020 78
February 2020 78
March 2020 124
April 2020 75
May 2020 71
June 2020 121
July 2020 90
August 2020 92
September 2020 115
October 2020 107
November 2020 124
December 2020 139
January 2021 101
February 2021 109
March 2021 146
April 2021 135
May 2021 131
June 2021 89
July 2021 101
August 2021 97
September 2021 89
October 2021 85
November 2021 117
December 2021 96
January 2022 101
February 2022 107
March 2022 172
April 2022 173
May 2022 168
June 2022 112
July 2022 112
August 2022 110
September 2022 116
October 2022 117
November 2022 147
December 2022 133
January 2023 132
February 2023 125
March 2023 149
April 2023 148
May 2023 195
June 2023 134
July 2023 107
August 2023 127
September 2023 129
October 2023 142
November 2023 147
December 2023 198
January 2024 175
February 2024 189
March 2024 197
April 2024 166
May 2024 173
June 2024 146
July 2024 139
August 2024 122
September 2024 152
October 2024 35

×

Email alerts

Citing articles via

More from Oxford Academic