STRING 7—recent developments in the integration and prediction of protein interactions (original) (raw)
Journal Article
,
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
22
University of Zurich, Winterthurerstrasse 190
8057 Zurich, Switzerland
*To whom correspondence should be addressed. Tel: +41 44 6353147; Fax: +41 44 6356864; Email: mering@molbio.unizh.ch
Search for other works by this author on:
,
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
Search for other works by this author on:
,
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
Search for other works by this author on:
,
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
22
University of Zurich, Winterthurerstrasse 190
8057 Zurich, Switzerland
Search for other works by this author on:
,
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
Search for other works by this author on:
,
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
Search for other works by this author on:
,
33
Utrecht University, Padualaan 8
3584 CH Utrecht, The Netherlands
Search for other works by this author on:
11
European Molecular Biology Laboratory, Meyerhofstrasse 1
69117 Heidelberg, Germany
44
Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Str. 10
13092 Berlin, Germany
Search for other works by this author on:
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
Received:
15 September 2006
Revision received:
05 October 2006
Accepted:
05 October 2006
Published:
10 November 2006
Cite
Christian von Mering, Lars J. Jensen, Michael Kuhn, Samuel Chaffron, Tobias Doerks, Beate Krüger, Berend Snel, Peer Bork, STRING 7—recent developments in the integration and prediction of protein interactions, Nucleic Acids Research, Volume 35, Issue suppl_1, 1 January 2007, Pages D358–D362, https://doi.org/10.1093/nar/gkl825
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
Information on protein–protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files. As of release 7, STRING has almost doubled to 373 distinct organisms, and contains more than 1.5 million proteins for which associations have been pre-computed. Novel features include AJAX-based web-navigation, inclusion of additional resources such as BioGRID, and detailed protein domain annotation. STRING is available at Author Webpage
INTRODUCTION
A fully comprehensive view of all functionally relevant protein interactions is still not available for any species, not even for relatively simple, single-celled model organisms. However, this information is essential for a systems-level understanding of cellular behavior, and it is needed in order to place the molecular functions of individual proteins into their cellular context.
For detecting direct physical binding between proteins, numerous small-scale and high-throughput experiments have been undertaken, and most of their reported interactions are available from dedicated interaction databases (1–4), as well as from multipurpose databases centered on specific model organisms (5–7). However, the growth of interaction data is severely lagging behind the pace of genome sequencing, so that for most genomes and proteins known to date no interaction data is available. Furthermore, proteins do not only interact physically: indirect associations such as genetic interactions or shared pathway memberships are equally important for a complete understanding of cellular function, but are for the most part not stored in interaction databases. Instead, they are available from a variety of pathway databases (8,9) and from the scientific literature.
The database STRING (‘Search Tool for the Retrieval of Interacting Genes/Proteins’) aims to collect, predict and unify most types of protein–protein associations, including direct and indirect associations. In order to cover organisms not yet addressed experimentally, STRING runs a set of prediction algorithms (10), and transfers known interactions from model organisms to other species based on predicted orthology of the respective proteins (11). STRING has grown from a purely predictive resource covering mainly prokaryotes (12) to a comprehensive tool integrating protein association information from all domains of life (Figure 1). Each interaction in the database is annotated with a benchmarked numerical confidence score, which can be used to filter the interaction network at any desired stringency. All data in STRING are stored in relational database tables. The interaction information is freely available for download, but download of the entire database content requires a license agreement to prevent redistribution (free for academic users who only access the previous version number).
Figure 1
Protein interaction network in STRING. Screenshot from STRING showing a network of Saccharomyces cerevisiae proteins [the exosome complex, upper right, is seen weakly associated with proteins from nuclear transport, lower left, see also Ref. (26)]. The inset shows the context menu available for all STRING proteins—in the context menu, annotation and domain architecture are shown directly, and links to other databases and tools are available (22,23). In the network, links between proteins signify the various interaction data supporting the network, colored by evidence type (see STRING website for color legend).
KNOWN AND PREDICTED INTERACTIONS
Known interactions in STRING are primarily imported from existing excellent interaction databases (1–5,8,9), and are complemented by automated text mining of PubMed abstracts and several other bodies of scientific text [such as from Ref. (6)]. As is the case for all interactions in STRING, imported interactions are mapped onto a consistent set of proteins and identifiers, thereby facilitating comparison between datasets. STRING does not store specific details regarding splicing isoforms or post-translational modifications, but instead reduces protein isoforms to a single protein per locus (usually as defined by the longest known protein-coding transcript). This level of resolution enables efficient storage and is compatible with most prediction/transfer algorithms, which usually operate only at the level of the gene locus.
Known interactions are further complemented by de novo interaction predictions derived from several comparative genomics prediction algorithms that are mainly applicable to prokaryotes (13–19). These algorithms systematically compare genomes, searching for frequently observed gene neighborhoods, gene fusion events and similarities in gene occurrence across genomes. For each prediction algorithm, dedicated viewers of the genomic evidence are available in STRING.
Interaction evidence from model organisms is often useful for other organisms as well, especially when orthologs of interacting proteins can be clearly identified in the second organism. STRING systematically executes such orthology transfers, using both precomputed orthologs from the COG database (20), as well as a homology-based orthology scheme computed de novo (11). STRING can thus immediately predict a large number of interactions for any newly sequenced genome, as soon as it is included into the system. The combination of known, predicted and transferred interactions is unique, making STRING the most comprehensive interaction resource available to date, especially for organisms not addressed experimentally.
The homology data stored in STRING form the basis for the interaction transfers, and are the result of more than 7 × 1011 pairwise protein comparisons using the sensitive Smith–Waterman dynamic programming algorithm. This dataset is a very useful asset in itself [see also (21)], and can be accessed independently of the protein interaction networks by locally installing the STRING database files. Users of the website can also browse all of the homologs detected for any protein of interest, and can inspect alignments with very fast response times (Figure 2).
Figure 2
Precomputed homology relations and alignments. For most genomes contained in STRING, sensitive all-against-all homology searches using the Smith–Waterman algorithm are included. These form the basis for assigning orthologs and transferring interaction information, but are also available directly to the user. Because they are stored in a relational database, access to homologs and alignments for any protein of interest is possible without the usual waiting time.
NEW FEATURES AND IMPROVEMENTS IN STRING 7
The network viewer in STRING (Figure 1) is the central information source and navigation hub for the user. It has been extended through a context-sensitive menu-box, which displays associated information for any protein in the network. This menu includes a graphical summary of protein domains and features, and allows the user to link out to other external resources such as the motif discovery tool DILIMOT (22). STRING is now also tightly integrated with the SMART protein architecture research tool (23). With the latter it shares a common set of genomes and proteins, for which consistent results are pre-computed and stored. This enables automatic interlinking between both resources (SMART includes interaction previews, and STRING includes domain architecture previews). The topology and evolution of interaction networks can thus be studied both at the level of proteins as well as at the level of individual domains.
Since the last update (11), STRING has grown substantially both in terms of data sources and number of organisms covered. Five new databases are included [MINT, HPRD, BioGRID, DIP and Reactome (2–5,8)], as well as 194 new organisms. Especially due to this latter increase in completely sequenced organisms, the architecture of STRING had to be substantially upgraded so that it can accommodate present and future growth. With respect to the user interface, this required changes in the viewers for the genomic context data, which could no longer show all of the genomes simultaneously by default. Instead, STRING uses a phylogenetic tree of species to collapse redundant genomes; this tree has been derived from concatenated alignments of a small number of universal protein families (24). Users can navigate the tree by expanding or collapsing its sub-branches, thus choosing which organisms to focus on. AJAX technology (‘Asynchronous JavaScript and XML’) is then used to fetch the requested information into the existing, pre-loaded browser page, thus increasing useability and speed.
With respect to the underlying database structure, changes were necessary in the way homology data and interaction transfers are stored. Both can no longer be computed and stored in an ‘all-against-all’ fashion, because of their quadratic scaling with the number of genomes. Beginning with version 7, STRING therefore adopts a two-layered approach when accommodating fully sequenced genomes (Figure 3): important model organisms and those for which experimental data are available form the ‘core genomes’, all other genomes form the periphery. Within the core, homology searches and interaction transfers are still executed in an all-against-all fashion, whereas for peripheral genomes only searches against the core are included. These and other changes in STRING dramatically improve the scalability of the resource, leading to faster update cycles even when the number of sequenced genomes is to increase as fast as currently projected. Together with future plans to increase the scope and specificity of the stored interaction information, STRING should thus continue to facilitate not only network research but also wider projects that range from phylogenetics to metagenomics (24,25).
Figure 3
Organisms covered by STRING. STRING currently contains 373 fully sequenced organisms. These are divided into ‘Core Organisms’ and ‘Peripheral Organisms’. The former include all important model organisms for which experimental data are available, as well as selected representatives for cases of redundant genome sequencing (e.g. when several closely related strains of a bacterial species have been sequenced, only one strain is included). The ‘Peripheral Organisms’ form the remainder; they tend to be somewhat redundant, and usually have little more than genomic sequence information annotated. For the core organisms, homology relations and interaction transfers are fully computed, whereas the peripheral organisms are only connected to the core but not among themselves (the graphic shows only a small selection of organisms; lines indicate homology searches and interaction transfers). This architecture allows STRING to encompass all sequenced genomes, while still keeping database size and computation time within reasonable limits.
The authors wish to thank Dianna Fisk from the Saccharomyces Genome Database for access to the Gene Summary Paragraphs, and Toby Gibson, Martijn Huynen, Victor Neduva, Rune Linding and members of the Bork group for continued feedback and discussions. This work was supported in part by grants from the Bundesministerium für Forschung und Bildung, Germany, as well as through the ADIT Integrated Project, contract number LSHB-CT-2005-511065, and through the BioSapiens Network of Excellence, contract number LSHG-CT-2003-503265, both funded by the European Commission FP6 Programme. Funding to pay the Open Access publication charges for this article was provided by the University of Zurich, through its Research Priority Program ‘Systems Biology and Functional Genomics’.
Conflict of interest statement. None declared.
REFERENCES
1
, , , , , , , , , , et al.
The biomolecular interaction network database and related tools 2005 update
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
D418
-
D424
)
2
, , , , , .
The Database of Interacting Proteins: 2004 update
,
Nucleic Acids Res.
,
2004
, vol.
32
(pg.
D449
-
D451
)
3
, , , , , .
MINT: a Molecular INTeraction database
,
FEBS Lett.
,
2002
, vol.
513
(pg.
135
-
140
)
4
, , , , , .
BioGRID: a general repository for interaction datasets
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D535
-
D539
)
5
, , , , , , , , , , et al.
Human protein reference database—2006 update
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D411
-
D414
)
6
, , , , , , , , , , et al.
Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D442
-
D445
)
7
, , , , , , , , , , et al.
WormBase: better software, richer content
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D475
-
D478
)
8
, , , , , , , , , , et al.
Reactome: a knowledgebase of biological pathways
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
D428
-
D432
)
9
, , , , , , , , .
From genomics to chemical genomics: new developments in KEGG
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D354
-
D357
)
10
, , , , , .
STRING: a database of predicted functional associations between proteins
,
Nucleic Acids Res.
,
2003
, vol.
31
(pg.
258
-
261
)
11
, , , , , , , , .
STRING: known and predicted protein–protein associations, integrated and transferred across organisms
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
D433
-
D437
)
12
, , , .
STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene
,
Nucleic Acids Res.
,
2000
, vol.
28
(pg.
3442
-
3444
)
13
, .
Computational methods for the prediction of protein interactions
,
Curr. Opin. Struct. Biol.
,
2002
, vol.
12
(pg.
368
-
373
)
14
, .
Measuring genome evolution
,
Proc. Natl Acad. Sci. USA
,
1998
, vol.
95
(pg.
5849
-
5856
)
15
, , , , .
Assigning protein functions by comparative genome analysis: protein phylogenetic profiles
,
Proc. Natl Acad. Sci. USA
,
1999
, vol.
96
(pg.
4285
-
4288
)
16
, , , .
Protein interaction maps for complete genomes based on gene fusion events
,
Nature
,
1999
, vol.
402
(pg.
86
-
90
)
17
, , , , , .
Detecting protein function and protein–protein interactions from genome sequences
,
Science
,
1999
, vol.
285
(pg.
751
-
753
)
18
, , , .
Conservation of gene order: a fingerprint of proteins that physically interact
,
Trends Biochem. Sci.
,
1998
, vol.
23
(pg.
324
-
328
)
19
, , , , .
The use of gene clusters to infer functional coupling
,
Proc. Natl Acad. Sci. USA
,
1999
, vol.
96
(pg.
2896
-
2901
)
20
, , , , , , , , , , et al.
The COG database: an updated version includes eukaryotes
,
BMC Bioinformatics
,
2003
, vol.
4
pg.
41
21
, , , , , .
SIMAP: the similarity matrix of proteins
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D252
-
D256
)
22
, .
DILIMOT: discovery of linear motifs in proteins
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
W350
-
W355
)
23
, , , , , .
SMART 5: domains in the context of genomes and networks
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D257
-
D260
)
24
, , , , , .
Toward automatic reconstruction of a highly resolved tree of life
,
Science
,
2006
, vol.
311
(pg.
1283
-
1287
)
25
, , , , , , , , , , et al.
Comparative metagenomics of microbial communities
,
Science
,
2005
, vol.
308
(pg.
554
-
557
)
26
, .
Nucleocytoplasmic transport: integrating mRNA production and turnover with export through the nuclear pore
,
Mol. Cell. Biol.
,
2004
, vol.
24
(pg.
3069
-
3076
)
Author notes
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
I agree to the terms and conditions. You must accept the terms and conditions.
Submit a comment
Name
Affiliations
Comment title
Comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.
Citations
Views
Altmetric
Metrics
Total Views 3,887
2,789 Pageviews
1,098 PDF Downloads
Since 12/1/2016
Month: | Total Views: |
---|---|
December 2016 | 2 |
January 2017 | 2 |
February 2017 | 25 |
March 2017 | 8 |
April 2017 | 9 |
May 2017 | 12 |
June 2017 | 11 |
July 2017 | 13 |
August 2017 | 10 |
September 2017 | 11 |
October 2017 | 5 |
November 2017 | 12 |
December 2017 | 24 |
January 2018 | 33 |
February 2018 | 18 |
March 2018 | 21 |
April 2018 | 26 |
May 2018 | 33 |
June 2018 | 21 |
July 2018 | 23 |
August 2018 | 94 |
September 2018 | 33 |
October 2018 | 34 |
November 2018 | 44 |
December 2018 | 30 |
January 2019 | 35 |
February 2019 | 21 |
March 2019 | 46 |
April 2019 | 52 |
May 2019 | 35 |
June 2019 | 43 |
July 2019 | 65 |
August 2019 | 46 |
September 2019 | 62 |
October 2019 | 40 |
November 2019 | 43 |
December 2019 | 38 |
January 2020 | 41 |
February 2020 | 17 |
March 2020 | 27 |
April 2020 | 15 |
May 2020 | 23 |
June 2020 | 30 |
July 2020 | 23 |
August 2020 | 50 |
September 2020 | 31 |
October 2020 | 31 |
November 2020 | 28 |
December 2020 | 34 |
January 2021 | 38 |
February 2021 | 16 |
March 2021 | 31 |
April 2021 | 48 |
May 2021 | 47 |
June 2021 | 37 |
July 2021 | 27 |
August 2021 | 22 |
September 2021 | 22 |
October 2021 | 34 |
November 2021 | 34 |
December 2021 | 15 |
January 2022 | 22 |
February 2022 | 45 |
March 2022 | 57 |
April 2022 | 66 |
May 2022 | 53 |
June 2022 | 40 |
July 2022 | 38 |
August 2022 | 44 |
September 2022 | 51 |
October 2022 | 56 |
November 2022 | 104 |
December 2022 | 59 |
January 2023 | 71 |
February 2023 | 65 |
March 2023 | 65 |
April 2023 | 63 |
May 2023 | 48 |
June 2023 | 49 |
July 2023 | 37 |
August 2023 | 64 |
September 2023 | 61 |
October 2023 | 66 |
November 2023 | 76 |
December 2023 | 96 |
January 2024 | 101 |
February 2024 | 95 |
March 2024 | 83 |
April 2024 | 74 |
May 2024 | 69 |
June 2024 | 55 |
July 2024 | 51 |
August 2024 | 79 |
September 2024 | 70 |
October 2024 | 13 |
Citations
490 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic