Linear motif atlas for phosphorylation-dependent signaling - PubMed (original) (raw)

doi: 10.1126/scisignal.1159433.

Lars Juhl Jensen, Francesca Diella, Claus Jørgensen, Michele Tinti, Lei Li, Marilyn Hsiung, Sirlester A Parker, Jennifer Bordeaux, Thomas Sicheritz-Ponten, Marina Olhovsky, Adrian Pasculescu, Jes Alexander, Stefan Knapp, Nikolaj Blom, Peer Bork, Shawn Li, Gianni Cesareni, Tony Pawson, Benjamin E Turk, Michael B Yaffe, Søren Brunak, Rune Linding

Affiliations

Linear motif atlas for phosphorylation-dependent signaling

Martin Lee Miller et al. Sci Signal. 2008.

Abstract

Systematic and quantitative analysis of protein phosphorylation is revealing dynamic regulatory networks underlying cellular responses to environmental cues. However, matching these sites to the kinases that phosphorylate them and the phosphorylation-dependent binding domains that may subsequently bind to them remains a challenge. NetPhorest is an atlas of consensus sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding domains [Src homology 2 (SH2), phosphotyrosine binding (PTB), BRCA1 C-terminal (BRCT), WW, and 14-3-3]. The atlas reveals new aspects of signaling systems, including the observation that tyrosine kinases mutated in cancer have lower specificity than their non-oncogenic relatives. The resource is maintained by an automated pipeline, which uses phylogenetic trees to structure the currently available in vivo and in vitro data to derive probabilistic sequence models of linear motifs. The atlas is available as a community resource (http://netphorest.info).

PubMed Disclaimer

Figures

Fig. 1

Fig. 1

Tree-based organization, redundancy reduction, and partitioning of data. (A) All available data from in vivo and in vitro experiments for kinase, SH2, and PTB domains are organized by mapping them onto the phylogenetic domain trees. (B) The tree data structure enables us to automatically compile a data set of positive and negative examples for each domain or family of related domains. For a given domain (leafs in the tree) or domain family (branch points in the tree), we exclude phosphorylation sites that cannot be unambiguously designated as positive or negative examples, because they were annotated at a higher level in the tree. (C) Redundant phosphoproteins and phosphorylation sites are identified and eliminated on the basis of sequence similarity of the full-length protein sequence or the phosphorylation sites themselves. (D) Each redundancy-reduced data set is partitioned into four parts that are used for training, test, and validation of ANNs. See fig. S1 for a flowchart of the pipeline, fig. S2 for an overview of the data coverage, and Methods for details.

Fig. 2

Fig. 2

Selection of classifiers using the phosphoinositide 3-kinase-related kinase (PIKK) family of kinases as an example. (A) ANNs are trained for individual domains, subfamilies, and families of domains; by contrast, the PSSMs are initially assigned to the specific domain with which the in vitro assay was performed. (B) As some PSSMs (for example, the one for ATM) may be better used as classifiers for a subfamily of closely related kinases (for example, ATM/ATR), we backtrack all PSSMs toward the root of the tree. (C) We eliminate families that contain domains that are highly dissimilar from each other (for example, the PIKK family and the ATM/ATR/mTOR subfamily), in order not to describe highly divergent domains with the same ANNs and PSSMs (see Methods). (D) Whenever possible, we benchmark the ANNs and PSSMs and discard classifiers that do not perform significantly better than random expectation. (E) A nonredundant set of classifiers is selected that maximizes the average AROC across all kinases, SH2 domains, or PTB domains. (F) For the PIKK family of kinases, this procedure selects the ANNs for the ATM/ATR subfamily, mTOR, and DNA-dependent protein kinase (DNAPK) to be the best combination of classifiers. See fig. S3 for an overview of the current selection of classifiers.

Fig. 3

Fig. 3

Overview of the performance of the NetPhorest classifiers. The histogram shows the distribution of areas under the receiver operating characteristic curves (AROCs). More than 60% of the classifiers have AROC > 0.75 (see table S1 for the complete list of AROCs and fig. S8 or

http://netphorest.info

for the collection of ROCs).

Fig. 4

Fig. 4

Comparison of NetPhorest to other motif resources. We compared NetPhorest to Scansite (13) and the sequence patterns of ELM (14), PROSITE (19), and HPRD (18) using the entire compilation of phosphorylation sites. For NetPhosK (20), GPS (22), and KinasePhos (24), we used only the subset of sites that was dissimilar in sequence to those used to train classifiers of NetPhorest (see Methods for details). When at least five positive examples were left, the AROC was calculated. Subsequently, we tested how many of the predictors from each method performed no better than random, better than random but significantly poorer than NetPhorest, or comparable to NetPhorest. No predictor from any of the tested methods performed significantly better than the corresponding NetPhorest classifier. The number on each pie chart specifies how many predictors were tested from the method in question (see table S2 for details). Because classifiers from NetPhosK and Scansite were included in NetPhorest, those two resources are shown above the dotted line.

Fig. 5

Fig. 5

Weak sequence specificity of oncogenic kinases and autophosphorylated sites. Using the AROC as a proxy for the degree of sequence specificity, we compared several subsets of kinases and SH2 domains. (A) Serine/threonine (S/T) kinases exhibit stronger sequence specificity (higher AROC) than tyrosine (Y) kinases (P < 10−10). Tyrosine kinases with SH2 domains are less specific (lower AROC) than other tyrosine kinases (P < 10−3). (B) Oncogenic tyrosine kinases, as defined by the Cancer Genome Project (56), have lower AROC than their non-oncogenic counterparts (P < 0.003). Error bars show the 90% confidence intervals and statistical significance was tested by Student’s t test. (C) The score distribution of serine/threonine autophosphorylation sites in 10 kinases is shifted toward low values, whereas the random expectation would be a uniform distribution (P < 0.04; see Methods). This shows that autophosphorylation sites typically have weaker sequence motifs than other sites phosphorylated by the same kinase.

Fig. 6

Fig. 6

The role of NetPhorest in phosphoproteomics and modeling of phosphorylation-dependent signaling networks. The NetPhorest atlas of consensus linear motifs can be used for designing synthetic peptides for the development of kinase- or family-specific antibodies (for example, pS/T-Q), will replace Scansite (13) and NetPhosK (20) as the motif component of the NetworKIN resource (

http://networkin.info

) (12, 57), and can be used to detect biases arising from the enrichment procedures commonly used in phosphoproteomics [for example, phosphoramidate chemistry (PAC), immobilized metal affinity chromatography (IMAC), and titanium oxide (TiO2) (58)]. The NetPhorest Web site (

http://netphorest.info

) provides the means to classify phosphorylation sites on the basis of consensus sequence motifs.

Similar articles

Cited by

References

    1. Linding R, Russell RB, Neduva V, Gibson TJ. GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res. 2003;31:3701–3708. - PMC - PubMed
    1. Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A. Pfam: Clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251. - PMC - PubMed
    1. Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P. SMART 5: Domains in the context of genomes and networks. Nucleic Acids Res. 2006;34:D257–D260. - PMC - PubMed
    1. Yaffe MB. “Bits” and pieces. Sci STKE. 2006;2006:pe28. - PubMed
    1. Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P. Co-evolution of transcriptional and posttranslational cell-cycle regulation. Nature. 2006;443:594–597. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources