Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs (original) (raw)

Journal Article

,

*To whom correspondence should be addressed. Tel: +1 6174522103; Fax: +1 6174524978; Email: myaffe@mit.edu

Search for other works by this author on:

,

Search for other works by this author on:

Search for other works by this author on:

Cite

John C. Obenauer, Lewis C. Cantley, Michael B. Yaffe, Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs, Nucleic Acids Research, Volume 31, Issue 13, 1 July 2003, Pages 3635–3641, https://doi.org/10.1093/nar/gkg584
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Scansite identifies short protein sequence motifs that are recognized by modular signaling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with protein or phospholipid ligands. Each sequence motif is represented as a position-specific scoring matrix (PSSM) based on results from oriented peptide library and phage display experiments. Predicted domain-motif interactions from Scansite can be sequentially combined, allowing segments of biological pathways to be constructed in silico. The current release of Scansite, version 2.0, includes 62 motifs characterizing the binding and/or substrate specificities of many families of Ser/Thr- or Tyr-kinases, SH2, SH3, PDZ, 14-3-3 and PTB domains, together with signature motifs for PtdIns(3,4,5)P3-specific PH domains. Scansite 2.0 contains significant improvements to its original interface, including a number of new generalized user features and significantly enhanced performance. Searches of all SWISS-PROT, TrEMBL, Genpept and Ensembl protein database entries are now possible with run times reduced by ∼60% when compared with Scansite version 1.0. Scansite 2.0 allows restricted searching of species-specific proteins, as well as isoelectric point and molecular weight sorting to facilitate comparison of predictions with results from two-dimensional gel electrophoresis experiments. Support for user-defined motifs has been increased, allowing easier input of user-defined matrices and permitting user-defined motifs to be combined with pre-compiled Scansite motifs for dual motif searching. In addition, a new series of Sequence Match programs for non-quantitative user-defined motifs has been implemented. Scansite is available via the World Wide Web at http://scansite.mit.edu.

Received February 12, 2003; Revised and Accepted April 7, 2003

INTRODUCTION

Characterizing protein interactions on a proteome-wide scale is required to catalyze the advance of systems biology. Online databases of protein sequences (15) and known protein–protein interactions (68) are the first steps taken in this direction, but finding new interactions will require new combinations of experimental and computational methods. Scansite (http://scansite.mit.edu) is a computational tool built on experimental binding and/or substrate information from oriented peptide library screening (913) and phage display experiments (14), together with detailed biochemical characterization to derive a weight matrix-based scoring algorithm that predicts protein–protein interactions and sites of phosphorylation (15).

DOMAINS AND MOTIFS

The accumulated molecular structures in the Protein Data Bank (PDB) make it clear that eukaryotic proteins are often built with a modular architecture, combining domains that fold and function independently into larger polypeptides. These domains often occur in multiple unrelated proteins, where they fulfill similar targeting functions. Identification of these domains within a protein can be a valuable indicator of the function of the protein as a whole and can assist in placing that protein within the correct cell signaling pathway. A number of modular domains such as WW, SH2, SH3, PTB, PDZ and 14-3-3 bind to their ligands through direct interactions with very short amino acid sequences (typically <10 amino acids), or in the case of protein kinases, phosphorylate a Ser-, Thr- or Tyr-containing sequence motif in their protein substrates. Modular binding domains are typically fairly long (60–300 residues) and can be identified using sequence comparison methods and Hidden Markov Models [e.g. Pfam (16) and SMART (17)]. In contrast, the corresponding motifs to which they bind are much shorter (3–10 residues) and have been more elusive to locate. The current release (version 7.8) of Pfam, for example, identifies 4941 protein domains and families, but only 18 motifs (16). Scansite was developed to address this need and to facilitate work in our own laboratories on signaling by protein kinases and modular phosphopeptide- and phospholipid-binding domains.

Many of the motifs in Scansite were determined using oriented peptide library experiments. In this technique, degenerate peptides with a single fixed (orienting) central residue are incubated with one type of domain (913). Because of our laboratories' research focus, this central residue was typically a Ser, Thr or Tyr for protein kinase domains, or a phosphoSer/Thr or phosphoTyr residue for phosphospecific binding domains (such as SH2, PTB or 14-3-3 domains). Peptides that were phosphorylated by the kinase or were bound by the binding domain were isolated and sequenced as an ensemble by Edman degradation. When sequenced in this manner, each Edman cycle reveals the relative amount of each amino acid residue occurring at that position. This information is then scaled and normalized to produce a scoring matrix (i.e. a PSSM) which quantitatively indicates the preference for each amino acid type at each position within the domain's recognition motif. These matrices can then be used to score entire databases of protein sequences to find a small number of proteins with high-ranking motif matches, indicating possible protein–protein interactions. As the number of motifs grew, the opposite search became practical as well: scanning a single protein sequence for matches to any of the motifs in our database.

We have collected these programs to create a user-friendly web-based tool accessible to the entire scientific community that allows investigators to search for motifs recognized by commonly occurring domains within a protein sequence query of their choice or to search entire protein sequence databases for optimal motif matches. The Motif Scan ensemble of programs computationally identifies all motifs within a given user-specified protein, while the Database Search ensemble of programs finds all proteins in a protein database, such as SWISS-PROT, that match a given motif. By repeated queries using the results of one search to launch another, it is possible to infer several steps of a signaling pathway in silico. For example, if a newly discovered protein is predicted by Scansite to be phosphorylated by the kinase domain from Akt and the resulting phosphorylation is predicted to create a binding site for 14-3-3 proteins, then the newly discovered protein is likely to function in a signaling pathway involving these proteins. These types of analyses performed on protein sequence databases can functionally annotate a limited number of promising interactions that merit experimental investigation and may also suggest that other intermolecular interactions are unlikely, at least within the limits of sequence-based prediction.

STRINGENCY LEVELS

Threshold values need to be assigned when scanning query proteins with the Motif Scan programs to decide which scores are likely to suggest real interactions. Scansite incorporates three settings, labeled ‘high’, ‘medium’ and ‘low’ stringencies; the high stringency setting is the most restrictive and reports a ‘hit’ only if the score falls within the top 0.2% of scores when the motif matrix of interest was applied to the vertebrate subset of SWISS-PROT. This dataset was chosen as a reference because of the non-redundant nature of SWISS-PROT and the relevance of vertebrate proteins to the type of cell signaling events predicted by Scansite. These values were found to increase the reliability of prediction of true positive ‘hits’ while minimizing the number of predicted false negative interactions, based on a comparative analysis of mammalian and bacterial database subsets (15). The medium and low stringencies were then arbitrarily chosen at 1 and 5%, respectively.

Scoring percentiles in the Database Search programs, on the other hand, are calculated de novo, based solely on the protein database subset selected for the search. For example, a search among human proteins will yield sites whose percentiles are relative to all human proteins included in the search. The same site can thus have a different percentile for different database searches, but its score is always constant.

It should always be borne in mind by the user that Scansite predicitions are based solely on 1D sequence comparison and all predicted interactions must be experimentally verified before they can be considered valid.

MATERIALS AND METHODS

Server

The public collection of Scansite programs runs on a Dell PowerEdge 8450 server, with 8 Intel Xeon 733 MHz CPUs and 4 Gb of RAM. Two 32 Gb hard drives are used in a RAID 1 array. The operating system is Red Hat Linux 7.3.

Development

All development for Scansite version 2.0 was performed using the GNU GCC compiler, the PHP 4.0 and Perl 5.5 interpreters, Mandrake Linux 8.0 through 9.0, Red Hat Linux 7.1 through 7.3, the Apache 1.3 web server, the MySQL 3.23 relational database and the KDE desktop environment.

SCANSITE PROGRAMS

A total of 10 programs are included in Scansite 2.0 and these are listed in Table 1. The Motif Scan programs can accept either a protein accession number or a sequence as input and can optionally accept a user-defined motif. The Database Search programs can operate on one or more Scansite motifs, one or more user-defined motifs or combinations of Scansite and user-defined motifs. The Quick Matrix Method allows users to construct a roughly quantitative matrix based on qualitative residue preferences for a sequence motif. The Sequence Match programs allow users to find occurrences of one or two specified consensus sequences in the protein databases and can also be used to find any MySQL-recognized regular expression. A brief description of using each of these programs follows. More detailed instructions can be found in the tutorial on our web site (http://scansite.mit.edu/tutorial/tutorial.html) (see also 18).

Motif Scan

To use the Motif Scan programs, users should go to the web site http://scansite.mit.edu. Under the heading ‘Motif Scan’, click ‘Scan a Protein by Accession Number or ID’ to use a protein from a public database or click ‘Scan a Protein by Input Sequence’ to enter a protein sequence directly. The required inputs are then displayed, which include the protein's accession number and database of origin (or with the input sequence version, the protein's sequence and an arbitrary name for it), followed by the list of motifs to scan for. The default setting is to search for occurrences of all motifs in the Scansite database. Alternatively, one or more individual motifs can be selected, or several motifs of similar type (i.e. a ‘motif group’) can be selected at once. The list of motifs currently available in Scansite is shown in Table 2. Users can search at high stringency (the default choice), which shows only the strongest motif matches or at medium or low stringency to see weaker sites. Finally, users can elect to identify domains in the protein sequence, which Scansite accomplishes by parsing the results from an external call to the Pfam server at Washington University, St Louis (16). This lengthens the time needed to generate results, but the domain information is often very informative. With all these settings selected, clicking the ‘Submit Request’ button initiates the scan. The result will show a schematic map of the protein with the predicted sites found (Fig. 1) and a detailed table showing the score and sequence of each one (Fig. 2).

To use the Database Search program, users should click ‘Search Using a Scansite Motif’ under the ‘Database Search’ heading. A list of all the motifs in Scansite is shown. Users should select one of the motifs to search with and select the name of the protein database to search. The databases currently available are SWISS-PROT, TrEMBL, Genpept and Ensembl. Optionally, the search can be limited to proteins in just one species or a category of organisms, including mammals, vertebrates, invertebrates, plants, fungi, viruses and bacteria and archaea (grouped together). Other options allow searching within a specified range of molecular weights and isoelectric points, to facilitate comparison with two-dimensional gel electrophoresis experiments. Restricting the results by keywords in the protein description and/or by characteristic subsequences is also possible. The last user-specified parameter is the desired size of the search output, ranging from 50 to 2000 reported sites. Clicking ‘Submit Request’ starts the search. The resulting table (Fig. 3) lists all sites found, identifying the associated protein's name, description, sequence, molecular weight and isoelectric point. Any protein found from a database search can be rapidly submitted to the Motif Scan program by clicking the ‘Submit’ button on the far left of each output line.

In addition to the pre-compiled Scansite motifs listed, investigators can use their own motifs to search databases, using the program ‘Search Using an Input Motif’. A tab-delimited text file containing a weight matrix is uploaded into Scansite and the subsequent options and output are the same as described above. Instructions on how to create and upload a matrix are provided in the tutorial page on our web site (http://scansite.mit.edu/tutorials/tutorial.html).

One variation on the Database Search is the program ‘Search Using Quick Matrix Method’. This program allows users to define an approximate motif by specifying a short pattern of amino acids, where wildcards are allowed. For a motif such as RXSXL, this sequence can be entered in the row of positions labeled ‘Primary Preference’. Optionally, if it was known that proline can substitute for the leucine, a P can be entered in the ‘Secondary Preference’ row at that position. Scansite makes a crude weight matrix based on these inputs, assigning a score of 9.0 to residues in the primary preference row, a score of 4.5 to those in the secondary preference row and a score of 1.0 to all unspecified residues. The results of using the Quick Matrix Method will be less quantitative than a normal database search, but can yield useful results when only limited motif information is available.

Sequence Match

The Sequence Match programs are new in the current release of Scansite. As with the Quick Matrix Method, these programs are useful when only partial motif information is available. Unlike the Quick Matrix Method, these programs do not provide quantitative match ranking, but they instead retrieve all proteins in a database that exactly match the sequence pattern specified, similar to the programs Patscan (Ross Overbeek and Alex Rodriguez, http://www-unix.mcs.anl.gov/compbio/PatScan/HTML/patscan.html) and ScanProsite (http://us.expasy.org/tools/scanprosite/). Unlike those two programs, Sequence Match will accept the widely used regular expression syntax common in Perl, PHP, MySQL and other programming environments. This kind of information can help an investigator decide how rare or specific a hypothetical motif is, how functionally similar the proteins containing the motif are, whether a motif occurs more commonly in one species or another and how many proteins may cross-react with an antibody made using the motif as an epitope. As with the Database Search, the proteins retrieved can be limited to the most relevant ones by specifying a single species, molecular weight range and values for the other options mentioned previously.

There are three Sequence Match programs. The first and simplest takes a single consensus sequence as input, which may contain wildcards. The second program looks for two different consensus sequences occurring simultaneously in the same protein. The third and most flexible program is ‘Search Databases for Regular Expression’. Unlike the first two programs, this program allows gaps of any length, alternative residues at any position and motifs at the N- or C-termini of proteins (such as signal sequences or antibody epitopes). Any regular expression recognized by MySQL can be used as the search term and our web site gives the full list of allowed symbols as well as several biologically useful examples.

IMPROVEMENTS IN VERSION 2.0 OF SCANSITE

Speed

Program execution speed has been significantly improved for the Database Search programs. Storing protein sequence information in a relational database rather than in text files, in combination with rewriting the base code, shortened the time needed for a typical database search by approximately a factor of three compared with Scansite version 1.0. Our protein sequence databases are currently updated with each major release of Genpept, SWISS-PROT, TrEMBL and Ensembl. Between updates, very recent additions to these databases may not be present in Scansite.

Targeted searches

In addition to speed, the MySQL relational databases for protein sequences and motif PSSMs facilitate restricted database searches based on pre-annotated database entries. Scansite 2.0 gives researchers the ability to find motifs in proteins from a single species or genus, within a range of molecular weights and isoelectric points, or containing keywords, and/or a characteristic subsequence (which can lie outside the motif region). The Motif Scan programs similarly benefit: rather than searching for all motifs or individually selected ones, users can now search by motif ‘groups’, where functionally similar motifs have been grouped together (e.g. SH2 domains, SH3 domains, tyrosine kinases and others) (Table 2). One or more motif groups can also be combined with one or more individually selected motifs.

Graphics

The algorithm previously used to display sites and domains graphically along the protein sequence sometimes led to overlapping text, making annotations difficult to read. The new algorithm displays many more sites and domains without overlap. In response to numerous user requests, the generated graphic is now a single downloadable PNG image to facilitate publication of users' results.

Two-dimensional gel electrophoresis

Results from a Database Search can be sorted by molecular weight or isoelectric point and the search can be restricted to proteins within a narrow range of both parameters. As a result, Scansite can be used in conjunction with two-dimensional gel electrophoresis experiments to help identify spots in regions of a gel. For experiments involving primarily phosphoproteins, the expected number of phosphate groups can be specified in the Database Search options and mass and isoelectric point calculations correspondingly adjusted.

User-entered motifs

Users have always been able to enter their own motifs to perform Scansite searches. In version 2.0, we made three additions. First, we now allow use of matrices that lack values for one or more amino acid types by supplying default values for those positions. Second, researchers studying selenocysteine-containing proteins can now enter motifs giving a score for selenocysteine by labeling that column ‘U’, its accepted single-letter code. Third, motifs targeting the N-terminus of a protein sequence can now be specified, using a column labeled with the arbitrarily chosen character ‘$’ (dollar sign). The ability to use C-terminal-directed motifs has existed since version 1.0 by using the ‘*’ character and is currently used in PSSMs for PDZ domains.

Multiple motifs

Searching for proteins that contain motifs of more than one type can be a powerful way to increase the functional relevance of database searches (15). Version 1.0 allowed users to search for two Scansite motifs or two user-entered motifs. Version 2.0 allows users to search for proteins containing up to five different motifs, which can be any combination of Scansite motifs and user-entered motifs.

User-contributed motifs

The Database Search, Quick Matrix Method and Sequence Match programs allow users to temporarily upload one or more motifs. In Scansite 2.0, we now allow researchers to submit motifs directly into the Scansite database to make them available to other users. This should contribute favorably to the number and diversity of motif types that can be searched for in protein sequence queries. However, we cannot vouch for the accuracy of user-submitted motifs. To control for this, the web site allows users the option of including or rejecting user-submitted motifs in their scans. In addition, user-submitted motifs can be individually selected along with our standard Scansite motifs when using the Motif Scan programs. Interested users should contact us for information on adding motifs to the Scansite database.

Open source

Scansite 2.0 is a completely rewritten version of the original program, developed entirely at the Massachusetts Institute of Technology. We are releasing the source code for Scansite under the terms of the GNU General Public License, version 2 (Free Software Foundation, http://www.gnu.org/licenses/gpl.txt). Researchers interested in the fine details of our score calculations and other methodologies will thus have access to them and laboratories considering writing similar web applications can use our code to get started. The PSSMs for the 62 Scansite motifs, however, remain proprietary and are not included in the release. This policy is intended to prevent incorporation of the motifs into unauthorized commercial products. Use of the motifs on our public web site is permitted for all users, whether commercial or not. Anyone developing new features for Scansite is encouraged to submit changes back to us for inclusion in future public releases.

FUTURE DIRECTIONS

The revision of Scansite has produced a significantly faster and more efficient program for finding probable protein interactions. Focused searches enabled by incorporation of a relational database will help investigators target Scansite 2.0 to their own model organisms and experiments. New motifs will continue to be added to Scansite as they become available from oriented peptide library experiments. Researchers are encouraged to submit motifs of their own to our database for others to use. More specialized protein databases will be added over time, such as the RefSeq database and the mouse proteome. We are in the process of installing a second Scansite server for batch processing of long lists of sequences such as those obtained from DNA microarray experiments or genomic sequencing efforts. Future additions will include the ability to search among specific tissue types, the ability to adjust scores for predicted interactions based on their evolutionary conservation in orthologues and paralogues, the ability to restrict predicted interactions to proteins that co-localize in the same subcellular compartment, the ability to correlate predicted interactions with published data in the literature in an automated manner and the ability to automatically generate signaling network-style diagrams based on predicted interactions.

ACKNOWLEDGEMENTS

The authors wish to acknowledge the work done by developers who contributed to Scansite version 1.0, especially German Leparc and Stefano Volinia, as well as to members of the Yaffe and Cantley laboratories that provided the experimental data and beta-tested the programs. This work was funded by a Merck Genome Research Institute grant, the Merck/MIT Collaboration Program, NIH grants GM-60594 (M.B.Y.), GM-56203 (L.C.C.) and GM-52981 (M.B.Y.) and a Burroughs-Wellcome Career Development Award to M.B.Y.

Figure 1. Description of elements in Motif Scan graphical output. The protein query (in this case, the transcription factor FOXO1) is represented schematically as a line, with colored rectangles marking known domains. Labels above the protein indicate where motifs were found and identify the motif family. Labels below the protein indicate the name and range of each domain found. If the protein's annotation includes phosphorylation sites that have been experimentally mapped (generally true only for some SWISS-PROT entries), these are also indicated below the domains. On the next line, a plot of the predicted surface accessibility at each residue, calculated using a 6 amino acid running window (19) is shown. The ruler at the bottom marks numbered intervals along the protein sequence.

Figure 1. Description of elements in Motif Scan graphical output. The protein query (in this case, the transcription factor FOXO1) is represented schematically as a line, with colored rectangles marking known domains. Labels above the protein indicate where motifs were found and identify the motif family. Labels below the protein indicate the name and range of each domain found. If the protein's annotation includes phosphorylation sites that have been experimentally mapped (generally true only for some SWISS-PROT entries), these are also indicated below the domains. On the next line, a plot of the predicted surface accessibility at each residue, calculated using a 6 amino acid running window (19) is shown. The ruler at the bottom marks numbered intervals along the protein sequence.

Figure 2.Motif Scan's output table. For each motif family with a site on the graphical output (cf Fig. 1), details about the best matching domain motifs and the position of the site in the query are shown. The score, percentile and sequence of the site are indicated, as is the calculated surface accessibility for that site (labeled SA). Clicking on the score will display a histogram showing where this score ranks when compared with all potential sites for that motif in vertebrate SWISS-PROT; clicking on the sequence shows its position in the full protein and provides a link to BLAST for evaluating conservation of the motif in related protein homologues. For domains with an entry in the Weizman Institute's GeneCard database (http://bioinformatics.weizmann.ac.il/cards), the name is listed as a hyperlink to its GeneCard reference.

Figure 2.Motif Scan's output table. For each motif family with a site on the graphical output (cf Fig. 1), details about the best matching domain motifs and the position of the site in the query are shown. The score, percentile and sequence of the site are indicated, as is the calculated surface accessibility for that site (labeled SA). Clicking on the score will display a histogram showing where this score ranks when compared with all potential sites for that motif in vertebrate SWISS-PROT; clicking on the sequence shows its position in the full protein and provides a link to BLAST for evaluating conservation of the motif in related protein homologues. For domains with an entry in the Weizman Institute's GeneCard database (http://bioinformatics.weizmann.ac.il/cards), the name is listed as a hyperlink to its GeneCard reference.

Figure 3. Output table from a Database Search. The name of the motif used in the search (in this case 14-3-3) is displayed at the top, with any search restictions specified immediately below (in this case, human proteins with Mw from 66 to 90 kDa). Each line in the table lists the score and sequence of a site found, together with its protein ID, description, molecular weight and isoelectric point. Clicking the Submit button at the left launches the Motif Scan program for that protein to facilitate further analysis. This table is sorted by score, but can alternatively be sorted by molecular weight or isoelectric point.

Figure 3. Output table from a Database Search. The name of the motif used in the search (in this case 14-3-3) is displayed at the top, with any search restictions specified immediately below (in this case, human proteins with Mw from 66 to 90 kDa). Each line in the table lists the score and sequence of a site found, together with its protein ID, description, molecular weight and isoelectric point. Clicking the Submit button at the left launches the Motif Scan program for that protein to facilitate further analysis. This table is sorted by score, but can alternatively be sorted by molecular weight or isoelectric point.

Table 1.

List of programs available on the Scansite web site

Motif Scan Scans a protein sequence for motifs
Scan a Protein by Accession Number or ID Takes accession number or ID as input (e.g. RB_HUMAN, P06400)
Scan a Protein by Input Sequence Takes sequence as input
Scan Input Sequence with an Input Motif Takes sequence and a user-defined motif as input
Database Search Searches a database for motifs
Search Using a Scansite Motif Searches for a single pre-compiled Scansite motif
Search Using an Input Motif Takes a user-defined motif as input
Search Using Quick Matrix Method for Making a Motif Takes a semi-quantitative user-defined motif as input
Search Using Multiple Motifs Searches for multiple pre-compiled or user-defined motifs
Sequence Match Retrieves all sequences matching an input pattern exactly
Search Databases for Sequence Pattern Takes a single sequence pattern as input
Search Databases for Two Sequence Patterns Takes two sequence patterns as input
Search Databases for Regular Expression Takes a regular expression as input
Motif Scan Scans a protein sequence for motifs
Scan a Protein by Accession Number or ID Takes accession number or ID as input (e.g. RB_HUMAN, P06400)
Scan a Protein by Input Sequence Takes sequence as input
Scan Input Sequence with an Input Motif Takes sequence and a user-defined motif as input
Database Search Searches a database for motifs
Search Using a Scansite Motif Searches for a single pre-compiled Scansite motif
Search Using an Input Motif Takes a user-defined motif as input
Search Using Quick Matrix Method for Making a Motif Takes a semi-quantitative user-defined motif as input
Search Using Multiple Motifs Searches for multiple pre-compiled or user-defined motifs
Sequence Match Retrieves all sequences matching an input pattern exactly
Search Databases for Sequence Pattern Takes a single sequence pattern as input
Search Databases for Two Sequence Patterns Takes two sequence patterns as input
Search Databases for Regular Expression Takes a regular expression as input

Table 1.

List of programs available on the Scansite web site

Motif Scan Scans a protein sequence for motifs
Scan a Protein by Accession Number or ID Takes accession number or ID as input (e.g. RB_HUMAN, P06400)
Scan a Protein by Input Sequence Takes sequence as input
Scan Input Sequence with an Input Motif Takes sequence and a user-defined motif as input
Database Search Searches a database for motifs
Search Using a Scansite Motif Searches for a single pre-compiled Scansite motif
Search Using an Input Motif Takes a user-defined motif as input
Search Using Quick Matrix Method for Making a Motif Takes a semi-quantitative user-defined motif as input
Search Using Multiple Motifs Searches for multiple pre-compiled or user-defined motifs
Sequence Match Retrieves all sequences matching an input pattern exactly
Search Databases for Sequence Pattern Takes a single sequence pattern as input
Search Databases for Two Sequence Patterns Takes two sequence patterns as input
Search Databases for Regular Expression Takes a regular expression as input
Motif Scan Scans a protein sequence for motifs
Scan a Protein by Accession Number or ID Takes accession number or ID as input (e.g. RB_HUMAN, P06400)
Scan a Protein by Input Sequence Takes sequence as input
Scan Input Sequence with an Input Motif Takes sequence and a user-defined motif as input
Database Search Searches a database for motifs
Search Using a Scansite Motif Searches for a single pre-compiled Scansite motif
Search Using an Input Motif Takes a user-defined motif as input
Search Using Quick Matrix Method for Making a Motif Takes a semi-quantitative user-defined motif as input
Search Using Multiple Motifs Searches for multiple pre-compiled or user-defined motifs
Sequence Match Retrieves all sequences matching an input pattern exactly
Search Databases for Sequence Pattern Takes a single sequence pattern as input
Search Databases for Two Sequence Patterns Takes two sequence patterns as input
Search Databases for Regular Expression Takes a regular expression as input

Table 2.

Current list of motifs included in Scansite

Phosphoserine/threonine binding domains 14-3-3 mode 1
Tyrosine kinase domains Abl
EGFR
FGFR
Insulin receptor
Itk
Lck
PDGFR
Src
Src homology 2 domains Abl
Crk
FGFR
Fyn
Grb2
Itk
Lck
Nck
p85
PLC γ (C terminal SH2)
PLC γ (N terminal SH2)
Shc
SHIP
Src
Src homology 3 domains Abl
Amphiphysin
Cbl-associated protein
Cortactin
Crk
Grb2
Intersectin
Itk
Nck
p85 α (mode 1)
p85 α (mode 2)
PLC γ
Src
Basophilic serine/threonine kinase domains Akt
Calmodulin-dependent kinase 2
Clk2
Protein kinase A
PKC α/β/γ
PKC δ
PKC ε
PKC μ
PKC ζ
DNA damage kinase domains ATM
DNA protein kinase
Acidophilic serine/threonine kinase domains Casein kinase 1
Casein kinase 2
GSK3
Proline-dependent serine/threonine kinase domains Cdc2
Cdk5
Erk1
p38 MAP kinase
Kinase binding domains Erk1
PDK1
PDZ binding domains PDZ class 1
PDZ class 2
PDZ (nNOS) class 1
PDZ (nNOS) class 3
Phosphotyrosine binding domains Shc
Lipid binding domains PIP3-binding PH
Phosphoserine/threonine binding domains 14-3-3 mode 1
Tyrosine kinase domains Abl
EGFR
FGFR
Insulin receptor
Itk
Lck
PDGFR
Src
Src homology 2 domains Abl
Crk
FGFR
Fyn
Grb2
Itk
Lck
Nck
p85
PLC γ (C terminal SH2)
PLC γ (N terminal SH2)
Shc
SHIP
Src
Src homology 3 domains Abl
Amphiphysin
Cbl-associated protein
Cortactin
Crk
Grb2
Intersectin
Itk
Nck
p85 α (mode 1)
p85 α (mode 2)
PLC γ
Src
Basophilic serine/threonine kinase domains Akt
Calmodulin-dependent kinase 2
Clk2
Protein kinase A
PKC α/β/γ
PKC δ
PKC ε
PKC μ
PKC ζ
DNA damage kinase domains ATM
DNA protein kinase
Acidophilic serine/threonine kinase domains Casein kinase 1
Casein kinase 2
GSK3
Proline-dependent serine/threonine kinase domains Cdc2
Cdk5
Erk1
p38 MAP kinase
Kinase binding domains Erk1
PDK1
PDZ binding domains PDZ class 1
PDZ class 2
PDZ (nNOS) class 1
PDZ (nNOS) class 3
Phosphotyrosine binding domains Shc
Lipid binding domains PIP3-binding PH

Table 2.

Current list of motifs included in Scansite

Phosphoserine/threonine binding domains 14-3-3 mode 1
Tyrosine kinase domains Abl
EGFR
FGFR
Insulin receptor
Itk
Lck
PDGFR
Src
Src homology 2 domains Abl
Crk
FGFR
Fyn
Grb2
Itk
Lck
Nck
p85
PLC γ (C terminal SH2)
PLC γ (N terminal SH2)
Shc
SHIP
Src
Src homology 3 domains Abl
Amphiphysin
Cbl-associated protein
Cortactin
Crk
Grb2
Intersectin
Itk
Nck
p85 α (mode 1)
p85 α (mode 2)
PLC γ
Src
Basophilic serine/threonine kinase domains Akt
Calmodulin-dependent kinase 2
Clk2
Protein kinase A
PKC α/β/γ
PKC δ
PKC ε
PKC μ
PKC ζ
DNA damage kinase domains ATM
DNA protein kinase
Acidophilic serine/threonine kinase domains Casein kinase 1
Casein kinase 2
GSK3
Proline-dependent serine/threonine kinase domains Cdc2
Cdk5
Erk1
p38 MAP kinase
Kinase binding domains Erk1
PDK1
PDZ binding domains PDZ class 1
PDZ class 2
PDZ (nNOS) class 1
PDZ (nNOS) class 3
Phosphotyrosine binding domains Shc
Lipid binding domains PIP3-binding PH
Phosphoserine/threonine binding domains 14-3-3 mode 1
Tyrosine kinase domains Abl
EGFR
FGFR
Insulin receptor
Itk
Lck
PDGFR
Src
Src homology 2 domains Abl
Crk
FGFR
Fyn
Grb2
Itk
Lck
Nck
p85
PLC γ (C terminal SH2)
PLC γ (N terminal SH2)
Shc
SHIP
Src
Src homology 3 domains Abl
Amphiphysin
Cbl-associated protein
Cortactin
Crk
Grb2
Intersectin
Itk
Nck
p85 α (mode 1)
p85 α (mode 2)
PLC γ
Src
Basophilic serine/threonine kinase domains Akt
Calmodulin-dependent kinase 2
Clk2
Protein kinase A
PKC α/β/γ
PKC δ
PKC ε
PKC μ
PKC ζ
DNA damage kinase domains ATM
DNA protein kinase
Acidophilic serine/threonine kinase domains Casein kinase 1
Casein kinase 2
GSK3
Proline-dependent serine/threonine kinase domains Cdc2
Cdk5
Erk1
p38 MAP kinase
Kinase binding domains Erk1
PDK1
PDZ binding domains PDZ class 1
PDZ class 2
PDZ (nNOS) class 1
PDZ (nNOS) class 3
Phosphotyrosine binding domains Shc
Lipid binding domains PIP3-binding PH

References

Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Wheeler,D.L. (

2003

) GenBank.

Nucleic Acids Res.

,

31

,

23

–27.

Stoesser,G., Baker,W., van den Broek,A., Garcia-Pastor,M., Kanz,C., Kulikova,T., Leinonen,R., Lin,Q., Lombard,V., Lopez,R. et al. (

2003

) The EMBL Nucleotide Sequence Database: major new developments.

Nucleic Acids Res.

,

31

,

17

–22.

Miyazaki,S., Sugawara,H., Gojobori,T. and Tateno,Y. (

2003

) DNA Data Bank of Japan (DDBJ) in XML.

Nucleic Acids Res.

,

31

,

13

–16.

Bairoch,A. and Apweiler,R. (

2000

) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Nucleic Acids Res.

,

28

,

45

–48.

Clamp,M., Andrews,D., Barker,D., Bevan,P., Cameron,G., Chen,Y., Clark,L., Cox,T., Cuff,J., Curwen,V. et al. (

2003

) Ensembl 2002: accommodating comparative genomics.

Nucleic Acids Res.

,

31

,

38

–42.

Bader,G.D., Betel,D. and Hogue,C.W.V. (

2003

) BIND: the Biomolecular Interaction Network Database.

Nucleic Acids Res.

,

31

,

248

–250.

Xenarios,I., Fernandez,E., Salwinski,L., Duan,X.J., Thompson,M.J., Marcotte,E.M. and Eisenberg,D. (

2001

) DIP: The Database of Interacting Proteins: 2001 update.

Nucleic Acids Res.

,

29

,

239

–241.

Zanzoni,A., Montecchi-Palazzi,L., Quondam,M., Ausiello,G., Helmer-Citterich,M. and Cesareni,G. (

2002

) MINT: a Molecular INTeraction database.

FEBS Lett.

,

513

,

135

–140.

Songyang,Z., Shoelson,S.E., Chaudhuri,M., Gish,G., Pawson,T., Haser,W.G., King,F., Roberts,T., Ratnofsky,S. and Lechleider,R.J. (

1993

) SH2 domains recognize specific phosphopeptide sequences.

Cell

,

72

,

767

–778.

Songyang,Z., Blechner,S., Hoagland,N., Hoekstra,M.F., Piwnica-Worms,H. and Cantley,L.C. (

1994

) Use of an oriented peptide library to determine the optimal substrates of protein kinases.

Curr. Biol.

,

4

,

973

–982.

Yaffe,M.B., Rittinger,K., Volinia,S., Caron,P.R., Aitken,A., Leffers,H., Gamblin,S.J., Smerdon,S.J. and Cantley,L.C. (

1997

) The structural basis for 14-3-3:phosphopeptide binding specificity.

Cell

,

91

,

961

–971.

Songyang,Z. and Cantley,L.C. (

1998

) The use of peptide library for the determination of kinase peptide substrates.

Methods Mol. Biol.

,

87

,

87

–98.

Yaffe,M.B. and Cantley,L.C. (

2000

) Mapping specificity determinants for protein-protein association using protein fusions and random peptide libraries.

Methods Enzymol.

,

328

,

157

–170.

Kay,B.K., Winter,J. and McCafferty,J. (

1996

)

Phage Display of Peptides and Proteins: a Laboratory Manual

. Academic Press, San Diego, CA.

Yaffe,M.B., Leparc,G.G., Lai,J., Obata,T., Volinia,S. and Cantley,L.C. (

2001

) A motif-based profile scanning approach for genome-wide prediction of signaling pathways.

Nat. Biotechnol.

,

19

,

348

–353.

Bateman,A., Birney,E., Cerruti,L., Durbin,R., Etwiller,L., Eddy,S.R., Griffiths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L.L. (

2002

) The Pfam Protein Families Database.

Nucleic Acids Res.

,

30

,

276

–280.

Letunic,I., Goodstadt,L., Dickens,N.J., Doerks,T., Schultz,J., Mott,R., Ciccarelli,F., Copley,R.R., Ponting,C.P. and Bork,P. (

2002

) Recent improvements to the SMART domain-based sequence annotation resource.

Nucleic Acids Res.

,

30

,

242

–244.

Obenauer,J.C. and Yaffe,M.B. (

2003

) Computational prediction of protein–protein interactions. In Fu,H. (ed.)

Protein–Protein Interactions: Methods and Protocols.

Humana Press, Towata, NJ, in press.

Emini,E.A., Hughes,J.V., Perlow,D.S. and Boger,J. (

1985

) Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide.

J. Virol.

,

55

,

836

–839.

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 5,350

4,090 Pageviews

1,260 PDF Downloads

Since 1/1/2017

Month: Total Views:
January 2017 1
February 2017 20
March 2017 29
April 2017 15
May 2017 18
June 2017 28
July 2017 20
August 2017 13
September 2017 19
October 2017 29
November 2017 13
December 2017 70
January 2018 64
February 2018 48
March 2018 88
April 2018 94
May 2018 46
June 2018 35
July 2018 43
August 2018 62
September 2018 55
October 2018 58
November 2018 48
December 2018 64
January 2019 50
February 2019 39
March 2019 63
April 2019 67
May 2019 58
June 2019 55
July 2019 73
August 2019 44
September 2019 54
October 2019 42
November 2019 47
December 2019 55
January 2020 41
February 2020 63
March 2020 41
April 2020 101
May 2020 43
June 2020 38
July 2020 61
August 2020 45
September 2020 82
October 2020 60
November 2020 68
December 2020 39
January 2021 66
February 2021 69
March 2021 101
April 2021 70
May 2021 74
June 2021 78
July 2021 38
August 2021 29
September 2021 49
October 2021 66
November 2021 56
December 2021 59
January 2022 61
February 2022 74
March 2022 62
April 2022 58
May 2022 83
June 2022 37
July 2022 48
August 2022 50
September 2022 59
October 2022 55
November 2022 86
December 2022 60
January 2023 52
February 2023 79
March 2023 82
April 2023 60
May 2023 71
June 2023 59
July 2023 34
August 2023 46
September 2023 54
October 2023 73
November 2023 64
December 2023 100
January 2024 110
February 2024 91
March 2024 114
April 2024 94
May 2024 76
June 2024 57
July 2024 73
August 2024 53
September 2024 74
October 2024 37

×

Email alerts

Citing articles via

More from Oxford Academic