SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent - PubMed (original) (raw)

SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent

Norman E Davey et al. Nucleic Acids Res. 2006.

Abstract

Many important interactions of proteins are facilitated by short, linear motifs (SLiMs) within a protein's primary sequence. Our aim was to establish robust methods for discovering putative functional motifs. The strongest evidence for such motifs is obtained when the same motifs occur in unrelated proteins, evolving by convergence. In practise, searches for such motifs are often swamped by motifs shared in related proteins that are identical by descent. Prediction of motifs among sets of biologically related proteins, including those both with and without detectable similarity, were made using the TEIRESIAS algorithm. The number of motif occurrences arising through common evolutionary descent were normalized based on treatment of BLAST local alignments. Motifs were ranked according to a score derived from the product of the normalized number of occurrences and the information content. The method was shown to significantly outperform methods that do not discount evolutionary relatedness, when applied to known SLiMs from a subset of the eukaryotic linear motif (ELM) database. An implementation of Multiple Spanning Tree weighting outperformed two other weighting schemes, in a variety of settings.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Simplified graphical representation of the SLiMDisc method. The steps completed by SLiMDisc are in green, those which occur outside the program are in red. The input dataset is given to the TEIRESIAS algorithm for pattern discovery and the BLAST algorithm to establish the evolutionary relationships of the parent proteins. The returned motifs are then filtered according to a number of user defined criteria. Finally, the motifs are ranked using information content (based on amino acid frequencies) and evolutionary relatedness.

Figure 2

Figure 2

Graphical representation of UHS and UP normalization techniques. Four proteins, labelled 1–4, are shown with annotated domains marked as coloured regions. Regions of homology as detected by BLAST are shown as grey boxes linking the sequences. Sequences 1 and 2 share a large homologous (orange) domain. Sequences 2 and 3 also share a homologous region but this is not annotated as a domain. Three other domains are specific to proteins 1 (green), 3 (blue) and 4 (purple). All motifs a–f have three occurrences in the dataset but have different support (shown in the table on the right) after filtering. a.→Motif a occurs in a shared region between 1 and 2, which is reduced by UHS to a single occurrence. The third occurrence in sequence 3 is not in an homologous region to 1 or 2 and is treated as a separate occurrence by UHS. However, proteins 2 and 3 share a homologous region and so UP will cluster sequences 1, 2 and 3, reducing the number of occurrences to 1. Filtering domains reduces the support to 1 in either case. b.→Motif b occurs in a shared region between 2 and 3, which is reduced by both UHS and UP to a single occurrence. This time, the third occurrence lies in the totally unrelated protein 4 and is counted with either filter. Filtering domains removed the occurrence in 4, reducing the support to 1. c.→Motif c lies purely within a repeated domain in protein 3. This is reduced to a single occurrence by both UHS and UP (the protein is homologous with itself). Although, whole-protein self-hits are ignored by UHS, the additional local BLAST hits between different domains (shown in grey) will still cause motif c to be filtered by UHS. Domain filtering removes it completely. d.→Motif d is the same as motif b, except that none of the occurrences lie in domains and so domain filtering makes no difference. e.→Motif e lies in non-homologous regions of protein 1 and 4. UHS therefore keeps all three occurrences. Whole-protein self-hits are ignored during the UHS filtering, and so both occurrences of motif e in protein 4 are counted. In contrast, UP clusters sequence 4 with itself and reduces the support to 2. No occurrences lie in domains and so domain filtering makes no difference. f.→Motif f is found in proteins 1, 3 and 4. None of these regions are homologous and so UHS gives a support of 3. UP, however, will group proteins 1 and 3; even though they do not directly share homology, they both share homology with common protein 2. UP therefore reduces the support to 2.

Figure 3

Figure 3

Scattergram of the information content versus the score for the KDEL (see Table 2) retrieving motif. Each blue point on the scattergram is a motif which has been considered by SLiMDisc. The points in green are the top three motifs ranked by the method. The actual SLiM for this dataset is the motif described by the regular expression [KRHQSAP][DENQT]EL.

Similar articles

Cited by

References

    1. Munro S., Pelham H.R. A C-terminal signal prevents secretion of luminal ER proteins. Cell. 1987;48:899–907. - PubMed
    1. Furmanek A., Hofsteenge J. Protein C-mannosylation: facts and questions. Acta. Biochim. Pol. 2000;47:781–789. - PubMed
    1. Dahiya A., Gavin M.R., Luo R.X., Dean D.C. Role of the LXCXE binding site in Rb function. Mol. Cell. Biol. 2000;20:6799–6805. - PMC - PubMed
    1. Puntervoll P., Linding R., Gemund C., Chabanis-Davidson S., Mattingsdal M., Cameron S., Martin D.M., Ausiello G., Brannetti B., Costantini A., et al. ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003;31:3625–3630. - PMC - PubMed
    1. Neduva V., Linding R., Su-Angrand I., Stark A., Masi F.D., Gibson T.J., Lewis J., Serrano L., Russell R.B. Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS. Biol. 2005;3:e405. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources