Detection of homologous proteins by an intermediate sequence search - PubMed (original) (raw)

Comparative Study

Bino John et al. Protein Sci. 2004 Jan.

Abstract

We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

An alignment of the target sequence with an intermediate and a putative homolog. The dotted and the dashed lines represent the intermediate aligned to the putative homolog and the target, respectively. The positions of the starting residues in the alignment of the intermediate with the putative homolog and the target are denoted by its and iqs, respectively. The positions of the ending residues are denoted by ite and iqe. The number of residues in the common overlap region of the intermediate is indicated by C.

Figure 2.

Figure 2.

Accuracy of ISSnew, ISSold, and PSI-BLAST on SEQS-EASY. The accuracy is described by the ROC curves (Materials and Methods) for PSI-BLAST (dashed line), ISSold (dotted line), and ISSnew (solid line).

Figure 3.

Figure 3.

Accuracies of ISSnew, ISSold, and PSI-BLAST at different target-sequence lengths. (See Figure 2 ▶ legend for a description of the different symbols used.) The sequence lengths are less than 100 residues (A), between 100 and 200 residues (B), and greater than 200 residues (C).

Figure 4.

Figure 4.

Average alignment accuracy as a function of the thresholds on the overlap length (_x_-axis) and the ISSnew score (M). Error bars indicate the standard error of the mean; they are so small that they are almost invisible. Alignment accuracy is measured by the Cα RMSD between the compared structures (A) and coverage (B).

Figure 5.

Figure 5.

Average alignment accuracy as a function of the _E_-value. The _E_-values were calculated using the BLOSUM62 amino acid substitution matrix. The alignments of the pairs in SEQS-HARD are obtained by ISSnew. (See Figure 4 ▶ for details.)

Figure 6.

Figure 6.

Average alignment accuracy of the top five alignments selected by _E_-value as a function of the thresholds on the overlap length (_x_-axis) and the ISSnew score (M). The _E_-values were calculated using the BLOSUM62 amino acid substitution matrix. (See Figure 4 ▶ for details.)

Figure 7.

Figure 7.

Average alignment accuracy of the best alignments in the selected set of alignments. Accuracy of the alignments selected by _E_-value using BC0030, BLOSUM62, and OPTIMA residue-type substitution matrices. Alignment accuracy is measured by the Cα RMSD of the structures (A) and coverage (B).

Similar articles

Cited by

References

    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215 403–410. - PubMed
    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
    1. Apostolico, A. and Giancarlo, R. 1998. Sequence alignment in molecular biology. J. Comput. Biol. 5 173–196. - PubMed
    1. Barton, G.J. 1994. Scop: Structural classification of proteins. Trends Biochem. Sci. 19 554–555. - PubMed
    1. Blake, J.D. and Cohen, F.E. 2001. Pairwise sequence alignment below the twilight zone. J. Mol. Biol. 307 721–735. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources