CHOP proteins into structural domain-like fragments - PubMed (original) (raw)
. 2004 May 15;55(3):678-88.
doi: 10.1002/prot.20095.
Affiliations
- PMID: 15103630
- DOI: 10.1002/prot.20095
CHOP proteins into structural domain-like fragments
Jinfeng Liu et al. Proteins. 2004.
Abstract
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.
Copyright 2004 Wiley-Liss, Inc.
Similar articles
- A comprehensive and non-redundant database of protein domain movements.
Qi G, Lee R, Hayward S. Qi G, et al. Bioinformatics. 2005 Jun 15;21(12):2832-8. doi: 10.1093/bioinformatics/bti420. Epub 2005 Mar 31. Bioinformatics. 2005. PMID: 15802286 - Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions.
Ekman D, Björklund AK, Frey-Skött J, Elofsson A. Ekman D, et al. J Mol Biol. 2005 Apr 22;348(1):231-43. doi: 10.1016/j.jmb.2005.02.007. J Mol Biol. 2005. PMID: 15808866 - CHOP: parsing proteins into structural domains.
Liu J, Rost B. Liu J, et al. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W569-71. doi: 10.1093/nar/gkh481. Nucleic Acids Res. 2004. PMID: 15215452 Free PMC article. - Identification of domains and domain interface residues in multidomain proteins from graph spectral method.
Sistla RK, K V B, Vishveshwara S. Sistla RK, et al. Proteins. 2005 May 15;59(3):616-26. doi: 10.1002/prot.20444. Proteins. 2005. PMID: 15789418 - Domains, motifs and clusters in the protein universe.
Liu J, Rost B. Liu J, et al. Curr Opin Chem Biol. 2003 Feb;7(1):5-11. doi: 10.1016/s1367-5931(02)00003-0. Curr Opin Chem Biol. 2003. PMID: 12547420 Review.
Cited by
- Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space.
Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. Marsden RL, et al. Nucleic Acids Res. 2006 Feb 15;34(3):1066-80. doi: 10.1093/nar/gkj494. Print 2006. Nucleic Acids Res. 2006. PMID: 16481312 Free PMC article. - Grammar of protein domain architectures.
Yu L, Tanwar DK, Penha EDS, Wolf YI, Koonin EV, Basu MK. Yu L, et al. Proc Natl Acad Sci U S A. 2019 Feb 26;116(9):3636-3645. doi: 10.1073/pnas.1814684116. Epub 2019 Feb 7. Proc Natl Acad Sci U S A. 2019. PMID: 30733291 Free PMC article. - Domain mobility in proteins: functional and evolutionary implications.
Basu MK, Poliakov E, Rogozin IB. Basu MK, et al. Brief Bioinform. 2009 May;10(3):205-16. doi: 10.1093/bib/bbn057. Epub 2009 Jan 16. Brief Bioinform. 2009. PMID: 19151098 Free PMC article. - A unified approach to protein domain parsing with inter-residue distance matrix.
Zhu K, Su H, Peng Z, Yang J. Zhu K, et al. Bioinformatics. 2023 Feb 3;39(2):btad070. doi: 10.1093/bioinformatics/btad070. Bioinformatics. 2023. PMID: 36734597 Free PMC article. - Evolutionary conservation of domain-domain interactions.
Itzhaki Z, Akiva E, Altuvia Y, Margalit H. Itzhaki Z, et al. Genome Biol. 2006;7(12):R125. doi: 10.1186/gb-2006-7-12-r125. Genome Biol. 2006. PMID: 17184549 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials