Non-globular domains in protein sequences: automated segmentation using complexity measures - PubMed (original) (raw)
Non-globular domains in protein sequences: automated segmentation using complexity measures
J C Wootton. Comput Chem. 1994 Sep.
Abstract
Computational methods based on mathematically-defined measures of compositional complexity have been developed to distinguish globular and non-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequences of high informational complexity. Sequences of known crystal structure in the Brookhaven Protein Data Bank differ only slightly from randomly shuffled sequences in the distribution of statistical properties such as local compositional complexity. In contrast, in the much larger body of deduced sequences in the SWISS-PROT database, approximately one quarter of the residues occur in segments of non-randomly low complexity and approximately half of the entries contain at least one such segment. Sequences of proteins with known, physicochemically-defined non-globular regions have been analyzed, including collagens, different classes of coiled-coil proteins, elastins, histones, non-histone proteins, mucins, proteoglycan core proteins and proteins containing long single solvent-exposed alpha-helices. The SEG algorithm provides an effective general method for partitioning the globular and non-globular regions of these sequences fully automatically. This method is also facilitating the discovery of new classes of long, non-globular sequence segments, as illustrated by the example of the human CAN gene product involved in tumor induction.
Similar articles
- Structural and sequence characteristics of long alpha helices in globular proteins.
Kumar S, Bansal M. Kumar S, et al. Biophys J. 1996 Sep;71(3):1574-86. doi: 10.1016/S0006-3495(96)79360-8. Biophys J. 1996. PMID: 8874031 Free PMC article. - Discovering simple regions in biological sequences associated with scoring schemes.
Wan H, Li L, Federhen S, Wootton JC. Wan H, et al. J Comput Biol. 2003;10(2):171-85. doi: 10.1089/106652703321825955. J Comput Biol. 2003. PMID: 12804090 - Intrinsic disorder in the Protein Data Bank.
Le Gall T, Romero PR, Cortese MS, Uversky VN, Dunker AK. Le Gall T, et al. J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123. J Biomol Struct Dyn. 2007. PMID: 17206849 - Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins.
Dosztányi Z, Mészáros B, Simon I. Dosztányi Z, et al. Brief Bioinform. 2010 Mar;11(2):225-43. doi: 10.1093/bib/bbp061. Epub 2009 Dec 10. Brief Bioinform. 2010. PMID: 20007729 Review. - Non-globular structures of tandem repeats in proteins.
Matsushima N, Tanaka T, Kretsinger RH. Matsushima N, et al. Protein Pept Lett. 2009;16(11):1297-322. doi: 10.2174/092986609789353745. Protein Pept Lett. 2009. PMID: 20001922 Review.
Cited by
- Sequence composition of disordered regions fine-tunes protein half-life.
Fishbain S, Inobe T, Israeli E, Chavali S, Yu H, Kago G, Babu MM, Matouschek A. Fishbain S, et al. Nat Struct Mol Biol. 2015 Mar;22(3):214-21. doi: 10.1038/nsmb.2958. Epub 2015 Feb 2. Nat Struct Mol Biol. 2015. PMID: 25643324 Free PMC article. - LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.
Cascarina SM, King DC, Osborne Nishimura E, Ross ED. Cascarina SM, et al. NAR Genom Bioinform. 2021 May 26;3(2):lqab048. doi: 10.1093/nargab/lqab048. eCollection 2021 Jun. NAR Genom Bioinform. 2021. PMID: 34056598 Free PMC article. - Assessing predictors for new post translational modification sites: A case study on hydroxylation.
Piovesan D, Hatos A, Minervini G, Quaglia F, Monzon AM, Tosatto SCE. Piovesan D, et al. PLoS Comput Biol. 2020 Jun 22;16(6):e1007967. doi: 10.1371/journal.pcbi.1007967. eCollection 2020 Jun. PLoS Comput Biol. 2020. PMID: 32569263 Free PMC article. - Predictive sequence analysis of the Candidatus Liberibacter asiaticus proteome.
Cong Q, Kinch LN, Kim BH, Grishin NV. Cong Q, et al. PLoS One. 2012;7(7):e41071. doi: 10.1371/journal.pone.0041071. Epub 2012 Jul 18. PLoS One. 2012. PMID: 22815919 Free PMC article. - Functional coverage of the human genome by existing structures, structural genomics targets, and homology models.
Xie L, Bourne PE. Xie L, et al. PLoS Comput Biol. 2005 Aug;1(3):e31. doi: 10.1371/journal.pcbi.0010031. Epub 2005 Aug 19. PLoS Comput Biol. 2005. PMID: 16118666 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources