Comparative analysis of protein domain organization - PubMed (original) (raw)
Comparative Study
Comparative analysis of protein domain organization
Yuzhen Ye et al. Genome Res. 2004 Mar.
Abstract
We have developed a set of graph theory-based tools, which we call Comparative Analysis of Protein Domain Organization (CADO), to survey and compare protein domain organizations of different organisms. In the language of CADO, the organization of protein domains in a given organism is shown as a domain graph in which protein domains are represented as vertices, and domain combinations, defined as instances of two domains found in one protein, are represented as edges. CADO provides a new way to analyze and compare whole proteomes, including identifying the consensus and difference of domain organization between organisms. CADO was used to analyze and compare >50 bacterial, archaeal, and eukaryotic genomes. Examples and overviews presented here include the analysis of the modularity of domain graphs and the functional study of domains based on the graph topology. We also report on the results of comparing domain graphs of two organisms, Pyrococcus horikoshii (an extremophile) and Haemophilus influenzae (a parasite with reduced genome) with other organisms. Our comparison provides new insights into the genome organization of these organisms. Finally, we report on the specific domain combinations characterizing the three kingdoms of life, and the kingdom "signature" domain organizations derived from those specific domain combinations.
Figures
Figure 1
A comparison of the number of ORFs, domains, domain combinations, and size of the giant component in domain graphs across genomes (see Supplemental Table A for details). The genomes are separated into archaeal, bacterial, and eukaryotic genomes by two lines; in each kingdom, the genomes are ranked according to the number of ORFs. The last six eukaryotic genomes have many more ORFs than the others, and they are not shown in the graph for clarity.
Figure 2
The correlation of functional homogeneity of the domain clusters with the cluster size in Saccharomyces cerevisiae. The functional homogeneity index (FHI) of domain combinations within the clusters and the FHI of domain combinations between the clusters are also shown for comparison. See text for details.
Figure 3
The clustering of domains from the giant component of genome Pyrococcus horikoshii, based on topological overlapping. The graph was drawn by TreeView (Page 1996).
Figure 4
The comparison between the domain graph of Methanopyrus kandleri AV19 (mka) and that of Pyrococcus horikoshii (pho). Only the largest component of their “combined” domain graph is shown with the common and specific domain and domain combinations shown in different colors: common in red, mka-specific in yellow, and pho-specific in green. The edges of weight 1 are shown in dashed lines; otherwise, the weight is shown along with the edges. See text for details.
Figure 5
The comparison between the domain graph of Haemophilus influenzae (hin) and that of Escherichia coli K12 (eco). Only the largest component of their “combined” domain graph is shown with the common and specific domain and domain combinations shown in different colors: common in red, hin-specific in yellow, and eco-specific in green. The edges ofweight 1 are shown in dashed lines, others in straight lines. See text for details.
Figure 6
The specific domains and combinations in the domain graph. Archaeal-specific domains and combinations are shown in yellow, bacterial-specific in blue, and eukaryotic-specific in green. The common domains and combinations in all genomes are shown in red. The remaining domains and combinations are shown in gray.
Figure 7
The largest eukaryotic “signature” domain organization. The graph was drawn manually for clarity based on a graphviz layout. See text for details.
Figure 8
(A) A schematic demonstration of the construction of a domain graph. Four domains (a, b, c, and d) are present in two given proteins. As a result, a domain graph with four vertices and four edges is formed. The vertices represent the domains with the same color. (B) The process of Comparative Analysis of Protein Domain Organization (CADO). Two domain graphs are shown in blue and pink (top) and their common organization is shown in brown in the bottom.
Similar articles
- Structural characterization of the human proteome.
Müller A, MacCallum RM, Sternberg MJ. Müller A, et al. Genome Res. 2002 Nov;12(11):1625-41. doi: 10.1101/gr.221202. Genome Res. 2002. PMID: 12421749 Free PMC article. - Preferred codons and amino acid couples in hyperthermophiles.
De Farias ST, Bonato MC. De Farias ST, et al. Genome Biol. 2002 Jul 19;3(8):PREPRINT0006. doi: 10.1186/gb-2002-3-8-preprint0006. Epub 2002 Jul 19. Genome Biol. 2002. PMID: 12186639 - Function-dependent clustering of orthologues and paralogues of cyclophilins.
Galat A. Galat A. Proteins. 2004 Sep 1;56(4):808-20. doi: 10.1002/prot.20156. Proteins. 2004. PMID: 15281132 - [Proteins sharing PNPLA domain, a new family of enzymes regulating lipid metabolism].
Baulande S, Langlois C. Baulande S, et al. Med Sci (Paris). 2010 Feb;26(2):177-84. doi: 10.1051/medsci/2010262177. Med Sci (Paris). 2010. PMID: 20188050 Review. French. - Innovation from reduction: gene loss, domain loss and sequence divergence in genome evolution.
Braun EL. Braun EL. Appl Bioinformatics. 2003;2(1):13-34. Appl Bioinformatics. 2003. PMID: 15130831 Review.
Cited by
- Wolbachia endosymbionts manipulate the self-renewal and differentiation of germline stem cells to reinforce fertility of their fruit fly host.
Russell SL, Castillo JR, Sullivan WT. Russell SL, et al. PLoS Biol. 2023 Oct 24;21(10):e3002335. doi: 10.1371/journal.pbio.3002335. eCollection 2023 Oct. PLoS Biol. 2023. PMID: 37874788 Free PMC article. - Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery.
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Gollapalli P, et al. J Mol Evol. 2023 Oct;91(5):598-615. doi: 10.1007/s00239-023-10129-w. Epub 2023 Aug 25. J Mol Evol. 2023. PMID: 37626222 Review. - Genome-Wide Identification and Characterization of the Soybean Snf2 Gene Family and Expression Response to Rhizobia.
Wang J, Sun Z, Liu H, Yue L, Wang F, Liu S, Su B, Liu B, Kong F, Fang C. Wang J, et al. Int J Mol Sci. 2023 Apr 14;24(8):7250. doi: 10.3390/ijms24087250. Int J Mol Sci. 2023. PMID: 37108411 Free PMC article. - A review of visualisations of protein fold networks and their relationship with sequence and function.
Sykes J, Holland BR, Charleston MA. Sykes J, et al. Biol Rev Camb Philos Soc. 2023 Feb;98(1):243-262. doi: 10.1111/brv.12905. Epub 2022 Oct 9. Biol Rev Camb Philos Soc. 2023. PMID: 36210328 Free PMC article. Review. - In silico structural and functional characterization of Antheraea mylitta cocoonase.
Sneha S, Pandey DM. Sneha S, et al. J Genet Eng Biotechnol. 2022 Jul 11;20(1):102. doi: 10.1186/s43141-022-00367-8. J Genet Eng Biotechnol. 2022. PMID: 35816268 Free PMC article.
References
- Aasland, R., Gibson, T.J., and Stewart, A.F. 1995. The PHD finger: Implications for chromatin-mediated transcriptional regulation. Trends Biochem. Sci. 20: 56–59. - PubMed
- Anantharaman, V., Koonin, E.V., and Aravind, L. 2001. TRAM, a predicted RNA-binding domain, common to tRNA uracil methylation and adenine thiolation enzymes. FEMS Microbiol. Lett. 197: 215–221. - PubMed
- Apic, G., Gough, J., and Teichmann, S.A. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310: 311–325. - PubMed
- Aravind, L. and Koonin, E.V. 1998. The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem. Sci. 23: 469–472. - PubMed
- Aravind, L. and Koonin, E.V. 2000. The U box is a modified RING finger—A common domain in ubiquitination. Curr. Biol. 10: R132–R134. - PubMed
WEB SITE REFERENCES
- ftp://ftp.ncbi.nih.gov/; NCBI GenBank.
- http://genome.jgi-psf.org/ciona4/ciona4.download.ftp.html; Ciona intestinalis.
- http://genome.jgi-psf.org/fugu6/fugu6.download.ftp.html; Fugu rubripes sequence.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Miscellaneous