Comparative analysis of protein domain organization - PubMed (original) (raw)

Comparative Study

Comparative analysis of protein domain organization

Yuzhen Ye et al. Genome Res. 2004 Mar.

Abstract

We have developed a set of graph theory-based tools, which we call Comparative Analysis of Protein Domain Organization (CADO), to survey and compare protein domain organizations of different organisms. In the language of CADO, the organization of protein domains in a given organism is shown as a domain graph in which protein domains are represented as vertices, and domain combinations, defined as instances of two domains found in one protein, are represented as edges. CADO provides a new way to analyze and compare whole proteomes, including identifying the consensus and difference of domain organization between organisms. CADO was used to analyze and compare >50 bacterial, archaeal, and eukaryotic genomes. Examples and overviews presented here include the analysis of the modularity of domain graphs and the functional study of domains based on the graph topology. We also report on the results of comparing domain graphs of two organisms, Pyrococcus horikoshii (an extremophile) and Haemophilus influenzae (a parasite with reduced genome) with other organisms. Our comparison provides new insights into the genome organization of these organisms. Finally, we report on the specific domain combinations characterizing the three kingdoms of life, and the kingdom "signature" domain organizations derived from those specific domain combinations.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A comparison of the number of ORFs, domains, domain combinations, and size of the giant component in domain graphs across genomes (see Supplemental Table A for details). The genomes are separated into archaeal, bacterial, and eukaryotic genomes by two lines; in each kingdom, the genomes are ranked according to the number of ORFs. The last six eukaryotic genomes have many more ORFs than the others, and they are not shown in the graph for clarity.

Figure 2

Figure 2

The correlation of functional homogeneity of the domain clusters with the cluster size in Saccharomyces cerevisiae. The functional homogeneity index (FHI) of domain combinations within the clusters and the FHI of domain combinations between the clusters are also shown for comparison. See text for details.

Figure 3

Figure 3

The clustering of domains from the giant component of genome Pyrococcus horikoshii, based on topological overlapping. The graph was drawn by TreeView (Page 1996).

Figure 4

Figure 4

The comparison between the domain graph of Methanopyrus kandleri AV19 (mka) and that of Pyrococcus horikoshii (pho). Only the largest component of their “combined” domain graph is shown with the common and specific domain and domain combinations shown in different colors: common in red, mka-specific in yellow, and pho-specific in green. The edges of weight 1 are shown in dashed lines; otherwise, the weight is shown along with the edges. See text for details.

Figure 5

Figure 5

The comparison between the domain graph of Haemophilus influenzae (hin) and that of Escherichia coli K12 (eco). Only the largest component of their “combined” domain graph is shown with the common and specific domain and domain combinations shown in different colors: common in red, hin-specific in yellow, and eco-specific in green. The edges ofweight 1 are shown in dashed lines, others in straight lines. See text for details.

Figure 6

Figure 6

The specific domains and combinations in the domain graph. Archaeal-specific domains and combinations are shown in yellow, bacterial-specific in blue, and eukaryotic-specific in green. The common domains and combinations in all genomes are shown in red. The remaining domains and combinations are shown in gray.

Figure 7

Figure 7

The largest eukaryotic “signature” domain organization. The graph was drawn manually for clarity based on a graphviz layout. See text for details.

Figure 8

Figure 8

(A) A schematic demonstration of the construction of a domain graph. Four domains (a, b, c, and d) are present in two given proteins. As a result, a domain graph with four vertices and four edges is formed. The vertices represent the domains with the same color. (B) The process of Comparative Analysis of Protein Domain Organization (CADO). Two domain graphs are shown in blue and pink (top) and their common organization is shown in brown in the bottom.

Similar articles

Cited by

References

    1. Aasland, R., Gibson, T.J., and Stewart, A.F. 1995. The PHD finger: Implications for chromatin-mediated transcriptional regulation. Trends Biochem. Sci. 20: 56–59. - PubMed
    1. Anantharaman, V., Koonin, E.V., and Aravind, L. 2001. TRAM, a predicted RNA-binding domain, common to tRNA uracil methylation and adenine thiolation enzymes. FEMS Microbiol. Lett. 197: 215–221. - PubMed
    1. Apic, G., Gough, J., and Teichmann, S.A. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310: 311–325. - PubMed
    1. Aravind, L. and Koonin, E.V. 1998. The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem. Sci. 23: 469–472. - PubMed
    1. Aravind, L. and Koonin, E.V. 2000. The U box is a modified RING finger—A common domain in ubiquitination. Curr. Biol. 10: R132–R134. - PubMed

WEB SITE REFERENCES

    1. ftp://ftp.ncbi.nih.gov/; NCBI GenBank.
    1. http://ffas.ljcrf.edu/DomainGraph; CADO.
    1. http://genome.jgi-psf.org/ciona4/ciona4.download.ftp.html; Ciona intestinalis.
    1. http://genome.jgi-psf.org/fugu6/fugu6.download.ftp.html; Fugu rubripes sequence.
    1. http://www.geneontology.org/; GO.

Publication types

MeSH terms

Substances

LinkOut - more resources