Relating tissue specialization to the differentiation of expression of singleton and duplicate mouse proteins - PubMed (original) (raw)

Relating tissue specialization to the differentiation of expression of singleton and duplicate mouse proteins

Shiri Freilich et al. Genome Biol. 2006.

Abstract

Background: Gene duplications have been hypothesized to be a major factor in enabling the evolution of tissue differentiation. Analyses of the expression profiles of duplicate genes in mammalian tissues have indicated that, with time, the expression patterns of duplicate genes diverge and become more tissue specific. We explored the relationship between duplication events, the time at which they took place, and both the expression breadth of the duplicated genes and the cumulative expression breadth of the gene family to which they belong.

Results: We show that only duplicates that arose through post-multicellularity duplication events show a tendency to become more specifically expressed, whereas such a tendency is not observed for duplicates that arose in a unicellular ancestor. Unlike the narrow expression profile of the duplicated genes, the overall expression of gene families tends to maintain a global expression pattern.

Conclusion: The work presented here supports the view suggested by the subfunctionalization model, namely that expression divergence in different tissues, following gene duplication, promotes the retention of a gene in the genome of multicellular species. The global expression profile of the gene families suggests division of expression between family members, whose expression becomes specialized. Because specialization of expression is coupled with an increased rate of sequence divergence, it can facilitate the evolution of new, tissue-specific functions.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A schematic illustration of concepts described in the text. (a) Proteins illustrating different aspects of phyletic age/time of duplication in the mouse proteome, when the calibrated time is the transition from unicellularity to multicellularity. 'A' represents the appearance of a protein in the mouse proteome, and 'D' is a duplication event, leading to the retention of both copies in the mouse proteome. The appearance of a novel protein relates to events where protein contains a novel combination of domains or to events where a protein sequence was changed beyond the recognition of traditional sequence search algorithms, and therefore there is a high likelihood that the protein performs a new function. Pre-metazoan mouse proteins are proteins that have descended from a protein present in the unicellular ancestor of mouse; metazoan-specific proteins are proteins that are unique to the multicellular lineage of metazoa. Because all duplications of metazoan-specific proteins are bound to take place after the transition to multicellularity, proteins from this group are not classified into groups of time of duplication (preMD/postMD). (b) Building a cumulative expression profile for protein families. The cumulative expression profile of each family was built by recording all tissues in which at least a single family member is expressed. Singleton proteins, by definition, are single member families and the cumulative distribution is identical to the protein distribution. Family A is an example of complementary expression with no expression overlap between the duplicate proteins; family B is an example of identical expression; and family C is an example of complementary expression with partial expression overlap. The protein cartoons used in this figure are only illustrative. postMD, post-multicellularity duplicates; preMD, pre-multicellularity duplicates.

Figure 2

Figure 2

Expression breadth versus the number of duplicate pairs. Red dots indicate singleton proteins, and black dots indicate duplicate proteins. The tissues tested are the 13 cluster-representing tissues. (a) The size of the dots represents the number of proteins that have the same number of duplicate pairs and the same expression breadth. The blue dots represent the average expression breadth of proteins with the same number of duplicate pairs. Sample size = 2731 proteins; Kendall's tau = -0.20; P value ≤ 2.2 × 10-16; 95% confidence interval = -0.22 to -0.17. (b) Proteins are ordered according to their number of duplicate pairs and collected into bins of at least 100 proteins. Each point represents a bin. Error bars indicate the standard deviation from the mean, obtained by bootstrapping.

Figure 3

Figure 3

Expression breadth versus the number of duplicate pairs in the preMD, postMD and metazoan-specific subsets. Red dots indicate singleton proteins, and black dots indicate duplicate proteins. The tissues tested are the 13 cluster-representing tissues. (a,c,e) The size of the dots represents the number of proteins that have the same number of duplicate pairs and the same expression breadth. The blue dots represent the average expression breadth of proteins with the same number of duplicate pairs. preMD subset (panel a): sample size = 291 proteins, Kendall's tau = 0.001, P = 0.51, 95% confidence interval = -0.10 to +0.10); postMD subset (panel c): sample size = 431 proteins, Kendall's tau = -0.15, P = 1.2 × 10-6, 95% confidence interval = -0.23 to -0.08; metazoan-specific proteins subset (panel e): sample size = 1060 proteins, Kendall's tau = -0.28, P = 9.7 × 10-43, 95% confidence interval = -0.33 to -0.24. (b,d,f) Proteins are ordered according to their number of duplicate pairs and collected into bins of at least 10 proteins. Each point represents a bin. Error bars indicate the standard deviation from the mean, obtained by bootstrapping. postMD, post-multicellularity duplicates; preMD, pre-multicellularity duplicates.

Figure 4

Figure 4

Average cumulative expression coverage in bins of protein families, ordered by size of family. Proteins families for which expression information is available for at least a single member (black) are grouped into bins of at least 35 proteins (total number of families = 1249). Protein families for which expression information is available for at least 75% of the family members (green) are grouped into bins of at least 10 proteins (total number of families = 189). Each point represents a bin. Values on the x-axis describe the size of a family with any expression information (black dots). The size of families with at least 75% expression information (green dots) is the value on top of each green dot.

References

    1. Freilich S, Massingham T, Bhattacharyya S, Ponsting H, Lyons PA, Freeman TC, Thornton JM. Relationship between the tissue-specificity of mouse gene expression and the evolutionary origin and function of the proteins. Genome Biol. 2005;6:R56. doi: 10.1186/gb-2005-6-7-r56. - DOI - PMC - PubMed
    1. Lehner B, Fraser AG. Protein domains enriched in mammalian tissue-specific or widely expressed genes. Trends Genet. 2004;20:468–472. doi: 10.1016/j.tig.2004.08.002. - DOI - PubMed
    1. Subramanian S, Kumar S. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics. 2004;168:373–381. doi: 10.1534/genetics.104.028944. - DOI - PMC - PubMed
    1. Hendriksen PJ, Hoogerbrugge JW, Baarends WM, de Boer P, Vreeburg JT, Vos EA, van der Lende T, Grootegoed JA. Testis-specific expression of a functional retroposon encoding glucose-6-phosphate dehydrogenase in the mouse. Genomics. 1997;41:350–359. doi: 10.1006/geno.1997.4673. - DOI - PubMed
    1. Boer PH, Adra CN, Lau YF, McBurney MW. The testis-specific phosphoglycerate kinase gene pgk-2 is a recruited retroposon. Mol Cell Biol. 1987;7:3107–3112. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources