Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes - PubMed (original) (raw)

. 2004 Jul;16(7):1667-78.

doi: 10.1105/tpc.021345. Epub 2004 Jun 18.

Affiliations

Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes

Guillaume Blanc et al. Plant Cell. 2004 Jul.

Abstract

It is often anticipated that many of today's diploid plant species are in fact paleopolyploids. Given that an ancient large-scale duplication will result in an excess of relatively old duplicated genes with similar ages, we analyzed the timing of duplication of pairs of paralogous genes in 14 model plant species. Using EST contigs (unigenes), we identified pairs of paralogous genes in each species and used the level of synonymous nucleotide substitution to estimate the relative ages of gene duplication. For nine of the investigated species (wheat [Triticum aestivum], maize [Zea mays], tetraploid cotton [Gossypium hirsutum], diploid cotton [G. arboretum], tomato [Lycopersicon esculentum], potato [Solanum tuberosum], soybean [Glycine max], barrel medic [Medicago truncatula], and Arabidopsis thaliana), the age distributions of duplicated genes contain peaks corresponding to short evolutionary periods during which large numbers of duplicated genes were accumulated. Large-scale duplications (polyploidy or aneuploidy) are strongly suspected to be the cause of these temporal peaks of gene duplication. However, the unusual age profile of tandem gene duplications in Arabidopsis indicates that other scenarios, such as variation in the rate at which duplicated genes are deleted, must also be considered.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Theoretical Age Distributions of Pairs of Duplicated Genes in a Genome. (A) Age distribution of pairs of paralogs expected under constant rates of gene duplication and duplicated gene deletion. Three main phases can be outlined. An initial peak accounts for the most recently duplicated genes. The distribution then drops off after an exponential decrease because of the deletion of duplicated genes that are not under selective constraints. A long tail corresponds to older pairs of duplicates, where both genes evolve under selective constraints. (B) Age distribution of pairs of paralogs expected for a species that sustained two ancient large-scale duplication events. The overrepresentation of duplicated genes at periods corresponding to the large-scale duplication events gives rise to two secondary peaks.

Figure 2.

Figure 2.

Distributions of the Fraction of Duplication Events as a Function of Their Levels of Synonymous Substitution for 14 Model Plant Species. Data was grouped into bins of 0.05 Ks units for graphing, except for wheat, where bins represent intervals of 0.03 Ks units (C). For G. hirsutum and G. arboreum (B), the Ks distributions of duplication events are shown in blue and red, respectively. For potato and tomato (H) and for soybean and M. truncatula (I), the Ks distribution of orthologs compared between the species is plotted (green line) as well as the paralog distributions within each species (blue and red lines). For rice and Arabidopsis ([J] and [K]), the distributions of all pairs of duplicated genes and pairs of tandem duplicates are shown in blue and green, respectively. The distribution of Arabidopsis duplicate genes resulting from the most recent polyploidy event (Blanc et al., 2003) is shown in red in (K).

Figure 3.

Figure 3.

Frequency Distributions of Ks Values Obtained from Pairs of Duplicated Genes Identified in Unigene Data (Blue Line) and Complete Genome Sequence Data (Red Line) Are Essentially Identical. (A) Distributions of Ks values for Arabidopsis. (B) Distributions of Ks values for rice.

Figure 4.

Figure 4.

Suspected Large-Scale Duplication Events Presented in Phylogenetic Context. The phylogenetic tree represents the currently accepted phylogeny of the plant species analyzed here (Soltis et al., 1999). Branch lengths are not to scale. The red ovals represent suspected large-scale duplication events (polyploidy or aneuploidy) recovered in this study. White ovals correspond to suspected polyploidy or aneuploidy events inferred in previous publications (see text for details) but not evidenced in our analysis. Estimated dates of duplication and speciation events are given in million years when available (from present or previous publications; see text). The question mark for the oldest large-scale duplication event of M. truncatula and soybean indicates that the phylogenetic position and the number of independent large-scale duplication events are still unclear.

Comment in

Similar articles

Cited by

References

    1. Adams, M.D., et al. (1991). Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252, 1651–1656. - PubMed
    1. Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. - PubMed
    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. - PMC - PubMed
    1. Aoki, N., Whitfeld, P., Hoeren, F., Scofield, G., Newell, K., Patrick, J., Offler, C., Clarke, B., Rahman, S., and Furbank, R.T. (2002). Three sucrose transporter genes are expressed in the developing grain of hexaploid wheat. Plant Mol. Biol. 50, 453–462. - PubMed
    1. Bennett, M.D. (1998). Plant genome values: How much do we know? Proc. Natl. Acad. Sci. USA 95, 2011–2016. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources