Widespread genome duplications throughout the history of flowering plants - PubMed (original) (raw)

doi: 10.1101/gr.4825606. Epub 2006 May 15.

P Kerr Wall, James H Leebens-Mack, Bruce G Lindsay, Douglas E Soltis, Jeff J Doyle, Pamela S Soltis, John E Carlson, Kathiravetpilla Arumuganathan, Abdelali Barakat, Victor A Albert, Hong Ma, Claude W dePamphilis

Affiliations

Widespread genome duplications throughout the history of flowering plants

Liying Cui et al. Genome Res. 2006 Jun.

Abstract

Genomic comparisons provide evidence for ancient genome-wide duplications in a diverse array of animals and plants. We developed a birth-death model to identify evidence for genome duplication in EST data, and applied a mixture model to estimate the age distribution of paralogous pairs identified in EST sets for species representing the basal-most extant flowering plant lineages. We found evidence for episodes of ancient genome-wide duplications in the basal angiosperm lineages including Nuphar advena (yellow water lily: Nymphaeaceae) and the magnoliids Persea americana (avocado: Lauraceae), Liriodendron tulipifera (tulip poplar: Magnoliaceae), and Saruma henryi (Aristolochiaceae). In addition, we detected independent genome duplications in the basal eudicot Eschscholzia californica (California poppy: Papaveraceae) and the basal monocot Acorus americanus (Acoraceae), both of which were distinct from duplications documented for ancestral grass (Poaceae) and core eudicot lineages. Among gymnosperms, we found equivocal evidence for ancient polyploidy in Welwitschia mirabilis (Gnetales) and no evidence for polyploidy in pine, although gymnosperms generally have much larger genomes than the angiosperms investigated. Cross-species sequence divergence estimates suggest that synonymous substitution rates in the basal angiosperms are less than half those previously reported for core eudicots and members of Poaceae. These lower substitution rates permit inference of older duplication events. We hypothesize that evidence of an ancient duplication observed in the Nuphar data may represent a genome duplication in the common ancestor of all or most extant angiosperms, except Amborella.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Effect of gene death rate and time of genome duplication on the _K_s distribution for paralogs. A single genome duplication was simulated, where time since duplication (corresponding to _K_s = 0.5 in A, D, and G; 1.0 in B, E, and H; or 1.5 in C, F, and I) is indicated by a star. The death rate of duplicate pairs (δ) increases from the top row to the bottom row (δ = 0.67 for A,B,C, as estimated from Arabidopsis data; 1.34 for D,E,F; and 2.68 for G,H,I). In each graph, the observed frequency of paralogs from background gene duplication is plotted with a dashed line, while the distribution deriving from genome duplication is plotted with a dotted line. The _K_s distribution of all paralogs is drawn with a solid line.

Figure 2.

Figure 2.

_K_s distribution from a sample of Arabidopsis unigenes and the diagnostic test according to the constant birth–death model (null model). (A) _K_s estimates from four methods show strong agreement. (ML) Maximum likelihood method by Goldman and Yang; (NG) Nei-Gojobori method; (mNG) modified Nei-Gojobori method; (YN) Yang and Nielsen method. These sample sizes are comparable to the unigenes available for the species sequenced in this study. (B) _K_s distributions for paralogs from four replicate unigene samples of 6000 sequences each. (C) The density plot of observed _K_s distribution and simulated data based on the null model with parameter δ = 0.67. (D) The Q-Q plot of observed versus expected _K_s values shows the poor fit of the null hypothesis that gene birth and death rates are constant (P ≪ 0.0001).

Figure 3.

Figure 3.

_K_s distributions of paralogs in selected angiosperm species, with fitted densities from mixture model analysis, suggest paleopolyploidy in eudicots and monocots. Each fitted line indicates a subpopulation in the mixture. The first (leftmost) component corresponds to paralogs from background gene duplications; other peaks indicate estimated median _K_s for ancient duplications. (A) Glycine max (soybean). (B, C) Solanum lycopersicum (tomato), data from floral tissue (B) and nonfloral tissue (C). (D) A basal eudicot, Eschscholzia californica (California poppy). (E) A basal monocot, Acorus americanus.

Figure 4.

Figure 4.

_K_s distributions of paralogs and orthologs among magnoliids suggest independent duplications and possibly shared genome duplication events in Laurales (Persea) and Magnoliales (Liriodendron). (A, B, C) The _K_s distributions for (A) Liriodendron, (B) Persea, and (C) Saruma, with fitted lines based on the mixture model analysis. (D) The _K_s distribution for Liriodendron and Persea, without scaling for rate differences between lineages. (E) _K_s distribution for paralogs in Liriodendron after rate calibration (adj = adjusted), compared with that of Persea, suggesting recent independent duplication and older shared genome-scale duplications. (F) _K_s distribution for orthologs of two magnoliid species. (Ltu) Liriodendron; (Pam) Persea; (She) Saruma. (G) Phylogeny of one representative orthologous gene set used for relative rate estimates. The branch lengths show the estimated relative rates of synonymous evolution in respective species.

Figure 5.

Figure 5.

_K_s distributions suggest possible genome duplications in basal angiosperms, and no evidence for genome duplication events in Amborella and some gymnosperm species. (A) _K_s distribution in Amborella, a basal-most angiosperm. No significant large-scale duplication is detected. (B) Three distinct components in the _K_s distribution for Nuphar, also a basal-most angiosperm, suggest at least two large-scale genome duplications. (C) Ks distribution for putative orthologs between Amborella and Nuphar. (D) Pinus taeda (loblolly pine) paralogous pairs follow the null model (see Methods). (E) _K_s distribution for paralogs in a gymnosperm, Welwitschia.

Figure 6.

Figure 6.

Phylogenetic summary of paleopolyploidy events estimated by the mixture model approach and their distribution among angiosperm and gymnosperm lineages. Scaled graph in center with Xs corresponding to median _K_s of pairs from background gene duplications, while small ovals indicate the median _K_s of possible concentrated duplications in the history of particular lineages. The phylogenetic tree at left shows the likely placement of detected genome-scale duplications. Uncertainty in phylogenetic timing of what may be a single duplication event at the base of the angiosperms is indicated with a wide oval that covers possible branching points compatible with the _K_s evidence. Hollow ovals indicate duplications identified in previous studies using paralogous genes or genomic data from those lineages.

Similar articles

Cited by

References

    1. Abi-Rached L., Gilles A., Shiina T., Pontarotti P., Inoko H., Gilles A., Shiina T., Pontarotti P., Inoko H., Shiina T., Pontarotti P., Inoko H., Pontarotti P., Inoko H., Inoko H. Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 2002;31:100–105. - PubMed
    1. Adams K.L., Cronn R., Percifield R., Wendel J.F., Cronn R., Percifield R., Wendel J.F., Percifield R., Wendel J.F., Wendel J.F. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. 2003;100:4649–4654. - PMC - PubMed
    1. Albert V.A., Soltis D.E., Carlson J.E., Farmerie W.G., Wall P.K., Ilut D.C., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Soltis D.E., Carlson J.E., Farmerie W.G., Wall P.K., Ilut D.C., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Carlson J.E., Farmerie W.G., Wall P.K., Ilut D.C., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Farmerie W.G., Wall P.K., Ilut D.C., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Wall P.K., Ilut D.C., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Ilut D.C., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Solow T.M., Mueller L.A., Landherr L.L., Hu Y., Mueller L.A., Landherr L.L., Hu Y., Landherr L.L., Hu Y., Hu Y., et al. Floral gene resources from basal angiosperms for comparative genomics research. BMC Plant Biol. 2005;5:5. - PMC - PubMed
    1. Bell C.D., Soltis D.E., Soltis P.S., Soltis D.E., Soltis P.S., Soltis P.S. The age of the angiosperms: A molecular timescale without a clock. Evolution Int. J. Org. Evolution. 2005;59:1245–1258. - PubMed
    1. Bennett M.D., Leitch I.J., Price H.J., Johnston J.S., Leitch I.J., Price H.J., Johnston J.S., Price H.J., Johnston J.S., Johnston J.S. Comparisons with Caenorhabditis (approximately 100 Mb) and Drosophila (approximately 175 Mb) using flow cytometry show genome size in Arabidopsis to be approximately 157 Mb and thus approximately 25% larger than the Arabidopsis Genome Initiative estimate of approximately 125 Mb. Ann. Bot. (Lond.) 2003;91:547–557. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources