Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? - PubMed (original) (raw)

Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots?

Sasa Stefanović et al. BMC Evol Biol. 2004.

Abstract

Background: Numerous studies, using in aggregate some 28 genes, have achieved a consensus in recognizing three groups of plants, including Amborella, as comprising the basal-most grade of all other angiosperms. A major exception is the recent study by Goremykin et al. (2003; Mol. Biol. Evol. 20:1499-1505), whose analyses of 61 genes from 13 sequenced chloroplast genomes of land plants nearly always found 100% support for monocots as the deepest angiosperms relative to Amborella, Calycanthus, and eudicots. We hypothesized that this conflict reflects a misrooting of angiosperms resulting from inadequate taxon sampling, inappropriate phylogenetic methodology, and rapid evolution in the grass lineage used to represent monocots.

Results: We used two main approaches to test this hypothesis. First, we sequenced a large number of chloroplast genes from the monocot Acorus and added these plus previously sequenced Acorus genes to the Goremykin et al. (2003) dataset in order to explore the effects of altered monocot sampling under the same analytical conditions used in their study. With Acorus alone representing monocots, strongly supported Amborella-sister trees were obtained in all maximum likelihood and parsimony analyses, and in some distance-based analyses. Trees with both Acorus and grasses gave either a well-supported Amborella-sister topology or else a highly unlikely topology with 100% support for grasses-sister and paraphyly of monocots (i.e., Acorus sister to "dicots" rather than to grasses). Second, we reanalyzed the Goremykin et al. (2003) dataset focusing on methods designed to account for rate heterogeneity. These analyses supported an Amborella-sister hypothesis, with bootstrap support values often conflicting strongly with cognate analyses performed without allowing for rate heterogeneity. In addition, we carried out a limited set of analyses that included the chloroplast genome of Nymphaea, whose position as a basal angiosperm was also, and very recently, challenged.

Conclusions: These analyses show that Amborella (or Amborella plus Nymphaea), but not monocots, is the sister group of all other angiosperms among this limited set of taxa and that the grasses-sister topology is a long-branch-attraction artifact leading to incorrect rooting of angiosperms. These results highlight the danger of having lots of characters but too few and, especially, molecularly divergent taxa, a situation long recognized as potentially producing strongly misleading molecular trees. They also emphasize the importance in phylogenetic analysis of using appropriate evolutionary models.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Current consensus hypothesis of angiosperm relationships. Tree topology is based on [42, 91] and references in Table 1. Small asterisks indicate the general phylogenetic position of the ten angiosperms (generic names shown for all but the three grasses) examined by Goremykin et al. [19]. The large asterisk indicates the addition in this study of the early-arising monocot Acorus to the Goremykin et al. [19] dataset. The height of the triangles reflects the relative number of species in eudicots (~175,000 species), monocots (~70,000), and magnoliids (~9,000) as estimated by Judd et al. [18] and Walter Judd (personal communication). The other five angiosperm groups shown contain only between 1 and ~100 species.

Figure 2

Figure 2

The effect of changing sampling of monocots as a function ofphylogenetic method. Analysis of the 61-gene data matrix using: Rows A-C, DNA parsimony; D-F, protein parsimony; G-I DNA ML HKY85 with no rate categories; J-L, RY-coded DNA parsimony. The first column of trees is with the Goremykin et al. [19] taxon sampling (grasses, but not Acorus), the second is with Acorus but not grasses, and the third is with both grasses and Acorus. All analyses used the first- and second-position matrix, either with or without the addition of Acorus as explained in Methods. Trees J-L use the same matrices, but with the nucleotides RY-coded.

Figure 3

Figure 3

Neighbor joining analyses using different evolutionary models and/or taxon sampling. Distance matrices were calculated from the first- and second-position matrix of Goremykin et al. [19] using (A) the K2P model, (B) the ML HKY85 model with four gamma-distributed rate categories and parameters estimated from the corresponding ML analysis, and (C) the K2P model with Acorus added to the first- and second-position matrix as described in Methods.

Figure 4

Figure 4

Maximum likelihood analyses using different evolutionary models. Trees A-C were calculated using the first- and second-position Goremykin et al. [19] matrix. Tree D was calculated using all three codon positions. All trees were built using ML with the HKY85 model and the following treatments of rate heterogeneity: A. No rate categories. B. Four gamma-distributed rate categories. C. Estimated proportion of invariant sites (no gamma rate categories). D. No rate categories (all three positions). Parameters were estimated separately for each analysis as described in Methods.

Figure 5

Figure 5

Bootstrap support and the SH-test p-value for the _Amborella_-sister or grasses-sister topologies as a function of (A) the gamma distribution α parameter value or (B) the proportion of invariable sites. The left vertical line in A and right line in B indicate the rate-heterogeneity parameter estimated from the data. The right vertical line in A and left line in B indicate the boundary where the topology of the best tree transitions between _Amborella_-sister and grasses-sister. All analyses were performed using the 61-gene first- and second-position matrix of Goremykin et al. [19] and the ML HKY85 model with the α parameter or proportion of invariant sites indicated on the X-axis. The transition-transversion parameter was estimated for each specified rate-heterogeneity parameter. p(Δ|LAmb-Lgrasses|) signifies the SH-test p-value for the difference between the likelihood scores of the two topologies. Bootstrap searches and SH-tests were performed as described in Methods.

Figure 6

Figure 6

Support for _Amborella_-sister or grasses-sister from the 61 chloroplast genes analyzed individually. A. ML HKY85 analyses with four gamma-distributed rate categories. Parameter estimates were calculated individually for each gene in a manner analogous to that performed on the concatenated dataset. B. MP analyses. All three codon positions are included in all analyses shown in both figures. Solid red lines correspond to _Amborella_-sister and dashed blue lines to grasses-sister topologies.

Figure 7

Figure 7

Inclusion of Nymphaea in analyses that account for rate heterogeneity. A. ML HKY85 with no rate categories (cf. Fig. 4A). B. ML HYK85 with four gamma-distributed rate categories (cf. Fig. 4B). C. ML with estimated proportion of invariant sites (no gamma rate categories; cf. Fig. 4C). D. NJ using a ML HKY85 model with four gamma-distributed rate categories to calculate distances (cf. Fig. 3B). All analyses used first- and second-positions only.

Figure 8

Figure 8

Competing hypotheses for the rooting of angiosperms showing the same underlying angiosperm topology when outgroups are excluded. A. Rooting within monocots (Mono), on the branch between grasses and all other angiosperms (see Fig. 2C, whose BS values are shown here, and also Fig. 2F; also see Goremykin et al. [19]). B. Unrooted network, with arrow showing alternative rootings as in A and C. C. Canonical rooting on the branch between Amborella and the rest of angiosperms (see Fig. 2I, whose BS values are shown here, and also Fig. 2L). We emphasize that 100% BS was obtained for _Amborella_-sister and for monocot monophyly (compared to 79% and 78% in C) using ML methods that allow for site-to-site rate heterogeneity (e.g., Additional files 1–3).

Similar articles

Cited by

References

    1. Mathews S, Donoghue MJ. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science. 1999;286:947–950. doi: 10.1126/science.286.5441.947. - DOI - PubMed
    1. Mathews S, Donoghue MJ. Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int J Plant Sci. 2000;161:S41–S55. doi: 10.1086/317582. - DOI - PubMed
    1. Parkinson CL, Adams KL, Palmer JD. Multigene analyses identify the three earliest lineages of extant flowering plants. Curr Biol. 1999;9:1485–1488. doi: 10.1016/S0960-9822(00)80119-0. - DOI - PubMed
    1. Qiu Y-L JLee, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis MJ, Zimmer EA, Chen Z, Savolainen V, Chase MW. The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature. 1999;402:404–407. doi: 10.1038/46536. - DOI - PubMed
    1. Qiu Y-L JLee, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis MJ, Zimmer EA, Chen Z, Savolainen V, Chase MW. Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int J Plant Sci. 2000;161:S3–S27. doi: 10.1086/317584. - DOI

Publication types

MeSH terms

Substances

LinkOut - more resources