Resolving difficult phylogenetic questions: why more sequences are not enough - PubMed (original) (raw)

Resolving difficult phylogenetic questions: why more sequences are not enough

Hervé Philippe et al. PLoS Biol. 2011 Mar.

No abstract available

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Simplified representation of the trees obtained in three recent phylogenomic analyses of early animal diversification.

(A) Schierwater et al. tree. (B) Dunn et al. tree. (C) Philippe et al. tree. Numbers in parentheses after taxon names indicate the number of species included in the dataset for the corresponding taxon. Bootstrap support values above 90% are indicated by a bullet (for nodes) or by underlining (for terminal taxa). It is worth mentioning that the monophyly of Porifera is not unequivocally accepted ,; only the analysis of 30,000 positions with a rich taxon sampling and a complex model of evolution recovers it with significant statistical support . Although such a sparse phylogenetic signal will require harnessing the full potential of phylogenomics to be confidently solved, this question is outside the scope of this study. Simplified drawings (redrawn from [74]) on the bottom illustrate the huge morphological disparity existing between the five terminal taxa. Porifera correspond to sponges; Cnidaria to sea anemones, jellyfishes, and allies; Ctenophora to comb jellies; and Bilateria to all other animals (characterized by their bilateral symmetry) except Trichoplax (Placozoa), which appears to be morphologically the most simply organized animal phylum.

Figure 2

Figure 2. Analysis of the revised Schierwater et al. dataset.

(A) Scheme of the original tree . (B) Scheme of the tree obtained with the revised dataset. Both trees were inferred using exactly the same probabilistic method and model (i.e., using RAxML with a GTR+Γ model for nucleotide sequences and a LG+F+Γ model for protein sequences). Numbers in the triangles indicate the number of species used for the corresponding clade. Bullets denote maximum bootstrap support values (BS = 100%); lower values are given. In the revised dataset, numerous discrepancies were corrected (Table S1), and a few genes were discarded because of dubious orthology; 14,112 unambiguously aligned positions were retained. Furthermore, the erroneous use of mitochondrial sequences of demosponge origin to represent both hexactinellids and calcareans (Figure S9) in the original study drastically—yet probably artifactually—strengthened the support for the monophyly of sponges (BS = 100%; [A]), whereas it appeared much weaker in our reanalysis (BS = 36%; [B]), in line with previous studies ,– that failed to find significant support for or against sponge monophyly (but see [3]). See Figure S10 for the complete tree obtained with the revised dataset.

Figure 3

Figure 3. Reanalysis of the Philippe et al. dataset with a reduced taxon sampling.

(A) Scheme of the original tree . (B) Scheme of the tree obtained after reduction of the taxon sampling. Both trees were inferred using exactly the same probabilistic method and model (i.e., PhyloBayes using the CAT+Γ model [76]). Numbers in the triangles indicate the number of species used for the corresponding clade. Bullets denote maximum bootstrap support values (BS = 100%); lower values are given. See Figure S12 for the complete tree obtained after reduction of the taxon sampling.

Figure 4

Figure 4. Reanalysis of the Philippe et al. dataset with a less complex model.

(A) Scheme of the original tree obtained with the CAT+Γ model. (B) Scheme of the tree obtained with the less complex WAG+F+Γ model. Both trees were inferred using exactly the same dataset. The WAG+F+Γ model has a less good fit to this alignment than the CAT+Γ model . Numbers in the triangles indicate the number of species used for the corresponding clade. Bullets denote maximum bootstrap support values (BS = 100%); lower values are given. See Figure S14 for the complete tree obtained with the less complex WAG+F+Γ model.

Figure 5

Figure 5. Saturation levels of datasets from Schierwater et al., Dunn et al., and Philippe et al.

(A) Schierwater et al. dataset. (B) Dunn et al. dataset. (C) Philippe et al. dataset. The revised alignments from Schierwater et al. and Dunn et al. were used (available as Datasets S1 and S2; see Text S1). The level of saturation was estimated for each dataset by computing the slope of the regression line of patristic distances (_y_-axis) versus uncorrected distances (_x_-axis), as previously described . Patristic distances between two species were computed from branch lengths of the best maximum likelihood tree (using a GTR+Γ model for nucleotide sequences and a LG+F+Γ model for protein sequences).

References

    1. Gee H. Evolution: ending incongruence. Nature. 2003;425:782. - PubMed
    1. Dunn C. W, Hejnol A, Matus D. Q, Pang K, Browne W. E, et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452:745–749. - PubMed
    1. Philippe H, Derelle R, Lopez P, Pick K, Borchiellini C, et al. Phylogenomics revives traditional views on deep animal relationships. Curr Biol. 2009;19:706–712. - PubMed
    1. Schierwater B, Eitel M, Jakob W, Osigus H. J, Hadrys H, et al. Concatenated analysis sheds light on early metazoan evolution and fuels a modern “urmetazoon” hypothesis. PLoS Biol. 2009;7:e1000020. doi: 10.1371/journal.pbio.1000020. - DOI - PMC - PubMed
    1. Philippe H, Chenuil A, Adoutte A. Can the Cambrian explosion be inferred through molecular phylogeny? Development. 1994;120:S15–S25.

Publication types

MeSH terms

LinkOut - more resources