Bootstrap confidence levels for phylogenetic trees - PubMed (original) (raw)

Bootstrap confidence levels for phylogenetic trees

B Efron et al. Proc Natl Acad Sci U S A. 1996.

Abstract

Evolutionary trees are often estimated from DNA or RNA sequence data. How much confidence should we have in the estimated trees? In 1985, Felsenstein [Felsenstein, J. (1985) Evolution 39, 783-791] suggested the use of the bootstrap to answer this question. Felsenstein's method, which in concept is a straightforward application of the bootstrap, is widely used, but has been criticized as biased in the genetics literature. This paper concerns the use of the bootstrap in the tree problem. We show that Felsenstein's method is not biased, but that it can be corrected to better agree with standard ideas of confidence levels and hypothesis testing. These corrections can be made by using the more elaborate bootstrap method presented here, at the expense of considerably more computation.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Part of the data matrix of aligned nucleotide sequences for the malaria parasite Plasmodium. Shown are the first 20 columns of the 11 × 221 matrix x of polytypic sites used in most of the analyses below. The final analysis of the last section also uses the data from 1399 monotypic sites.

Figure 2

Figure 2

Phylogenetic tree based on the malaria data matrix; species are numbered as in Fig. 1. The numbers at the branches are confidence values based on Felsenstein’s bootstrap method.B = 200 bootstrap replications.

Figure 3

Figure 3

Schematic diagram of tree estimation; triangle represents the space of all possible ˜π vectors in the multinomial probability model; regions ℛ1, ℛ2. . . correspond to the different possible trees. In the case shown ˜π and ˜π̂ lie in the same region so TREE =formula image, but ˜π̂* lies in a region whereformula image* does not have the 9-10 clade.

Figure 4

Figure 4

Two cases of the simple normal model; in both we observe μ̂ = (4.5, 0) ∈ ℛ1, and wish to assign a confidence value to μ ∈ ℛ1. Case I, ℛ2 is the region {μ1 ≤ 3}. Case II, ℛ2 is the region {∥μ∥ < 3}. The dashed circles indicate bootstrap samplingμ̂* ∼_N_2(μ̂, I).

Figure 5

Figure 5

Confidence levels of the two cases in Fig. 4;μ̂0 = (3, 0) is the closest point toμ̂ = (4.5, 0) on the boundary separating ℛ1 from ℛ2; bootstrap vectorμ̂** ∼_N_2(μ̂0, I). The confidence level α̂ is the probability thatμ̂** is closer than μ̂ to the boundary.

Corrected and republished from

Similar articles

Cited by

References

    1. Efron B, Tibshirani R. An Introduction to the Bootstrap. London: Chapman & Hall; 1993.
    1. Felsenstein J. Evolution. 1985;39:783–791. - PubMed
    1. Hillis D, Bull J. Syst Biol. 1993;42:182–192.
    1. Felsenstein J, Kishino H. Syst Biol. 1993;42:193–200.
    1. Newton M A. Biometrika. 1996;83:315–328.

Publication types

MeSH terms

LinkOut - more resources