The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions (original) (raw)
Related papers
Molecular Phylogenetics and …, 2009
Phylogenetic analyses based on mitochondrial DNA have yielded widely differing relationships among members of the arthropod lineage Arachnida, depending on the nucleotide coding schemes and models of evolution used. We enhanced taxonomic coverage within the Arachnida greatly by sequencing seven new arachnid mitochondrial genomes from five orders. We then used all 13 mitochondrial protein-coding genes from these genomes to evaluate patterns of nucleotide and amino acid biases. Our data show that two of the six orders of arachnids (spiders and scorpions) have experienced shifts in both nucleotide and amino acid usage in all their protein-coding genes, and that these biases mislead phylogeny reconstruction. These biases are most striking for the hydrophobic amino acids isoleucine and valine, which appear to have evolved asymmetrical exchanges in response to shifts in nucleotide composition. To improve phylogenetic accuracy based on amino acid differences, we tested two recoding methods: (1) removing all isoleucine and valine sites and (2) recoding amino acids based on their physiochemical properties. We find that these methods yield phylogenetic trees that are consistent in their support of ancient intraordinal divergences within the major arachnid lineages. Further refinement of amino acid recoding methods may help us better delineate interordinal relationships among these diverse organisms.
Correct and incorrect vertebrate phylogenies obtained by the entire mitochondrial DNA sequences
Molecular Biology and Evolution, 1999
Concatenated sequences of all protein-coding genes in mitochondria recovered a known phylogeny of 11 vertebrate species correctly with statistical significance. However, when it was rooted by lampreys or sea urchins, the root of the vertebrate tree was placed between the mammal cluster and the chicken-frog-fish cluster or between the mammalchicken cluster and the frog-fish cluster, depending on the tree-making method used. Although the frog-fish or chicken-frog-fish cluster was biologically incorrect, it was again supported with a significantly high bootstrap value. In this study, we investigated the reasons why this happened. It has been suggested that an incorrect phylogeny may be constructed due to a change of amino acid composition in different lineages or due to homoplasies at sites with hydrophobic amino acids. However, our results indicated that these were not the causes of the incorrect rooting of the vertebrate tree. Rather, it was important to take into account an extensive rate variation across sites and different probabilities of substitution among different amino acids. The substitution rates for mitochondrial sequences vary considerably for different vertebrate lineages. In such a case, it is known to be important to use the model that reflects the actual substitution probability to obtain a correct tree topology. The correct rooting of the vertebrate tree was recovered when rate variation across sites was properly accounted for.
Annual Review of Ecology, Evolution, and Systematics, 2006
DNA data has been widely used in animal phylogenetic studies over the past 15 years. Here we review how these studies have used advances in knowledge of molecular evolutionary processes to create more realistic models of evolution, evaluate the information content of data, test phylogenetic hypotheses, attach time to phylogenies, and understand the relative usefulness of mitochondrial and nuclear genes. We also provide a new compilation of conserved polymerase chain reaction (PCR) primers for mitochondrial genes that complements our earlier compilation. 545 Annu. Rev. Ecol. Evol. Syst. 2006.37:545-579. Downloaded from arjournals.annualreviews.org by Prof Chris Simon on 11/10/06. For personal use only. mtDNA: mitochondrial DNA Gene order rearrangement: an evolutionary change in the location and/or direction of transcription of a gene with respect to other genes PCR: polymerase chain reaction rRNA: ribosomal RNA 546 Simon et al. Annu. Rev. Ecol. Evol. Syst. 2006.37:545-579. Downloaded from arjournals.annualreviews.org by Prof Chris Simon on 11/10/06. For personal use only. Allows for transition/transversion bias Allows base frequencies to vary Allows three substitution types Allows for transition/transversion bias Allows six substitution types Allows three substitution types Allows base frequencies to vary Allows six substitution types 548 Simon et al.
Cladistics, 2004
An analysis of the relationships of the major arthropod groups was undertaken using mitochondrial genome data to examine the hypotheses that Hexapoda is polyphyletic and that Collembola is more closely related to branchiopod crustaceans than insects. We sought to examine the sensitivity of this relationship to outgroup choice, data treatment, gene choice and optimality criteria used in the phylogenetic analysis of mitochondrial genome data. Additionally we sequenced the mitochondrial genome of an archaeognathan, Nesomachilis australica, to improve taxon selection in the apterygote insects, a group poorly represented in previous mitochondrial phylogenies. The sister group of the Collembola was rarely resolved in our analyses with a significant level of support. The use of different outgroups (myriapods, nematodes, or annelids + mollusks) resulted in many different placements of Collembola. The way in which the dataset was coded for analysis (DNA, DNA with the exclusion of third codon position and as amino acids) also had marked affects on tree topology. We found that nodal support was spread evenly throughout the 13 mitochondrial genes and the exclusion of genes resulted in significantly less resolution in the inferred trees. Optimality criteria had a much lesser effect on topology than the preceding factors; parsimony and Bayesian trees for a given data set and treatment were quite similar. We therefore conclude that the relationships of the extant arthropod groups as inferred by mitochondrial genomes are highly vulnerable to outgroup choice, data treatment and gene choice, and no consistent alternative hypothesis of Collembola's relationships is supported. Pending the resolution of these identified problems with the application of mitogenomic data to basal arthropod relationships, it is difficult to justify the rejection of hexapod monophyly, which is well supported on morphological grounds.
Systematic Entomology, 2010
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among-site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among-site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among-site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among-site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis. PO Crioceris (75.25%) PO Anoplophora (76.99%) PO Sphenophorus (74.96%) PO Naupactus (75.73%) PO Cucujus (74.82%) PO Priasilpha (73.70%) PO Tribolium (69.23%) PO Adelium (70.39%) PO Mordella (70.60%) PO Chaetosoma (77.37%) PO Necrophila (73.25%) PO Tropisternus (72.52%) PO Rhopaea (74.53%) PO Euspilotus (71.76%) PO Chauliognathus (76.17%) PO Pyrocoelia (76.29%) PO Rhagophthalmus (78.18%) AR Tetraphalerus (65.86%) PO Pyrophorus (67.56%) PO Cyphon (72.87%) AD Calosoma (76.73%) AD Trachypachus (78.48%) AD Macrogyrus (75.34%) MY Sphaerius (79.80%) Drosophila (76.71%) Anopheles (75.99%) PO Crioceris (75.25%) PO Anoplophora (76.99%) PO Sphenophorus (74.96%) PO Naupactus (75.73%) PO Cucujus (74.82%) PO Priasilpha (73.70%) PO Tribolium (69.23%) PO Adelium (70.39%) PO Mordella (70.60%) PO Chaetosoma (77.37%) PO Necrophila (73.25%) PO Tropisternus (72.52%) PO Rhopaea (74.53%) PO Euspilotus (71.76%) PO Chauliognathus (76.17%) PO Pyrocoelia (76.29%) PO Rhagophthalmus (78.18%) AR Tetraphalerus (65.86%) PO Pyrophorus (67.56%) PO Cyphon (72.87%) AD Calosoma (76.73%) AD Trachypachus (78.48%) AD Macrogyrus (75.34%) MY Sphaerius (79.80%) Drosophila (76.71%) Anopheles (75.99%) Bombyx (79.57%) Ostrinia (79.42%) Vanhornia (78.21%) Melipona (86.35%) Philaenus (76.08%) PO Crioceris (75.25%) PO Anoplophora (76.99%) PO Sphenophorus (74.96%) PO Naupactus (75.73%) PO Cucujus (74.82%) PO Priasilpha (73.70%) PO Tribolium (69.23%) PO Adelium (70.39%) PO Mordella (70.60%) PO Chaetosoma (77.37%) PO Necrophila (73.25%) PO Tropisternus (72.52%) PO Rhopaea (74.53%) PO Euspilotus (71.76%) PO Chauliognathus (76.17%) PO Pyrocoelia (76.29%) PO Rhagophthalmus (78.18%) AR Tetraphalerus (65.86%) PO Pyrophorus (67.56%) PO Cyphon (72.87%) AD Calosoma (76.73%) AD Trachypachus (78.48%) AD Macrogyrus (75.34%) AD Sphaerius (79.80%) Drosophila (76.71%) Anopheles (75.99%) PO Crioceris (75.25%) PO Anoplophora (76.99%) PO Sphenophorus (74.96%) PO Naupactus (75.73%) PO Cucujus (74.82%) PO Priasilpha (73.70%) PO Tribolium (69.23%) PO Adelium (70.39%) PO Mordella (70.60%) PO Chaetosoma (77.37%) PO Necrophila (73.25%) PO Tropisternus (72.52%) PO Rhopaea (74.53%) PO Euspilotus (71.76%) PO Chauliognathus (76.17%) PO Pyrocoelia (76.29%) PO Rhagophthalmus (78.18%) AR Tetraphalerus (65.86%) PO Pyrophorus (67.56%) PO Cyphon (72.87%) AD Calosoma (76.73%) AD Trachypachus (78.48%) AD Macrogyrus (75.34%) MY Sphaerius (79.80%) Drosophila (76.71%) Anopheles (75.99%)
Molecular Phylogenetics and Evolution, 2007
Analysis of complete mitochondrial genome sequences is becoming increasingly common in genetic studies. The availability of full genome datasets enables an analysis of the information content distributed throughout the mitochondrial genome in order to optimize the research design of future evolutionary studies. The goal of our study was to identify informative regions of the human mitochondrial genome using two criteria: (1) accurate reconstruction of a phylogeny and (2) consistent estimates of time to most recent common ancestor (TMRCA). We created two series of datasets by deleting individual genes of varied length and by deleting 10 equal-size fragments throughout the coding region. Phylogenies were statistically compared to the full-coding-region tree, while coalescent methods were used to estimate the TMRCA and associated credible intervals. Individual fragments important for maintaining a phylogeny similar to the full-coding-region tree encompassed bp 577-2122 and 11,399-16,023, including all or part of 12S rRNA, 16S rRNA, ND4, ND5, ND6, and cytb. The control region only tree was the most poorly resolved with the majority of the tree manifest as an unresolved polytomy. Coalescent estimates of TMRCA were less sensitive to removal of any particular fragment(s) than reconstruction of a consistent phylogeny. Overall, we discovered that half the genome, i.e., bp 3669-11,398, could be removed with no signiWcant change in the phylogeny (p AU D 0.077) while still maintaining overlap of TMRCA 95% credible intervals. Thus, sequencing a contiguous fragment from bp 11,399 through the control region to bp 3668 would create a dataset that optimizes the information necessary for phylogenetic and coalescent analyses and also takes advantage of the wealth of data already available on the control region.