There is no universal molecular clock for invertebrates, but rate variation does not scale with body size (original) (raw)

Abstract

The existence of a universal molecular clock has been called into question by observations that substitution rates vary widely between lineages. However, increasing empirical evidence for the systematic effects of different life history traits on the rate of molecular evolution has raised hopes that rate variation may be predictable, potentially allowing the “correction” of the molecular clock. One such example is the body size trend observed in vertebrates; smaller species tend to have faster rates of molecular evolution. This effect has led to the proposal of general predictive models correcting for rate heterogeneity and has also been invoked to explain discrepancies between molecular and paleontological dates for explosive radiations in the fossil record. Yet, there have been no tests of an effect in any nonvertebrate taxa. In this study, we have tested the generality of the body size effect by surveying a wide range of invertebrate metazoan lineages. DNA sequences and body size data were collected from the literature for 330 species across five phyla. Phylogenetic comparative methods were used to investigate a relationship between average body size and substitution rate at both interspecies and interfamily comparison levels. We demonstrate significant rate variation in all phyla and most genes examined, implying a strict molecular clock cannot be assumed for the Metazoa. Furthermore, we find no evidence of any influence of body size on invertebrate substitution rates. We conclude that the vertebrate body size effect is a special case, which cannot be simply extrapolated to the rest of the animal kingdom.

Keywords: comparative method, generation time, metabolic rate, Metazoa


The concept of a molecular clock, (a relatively constant rate of molecular evolution across lineages) has been fundamental in evolutionary biology for two main reasons. First, the observation of surprisingly even rates of protein change over evolutionary time (1) was one of the key concepts on which the neutral theory was constructed (2). Second, the molecular clock has provided one of the most useful new tools in evolutionary biology. The assumption that genetic distance is related to lineage divergence time has provided a way of reconstructing the evolutionary history of life. Molecular clocks have been particularly valuable for lineages with little or no fossil record (e.g., origins of emerging diseases such as HIV; ref. 3) or for taxa or periods for which the fossil record may contain gaps (e.g., origin of the kingdoms; refs. 5 and 6).

There is growing evidence, however, that substitution rates can vary considerably between species for a wide range of taxa, including mammals (79), arthropods (1012), and vascular plants (13). Such findings suggest that the molecular clock may not “tick” at a steady rate, even between closely related lineages. The false assumption of a molecular clock when reconstructing molecular phylogenies can result in incorrect topology (14, 15) and biased date estimation (1618). The existence of widespread rate variation has therefore reduced confidence in the use of the molecular clock. This problem has been exacerbated by the relatively low power of the tests used to detect and exclude rate variable sequences from such analyses (19). Although new “relaxed clock” methods that allow for rate variation have been developed, the assumptions on which these methods rely are unavoidably ad hoc and not closely based on empirical data. Consequently, the reliability of these newer molecular dating methods cannot be determined without improving our understanding of the determinants of rate heterogeneity (20).

Many of the recent molecular dating methods assume that variation in the rate of molecular evolution is essentially stochastic. These methods may produce seriously biased date estimates if rates also vary systematically among lineages, which may be the case if rates are at least partially determined by species characteristics (21). A range of correlates with the rate of molecular evolution have been reported, including life history traits such as metabolic rate (22), environmental factors such as temperature (23), and evolutionary processes such as speciation rate (24). The best characterized example of a systematic association between species characteristics and substitution rate is the body size effect observed in vertebrates: smaller bodied species tend to have faster rates of molecular evolution than their larger bodied relatives (22). This relationship has been observed for a number of mitochondrial and nuclear genes and for DNA·DNA hybridization distances in mammals (25), birds (26), and reptiles (27).

The negative correlation between body size and substitution rate in vertebrates is important for three reasons. First, it may help us to understand how and why rate variation occurs and, in doing so, may shed some light on the causes of molecular evolution. Second, the idea that patterns of lineage-specific rate variation may be predictable has prompted suggestions that the molecular clock could potentially be salvaged (28). Identifying correlates of rate variation may provide useful a priori information for the development of variable-rate molecular dating models (20), or it may even enable the production of a “corrected” metazoan clock, if substitution rates can be corrected for body size (29). Third, a body size relationship has been invoked to explain discrepancies between molecular and paleontological dates for several explosive radiations in the fossil record. For example, if mammalian lineages increased in average body size, then there may have been a concerted slowdown in molecular rates across multiple lineages, which could cause systematic errors in molecular date estimates (21).

Although the influence of body size on molecular evolution has been assumed to apply more widely (17, 30, 31), this association has only been reported in vertebrates, and to date, there have been no published studies systematically investigating the relationship in nonvertebrate taxa, which is important for several reasons. First, empirical evidence of a body size effect in invertebrates is needed before size or related life history variables can be used to correct molecular clocks or to inform a priori assumptions in variable-rate dating methods for these taxa. Second, if there is no evidence of a body size effect in invertebrates, it cannot be invoked to explain disagreements between molecular and fossil date estimates for radiations involving nonvertebrate taxa (including, most notably, the “explosive” origin of the major metazoan phyla). Third, and perhaps most importantly, an investigation into the rates of molecular evolution in invertebrates may shed light on the cause of the body size effect in vertebrates. The identification of potential causes of this effect has been complicated by the strong covariation that exists between body size and other vertebrate life history traits. Such covariance has meant that several different hypotheses have been put forward to explain the body size effect. For instance, smaller vertebrate species tend to have shorter generation times than their larger relatives; so they may copy their genomes more often per unit time and, thereby, acquire more mutations due to replication errors (the generation time effect; refs. 32 and 33). Smaller vertebrates also tend to have higher mass-specific metabolic rates, and consequently, higher mutation rates may result from the increased production of mutagenic metabolites (22, 34). Investigating the body size effect in invertebrates may offer an opportunity to distinguish between such explanations; life history traits that tightly covary in vertebrates may be disassociated in some invertebrate taxa.

In this study, we tested for the existence of a body size effect in a wide range of metazoans. To perform this analysis, we assembled body size measurements and DNA sequence data for 330 metazoan species from five different phyla (Arthropoda, Annelida, Echinodermata, Mollusca, and Platyhelminthes). We analyzed seven different genes, including mitochondrial and nuclear genes and protein-coding and RNA-coding sequences. Importantly, we accounted for phylogenetic nonindependence by using nonoverlapping contrasts between related taxa. We found no evidence of a consistent influence of body size on rate of molecular evolution in nonvertebrate metazoans.

Results

Evidence of Rate Variation.

The free-rate model of molecular evolution was significantly preferred for 15 of 22 data sets and preferred, but not significantly, for one other (Hymenoptera COI) (see Appendix 1, which is published as supporting information on the PNAS web site). These fifteen included at least one gene for each of the family-level data sets and for four of the six species-level data sets.

Compared with the clock model, where there is only one rate of substitution applied to the whole tree, the free-rate model is the most highly parameterized model of rate variation. As such, our test is conservative insofar as a free-rate model is unlikely to be favored in the presence of limited variation. (As such, support for the clock could indicate either truly clock-like evolution or a lack of power.) Conversely, however, we cannot tell if rejection of the clock is due to only a few lineages in the phylogeny having anomalous rates. Despite these limitations, we can conclude that significant rate variation is a common feature of metazoan phylogenies and that the molecular clock cannot be assumed for the invertebrates as a whole.

No Evidence of a Body Size Effect.

We found no evidence of an association between body size and substitution rate for any of the data sets included in the analysis (Table 1). There were no significant results for any of the sign tests or Spearman’s rank correlations. Two signed rank tests gave significant P values: the Lepidoptera COII (P = 0.008) and the Echinodermata 28S (P = 0.042), but correcting for multiple tests using the Bonferroni correction increased these P values above the significance threshold. This nonsignificance is supported by a visual inspection of the data: plotting the data points for the comparisons does not indicate any obvious relationship between the variables (see Fig. 1, which is published as supporting information on the PNAS web site). Furthermore, for these significant signed rank results, there was no evidence of a significant correlation in any of the other genes within the same taxon. It therefore seems likely that these two results were caused by statistical artifact rather than a biological relationship.

Table 1.

Results for statistical tests analyzing the relationship between body size and substitution rate

Data set Gene No. of pairs No. of +ve signs Sign test P value Signed rank P value Spearman’s rank Spearman P value
Species-level comparisons
Lepidoptera COI 63 31 1.000 0.816 −0.020 0.881
COII 38 25 0.073 0.008* (0.178) −0.021 0.901
ND5 28 16 0.572 0.891 0.064 0.750
Arachnida 16S 22 11 0.832 0.783 −0.243 0.274
COI 17 8 1.000 0.463 0.113 0.666
Cephalopoda COI 34 17 1.000 0.993 0.124 0.491
16S 26 13 1.000 0.829 0.085 0.714
Gastropoda 28S 23 11 1.000 0.191
COI 14 7 1.000 0.542
Echinodermata 18S 11 4 0.774 0.667
28S 10 3 0.227 0.042* (0.924)
Platyhelminthes 18S 14 8 1.000 0.520 −0.455 0.163
COI 10 6 1.000 0.496 0.455 0.191
Family-level comparisons
Hymenoptera 28S 11 6 1.000 0.831 −0.464 0.154
COI 8 2 0.289 0.641 −0.595 0.132
Bivalvia 18S 13 7 1.000 0.685 −0.154 0.617
28S 12 6 1.000 0.519 −0.382 0.248
COI 10 4 0.754 0.774 −0.321 0.368
Annelida 18S 11 4 0.549 0.365 0.091 0.797
H3 8 3 0.289 0.109 0.238 0.582
Monogenea 18S 7 5 0.453 0.375 0.321 0.498
28S 7 2 0.453 0.156 −0.464 0.302

Similarly, all but three of the statistical tests for the synonymous (dS) and nonsynonymous (dN) substitutions were found to be nonsignificant (Arachnida dN, sign test; Cephalopoda dS, signed rank test; and Gastropoda dS, Spearman’s rank correlations; see Appendix 2, which is published as supporting information on the PNAS web site). As before, although the number of data sets was smaller, Bonferroni correction rendered these results nonsignificant. Furthermore, no significant P values were returned for any more than one statistical test within the same taxon.

The metaanalysis of all comparisons did not reveal any significant association between body size and rate of substitution. Contrasts were almost evenly divided between those with a faster rate in the larger-bodied lineage (101 comparisons) and those with a faster rate in the smaller-bodied lineage (105 comparisons). Additionally, none of the within-phylum metaanalyses yielded a significant result (see Appendix 3, which is published as supporting information on the PNAS web site).

A total of fourteen David and Goliath comparison pairs were chosen, nine of which also showed evidence of significant rate variation (see Appendix 4, which is published as supporting information on the PNAS web site). Sign and signed rank tests were nonsignificant both for those pairs with observable rate variation (obtained P values were 0.5078 and 0.5703, respectively) and for the complete David and Goliath data set (with P values of 0.1796 and 0.5016). Additionally, results from the family-level comparison data sets, where divergences between comparison pairs were much deeper, were not obviously different to the results of the species-level analyses.

These results demonstrate that, although we cannot rule out a body size effect for taxa or genes not included in this study, we can clearly reject a universal body size effect on rate of molecular evolution in invertebrates, as there is no evidence of a significant effect of body size on substitution rate for these taxa and genes.

Discussion

This study shows that there is significant variation in the rate of molecular evolution between metazoan lineages. This rate variation is observable not only between deep divisions (e.g., among families) but also between closely related genera and species. We therefore conclude that it is unwise to assume a strict molecular clock for phylogenetic analyses without thorough testing of the sequence data in question. In addition, we demonstrate that this widespread rate variation is not explained by variation in average body size between lineages and therefore that the body size effect observed for rates of molecular evolution in vertebrates cannot simply be extrapolated to other metazoan lineages.

We are confident that our analysis has sufficient power to detect any universal body size effect for a number of reasons. First, we have a very large sample of phylogenetically independent comparisons between related species, genera, and families (see Appendix 5, which is published as supporting information on the PNAS web site). Second, our data set spans a broad range of taxa, including phyla from each of the three major metazoan groups (Ecdysozoa, Lophotrochozoa, and Deuterostomia). Third, we have included a range of different genes in this analysis, including mitochondrial and nuclear, protein coding and RNA, and have separately examined both synonymous and nonsynonymous substitutions: any general genome-wide effect should be detected in at least some of these sequences. Fourth, we have carried out this analysis for pairs of taxa ranging from recently diverged species to families of Palaeozoic age, so we are confident that our results are not obscured by comparisons being either too young (insufficient molecular change) or too old (saturated). In all cases, we included only resolved comparisons in our estimated phylogenies, i.e., those comparisons with a sufficient number of informative sites to determine differences in both topology and rate between species. Last, our comparisons span size differences of up to five orders of magnitude, so we believe that, if any pattern were present within the investigated taxonomic levels, we have included differences of sufficient size to detect it. Given these data, we can confidently reject a general, genome-wide influence of body size on the rate of molecular evolution in metazoans, even though such an effect may operate locally for particular taxa, such as vertebrates, or may be evident at deeper levels of divergence (E. Fontanillas, J.A.T., J.J.W., and L.B., unpublished work).

What Is the Cause of the Observed Rate Variation?

There are two broad ways for lineage-specific rates to arise. First, species may differ in their underlying mutation rate; for example, species with a higher metabolic rate may incur more DNA damage (22) or species with shorter generation times may accumulate more replication errors (35). Second, species may differ in the proportion of mutations that become fixed in a population (e.g., species with smaller effective population sizes are expected to have faster rates of molecular evolution) (36).

The lack of a general body size effect in invertebrates may be due to one of two causes: first, it is possible that in contrast to results from mammals (37), lineage effects are simply not an important component of rate variation in the invertebrate taxa studied, and instead, the rate variation detected in this study is due to gene-by-lineage effects (that is, rate variation is not a genome-wide phenomenon). This explanation could be the case if, for example, adaptive substitutions form a large component of overall substitution rate (38). Second, body size itself may not be the causal factor generating the vertebrate body size effect. Many other life history traits are known to scale with body size in vertebrates (3943), which raises the possibility that the same underlying mechanism (e.g., a generation time or metabolic rate effect) could be operating in all metazoans, but if the responsible factor does not correlate with body size in invertebrates, a body size effect may not have been detected in our analysis. Partial correlation analyses have been used to determine the contribution of multiple life history traits to substitution rate variation in vertebrates (25, 26), but the relative paucity of life history data available for invertebrates means that we cannot currently test this question conclusively. The additional life history trait measurements we collected for three data sets showed no evidence of any significant association with substitution rate (results not shown). However, we can consider how other life history traits might have an effect on invertebrate substitution rates.

Could Metabolic Rate Cause the Rate Variation?

Large-scale studies of arthropods (4448) and molluscs (4951) support the claim that metabolic rate scales with body size in these lineages, as it does in vertebrates. If this is the case, the results from this study suggest that metabolic rate is unlikely to be a cause of systematic rate variation in invertebrates. However, some of these previous studies have failed to account for phylogenetic nonindependence (45) and others do not distinguish between within- and between-species measurements. In addition, the metabolic rate of ectotherms can be influenced by other aspects of their biology such as temperature (46, 5254) or feeding strategy (45, 52), which may not scale with body size in a simple manner. Invertebrate metabolic rates are also likely to be more temporally variable than mammalian and avian rates even in comparison with the ectothermic vertebrates because greater fluctuations of core temperature may result from the typically smaller body size of invertebrates.

Invertebrates also tend to have more complex life cycles than vertebrates, and metabolic rate can vary between life cycle stages (55). Because germ line cells can be laid down during larval phases, substitution rate may be affected nonsystematically by changes in metabolic rate between different stages. The effects of metabolic rate fluctuation on substitution rate are not known; studies of the relationship between rate variation and life history traits in vertebrates with more complex life cycles, such as some amphibians and fish, may be informative. In summary, although our results do not allow us to conclude that metazoan substitution rate is not systematically affected by metabolic rate, it seems unlikely that any simple or general relationship between these two traits exists.

Could Generation Time Cause the Rate Variation?

It has been proposed that the rate of molecular evolution may be affected by number of DNA replications per unit time, which depends not only on the number of generations per unit time, but also on the number of germ-line cell divisions per generation (56). In invertebrates, it is unlikely that the number of generations will scale simply with body size, although there is some evidence of a positive correlation between generation time and body size for cephalopods (57) and annelids (data from ref. 58 and results not shown). The relationship between arthropod generation length and body size appears to be complex; both traits are separately affected by ecological factors such as season length, temperature (environmental and developmental), and resource availability (5961). The relationship between body size and generation time may be additionally confounded by effects of latitude and metabolic rate, meaning these traits might not systematically covary across taxa (60).

The number of germ-line divisions per generation is also known to vary both between different invertebrate species and between sexes. Drosophila melanogaster, for example, has ≈25 cell divisions ancestral to sperm cells, whereas the number of cell divisions ancestral to eggs depends on the age of the female. Caenorhabditis elegans, however, has ≈8.2 cell divisions ancestral to sperm and ≈10 ancestral to eggs (62). However, much data needs to be collected before it can be established how important an effect the number of germ-line cell divisions may have in a generation time effect on substitution rate. Because invertebrate generation time does not appear to correlate generally or simply with body size, we may not see a body size effect in invertebrates if generation time is the real cause of the body size effect in vertebrates. Although our results cannot rule out the possibility that variation in generation time is contributing to rate variation across taxa, we have seen no evidence of a systematic effect of this trait.

What Other Factors Could Cause Rate Variation?

In addition to metabolic rate and generation time, there are a number of other possible factors that could influence lineage-specific substitution rate. DNA damage resulting from either metabolites or copy errors will affect substitution rate if these mutations are not repaired correctly, and species can differ in the relative effectiveness of their DNA damage repair and proof-reading correction pathways (63). Substitution rate can also be influenced by a species’ effective population size, which may vary between closely related lineages, due to location (e.g., island versus mainland populations; see ref. 64) or lifestyle (e.g., the degree of inbreeding; see ref. 65). A number of studies have proposed environmental correlates of substitution rate variation, such as temperature (23), salinity (12), or UV exposure (66). We do not currently have sufficient information to test the effects of these variables on the rate of molecular evolution.

Implications for the Molecular Clock.

We have shown that variation in the rate of DNA sequence evolution is widespread amongst metazoan lineages, and it is therefore unwise to assume a constant rate of molecular evolution for the purpose of dating divergences. Furthermore, we have not found any evidence of systematic variation in rate of molecular evolution that would allow the molecular clock to be “corrected.” Gillooly et al. (29) have suggested that, because metabolic rate scales with size in some taxa, it may be possible to correct genetic distances with measures of body size and temperature, but our results suggest that this proposal is unlikely to provide a general solution. The results of this study also do not provide any evidence that discrepancies between molecular and fossil dates for the timing of the metazoan radiation could be due to a general body size effect. However, further research is still needed before such a complex problem can be resolved.

Ascertaining the true causes of rate variation, either generally or specifically concerning variation caused by changes in body size, will need much more research. The interactions of many biological traits and environmental factors are likely to produce complex patterns of lineage-specific rate variation in the invertebrates. Substantial empirical data on life history and ecology in many invertebrate lineages will have to be collected before we can conduct covariance analyses to tease apart their effects. Once this information is available, we may be able to determine whether there is a causal factor, which scales with body size in vertebrates but not invertebrates, underlying variation in the rate of molecular evolution in metazoans. It is hoped that this may eventually put us on the path of establishing the possible causes of rate variation and potentially reliably correcting the metazoan molecular clock.

Methods

Phylogenetic Comparative Methods.

Most statistical methods assume that data points are independent; thus using these methods with nonindependent data points may yield spurious results. Despite this fact, many previous attempts to identify correlates of rates of molecular evolution have used nonindependent data points. The most common kind of nonindependence involves treating each individual lineage as a separate data point. This approach can lead to a single instance of correlated change between trait and rate being counted multiple times if the change occurred in the common ancestor of multiple lineages (67, 68). Such pseudoreplication can be avoided by considering as a single data point the difference in trait values between a pair of taxa (69, 70). Crucially, however, the chosen pairs must be phylogenetically independent, i.e., they must not overlap on the phylogeny. If pairs do overlap, then the portions of the lineages forming the overlap will be counted multiple times, again violating the assumption of independence (e.g., ref. 29).

Therefore, to test the relationship between body size and the rate of molecular evolution in metazoans, we sought taxonomic groups for which we could obtain three types of data: (i) a published compendium of comparable body size data; (ii) publicly available DNA sequences for these taxa; and (iii) independent phylogenetic and/or taxonomic information to allow us to choose a sufficient number of independent comparison pairs. Here, we use the term “invertebrates” to refer to all metazoans except those in subphylum Vertebrata. Although invertebrate is a paraphyletic taxonomic division, it is appropriate here because the body size effect has been shown for vertebrates but no other metazoans. Throughout this article, we will use the term “data set” to refer to a set of phylogenetically independent comparisons for sequences of the same gene made between lineages that differed in average body size within a particular taxonomic group. We had 22 such data sets (see Table 1).

Body Size Data.

We sought body size data from the literature, targeting studies with the same measure of body size for a large number of related taxa (e.g., wing length for Lepidoptera or mantle length for Cephalopoda). A total of 10 taxonomic groups with sufficient body size data and sequence data were obtained. For six of the data sets, Lepidoptera, Arachnida, Cephalopoda, Gastropoda, Echinodermata, and Platyhelminthes (excluding Monogenea), body size measurements were species averages. Family-level body size averages were used for the Hymenoptera, Bivalvia, Annelida, and Monogenea. Size measurements and sources of data are available in Appendix 6, which is published as supporting information on the PNAS web site. Where size ranges were given in the sources, the geometric mean was calculated for use in statistical analyses. Where multiple length measurements were given (e.g., shell length and shell width for Gastropoda), an appropriate compound measurement (e.g., area) was calculated. For the Bivalvia, average family body sizes were determined from fossil species data (obtained from ref. 71). For two groups, Gastropoda and Echinodermata, more than one type of body size measurement was available (e.g., gastropod size measurements included shell height and width for snails; body length for slugs; and shell height, width, and depth for limpets): in these cases, comparisons were only made between species with the same measurement type. We also sought data on other life history traits, but such data was only available for three taxa: maturation time, longevity, fecundity and progeny volume/egg size for the Annelida and Platyhelminthes, and egg size for the Lepidoptera (for references see Appendix 6). These traits were also investigated for a significant association with substitution rate.

DNA Sequence Data.

Mitochondrial and nuclear genes were obtained from GenBank (www.ncbi.nlm.nih.gov). We used only those genes available for a sufficient number of taxa within each group and only sequences >300 base pairs in length. Sequences were aligned by eye using se-al (72), and regions of genes that could not be confidently aligned (e.g., hypervariable regions of RNAs) were excluded from the analysis (see Appendix 7, which is published as supporting information on the PNAS web site). We analyzed a total of 22 sequence alignments for seven different genes: three nuclear (histone H3 and 18S and 28S rRNA) and four mitochondrial (16S rRNA, NADH5, and cytochrome oxidases COI and COII).

Phylogenies were estimated for each of these alignments using maximum likelihood [as implemented in paup* 4.0 (73)]. We used the HKY85+Γ model of substitution, which includes variation in base composition, transition/transversion ratio, and between-site rate variation (modeled with a discretized gamma distribution with eight categories). All model parameters were estimated from the data for each alignment. HKY85+Γ encapsulates many important features of phylogeny reconstruction but is not overly parameter-rich. Because this study requires a model in which the rate of evolution is allowed to vary on every internal branch of each phylogeny, more parameter-rich models such as GTR may lead to overfitting. Tree topology was estimated using maximum likelihood methods, with the tree–bisection–reconnection search algorithm and a neighbor-joining starting tree.

Because we estimated topology from the gene sequence data, to avoid circularity, comparison pairs of lineages were chosen based on independent phylogenies from the literature or accepted taxonomic relationships (see Appendix 5). We used these phylogenies to ensure that the comparisons between lineages did not overlap with each other, thus maintaining the phylogenetic independence of data points in our analysis (69). The list of comparison pairs is in Appendix 5.

Testing for Rate Variation.

To determine whether our data sets demonstrated significant rate variation between lineages, we compared the likelihood scores obtained for each alignment under two different models of rate variation: (i) a fixed-rate molecular clock model (with a single substitution rate applying to all lineages) and (ii) a free-rate model (with a separate substitution rate estimated for each branch). Phylogenetic topology was allowed to vary freely in each case. We used the Akaike information criterion (AIC) to compare the likelihood of the two models because models were not nested. Because the free-rate model of sequence evolution contains many more parameters than the fixed-rate model, we chose to use the second-order AIC (AICc), taking the number of base pairs as the effective sample size (74, 75). The AICc is a model comparison statistic that penalizes parameter-rich models, and with its harsher penalties, it outperforms the standard AIC when the number of parameters is not much smaller than the sample size. Following convention, we took as our threshold of significance a difference in AICc score of 10 units (74, 75). As such, if the AICc score for the free-rate model was greater by 10 or more units than the equivalent score for the fixed-rate model, it was taken as evidence of significant rate variation in the phylogeny.

Testing for a Body Size Effect.

Each data point in our analyses was a comparison between two taxa, contrasting the difference in their average body size with the relative difference in amount of molecular change accumulated since their common ancestor. The independent variable was calculated as ln(BSbig/BSlittle), where BSbig is the body size measurement of the bigger species and BSlittle that of the smaller species. The dependent variable was calculated as ln(λbig/λlittle), where λ was the branch length from each of the species to the last common ancestor of the pair. Branch lengths were taken from the free-rate maximum likelihood phylogeny estimated for each alignment (if the pair did not form a monophyletic group in the phylogeny any intervening taxa were deleted and the branch lengths reestimated using maximum likelihood estimates of the other parameters obtained in the phylogenetic estimation). To investigate whether body size might have different effects on silent and amino acid substitutions in the protein coding sequences, rates of molecular evolution for nonsynonymous and synonymous sites (dN and dS) were also calculated using the codeml program in paml v3.14 (4).

We tested for an association between branch length and body size separately for each data set, using three different nonparametric tests. (i) The sign test: if there is no association between body size and substitution rate, then the difference between the big and little species’ substitution rates should be randomly distributed around zero. An excess of positive or negative signs in this difference indicates a nonrandom association between variables. (ii) The signed ranks test additionally takes into account the magnitude of the branch length difference when assessing whether there is a significant association between size and rate. (iii) Spearman’s rank correlation accounts for the magnitudes of the differences in both body size and rate. If the substitution rate is systematically influenced by body size, then the greater the size difference between species, the greater the difference in species’ branch lengths. Because the sign and signed ranks tests do not take the magnitude of the body size difference into account, different measures of body size can be used within these tests, as long as they are the same within a comparison pair. For Spearman’s rank correlation, however, body size measurements must be of the same type across the whole data set. Thus, any data sets containing more than one type of body size measurement (e.g., Gastropoda and Echinodermata) were not included in this test. All tests were two-tailed; although the correlation between size and substitution rate in vertebrates is negative, we did not wish to assume the same direction of association in invertebrates.

Additionally, it was necessary to address the problem of multiple tests, due to the number of data sets analyzed. Using a significance value of P = 0.05, we would expect around 1 in 20 type I statistical errors (false positives). We therefore used the Bonferroni correction to adjust our P values for the number of tests. A Bonferroni- adjusted P value is the single test P value multiplied by the number of outcomes being tested. Any returned P value that is >0.05 after correction can be considered nonsignificant (if the adjusted P value ends up >1.0, it is rounded down to 1.0).

Two further analyses were also carried out to increase the power of the tests used to detect a body size effect. First, we increased the number of data points by pooling all data sets into one metaanalysis. If the body size effect is relatively weak or if there are many confounding factors that add noise to measurements of body size or substitution rate, then an effect may only be detected for very large data sets. Because each data set was phylogenetically independent, we were able to combine comparisons from all data sets into one metaanalysis, using one gene from each taxon (selecting the gene with the greatest number of possible comparisons). We then tested for an association between size and rate using sign and signed rank tests. It is important when combining results across data sets to take into account the diversity of the invertebrate phyla; body size and substitution rates may be associated in different ways in different taxa. Just as it cannot be assumed that invertebrates will have the same body size effect as the vertebrates, it cannot be assumed that the direction of a body size effect will be the same for all invertebrate phyla. It is possible that opposing effects in different phyla (e.g., a negative effect in the Mollusca but a positive effect in the Arthropoda) may “cancel each other out” in a metaanalysis. Consequently, data sets were combined for each phyla, generating within-phylum metaanalyses (see Appendix 3).

Second, we selected a new set of comparisons to maximize the size differences between pairs. If a large number of data points have only relatively small differences in body size or rates, then any relationship may be masked by measurement error. To deal with this, we first selected one new comparison pair for each data set (or two where possible), between the very largest “Goliath” and very smallest “David” species, where a body size effect would most likely be observed if it existed. Any intervening taxa were pruned from the phylogeny and branch lengths reestimated for the pair. Tests of significant rate variation were then performed for these pairs using the baseml program in paml v3.14 (4). Choosing an outgroup for each pair from the phylogeny, we used a likelihood ratio test to compare the likelihoods obtained for each triplet of species between two models of molecular evolution: a free-rate model (with a separate rate applied to each lineage) and a local molecular clock model (with one rate applied to the comparison pair lineage and one to the outgroup). The relationship between the two variables was assessed by using sign and signed rank tests, both for the set of comparisons that differed significantly in rate and for all of the David and Goliath pairs (see Appendix 4).

Considered together, the analyses included comparison pairs at a range of taxonomic levels, for example between species, genera, and families, allowing the effect of comparison depth to be examined.

Supplementary Material

Supporting Information

Acknowledgments

We thank K. Roy, D. McHugh, P. Fong, and R. Poulin for sending body size data and independent phylogenies and C. Bleidorn and P. Colgan for providing annelid alignments. Special thanks also go to M. Broom, R. Lanfear, and E. Fontanillas for useful discussions. This research was supported by a Biotechnology and Biological Sciences Research Council research grant.

Footnotes

Conflict of interest statement: No conflicts declared.

This paper was submitted directly (Track II) to the PNAS office.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information