HOW DO GEOLOGICAL SAMPLING BIASES AFFECT STUDIES OF MORPHOLOGICAL EVOLUTION IN DEEP TIME? A CASE STUDY OF PTEROSAUR (REPTILIA: ARCHOSAURIA) DISPARITY (original) (raw)

Abstract

A fundamental contribution of paleobiology to macroevolutionary theory has been the illumination of deep time patterns of diversification. However, recent work has suggested that taxonomic diversity counts taken from the fossil record may be strongly biased by uneven spatiotemporal sampling. Although morphological diversity (disparity) is also frequently used to examine evolutionary radiations, no empirical work has yet addressed how disparity might be affected by uneven fossil record sampling. Here, we use pterosaurs (Mesozoic flying reptiles) as an exemplar group to address this problem. We calculate multiple disparity metrics based upon a comprehensive anatomical dataset including a novel phylogenetic correction for missing data, statistically compare these metrics to four geological sampling proxies, and use multiple regression modeling to assess the importance of uneven sampling and exceptional fossil deposits (Lagerstätten). We find that range-based disparity metrics are strongly affected by uneven fossil record sampling, and should therefore be interpreted cautiously. The robustness of variance-based metrics to sample size and geological sampling suggests that they can be more confidently interpreted as reflecting true biological signals. In addition, our results highlight the problem of high levels of missing data for disparity analyses, indicating a pressing need for more theoretical and empirical work.

One of the fundamental contributions of analytical paleobiology to broader studies of evolution has been the illumination of trends in the diversification of life over geologic time scales (e.g., Sepkoski et al. 1981; Niklas et al. 1983; Benton 1985, 1995, 2010; Valentine 1985; Alroy et al. 2001, 2008; Benton and Emerson 2007; Alroy 2010a, b). However, since paleobiologists began to intensively compile databases documenting the history of biotic diversification, there have been concerns about the potential effects of uneven sampling on observed diversity patterns (e.g., Raup 1972, 1976; Sepkoski 1976). Uneven spatiotemporal sampling of the fossil record takes three major forms: geological biases in the amount of available fossiliferous rock, variation in the quality of fossil preservation through time, and anthropogenic biases in the extent to which the available rock record has been surveyed by paleobiologists.

A major recent focus in paleobiological research has been explicit quantification of the relationship between sampling and observed taxonomic diversity (e.g., Smith 2001; Peters and Foote 2001; Crampton et al. 2003; Peters 2005, 2006; Smith and McGowan 2007; Fröbisch 2008; McGowan and Smith 2008; Barrett et al. 2009; Butler et al. 2009, 2011; Wall et al. 2009; Benson et al. 2010; Mannion et al. 2011) and the development of techniques to ameliorate these sampling biases via statistical modeling or sampling standardization (Alroy et al. 2001, 2008; Smith and McGowan 2007; McGowan and Smith 2008; Alroy 2010a, b). This flurry of research has persuasively demonstrated that, in many cases, sampling biases are present and could have a confounding effect on observed diversity patterns. As a result, understanding genuine patterns of diversification in deep time requires knowledge of sampling.

Almost all such work carried out to date has focused on the relationship between sampling proxies and taxonomic diversity, usually global diversity. Taxonomic diversity, however, is only one possible measure of biotic diversification, and the degree to which sampling biases may affect other deep time biodiversity metrics remains largely unquantified. One of the most frequently assessed “additional” biotic diversification measures is morphological disparity: variability in shape, form, and body plan among organisms (Gould 1991: Foote 1993, 1997; Wills et al. 1994; Ciampaglio et al. 2001; Erwin 2007). Disparity is a morphological equivalent of taxonomic diversity, in that it measures the variety of anatomical features expressed by a clade, and therefore quantifies an aspect of diversification that is not captured by, or necessarily directly correlated with, other metrics such as evolutionary rates (Wagner 1997; Brusatte et al. 2008a, 2011a). Taxonomic diversity and morphological disparity are often both compiled for a group of interest, and temporal patterns in each compared side-by-side, an exercise that can give profound insight into the tempo and mode of evolutionary radiations through deep time (Foote 1993, 1997).

Because morphological disparity is conceptually similar to taxonomic diversity, and because both metrics are assessed by reference to the observed fossil record, it is natural to question whether disparity is subject to similar sampling biases as those documented for diversity estimates. It has already been established that some disparity metrics are sensitive to differences in sample size: range-based measures, for instance, tend to become inflated with larger sample size, and rarefaction is therefore often used to assess whether disparity trends (temporal, taxonomic, etc.) are robust with respect to sample size differences (e.g., Foote 1992; Wills et al. 1994; Wills 1998; Ciampaglio et al. 2001; Deline 2009). This statistical reality raises the theoretical possibility that heterogeneous geological or anthropogenic sampling might bias our perception of disparity through deep time. However, this has yet to be demonstrated empirically, resulting in unanswered questions on the interaction between the geological and fossil records. Are time intervals with greater amounts of accessible fossiliferous rock or higher intensities of sampling by paleobiologists likely to exhibit relatively higher measures of disparity due to sampling bias alone? If sampling biases do affect disparity, how prevalent and problematic is this effect? Are some disparity metrics more sensitive to incomplete sampling than others?

Here, we present the first attempt to address these questions with empirical data by using the fossil record of pterosaurs, the familiar flying reptiles of the Mesozoic, as a case study. We use an extensive new dataset of discrete morphological characters scored across 100 pterosaur species, ranging from the Late Triassic to the Late Cretaceous. From these data, we calculate four measures of disparity, which are plotted over time. These disparity curves are statistically compared to temporal plots of several sampling proxies, compiled using the extensive records in The Paleobiology Database (PBDB), which provide estimates of the potential to sample pterosaur species for each time interval. We use these comparisons to assess the relationship between sampling and disparity in an empirical case study. This is currently one of the few practical examples in which explicit measures of disparity and detailed sampling proxies are both available for a single clade.

Materials and Methods

CHOICE OF STUDY GROUP

Pterosaurs were a diverse clade of Mesozoic diapsid reptiles that were the first vertebrate group to evolve true powered flight (e.g., Padian 1985). The earliest unequivocal pterosaur material is from the middle–late Norian stage of the Late Triassic (c. 210 million years ago [Ma]), and pterosaur fossils are globally distributed from the Early Jurassic onwards (Barrett et al. 2008). Pterosaurs vanished in the end Cretaceous mass extinction event (c. 65.5 Ma). Pterosaurs achieved considerable taxonomic and morphological diversity in their 145+ million year evolutionary history—approximately 140 species are currently recognized, with mass estimates (Witton 2008; Henderson 2010) ranging from just 5–35 g (Anurognathus) up to 259–544 kg for the giant Quetzalcoatlus, whose wingspan is estimated at 10–12 m. Diverse skeletal morphology within the clade suggests the presence of a broad range of ecological adaptations (e.g., Bonaparte 1970; Bennett 2007; Witton and Naish 2008; Lü et al. 2010).The morphological disparity of pterosaurs has been the subject of two previous studies: Dyke et al. (2009) and Prentice et al. (2011). Dyke et al. (2009) analyzed disparity based upon measurements of fore- and hindlimb bones for 28 species, whereas Prentice et al. (2011) focused on discrete morphological characters from across the skeleton scored in 56 species (using the cladistic dataset of Lü and Ji 2006). Both studies found that pterosaur disparity peaked during the Late Jurassic–Early Cretaceous (more than 50 million years after the first appearance of the group in the fossil record), contrasting with many studies of disparity in other major clades that have found evidence for rapid expansions in disparity early on in a clade's evolutionary history (see references in Erwin 2007). Prentice et al. (2011) noted, however, that this apparent late peak in disparity could simply reflect poor sampling in other (particularly earlier) time intervals.

For the current study, we selected pterosaurs as a study group because they fulfill several basic requirements. First, a new, large, and comprehensive species-level morphological cladistic dataset has recently become available for the group (Andres 2010), providing the discrete character data essential for the most common types of disparity analyses. Second, comprehensive data on geological sampling of their fossil record exist (compiled by Barrett et al. 2008 and subsequently incorporated into the PBDB). Third, the relationship between taxonomic diversity and geological sampling has previously been examined for the group (Butler et al. 2009), with results suggesting that sampling biases have had a major impact on observed taxonomic diversity. Most other fossil groups do not currently fulfill these requirements.

ANATOMICAL CHARACTER DATASET

We used an updated version of the character dataset of Andres (2010; see Fig. S1), which includes 100 operational taxonomic units (OTUs), each representing a valid pterosaur species, plus a single outgroup (Euparkeria capensis). This represents 70.4% of 142 known pterosaur species. These were scored for 183 morphological characters, of which 152 are discrete and 31 continuous. This analysis includes all valid species and characters used in previous phylogenetic studies, incorporating data from the entire >140-year history of pterosaur phylogenetic inference.

Two alterations were necessary to convert the dataset into one ideal for a disparity study. First, because our disparity methods require discrete characters, we converted the continuous characters of Andres (2010) into discrete characters with three character states, using the gap-weighting method of Thiele (1993) (see Supporting information for further details).

Second, because of the fragmentary nature of many pterosaur species, a missing data reconstruction method was implemented. The phylogenetic dataset contains a high proportion of missing data (total percentage of missing or inapplicable data: 51.5%; average missing data per OTU: 41.8%; individual species range from 0% to 97.8% missing data). Missing data may sometimes be a problem in disparity studies, because it reduces the number of pairwise character comparisons between taxa during construction of the distance matrix (see below). Preliminary disparity analyses suggested that this high proportion of missing data may generate spurious results (see Results). To cope with this missing data problem, we reconstructed missing character state scores (“?” or “-”) by reference to character optimization on an Adams consensus of the two most parsimonious trees (this consensus tree is identical to one of the most parsimonious trees) generated by the analysis of the modified data matrix of Andres (2010). We used only unambiguously optimized character states in MacClade (Maddison and Maddison 2005). This resulted in a much lower proportion of missing data: approximately 6% of all character scores. Zero missing data could be obtained via ACCTRAN or DELTRAN optimization criteria, but this small improvement was not considered adequate to justify the arbitrary choice of an equivocal optimization regime.

TIMESCALE AND TIME BINS

Our geological timescale is based upon that presented by Gradstein et al. (2004), incorporating recent revisions to the Triassic timescale (see Walker and Geissman 2009). Many recent studies of diversity have used geological stages, or subdivisions thereof, as their temporal bins (e.g., Peters 2005; Smith and McGowan 2007; Barrett et al. 2009; Butler et al. 2009, 2011; Benson et al. 2010; Mannion et al. 2011). However, a potential problem with this approach is the uneven lengths of geological stages (e.g., Mesozoic stages range from 2.3–13 million years in length with a standard deviation of 3.38 million years), because counts of diversity and geological sampling are expected to accumulate in longer time bins. This could result in a spuriously strong correlation between the two metrics, driven merely by variation in bin length. Although recent work has suggested that uneven bins lengths may not be a major problem for studies of Mesozoic vertebrates (Benson et al. 2010; Butler et al. 2011; Mannion et al. 2011), we here adopt an alternative approach in which we divide our data series (both disparity metrics and sampling proxies) into a series of 15 equal-length bins, each 10 million years long, spanning 65–215 Ma. The choice of 10 million year bin lengths represents a compromise between shorter bins, which are problematic because each bin contains fewer species, and longer bins that reduce statistical power due to the smaller number of overall datapoints for pairwise comparisons (see Supporting information). Previous studies of pterosaur disparity (Dyke et al. 2009; Prentice et al. 2011) have used coarser epoch-level bins (e.g., Late Jurassic, Early Cretaceous), which range in length from 15.7 to 45.9 million years. To place each of our species into one or more 10 million year bins, we first obtained a “stratigraphic range” for each from the PBDB (plus some literature not yet incorporated into the PBDB), and then assigned a maximum and minimum age in millions of years to each species using data from Gradstein et al. (2004) and Walker and Geissman (2009). Following the assignment of stratigraphic ranges, species were assigned to all 10 million years bins to which they could possibly belong (see discussion of this approach in Upchurch and Barrett 2005). Forty-three percent of species could be assigned to a single bin, 50% to two bins, and only 7% were assigned to three or more bins. This approach counts single occurrences multiple times and is likely to overestimate the number of taxa (and thus perhaps disparity) in some time bins, but should not underestimate the taxon counts in any bin, unlike other possible approaches.

Because the 185–195 Ma bin (middle Sinemurian–earliest Toarcian; Lower Jurassic) included just a single taxon, it was excluded from all subsequent analyses; most analyses are therefore based upon 14 time bins, with the exception of analyses involving rarefied data, which include 12 bins (see below). Data on geological sampling (see below) were binned in an equivalent manner to species.

DISPARITY CALCULATIONS

Disparity metrics were calculated based on a protocol that is now well established in the literature (e.g., Foote 1993; Wills et al. 1994; Ciampaglio et al. 2001). Because there are many options for using discrete characters to measure disparity, we followed a set of procedures that have been employed in recent studies of disparity in extinct vertebrate clades (e.g., Brusatte et al. 2008a,b, 2011a, b; Ruta 2009; Cisneros and Ruta 2010; Young et al. 2010; Prentice et al. 2011). First, the discrete character dataset was used to generate an Euclidean distance matrix, using the freeware program MATRIX (M. Wills, pers. comm.), which quantified the pairwise differences between all taxa. The distance matrix was next subjected to principal coordinates analysis (PCO) with a negative eigenvalue correction (Calliez method), using the freeware program Gingko (Universitat de Barcelona, http://biodiver.bio.ub.es/ginkgo/). The first PCO axis represents those characters contributing most to the overall variability among species, and each additional axis represents characters of progressively less significance. The various axes, akin to the two-dimensional axes on a bivariate graph, allow each species to be plotted within a multidimensional morphospace. The PCO analysis returned a set of PCO scores for each species, which plots the species on each of the 101 morphospace axes (Fig. 1). Disparity metrics were calculated using the first 22 PCO axes, which encompass 80% of the Gower-transformed dissimilarity (roughly similar to variance). This cut-off was determined by identifying a substantial break in the slope of the scree plot (Wills et al. 1994). Four disparity metrics were then calculated for each time interval (Fig. 2): the sums and products of the ranges and variances on the 22 axes (Wills et al. 1994), using the software program RARE (Wills 1998). Multiplicative measures were normalized by taking the twenty-second root (corresponding to the number of PCO axes used). Range measures encompass the entire spread of morphological variation exhibited during a certain time interval (multidimensional morphospace volume), whereas variance measures quantify mean dissimilarity among forms (the spread of taxa in morphospace compared to its center).

Morphospace plots (PCO1 versus PCO2) for datasets prior to reconstruction of missing data (top) and following missing data (bottom).

Figure 1.

Morphospace plots (PCO1 versus PCO2) for datasets prior to reconstruction of missing data (top) and following missing data (bottom).

Raw (nonrarefied) disparity metrics and geological sampling proxies compared to geological time. Trend lines (including the r2 value) are shown in solid gray. Numbers in italics at the top of the range-based disparity plots indicate the numbering of the time bins used in the text.

Figure 2.

Raw (nonrarefied) disparity metrics and geological sampling proxies compared to geological time. Trend lines (including the _r_2 value) are shown in solid gray. Numbers in italics at the top of the range-based disparity plots indicate the numbering of the time bins used in the text.

Subsampled disparity curves were produced via rarefaction analysis implemented by RARE (Fig. 3). For the rarefied disparity curves that are discussed in this article, disparity was subsampled at a common sample size of five. Three time bins have lower sample sizes than five: 185–175 Mya (four pterosaur taxa), 85–75 Mya (four taxa), and 75–65 Mya (three taxa). Therefore, in the rarefied disparity curves, the 185–175 Mya time bin was deleted and the 85–75 and 75–65 Mya time bins were combined into a single bin with a sample size of seven. The statistical significance of an observed disparity difference between two bins, in both the raw (nonrarefied) and rarefied curves, was determined by the overlap or nonoverlap of 95% bootstrap confidence intervals, which were calculated by RARE (1000 replications). This exercise does not reperform a PCO analysis for each subsampled collection of taxa, but rather assumes that the PCO of the entire dataset represents a “true” morphospace.

Rarefied disparity metrics compared to geological time. Trend lines (including the r2 value) are shown in solid gray. Numbers in italics at the top of the range-based disparity plots indicate the numbering of the time bins used in the text.

Figure 3.

Rarefied disparity metrics compared to geological time. Trend lines (including the _r_2 value) are shown in solid gray. Numbers in italics at the top of the range-based disparity plots indicate the numbering of the time bins used in the text.

SAMPLING PROXIES

We used four separate data series as proxies for sampling of the rock record across the Mesozoic (Fig. 2). Two were based on counts of fossiliferous geological formations and two were based upon the number of distinct fossiliferous collections made by researchers. Formations are formally defined units of stratigraphy, representing lithologically distinct rock units that can be mapped by field geologists. Counts of formations per time bin have been widely used as a proxy for temporal variation in the amount of rock and the variety of paleoenvironments available for paleontologists to sample (e.g., Raup 1976; Peters and Foote 2001). Although formations vary in thickness, erosion rates, preservation potential of vertebrate fossils, amount of exploration by palaeontologists, and aerial extent of outcrop, it is plausible that this variation is randomly distributed. Thus, formation counts may represent an adequate proxy for geological sampling, as suggested by significant correlation with other measures of rock volume (Peters and Foote 2001; Peters 2005; although see Crampton et al. 2003).

Our first sampling proxy is the number of pterosaur-bearing formations (PBFs) through time, calculated from data in the PBDB (downloaded September 2, 2010). PBF counts include any geological formation that has ever yielded published pterosaur fossils, regardless of the completeness or taxonomic validity of that material, and were previously used for comparisons between rock record sampling and pterosaur taxonomic diversity (Butler et al. 2009). Counts of PBFs per bin range from 8 to 54 (average = 31; standard deviation = 14.3). We are confident that this data series is comprehensive as it was entered by one of us (RJB) for the purposes of this and related studies.

Second, we compiled data on tetrapod-bearing formations (TBFs) through time, also calculated from the PBDB (downloaded October 25, 2010). This includes all geological formations that have yielded tetrapod body fossils, although formations that have to date only yielded fully pelagic reptiles, such as plesiosaurs and ichthyosaurs, were excluded from these counts. It should be noted that the exclusion of these formations has a negligible impact on the shape of the TBF data series. Counts of TBFs per bin range from 63 to 226 (average = 134; standard deviation = 52.3). Although data in the PBDB are not yet comprehensive for tetrapods, it is approximately so for some major Mesozoic groups (e.g., dinosaurs and pterosaurs). Temporal variation in TBFs plausibly reflects temporal variation in the total volume of fossiliferous rock from which pterosaurs could potentially be collected. TBFs were chosen as a sampling proxy in addition to PBFs because this metric takes into account both successful sampling of pterosaurs, and sampling failure—that is, it incorporates formations that have yielded fossil tetrapod bones (and thus have the potential to yield pterosaurs) but have to date failed to yield pterosaurs. This mitigates the possibility that formation counts and taxic diversity may be redundant with respect to one another in some cases (Benton 2010). Although PBF counts may be valid as a sampling proxy when pterosaurs are abundant, cosmopolitan, and found in all environments, global decline in pterosaur abundance, or restriction to a subset of facies, may reduce PBF counts, even though genuine opportunities to sample pterosaurs (measured independently by TBFs) are not diminished. TBFs allow us to determine (by comparison) the extent of possible biases introduced by using PBFs and to test for biases in the fossil record of pterosaurs and thus provide a taphonomic control (sensu Bottjer and Jablonski 1988).

Collections-based sampling metrics count the number of fossil collections from each time bin, and have been extensively used in studies attempting to reconstruct accurate diversity patterns via subsampling-based procedures for standardization of sampling effort (e.g., rarefaction, and other methods applied to paleodiversity data by Alroy et al. 2001, 2008; Alroy 2010a, b). These counts are based upon PBDB “collections,” which ideally represent the fossils collected from a single stratigraphic horizon within a small geographical area (i.e., a single small quarry or locality). Although collections-based sampling metrics undoubtedly incorporate aspects of temporal variation in rock volume, they also more explicitly incorporate anthropogenic factors, such as the amount of research effort devoted by paleobiologists to particular time bins. We compiled counts of pterosaur-bearing collections (PBCs) and tetrapod-bearing collections (TBCs, excluding pelagic taxa). Counts of PBCs per bin range from 17 to 99 (average = 59; standard deviation = 28.8), whereas counts of TBCs range from 184 to 2398 (average = 861; standard deviation = 583.6). Bin 3 (late Pliensbachian–early Aalenian) is an outlier for PBCs when compared to other sampling proxies; this likely reflects a monographical bias, where every single specimen known from this time bin has been entered into the PBDB as the result of the recent publication of a comprehensive monograph on Toarcian pterosaurs (Padian 2008).

STATISTICAL COMPARISONS

Statistical analyses were carried out using PAST version 2.0 (Hammer et al. 2001) unless otherwise indicated. Data series were initially examined for evidence of trend, which results in extreme values at either end of data series, thereby artificially inflating correlation coefficients (e.g., McKinney 1990; Alroy 2000; Chatfield 2003; Butler et al. 2011). Values of each data series (disparity measures, both raw and rarefied, and sampling proxy measures) were individually regressed against the midpoint ages of the time bins. Trend was inferred when a statistically significant fit of the least squares regression line was detected. In general, all data series showed evidence for long-term positive trend, with the exception of the last four bins of the raw disparity and PBF/PBC data series, and the last two or three bins of the rarefied disparity data series, which represent outliers from the general pattern for these data series of increasing values through the Mesozoic.

Detrending was carried out by calculating the equation of the least squares regression line for the first 10 bins (for disparity, rarefied disparity and PBFs/PBCs; in which the terminal four values are trend outliers [see above]) or all bins (for TBFs/TBCs). This equation was used to calculate predicted values for each bin, which were subtracted from those actually observed. The Jarque–Bera test confirmed that most data series were normally distributed; TBCs were normally distributed following log10 transformation. Serial correlation was absent from detrended data, as confirmed by nonsignificant values of Pearson's correlation coefficient for each data series compared to itself at a time lag of one bin. Thus, additional data transformations, such as generalized differencing (McKinney 1990; Alroy 2000), were not required.

We made pairwise statistical comparisons between detrended data series using Pearson's product-moment, Spearman's rank, and Kendall's tau. Nondetrended data series were also compared, but the resulting coefficients must be treated with caution because of the confounding problem of trend.

MULTIPLE REGRESSION MODELS TO DETECT THE INFLUENCE OF LAGERSTÄTTEN

Much of our understanding of pterosaur evolution results from exceptional fossil-bearing deposits, often called “Lagerstätten” (Buffetaut 1995; Butler et al. 2009). These deposits yield more abundant pterosaur remains, more complete individuals, or complete, exceptionally preserved individuals. These deposits provide more detailed anatomical data, and more taxonomically diagnostic fossil material, than other deposits and may inflate our view of disparity. Lagerstätten, as described here, present a problem for studies of rock record sampling. They represent sampling opportunities that are not quantified by more general measures of rock and facies availability (e.g., formation counts), and can be difficult to subdivide into discrete “collections” based on any consistent rationale. They are a departure from the “background” pterosaur fossil record both in the number of specimens and quality of preservation.

To elucidate the relationship between disparity metrics, our sampling proxies, and Lagerstätten, we tested generalized least squares multiple regression models with underlying autoregressive models of orders 0–3 (see Supporting information; for other applications to paleontological time series see Hunt et al. 2005; Marx and Uhen 2010; R. B. J. Benson and R. J. Butler unpubl. ms.).

We identified nine exceptional pterosaur-bearing deposits (see Supporting information). A Lagerstätten variable was constructed by counting the number of these deposits in each time bin (cf., R. B. J. Benson and R. J. Butler unpubl. ms.). This attempt to quantify heterogeneity in the pterosaur record is necessarily coarse because the cut-off between Lagerstätten and non-Lagerstätten somewhat intangible and difficult to define. Regression models were assessed using an Akaike information criterion with correction for small sample sizes (AICc: Sugiura 1978) to establish whether sampling alone, a combination of sampling and Lagerstätten, or a null model including neither, provided the best explanation of pterosaur disparity. Good models are indicated by low values of the information criterion, which rewards goodness of fit and penalizes higher numbers of variables. The Jarque–Bera test and Breusch Pagan test confirmed that the residuals from our regression models were normally distributed and homoskedastic in most cases. Several models describing the product of ranges had heteroskedastic residuals that could not be rendered homoskedastic by either square-root transformation or log-transformation. The fit of these models is overestimated (e.g., Burnham and Anderson 2001) and their results should be treated with caution but do not differ qualitatively from those for the sum of ranges.

Results

MORPHOSPACE PLOTS

The morphospace plot (based upon the first two PCO axes) following reconstruction of missing data (Fig. 1B) is highly congruent with results from phylogenetic analysis of the dataset (Fig. S1). Most pterosaur clades recovered by phylogenetic analysis are represented by distinct clusters in morphospace, and the first two PCO axes largely reflect the branching order of these clades. This is not surprising, given that the first PCO axis (and often additional axes) is known to have a strong phylogenetic signal in disparity analyses (e.g., Brusatte et al. 2008b).

The order of clusters along PCO1 (from positive to negative values) is essentially identical to the branching order of the taxa they represent along the backbone of the phylogeny from the root to Ornithocheiroidea (Fig. S1). PCO1 therefore also represents, in part, time, with Triassic pterosaurs plotting with extreme positive values and most Cretaceous pterosaurs plotting with negative values. Variation in PCO2 is largely restricted to within Ornithocheiroidea (the constituent clades of which have approximately the same, negative, PCO1 values), and within this clade the Azhdarchoidea has extreme negative values whereas the Pteranodontoidea has extreme positive values. Whereas there is little correlation between phylogenetic relationships and PCO2 within Azhdarchoidea, branching order within Pteranodontoidea is highly congruent with increasing PCO2 values. This extreme disparity of the azhdarchoids and pteranodontoids relatively late in the evolutionary history of pterosaurs gives the morphospace plot an overall T-shape.

IMPACT OF RECONSTRUCTION OF MISSING DATA

The morphospace plot generated from the initial anatomical character dataset without the reconstruction of missing data (unreconstructed dataset) displays much more scatter of datapoints within clades (Fig. 1A), resulting in overlapping distributions of higher taxa. A Mantel Test, which assesses whether two matrices are significantly correlation with one another, rejects the null hypothesis of no correlation between the unreconstructed and reconstructed matrices (correlation coefficient = 0.4234; P = 0.000). Nevertheless, the low correlation coefficient demonstrates that the two matrices are far from identical.

PCO2 of the unreconstructed dataset is relatively congruent with phylogeny and generally reflects the branching order of pterosaur groups along the backbone of the phylogeny from the root (negative values) to Azhdarchidae (positive values). As such, PCO2 of the unreconstructed analysis (Fig. 1A) is similar to PCO1 of the reconstructed analysis (Fig. 1B), except that clades of pterosaurs do not cluster as tightly, greatly overlap one another in morphospace, and show a much greater range of values.

PCO1, which encapsulates the most variation in the unreconstructed dataset, largely reflects the relative amounts of cranial and postcranial character data available for individual species. Species known only from cranial material have very high positive values, species known only from postcranial material have very high negative values, and species that are very complete (or, at least, have similar amounts of missing data for the cranium and postcranium) have intermediate values. As such, preservation accounts for the greatest variability in the unreconstructed dataset, and distribution of taxa on PCO1 results mainly from similarities in the distribution and proportion of missing data. Clearly, the resulting plot is erroneous, and justifies our usage of the missing data correction. Indeed, the morphospace plot following reconstruction of missing data is much more intuitive in reconstructing all major pterosaur clades as discrete spatial clusters (see above).

Disparity metrics calculated from the unreconstructed dataset generated some intuitively unusual results: for example, variance metrics show a peak in the terminal bin (latest Cretaceous). This result is erroneous given that the bin contains just three taxa representing only two clades (Azhdarchidae, Nyctosauridae) that are also present in adjacent bins with supposedly lower disparity, and probably arises from the highly incomplete and nonoverlapping preservation of the taxa in that bin (i.e., Nyctosaurus lamegoi, 97.8% missing/inapplicable data, known from a humerus only; Arambourgiania philadelphiae, 95.1% missing/inapplicable data, known from a cervical vertebra only; Quetzalcoatlus northropi, 39.3% missing/inapplicable data). The metrics calculated from the reconstructed dataset, however, are more intuitive in recovering low variance in the terminal bin. Furthermore, the error bars on the variance metrics calculated from the unreconstructed dataset are enormous, and several times as large as those for the variance measures calculated from the reconstructed dataset. It is no surprise, therefore, that variance metrics for the unreconstructed and reconstructed datasets are not significantly correlated, or only marginally correlated, with each other (_r_2: 0.20–0.29; Table S1). The two sets of range metrics, on the other hand, are significantly correlated (_r_2: 0.81–0.88; Table S1). In sum, the distance matrix, morphospace plot, and disparity metrics calculated from the unreconstructed data all exhibit unusual and inexplicable properties that seem to be caused by high amounts of missing data. The reconstructed dataset, however, delivers much more intuitive and reasonable results in each case.

RAW DISPARITY RESULTS

The four raw (nonrarefied) disparity metrics are strongly congruent with one another when plotted against time (_r_2 for detrended pairwise comparisons: 0.84–0.99, P < 0.000005; Table S2; Fig. 2). All show low disparity in bins 1–3 (Late Triassic–Early Jurassic: 215–175 Ma) followed by an increase in disparity through bins 4–6 (Middle–Late Jurassic: 175–145 Ma). Although exact patterns differ, each disparity metric shows at least one statistically significant increase between the set of the first three bins and the set of the next three bins (Fig. 2). A significant drop in disparity is observed in bin 7 (earliest Cretaceous: 145–135 Ma) for three metrics (sum and product of ranges, product of variances), but sum of variances shows a slight increase from the value in bin 6. Bins 8–10 (Hauterivian–Albian of the Early Cretaceous: 135–105 Ma) represent a period of consistently high disparity measured by all metrics, and at least one of these time bins exhibits significantly higher disparity than in bin 7 according to range metrics. These high bins are followed by substantially, and generally significantly, lower disparity in bins 11–14 (late Albian–Maastrichtian of the late Early–Late Cretaceous: 105–65 Ma) for the sum and product of ranges and the product of variances. This drop in disparity occurs one bin later (i.e., it is most marked from bin 12, the late Cenomanian–early Santonian onwards) and is less pronounced for the sum of variances. All four metrics indicate that the Late Jurassic–late Early Cretaceous (bins 6–10; Kimmeridgian–Albian) is the time of highest disparity, although the exact timing of peak disparity varies between metrics. For the range metrics and product of variances, some or all of these five bins exhibit significantly greater disparity than that of all other bins that fall outside of this temporal range.

RAREFIED DISPARITY

Rarefied disparity curves are significantly positively correlated with one another (_r_2 for detrended pairwise comparisons: 0.65–0.98, P < 0.003; Table S3) and with raw (nonrarefied) disparity metrics (_r_2 for detrended pairwise comparisons: 0.80–0.99, P ≤ 0.0002, Table S4), although the latter correlations are notably stronger for variance metrics. They show the same general pattern through time as the raw metrics (Figs 2 and 3). First, disparity increases through bins 1–6 (Triassic–Jurassic), with the disparity of bin 6 significantly higher than that of bin 1 for range metrics and the product of variances. This is followed by a decrease in disparity in bin 7 for range metrics and the product of variances, uniformly high disparity in bins 8 and 9, and then a decline in disparity in the terminal bins. This decline begins in bin 10 for the rarefied sum of ranges, bin 11 for the rarefied product of ranges and product of variances, and bin 12 for the rarefied sum of variances. The Late Cretaceous decline in disparity (bins 11–14) is more gradual and less pronounced than portrayed in the raw curves, and disparity is not significantly lower in these bins than in the Late Jurassic–Early Cretaceous (bins 5–10).

RELATIONSHIP BETWEEN SPECIES SAMPLE SIZE AND DISPARITY METRICS

Species sample size (number of pterosaur species in a bin) and raw disparity metrics show a nonlinear relationship (approximating a logistic curve), with little increase in disparity observed above sample sizes of ∼15 species for range metrics and product of variance, and above a sample size of approximately eight species for sum of variance (Fig. S2). The relationship between sample size and raw disparity is strongest for range metrics (Spearman's rho [ρ]= 0.88–0.90, P < 0.00004), weaker for product of variance (ρ= 0.75, P = 0.002) and weakest for sum of variance (ρ= 0.61, P = 0.021; Table S5).

Disparity results based upon rarefaction to a common sample size of five species show a similar nonlinear relationship with original species sample sizes (Fig. S2). The relationship between original species sample size and rarefied disparity is strongest for range metrics and product of variances (ρ= 0.74–0.80, P ≤ 0.006), and weaker for the sum of variances (ρ= 0.60, P = 0.04). Correlation coefficients for rarefied range metrics are lower than those for raw range metrics, whereas those for the rarefied variance data are similar to those calculated for raw variance metrics.

RELATIONSHIP BETWEEN GEOLOGICAL SAMPLING PROXIES AND DISPARITY METRICS

Disparity data series show significant positive correlations with PBCs and PBFs prior to detrending (Table 1). This relationship is nonsignificant for TBCs and TBFs, primarily due to the last four time bins (bins 11–14), which exhibit high sampling for TBCs and TBFs and low disparity (Fig. 2). When these four bins are excluded, significant positive correlations are detected between TBFs and all disparity metrics, and between TBCs and the sum of ranges (but not other disparity metrics). However, because of the presence of trend in all data series (which inflates correlation coefficients) these correlations based upon untransformed data series must be treated with caution.

Table 1.

Results of pairwise comparisons (using Pearson's product-moment) between nonrarefied disparity metrics and geological sampling proxies, prior to detrending. Values in brackets exclude outliers (the last four time bins are excluded for comparisons involving TBFs, TBCs, and PBFs; the last four time bins plus time bin 3 are excluded for comparisons involving PBCs). Results in bold are statistically significant (alpha = 0.05).

TBFs Log10(TBCs) PBFs PBCs
Sum of ranges 0.09; P = 0.77 0.06; P = 0.85 0.61; P =0.02 0.53; P =0.049
(0.88; P =0.0007) (0.65; P =0.04) (0.93; P =7.6 ×10−05) (0.81; P =0.005)
Product of ranges 0.08; P = 0.79 0.02; P = 0.94 0.58; P =0.03 0.55; P =0.04
(0.87; P =0.0009) (0.61; P = 0.06) (0.94; P =7.1 × 10−05) (0.94; P =0.0001)
Sum of variances 0.31; P = 0.28 0.13; P = 0.67 0.71; P =0.005 0.61; P =0.02
(0.69; P =0.03) (0.38; P = 0.28) (0.69; P =0.027) (0.64; P =0.02)
Product of variances 0.11; P = 0.70 –0.01; P = 0.99 0.58; P =0.03 0.58; P =0.03
(0.72; P =0.02) (0.41; P = 0.24) (0.87; P =0.001) (0.59; P =0.03)
TBFs Log10(TBCs) PBFs PBCs
Sum of ranges 0.09; P = 0.77 0.06; P = 0.85 0.61; P =0.02 0.53; P =0.049
(0.88; P =0.0007) (0.65; P =0.04) (0.93; P =7.6 ×10−05) (0.81; P =0.005)
Product of ranges 0.08; P = 0.79 0.02; P = 0.94 0.58; P =0.03 0.55; P =0.04
(0.87; P =0.0009) (0.61; P = 0.06) (0.94; P =7.1 × 10−05) (0.94; P =0.0001)
Sum of variances 0.31; P = 0.28 0.13; P = 0.67 0.71; P =0.005 0.61; P =0.02
(0.69; P =0.03) (0.38; P = 0.28) (0.69; P =0.027) (0.64; P =0.02)
Product of variances 0.11; P = 0.70 –0.01; P = 0.99 0.58; P =0.03 0.58; P =0.03
(0.72; P =0.02) (0.41; P = 0.24) (0.87; P =0.001) (0.59; P =0.03)

Table 1.

Results of pairwise comparisons (using Pearson's product-moment) between nonrarefied disparity metrics and geological sampling proxies, prior to detrending. Values in brackets exclude outliers (the last four time bins are excluded for comparisons involving TBFs, TBCs, and PBFs; the last four time bins plus time bin 3 are excluded for comparisons involving PBCs). Results in bold are statistically significant (alpha = 0.05).

TBFs Log10(TBCs) PBFs PBCs
Sum of ranges 0.09; P = 0.77 0.06; P = 0.85 0.61; P =0.02 0.53; P =0.049
(0.88; P =0.0007) (0.65; P =0.04) (0.93; P =7.6 ×10−05) (0.81; P =0.005)
Product of ranges 0.08; P = 0.79 0.02; P = 0.94 0.58; P =0.03 0.55; P =0.04
(0.87; P =0.0009) (0.61; P = 0.06) (0.94; P =7.1 × 10−05) (0.94; P =0.0001)
Sum of variances 0.31; P = 0.28 0.13; P = 0.67 0.71; P =0.005 0.61; P =0.02
(0.69; P =0.03) (0.38; P = 0.28) (0.69; P =0.027) (0.64; P =0.02)
Product of variances 0.11; P = 0.70 –0.01; P = 0.99 0.58; P =0.03 0.58; P =0.03
(0.72; P =0.02) (0.41; P = 0.24) (0.87; P =0.001) (0.59; P =0.03)
TBFs Log10(TBCs) PBFs PBCs
Sum of ranges 0.09; P = 0.77 0.06; P = 0.85 0.61; P =0.02 0.53; P =0.049
(0.88; P =0.0007) (0.65; P =0.04) (0.93; P =7.6 ×10−05) (0.81; P =0.005)
Product of ranges 0.08; P = 0.79 0.02; P = 0.94 0.58; P =0.03 0.55; P =0.04
(0.87; P =0.0009) (0.61; P = 0.06) (0.94; P =7.1 × 10−05) (0.94; P =0.0001)
Sum of variances 0.31; P = 0.28 0.13; P = 0.67 0.71; P =0.005 0.61; P =0.02
(0.69; P =0.03) (0.38; P = 0.28) (0.69; P =0.027) (0.64; P =0.02)
Product of variances 0.11; P = 0.70 –0.01; P = 0.99 0.58; P =0.03 0.58; P =0.03
(0.72; P =0.02) (0.41; P = 0.24) (0.87; P =0.001) (0.59; P =0.03)

Following detrending, significant positive correlations between disparity and PBFs/PBCs were recovered for both range metrics (Table 2). Significant positive correlations were also recovered between PBFs/PBCs and sum and product of variances, but only using Pearson's correlation coefficient. Significant positive correlations between TBFs/TBCs and range metrics were recovered only when the last four time bins were excluded (see above). TBFs/TBCs and variance metrics were not significantly correlated, regardless of the exclusion of the last four time bins. In general, the detrended rarefied disparity series do not show significant statistical correlations with sampling proxies, although this may in some instances reflect low statistical power: moderately high and marginally nonsignificant to marginally significant correlation coefficients were recovered for all pairwise comparisons involving PBFs and PBCs (Table 3).

Table 2.

Results of pairwise comparisons (using Pearson's product-moment) between nonrarefied disparity metrics and geological sampling proxies, after detrending. Values in brackets exclude outliers (the last four time bins are excluded for comparisons involving TBFs, TBCs, and PBFs; the last four time bins plus time bin 3 are excluded for comparisons involving PBCs). Results in bold are statistically significant (alpha = 0.05).

DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges 0.01; P = 0.97 –0.01; P = 0.96 0.80; P =0.0006 0.70; P =0.005
**(0.74; P**= 0.01) (0.80; P =0.006) (0.75; P =0.012) (0.78; P =0.01)
DT product –0.01; P = 0.99 –0.04; P = 0.91 0.79; P =0.0008 0.71; P =0.004
of ranges (0.70; P =0.03) (0.67; P =0.03) (0.76; P =0.01) (0.80; P =0.01)
DT sum of –0.12; P = 0.69 –0.12; P = 0.57 0.75; P =0.005 0.75; P =0.002
variances (0.06; P = 0.86) (0.16; P = 0.66) (0.21; P = 0.55) (0.21; P = 0.55)
DT product –0.12; P = 0.68 –0.15; P = 0.62 0.73; P =0.003 0.69; P =0.006
of variances (0.27; P = 0.45) (0.22; P = 0.55) (0.54; P = 0.11) (0.47; P = 0.20)
DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges 0.01; P = 0.97 –0.01; P = 0.96 0.80; P =0.0006 0.70; P =0.005
**(0.74; P**= 0.01) (0.80; P =0.006) (0.75; P =0.012) (0.78; P =0.01)
DT product –0.01; P = 0.99 –0.04; P = 0.91 0.79; P =0.0008 0.71; P =0.004
of ranges (0.70; P =0.03) (0.67; P =0.03) (0.76; P =0.01) (0.80; P =0.01)
DT sum of –0.12; P = 0.69 –0.12; P = 0.57 0.75; P =0.005 0.75; P =0.002
variances (0.06; P = 0.86) (0.16; P = 0.66) (0.21; P = 0.55) (0.21; P = 0.55)
DT product –0.12; P = 0.68 –0.15; P = 0.62 0.73; P =0.003 0.69; P =0.006
of variances (0.27; P = 0.45) (0.22; P = 0.55) (0.54; P = 0.11) (0.47; P = 0.20)

Table 2.

Results of pairwise comparisons (using Pearson's product-moment) between nonrarefied disparity metrics and geological sampling proxies, after detrending. Values in brackets exclude outliers (the last four time bins are excluded for comparisons involving TBFs, TBCs, and PBFs; the last four time bins plus time bin 3 are excluded for comparisons involving PBCs). Results in bold are statistically significant (alpha = 0.05).

DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges 0.01; P = 0.97 –0.01; P = 0.96 0.80; P =0.0006 0.70; P =0.005
**(0.74; P**= 0.01) (0.80; P =0.006) (0.75; P =0.012) (0.78; P =0.01)
DT product –0.01; P = 0.99 –0.04; P = 0.91 0.79; P =0.0008 0.71; P =0.004
of ranges (0.70; P =0.03) (0.67; P =0.03) (0.76; P =0.01) (0.80; P =0.01)
DT sum of –0.12; P = 0.69 –0.12; P = 0.57 0.75; P =0.005 0.75; P =0.002
variances (0.06; P = 0.86) (0.16; P = 0.66) (0.21; P = 0.55) (0.21; P = 0.55)
DT product –0.12; P = 0.68 –0.15; P = 0.62 0.73; P =0.003 0.69; P =0.006
of variances (0.27; P = 0.45) (0.22; P = 0.55) (0.54; P = 0.11) (0.47; P = 0.20)
DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges 0.01; P = 0.97 –0.01; P = 0.96 0.80; P =0.0006 0.70; P =0.005
**(0.74; P**= 0.01) (0.80; P =0.006) (0.75; P =0.012) (0.78; P =0.01)
DT product –0.01; P = 0.99 –0.04; P = 0.91 0.79; P =0.0008 0.71; P =0.004
of ranges (0.70; P =0.03) (0.67; P =0.03) (0.76; P =0.01) (0.80; P =0.01)
DT sum of –0.12; P = 0.69 –0.12; P = 0.57 0.75; P =0.005 0.75; P =0.002
variances (0.06; P = 0.86) (0.16; P = 0.66) (0.21; P = 0.55) (0.21; P = 0.55)
DT product –0.12; P = 0.68 –0.15; P = 0.62 0.73; P =0.003 0.69; P =0.006
of variances (0.27; P = 0.45) (0.22; P = 0.55) (0.54; P = 0.11) (0.47; P = 0.20)

Table 3.

Results of pairwise comparisons (using Pearson's product-moment) between rarefied disparity data series and geological sampling proxies, after detrending. Results in bold are statistically significant (alpha = 0.05).

DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges (rarefied) 0.10; P = 0.78 0.28; P = 0.40 0.57; P = 0.07 0.51; P = 0.11
DT product of ranges (rarefied) 0.13; P = 0.71 0.26; P = 0.44 0.60; P = 0.054 0.53; P = 0.09
DT sum of variances (rarefied) 0.08; P = 0.81 0.24; P = 0.48 0.60; P =0.049 0.51; P = 0.11
DT product of variances (rarefied) 0.17; P = 0.61 0.32; P = 0.34 0.65; P =0.03 0.61; P =0.049
DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges (rarefied) 0.10; P = 0.78 0.28; P = 0.40 0.57; P = 0.07 0.51; P = 0.11
DT product of ranges (rarefied) 0.13; P = 0.71 0.26; P = 0.44 0.60; P = 0.054 0.53; P = 0.09
DT sum of variances (rarefied) 0.08; P = 0.81 0.24; P = 0.48 0.60; P =0.049 0.51; P = 0.11
DT product of variances (rarefied) 0.17; P = 0.61 0.32; P = 0.34 0.65; P =0.03 0.61; P =0.049

Table 3.

Results of pairwise comparisons (using Pearson's product-moment) between rarefied disparity data series and geological sampling proxies, after detrending. Results in bold are statistically significant (alpha = 0.05).

DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges (rarefied) 0.10; P = 0.78 0.28; P = 0.40 0.57; P = 0.07 0.51; P = 0.11
DT product of ranges (rarefied) 0.13; P = 0.71 0.26; P = 0.44 0.60; P = 0.054 0.53; P = 0.09
DT sum of variances (rarefied) 0.08; P = 0.81 0.24; P = 0.48 0.60; P =0.049 0.51; P = 0.11
DT product of variances (rarefied) 0.17; P = 0.61 0.32; P = 0.34 0.65; P =0.03 0.61; P =0.049
DT TBFs DT Log10(TBCs) DT PBFs DT PBCs
DT sum of ranges (rarefied) 0.10; P = 0.78 0.28; P = 0.40 0.57; P = 0.07 0.51; P = 0.11
DT product of ranges (rarefied) 0.13; P = 0.71 0.26; P = 0.44 0.60; P = 0.054 0.53; P = 0.09
DT sum of variances (rarefied) 0.08; P = 0.81 0.24; P = 0.48 0.60; P =0.049 0.51; P = 0.11
DT product of variances (rarefied) 0.17; P = 0.61 0.32; P = 0.34 0.65; P =0.03 0.61; P =0.049

MULTIPLE REGRESSION MODELS

Range metrics are generally well explained by models combining a geological sampling proxy with Lagerstätten (Table 4). These models exhibit low AICc scores, _r_2 values around 0.60 or higher, and a statistically significant slope for all included variables (other than PBCs in models including PBCs and Lagerstätten). Product of variances is best explained by our Lagerstätten variable, which had a statistically significant slope, although _r_2 for this regression was lower (0.36). The sum of variances is best explained by the null model, consistent with correlation results suggesting only a weak relationship with sampling, and the robustness of this disparity metric to rarefaction. All these results are broadly congruent with the results of pairwise comparisons described above.

Table 4.

Comparison of a priori generalized least squares regression models attempting to explain observed pterosaur disparity in terms of background sampling and Lagerstätten. The _R_2 shown is the generalized _R_2 proposed by Madala (1983) and Nagelkerke (1991): negative values are associated with models that are worse than the null model. P is the probability that the model fit is better than the null hypothesis, this was impossible to estimate for the fit of Lagerstätten to product of variances (but is likely strongly significant). AICc is a version of the Akaike information criterion appropriate for small sample sizes, ΔAICc is the difference between the model and the best model for that data series (n.b. the best model has ΔAICc of zero). For sum of variances, the null model (not shown) provides the lowest AICc score (Supporting information). Parameter values within regression models are documented in the Supporting information. The best models (based on statistical significance and Akaike weights) are highlighted in bold.

Model _R_2 P (one-tailed) AICc ΔAICc wi _R_2 P (one-tailed) AICc ΔAICc wi
Sum of ranges (N = 14) Product of ranges (N = 14)
Lagerstätten 0.684 <0.0001* 102.39 0.10 0.41 0.678 <0.0001* 21.00 0.00 0.94
PBF 0.176 0.0500* 115.83 13.54 <0.01 –0.338 0.9782 40.93 19.93 <0.01
PBF +
Lagerstätten 0.765 <0.0001* 102.29 0.00 0.43 0.630 0.0005* 26.96 5.97 0.05
PBC 0.003 0.4148 118.49 16.20 <0.01 –0.499 0.9914 42.52 21.52 <0.01
PBC +
Lagerstätten 0.632 0.0005* 108.60 6.31 0.02 0.381 0.0048* 32.19 11.19 <0.01
TBF –0.104 0.8800 119.91 17.63 <0.01 –0.807 0.9980 45.14 24.14 <0.01
TBF +
Lagerstätten 0.759 0.0001* 105.25 2.34 0.13 0.642 0.0031* 30.52 9.52 0.01
TBC –0.650 0.9960 125.54 23.25 <0.01 –1.640 0.9999 50.44 29.44 <0.01
TBC +
Lagerstätten 0.588 0.0031* 113.69 9.89 <0.01 –0.026 0.7260 39.26 18.26 <0.01
Sum of variances (N = 14) Product of variances (N = 14)
Lagerstätten 0.060 0.1752 86.89 2.44 0.22 0.359 –7.49 0.00 0.92
PBF 0.285 0.9696 91.28 6.82 0.02 0.690 0.9967 8.08 15.57 <0.01
PBF +
Lagerstätten 0.190 0.8516 94.24 9.78 0.01 0.144 0.9153 4.66 12.15 <0.01
PBC 0.464 0.9896 93.10 8.64 0.01 0.701 0.9968 8.16 15.65 <0.01
PBC +
Lagerstätten 0.319 0.9278 95.68 11.22 <0.01 0.329 0.9770 6.75 14.24 <0.01
TBF 0.530 0.9927 93.72 9.27 0.01 1.268 0.9997 12.20 19.69 <0.01
TBF +
Lagerstätten 0.421 0.9572 96.73 12.27 <0.01 0.600 0.9949 9.35 16.84 <0.01
TBC 1.175 0.9995 98.64 14.19 <0.01 2.277 >0.9999 17.35 24.83 <0.01
TBC +
Lagerstätten 1.010 0.9963 101.58 17.13 <0.01 1.307 0.9997 14.48 21.97 <0.01
Model _R_2 P (one-tailed) AICc ΔAICc wi _R_2 P (one-tailed) AICc ΔAICc wi
Sum of ranges (N = 14) Product of ranges (N = 14)
Lagerstätten 0.684 <0.0001* 102.39 0.10 0.41 0.678 <0.0001* 21.00 0.00 0.94
PBF 0.176 0.0500* 115.83 13.54 <0.01 –0.338 0.9782 40.93 19.93 <0.01
PBF +
Lagerstätten 0.765 <0.0001* 102.29 0.00 0.43 0.630 0.0005* 26.96 5.97 0.05
PBC 0.003 0.4148 118.49 16.20 <0.01 –0.499 0.9914 42.52 21.52 <0.01
PBC +
Lagerstätten 0.632 0.0005* 108.60 6.31 0.02 0.381 0.0048* 32.19 11.19 <0.01
TBF –0.104 0.8800 119.91 17.63 <0.01 –0.807 0.9980 45.14 24.14 <0.01
TBF +
Lagerstätten 0.759 0.0001* 105.25 2.34 0.13 0.642 0.0031* 30.52 9.52 0.01
TBC –0.650 0.9960 125.54 23.25 <0.01 –1.640 0.9999 50.44 29.44 <0.01
TBC +
Lagerstätten 0.588 0.0031* 113.69 9.89 <0.01 –0.026 0.7260 39.26 18.26 <0.01
Sum of variances (N = 14) Product of variances (N = 14)
Lagerstätten 0.060 0.1752 86.89 2.44 0.22 0.359 –7.49 0.00 0.92
PBF 0.285 0.9696 91.28 6.82 0.02 0.690 0.9967 8.08 15.57 <0.01
PBF +
Lagerstätten 0.190 0.8516 94.24 9.78 0.01 0.144 0.9153 4.66 12.15 <0.01
PBC 0.464 0.9896 93.10 8.64 0.01 0.701 0.9968 8.16 15.65 <0.01
PBC +
Lagerstätten 0.319 0.9278 95.68 11.22 <0.01 0.329 0.9770 6.75 14.24 <0.01
TBF 0.530 0.9927 93.72 9.27 0.01 1.268 0.9997 12.20 19.69 <0.01
TBF +
Lagerstätten 0.421 0.9572 96.73 12.27 <0.01 0.600 0.9949 9.35 16.84 <0.01
TBC 1.175 0.9995 98.64 14.19 <0.01 2.277 >0.9999 17.35 24.83 <0.01
TBC +
Lagerstätten 1.010 0.9963 101.58 17.13 <0.01 1.307 0.9997 14.48 21.97 <0.01

Table 4.

Comparison of a priori generalized least squares regression models attempting to explain observed pterosaur disparity in terms of background sampling and Lagerstätten. The _R_2 shown is the generalized _R_2 proposed by Madala (1983) and Nagelkerke (1991): negative values are associated with models that are worse than the null model. P is the probability that the model fit is better than the null hypothesis, this was impossible to estimate for the fit of Lagerstätten to product of variances (but is likely strongly significant). AICc is a version of the Akaike information criterion appropriate for small sample sizes, ΔAICc is the difference between the model and the best model for that data series (n.b. the best model has ΔAICc of zero). For sum of variances, the null model (not shown) provides the lowest AICc score (Supporting information). Parameter values within regression models are documented in the Supporting information. The best models (based on statistical significance and Akaike weights) are highlighted in bold.

Model _R_2 P (one-tailed) AICc ΔAICc wi _R_2 P (one-tailed) AICc ΔAICc wi
Sum of ranges (N = 14) Product of ranges (N = 14)
Lagerstätten 0.684 <0.0001* 102.39 0.10 0.41 0.678 <0.0001* 21.00 0.00 0.94
PBF 0.176 0.0500* 115.83 13.54 <0.01 –0.338 0.9782 40.93 19.93 <0.01
PBF +
Lagerstätten 0.765 <0.0001* 102.29 0.00 0.43 0.630 0.0005* 26.96 5.97 0.05
PBC 0.003 0.4148 118.49 16.20 <0.01 –0.499 0.9914 42.52 21.52 <0.01
PBC +
Lagerstätten 0.632 0.0005* 108.60 6.31 0.02 0.381 0.0048* 32.19 11.19 <0.01
TBF –0.104 0.8800 119.91 17.63 <0.01 –0.807 0.9980 45.14 24.14 <0.01
TBF +
Lagerstätten 0.759 0.0001* 105.25 2.34 0.13 0.642 0.0031* 30.52 9.52 0.01
TBC –0.650 0.9960 125.54 23.25 <0.01 –1.640 0.9999 50.44 29.44 <0.01
TBC +
Lagerstätten 0.588 0.0031* 113.69 9.89 <0.01 –0.026 0.7260 39.26 18.26 <0.01
Sum of variances (N = 14) Product of variances (N = 14)
Lagerstätten 0.060 0.1752 86.89 2.44 0.22 0.359 –7.49 0.00 0.92
PBF 0.285 0.9696 91.28 6.82 0.02 0.690 0.9967 8.08 15.57 <0.01
PBF +
Lagerstätten 0.190 0.8516 94.24 9.78 0.01 0.144 0.9153 4.66 12.15 <0.01
PBC 0.464 0.9896 93.10 8.64 0.01 0.701 0.9968 8.16 15.65 <0.01
PBC +
Lagerstätten 0.319 0.9278 95.68 11.22 <0.01 0.329 0.9770 6.75 14.24 <0.01
TBF 0.530 0.9927 93.72 9.27 0.01 1.268 0.9997 12.20 19.69 <0.01
TBF +
Lagerstätten 0.421 0.9572 96.73 12.27 <0.01 0.600 0.9949 9.35 16.84 <0.01
TBC 1.175 0.9995 98.64 14.19 <0.01 2.277 >0.9999 17.35 24.83 <0.01
TBC +
Lagerstätten 1.010 0.9963 101.58 17.13 <0.01 1.307 0.9997 14.48 21.97 <0.01
Model _R_2 P (one-tailed) AICc ΔAICc wi _R_2 P (one-tailed) AICc ΔAICc wi
Sum of ranges (N = 14) Product of ranges (N = 14)
Lagerstätten 0.684 <0.0001* 102.39 0.10 0.41 0.678 <0.0001* 21.00 0.00 0.94
PBF 0.176 0.0500* 115.83 13.54 <0.01 –0.338 0.9782 40.93 19.93 <0.01
PBF +
Lagerstätten 0.765 <0.0001* 102.29 0.00 0.43 0.630 0.0005* 26.96 5.97 0.05
PBC 0.003 0.4148 118.49 16.20 <0.01 –0.499 0.9914 42.52 21.52 <0.01
PBC +
Lagerstätten 0.632 0.0005* 108.60 6.31 0.02 0.381 0.0048* 32.19 11.19 <0.01
TBF –0.104 0.8800 119.91 17.63 <0.01 –0.807 0.9980 45.14 24.14 <0.01
TBF +
Lagerstätten 0.759 0.0001* 105.25 2.34 0.13 0.642 0.0031* 30.52 9.52 0.01
TBC –0.650 0.9960 125.54 23.25 <0.01 –1.640 0.9999 50.44 29.44 <0.01
TBC +
Lagerstätten 0.588 0.0031* 113.69 9.89 <0.01 –0.026 0.7260 39.26 18.26 <0.01
Sum of variances (N = 14) Product of variances (N = 14)
Lagerstätten 0.060 0.1752 86.89 2.44 0.22 0.359 –7.49 0.00 0.92
PBF 0.285 0.9696 91.28 6.82 0.02 0.690 0.9967 8.08 15.57 <0.01
PBF +
Lagerstätten 0.190 0.8516 94.24 9.78 0.01 0.144 0.9153 4.66 12.15 <0.01
PBC 0.464 0.9896 93.10 8.64 0.01 0.701 0.9968 8.16 15.65 <0.01
PBC +
Lagerstätten 0.319 0.9278 95.68 11.22 <0.01 0.329 0.9770 6.75 14.24 <0.01
TBF 0.530 0.9927 93.72 9.27 0.01 1.268 0.9997 12.20 19.69 <0.01
TBF +
Lagerstätten 0.421 0.9572 96.73 12.27 <0.01 0.600 0.9949 9.35 16.84 <0.01
TBC 1.175 0.9995 98.64 14.19 <0.01 2.277 >0.9999 17.35 24.83 <0.01
TBC +
Lagerstätten 1.010 0.9963 101.58 17.13 <0.01 1.307 0.9997 14.48 21.97 <0.01

The disparity metrics that are most heavily influenced by sampling (range metrics) show higher (i.e., worse) AICc scores when compared to tetrapod-fossil-based sampling proxies than pterosaur-fossil-based proxies. This is expected, given the departure of the pterosaur record from that of other tetrapods in the last 40 million years of the Cretaceous (Fig. 2). PBCs show a worse fit to disparity than do PBFs. Unfortunately, the low number of datapoints (14 time bins) means that fitting multiple explanatory variables is difficult and preliminary analyses excluding problematic bins (e.g., bin 3, which contains high values of PBC due to monographic biases; see Methods) retrieved higher AICc scores. However, the general competitiveness of sampling plus Lagerstätten models for explaining range metrics suggests that these disparity measures are highly biased by rock record and other sampling factors.

The residuals from our regression line show the same broad temporal patterns as those derived from rarefaction (Supporting information): gradual ascent to high disparity values in bins 6–11 (Late Jurassic to earliest Late Cretaceous), followed by low disparity in the terminal three bins (Late Cretaceous).

Discussion

THE RELATIONSHIPS BETWEEN DISPARITY AND SPECIES AND GEOLOGICAL SAMPLING

Our results demonstrate a tight correlation between the number of pterosaur species within a time bin and the values for range-based disparity metrics derived from that sample. This result is expected given that range metrics essentially measure the entire volume of morphospace occupied by a cluster of taxa and are therefore highly sensitive to the addition of outliers (Wills et al. 1994; Ciampaglio et al. 2001). Sample sizes for pterosaurs exceeding approximately 15 species are required before the correlation between range metrics and species sample size levels off, and such sample sizes are only known from three time bins, each of which contains at least one Lagerstätte. Given that range metrics are strongly affected by species sample size, it is unsurprising that they are significantly correlated with geological sampling proxies and the presence or absence of Lagerstätten: increases in geological sampling from one time bin to the next are likely to lead to higher species sampling, and thus higher range metrics. Range-based measures of disparity are therefore likely to be substantially biased by uneven spatiotemporal sampling of the fossil record. This problem is likely to be less severe for groups that are abundant in the fossil record and thus have consistently large sample sizes, or for coarse time bins (which will generally increase sample sizes). Rarefaction to a common sample size partially ameliorates but does not remove this bias: rarefied range metrics remain strongly correlated with original sample size. The inclusion of Lagerstätten as an explanatory variable in the best multiple regression models for range metrics supports the hypothesis that exceptional deposits are likely to create biases in our understanding of pterosaur evolution, at least for some biodiversity measures (Buffetaut 1995; Butler et al. 2009; contra Dyke et al. 2009).

The relationship between the number of pterosaur species within a time bin and variance-based disparity metrics is much weaker than for range metrics, and is weakest for sum of variances. The weak correlation between species sample size and variance metrics is also predicted by theory and observed in model datasets (Foote 1992; Wills et al. 1994; Ciampaglio et al. 2001). The weak correlation between species sample size and sum of variances levels off at sample sizes exceeding approximately eight pterosaur species, a sample size that is achieved in approximately two-thirds of the Mesozoic time bins. Increased sampling of Early–Middle Jurassic and Late Cretaceous pterosaurs may in future nearly completely remove the influence of species sample size from the sum of variances. In most cases, variance metrics show weaker correlations with geological sampling proxies than do range metrics. Multiple regression models suggest that product of variances is influenced by the presence or absence of Lagerstätten, but that rock record biases have only a very weak impact on sum of variances. Rarefaction of variance metrics to a common sample size does not alter the (relatively weak) relationship between variance metrics and species sample size.

In summary, our empirical results confirm that range-based disparity metrics are more severely biased by variation in species sample size than variance-based metrics, and that range metrics are therefore more likely than variance metrics to be biased by uneven spatiotemporal sampling of the fossil record, with the effect likely to be greatest in groups with sparser fossil sampling and small sample sizes for at least some time bins (such as many fossil vertebrate groups), especially those with more heterogeneously sampled fossil records such as pterosaurs. This bias is comparable to that inferred for taxonomic diversity by previous studies (e.g., Peters and Foote 2001; Smith and McGowan 2007; Barrett et al. 2009; Butler et al. 2009, 2011; Benson et al. 2010), and analyses of changes in range-based disparity metrics through time should consider carefully the potential impact of geological sampling biases. Rarefaction may be helpful for reducing the influence of sample size upon range metrics, but should not be viewed as a panacea. Variance-based disparity metrics are likely to be much more robust to uneven spatiotemporal sampling of the fossil record than range metrics (and possibly more so than uncorrected taxonomic diversity counts), and they can be interpreted with greater confidence as reflecting genuine biological signals. More broadly, the robustness of the numerous alternative disparity metrics to uneven fossil record sampling is likely to be closely related to their respective robustness to species sample size variation.

IMPORTANCE OF MISSING DATA CORRECTIONS

Missing data are only seldom discussed in empirical and theoretical studies of disparity, which is surprising because the effects of missing anatomical information are a frequent subject of concern and study in phylogenetic analyses (e.g., Wiens 2003). Multivariate techniques such as PCO are often used in disparity studies largely because they can “accommodate missing data” (Wesley-Hunt 2005:38–39). However, although it is possible to compute PCO scores from incomplete datasets, this does not guarantee fidelity of the results. Ciampaglio et al. (2001) assessed the robustness of several disparity measurement protocols to varying amounts of missing data, simulating 0–25% missing data because “datasets are typically more than 75% complete” (Ciampaglio et al. 2001:701). This somewhat idyllic assumption, however, is not true of many paleontological datasets, especially those focusing on vertebrates, which have character-rich skeletons but are typically only partially preserved. The current dataset, for instance, has more missing data (including inapplicable data) than positive character scores, and high amounts of missing information also plague other published fossil vertebrate datasets (e.g., 40.6% missing data for the dataset of Brusatte et al. 2008a). Far from being a mere superficial nuisance, high amounts of missing data may be a confounding problem in some studies. This is well illustrated by the unusual nature of the distance matrix, PCO morphospace, and disparity metrics calculated using the original, highly incomplete, unreconstructed dataset in the current study. Ameliorating problems relating to extreme missing data are not straightforward, but we have presented a novel, phylogenetic approach for filling in missing character scores. We encourage future workers to think carefully about missing data, interpret any disparity results in light of potential missing data biases, and experiment with techniques for correcting such biases. Furthermore, although it is outside of the scope of this study, it would be interesting for future workers to expand the simulations of Ciampaglio et al. (2001) to include datasets with higher amounts of missing data (up to ∼60%).

PTEROSAUR DISPARITY THROUGH THE MESOZOIC: UNTANGLING A TRUE BIOLOGICAL SIGNAL

The general patterns of disparity recovered here are congruent with previous analyses that also indicated a Late Jurassic–Early Cretaceous peak in pterosaur disparity (Dyke et al. 2009; Prentice et al. 2011). The significant correlations recovered between range metrics and geological sampling proxies suggest that temporal fluctuations in these metrics should be interpreted with caution. By contrast, the weak correlation between variance metrics and geological sampling proxies suggests that these metrics are more robust to spatiotemporal variation in sampling of the fossil record, and may be more confidently interpreted as showing a true biological signal. Here, we focus our discussion of pterosaur disparity through time on the sum of variances plot (Fig. 2), because this metric is least influenced by species sample size and shows the weakest correlations with geological sampling proxies and the presence or absence of Lagerstätten.

Disparity appears to have been genuinely low in the Late Triassic (time bins 1–2): although taxon sampling is relatively low, all known Triassic pterosaurs are recovered as a monophyletic grouping by the cladistic analysis of Andres (2010) and form a tight cluster in our morphospace plot (Fig. 1B). Because so few earliest Jurassic pterosaurs are known, the 195–185 Ma time bin was excluded from our analysis, and thus insights into the impact of the end Triassic extinction events on the disparity of pterosaurs are limited. Disparity appears to have been similar in the late Early Jurassic (bin 3: late Pliensbachian–earliest Aalenian) to that of the Late Triassic. However, the inclusion of a Lagerstätten variable in multiple regression models suggests a disparity minimum in this bin (Supporting information), potentially indicating that the end Triassic extinction decimated pterosaur disparity but that this decline is partially masked by the presence of exceptional deposits in the Toarcian Posidonienschiefer of Germany (Padian 2008).

Disparity increased through the Middle and Late Jurassic (bins 4–6), with the origins and radiations of new clades, such as rhamphorhynchids, wukongopterids, anurognathids, and the basal pterodactyloids. Subsequent high disparity in the Early Cretaceous (bins 7–11) coincides with the radiations of multiple ornithocheiroid pterodactyloid clades (Istiodactylidae, “Ornithocheiridae,” Tapejaridae, Dsungaripteridae, Chaoyangopteridae, etc.). However, despite this major pterodactyloid radiation, Early Cretaceous disparity does not appear to have been significantly higher than that of the Late Jurassic, probably due to the extinctions of most nonpterodactyloid pterosaur clades. Some disparity metrics (range metrics, product of variances) show a marked decrease in the earliest Cretaceous (bin 7) that is coincident with a decline in observed taxonomic diversity (Butler et al. 2009). A number of authors have postulated a terminal Jurassic mass extinction event for other Mesozoic vertebrate groups (e.g., Bakker 1993; Bardet 1994; Upchurch and Barrett 2005; Benson et al. 2010; Mannion et al. 2011). However, the earliest Cretaceous decline in pterosaur disparity is not observed in the sum of variances plot, and a trough in geological sampling occurs within this bin for all proxies. Moreover, bin 7 is preceded and followed by bins with major Lagerstätten (bin 6: “Solnhofen”; bin 8: Yixian Formation). This apparent disparity decline may prove to be, at least in part, an artifact of poor sampling of the earliest Cretaceous record. However, it remains clear that the Jurassic–Cretaceous transition witnessed high rates of turnover between nonpterodactyloid and pterodactyloid clades and corresponding shift in morphospace occupation.

A Late Cretaceous decline in disparity occurs in all disparity metrics and in rarefied and data series, although this decline is less pronounced in variance metrics and in rarefied data than in nonrarefied range metrics, likely reflecting the sensitivity of the latter to low species sample sizes. This decline in disparity approximately coincides temporally with a decline in observed taxonomic diversity (Butler et al. 2009). In addition, although the shape of the curves for pterosaur-fossil-based geological sampling proxies (PBFs, PBCs) are highly similar in shape to those for tetrapod-fossil-based proxies across most of the Mesozoic, they diverge from one another in the Late Cretaceous: counts of PBFs and PBCs in the terminal two time bins do not exceed counts for the Late Jurassic and late Early Cretaceous, unlike the continuing upward trend present in TBF and TBC curves. The geological sampling proxies suggest therefore that although the terminal Cretaceous is well sampled for fossil vertebrates more broadly, pterosaur remains have been found from proportionally fewer formations and collections than in previous time bins. They may thus have been restricted to a smaller range of environments than those inhabited by other tetrapods, and by earlier pterosaur faunas (Unwin 2006). This is congruent with the restriction of terminal Cretaceous pterosaurs to giant body sizes (Unwin 2006) and a narrow range of ecotypes represented by only two clades (Nyctosauridae, Azhdarchidae).

All lines of evidence therefore suggest a genuine reduction in pterosaur disparity occurring within the Late Cretaceous, although the timing of this event is difficult to constrain precisely. The decline occurs within bin 11 (Albian–Cenomanian) in the range metrics and product of variances (i.e., it occurs somewhere around the boundary between the Early and Late Cretaceous), but is delayed by one bin (late Cenomanian–early Santonian) in the sum of variances. Residuals for multiple regression models also show that the terminal Cretaceous disparity crash is delayed to bin 12 when Lagerstätten are included as a regression variable. The apparent inception of this disparity decline in bin 11 in some disparity metrics (Fig. 2) likely arises from the absence of Lagerstätten in bin 11, which is preceded and followed by exceptional deposits in bins 10 (Crato, Santana and Jiufotang formations) and 12 (Niobrara Chalk). Improved sampling of bin 11 is critical to understanding the timing of Cretaceous pterosaur decimation.

Although all disparity metrics confirm the existence of a Late Cretaceous disparity decline, it is notably less marked in variance metrics and rarefied range metrics than in raw (nonrarefied) range metrics. This supports the arguments of Butler et al. (2009), who suggested that although a Late Cretaceous pterosaur decline in taxonomic diversity is likely to be genuine, the magnitude of this event is inflated by poor sampling of the pterosaur record through this time period, including the scarcity of Late Cretaceous Lagerstätten.

Associate Editor: D. Carrier

ACKNOWLEDGMENTS

This study is Paleobiology Database official publication 139 and benefited from data compiled by a number of individuals, particularly M. Carrano. We thank M. Ruta, M. Benton, G. Lloyd, P. Barrett, and M. Sakamoto for discussion and advice on data processing, and M. Ruta and P. Wagner for their supportive and helpful reviews. RJB is supported by an Alexander von Humboldt Research Fellowship. SLB is supported by a National Science Foundation Graduate Research Fellowship (Columbia University) and an NSF Doctoral Dissertation Improvement Grant (NSF DEB 1110357).

LITERATURE CITED

Alroy

,

J.

Successive approximations of diversity curves: ten more years in the library

.

Geology

28

:

1023

1026

.

Alroy

,

J.

2010a

.

The shifting balance of diversity among major marine animal groups

.

Science

329

:

1191

1194

.

Alroy

,

J.

2010b

.

Geographical, environmental and intrinsic biotic controls on Phanerozoic marine diversification

.

Palaeontology

53

:

1211

1235

.

Alroy

,

J.

,

C. R.

Marshall

,

R. K.

Bambach

,

K.

Bezusko

,

M.

Foote

,

F. T.

Fürsich

,

T. A.

Hansen

,

S. M.

Holland

,

L. C.

Ivany

,

D.

Jablonski

, et al.

2001

.

Effects of sampling standardization on estimates of Phanerozoic marine diversification

.

Proc. Natl. Acad. Sci. USA

98

:

6261

6266

.

Alroy

,

J.

,

M.

Aberhan

,

D. J.

Bottjer

,

M.

Foote

,

F. T.

Fürsich

,

P. J.

Harries

,

A. J. W.

Hendy

,

S. M.

Holland

,

L. C.

Ivany

,

W.

Kiessling

, et al.

2008

.

Phanerozoic trends in the global diversity of marine invertebrates

.

Science

321

:

97

100

.

Andres

,

B.

2010

.

Systematics of the pterosauria

.

Ph.D. thesis

,

Department of Geology and Geophysics, Yale University

, New Haven, CT .

347

pp.

Bakker

,

R. T.

1993

.

Plesiosaur extinction cycles—events that mark the beginning, middle and end of the Cretaceous

.

Geol. Ass. Can. Sp. Pap.

39

:

641

664

.

Bardet

,

N.

1994

.

Extinction events among Mesozoic marine reptiles

.

Hist. Biol.

7

:

313

324

.

Barrett

,

P. M.

,

R. J.

Butler

,

N. P.

Edwards

, and

A. R.

Milner

.

2008

.

Pterosaur distribution in time and space: an atlas

.

Zitteliana B

28

:

61

107

.

Barrett

,

P. M.

,

A. J.

McGowan

, and

V.

Page

.

2009

.

Dinosaur diversity and the rock record

.

Proc. R. Soc. Lond. B

276

:

2667

2674

.

Benson

,

R. B. J.

, and

R. J.

Butler

. In press.

Uncovering the diversification history of marine tetrapods: ecology influences the effect of geological sampling biases

.

Geol. Soc. London Spec. Paper

.

Benson

,

R. B. J.

,

R. J.

Butler

,

J.

Lindgren

, and

A. S.

Smith

.

2010

.

Mesozoic marine tetrapod diversity: mass extinctions and temporal heterogeneity in geological megabiases affecting vertebrates

.

Proc. R. Soc. Lond. B

277

:

829

834

.

Benton

,

M. J.

1985

.

Mass extinction among non-marine tetrapods

.

Nature

316

:

811

814

.

Benton

,

M. J.

1995

.

Diversification and extinction in the history of life

.

Science

268

:

52

58

.

Benton

,

M. J.

2010

.

The origins of modern biodiversity on land

.

Philos. Trans. R. Soc. B

365

:

3667

3679

.

Benton

,

M. J.

, and

B. C.

Emerson

.

2007

.

How did life become so diverse? The dynamics of diversification according to the fossil record and molecular phylogenetics

.

Palaeontology

50

:

23

40

.

Bonaparte

,

J. F.

1970

.

Pterodaustro guinazui gen. et sp. nov. Pterosaurio de al formacion Lagarcito, Provincia de SanLuis, Argentina y su significado en la geologia regional (Pterodactylidae)

.

Acta Geol. Lill.

10

:

207

226

.

Bottjer

,

D. J.

, and

D.

Jablonski

.

1988

.

Paleoenvironmental patterns in the evolution of post-Paleozoic benthic marine invertebrates

.

Palaios

3

:

540

560

.

Brusatte

,

S. L.

,

M. J.

Benton

,

M.

Ruta

, and

G. T.

Lloyd

.

2008a

.

Superiority, competition, and opportunism in the evolutionary radiation of dinosaurs

.

Science

321

:

1485

1488

.

Brusatte

,

S. L.

,

M. J.

Benton

,

M.

Ruta

, and

G. T.

Lloyd

.

2008b

.

The first 50 mya of dinosaur evolution: macroevolutionary pattern and morphological disparity

.

Biol. Lett.

4

:

733

736

.

Brusatte

,

S. L.

,

S.

Montanari

,

H.-Y.

Yi

, and

M. A.

Norell

.

2011a

.

Phylogenetic corrections for morphological disparity analysis: new methodology and case studies

.

Paleobiology

37

:

1

22

.

Brusatte

,

S. L.

,

M. J.

Benton

,

G. T.

Lloyd

,

M.

Ruta

, and

S. C.

Wang

.

2011b

.

Macroevolutionary patterns in the evolutionary radiation of archosaurs (Tetrapoda: Diapsida)

.

Earth Envir. Sci. Trans. R. Soc. Edin.

101

:

367

382

.

Buffetaut

,

E.

1995

. The importance of “Lagerstätten” for our understanding of the evolutionary history of certain groups of organisms: the case of pterosaurs. Pp.

49

52

in

Extended abstracts, II International Symposium on Lithographic Limestones (Lleida-Cuenca, 9th–16th July 1995)

. Ediciones de la Universidad Autonoma de Madrid.

Burnham

,

K. P.

, and

D.

Anderson

.

2001

.

Model selection and multi-model inference: a practical information-theoretic approach

. 2nd edn.

Springer

, New York.

Butler

,

R. J.

,

P. M.

Barrett

,

S.

Nowbath

, and

P.

Upchurch

.

2009

.

Estimating the effects of the rock record on pterosaur diversity patterns: implications for hypotheses of bird/pterosaur competitive replacement

.

Paleobiology

35

:

432

446

.

Butler

,

R. J.

,

R. B. J.

Benson

,

M. T.

Carrano

,

P. D.

Mannion

, and

P.

Upchurch

.

2011

.

Sea-level, dinosaur diversity, and sampling biases: investigating the ‘common cause’ hypothesis in the terrestrial realm

.

Proc. R. Soc. Lond. B

278

:

1165

1170

.

Chatfield

,

C.

2003

.

The analysis of time-series: an introduction

.

Chapman & Hall, Boca Raton

.

Ciampaglio

,

C. N.

,

M.

Kemp

, and

D. W.

McShea

.

2001

.

Detecting changes in morphospace occupation patterns in the fossil record: characterization and analysis of measures of disparity

.

Paleobiology

27

:

695

715

.

Cisneros

,

J. C.

, and

M.

Ruta

.

2010

.

Morphological diversity and biogeography of procolophonids (Amniota: Parareptilia)

.

J. Syst. Palaeontol.

8

:

607

625

.

Crampton

,

J. S.

,

A. G.

Beu

,

R. A.

Cooper

,

C. M.

Jones

,

B.

Marshall

, and

P. A.

Maxwell

.

2003

.

Estimating the rock volume bias in paleobiodiversity studies

.

Science

301

:

358

360

.

Deline

,

B.

2009

.

The effects of rarity and abundance distributions on measurements of local morphological disparity

.

Paleobiology

35

:

175

189

.

Dyke

,

G. J.

,

A. J.

McGowan

,

R. L.

Nudds

, and

D.

Smith

.

2009

.

The shape of pterosaur evolution: evidence from the fossil record

.

J. Evol. Biol.

22

:

890

898

.

Erwin

,

D. H.

2007

.

Disparity: morphological pattern and developmental context

.

Palaeontology

50

:

57

73

.

Foote

,

M.

1992

.

Rarefaction analysis of morphological and taxonomic diversity

.

Paleobiology

18

:

1

16

.

Foote

,

M.

1993

.

Discordance and concordance between morphological and taxonomic diversity

.

Paleobiology

20

:

185

204

.

Foote

,

M.

1997

.

The evolution of morphological diversity

.

Annu. Rev. Ecol. Syst.

28

:

129

152

.

Fröbisch

,

J.

2008

.

Global taxonomic diversity of anomodonts (Tetrapoda, Therapsida) and the terrestrial rock record across the Permian-Triassic boundary

.

PLoS ONE

3

:

1

14

.

Gould

,

S. J.

1991

.

The disparity of the Burgess Shale arthropod fauna and the limits of cladistic analysis: why we must strive to quantify morphospace

.

Paleobiology

17

:

411

423

.

Gradstein

,

F. M.

,

J. G.

Ogg

, and

A. G.

Smith

.

2004

.

A geologic time scale 2004

.

Cambridge Univ. Press

, Cambridge, U.K.

Hammer

,

Ø.

,

D. A. T.

Harper

, and

P. D.

Ryan

.

2001

.

PAST: Palaeontological Statistics software package for education and data analysis

.

Palaeontol. Electronica

4

:

1

9

.

Henderson

,

D. M.

2010

.

Pterosaur body mass estimates from three-dimensional mathematical slicing

.

J. Vert. Paleontol.

30

:

768

785

.

Hunt

,

G.

,

T. M.

Cronin

, and

K.

Roy

.

2005

.

Species-energy relationship in the deep sea: a test using the Quaternary fossil record

.

Ecol. Lett.

8

:

739

747

.

,

J.

, and

Q.

Ji

.

2006

.

Preliminary result of a phylogenetic analysis of the pterosaurs from western Liaoning and surrounding areas

.

J. Paleontol. Soc. Korea

22

:

239

261

.

,

J.

,

D. M.

Unwin

,

X.

Jin

,

Y.

Liu

, and

Q.

Ji

.

2010

.

Evidence for modular evolution in a long-tailed pterosaur with a pterodactyloid skull

.

Proc. R. Soc. Lond. B

277

:

383

389

.

Maddison

,

D. R.

, and

W. P.

Maddison

.

2005

.

MacClade 4: analysis of phylogeny and character evolution, version 4.07

. Sinauer Associates, Sunderland.

Mannion

,

P. D.

,

P.

Upchurch

,

M. T.

Carrano

, and

P. M.

Barrett

.

2011

.

Testing the effect of the rock record on diversity: a multidisciplinary approach to elucidating the generic richness of sauropodomorph dinosaurs through time

.

Biol. Rev.

86

:

157

181

.

Marx

,

F. G.

, and

M. D.

Uhen

.

2010

.

Climate, critters, and cetaceans: Cenozoic drivers of the evolution of modern whales

.

Science

327

:

993

996

.

McGowan

,

A. J.

, and

A. B.

Smith

.

2008

.

Are global Phanerozoic marine diversity curves truly global? A study of the relationship between regional rock records and global Phanerozoic marine diversity

.

Paleobiology

34

:

80

103

.

McKinney

,

M. L.

1990

Classifying and analysing evolutionary trends. Pp.

28

58

in

K. J.

McNamara

, ed.

Evolutionary trends

.

Univ. of Arizona Press, Tuscon, AZ

.

Niklas

,

K. J.

,

B. H.

Tiffney

, and

A. H.

Knoll

.

1983

.

Patterns in vascular land plant diversification

.

Nature

303

:

614

616

.

Padian

,

K.

1985

.

The origins and aerodynamics of flight in extinct vertebrates

.

Palaeontology

28

:

413

433

.

Padian

,

K.

2008

.

The early Jurassic pterosaur Dorygnathus banthensis (Theodori, 1830) and the early Jurassic pterosaur Campylognathoides strand, 1928

.

Spec. Papers Palaeontol.

80

:

1

107

.

Peters

,

S. E.

2005

.

Geologic constraints on the macroevolutionary history of marine animals

.

Proc. Natl. Acad. Sci. USA

102

:

12326

12331

.

Peters

,

S. E.

2006

.

Genus extinction, origination, and the durations of sedimentary hiatuses

.

Paleobiology

32

:

387

407

.

Peters

,

S. E.

, and

M.

Foote

.

2001

.

Biodiversity in the Phanerozoic: a reinterpretation

.

Paleobiology

27

:

583

601

.

Prentice

,

K. C.

,

M.

Ruta

, and

M. J.

Benton

.

2011

.

Evolution of morphological disparity in pterosaurs

.

J. Syst. Palaeontol.

, doi: 10.1080/14772019.2011.565081 [Epub ahead of print].

Raup

,

D. M.

1972

.

Taxonomic diversity during the Phanerozoic

.

Science

177

:

1065

1071

.

Raup

,

D. M.

1976

.

Species diversity in the Phanerozoic: an interpretation

.

Paleobiology

2

:

289

297

.

Ruta

,

M.

2009

.

Patterns of morphological evolution in major groups of Palaeozoic Temnospondyli (Amphibia: Tetrapoda)

.

Spec. Papers Palaeontol.

81

:

91

120

.

Sugiura

,

N.

1978

.

Further analysis of the data by Akaike's information criterion and the finite corrections

.

Comm. Stat. Theory Meth

.

A

:

13

26

.

Sepkoski

,

J. J.

1976

.

Species diversity in the Phanerozoic: species-area effects

.

Paleobiology

2

:

298

303

.

Sepkoski

,

J. J.

,

R. K.

Bambach

,

D. M.

Raup

, and

J. W.

Valentine

.

1981

.

Phanerozoic marine diversity and the fossil record

.

Nature

293

:

435

437

.

Smith

,

A. B.

2001

.

Large-scale heterogeneity of the fossil record: implications for Phanerozoic biodiversity studies

.

Philos. Trans. R. Soc. Lond. B

356

:

351

367

.

Smith

,

A. B.

, and

A. J.

McGowan

.

2007

.

The shape of the marine palaeodiversity curve: how much can be predicted from the sedimentary rock record of western Europe?

Palaeontology

50

:

765

774

.

Thiele

,

K.

1993

.

The holy grail of the perfect character: the cladistic treatment of morphometric data

.

Cladistics

9

:

275

304

.

Thiele

,

K.

2006

.

The pterosaurs from deep time

.

Pi Press

, New York.

Upchurch

,

P.

, and

P. M.

Barrett

.

2005

. Phylogenetic and taxic perspectives on sauropod diversity. Pp.

104

124

in

K. A.

Curry Rogers

, and

J. A.

Wilson

, eds.

The sauropods: evolution and paleobiology

.

Univ. of California Press

, Berkeley, CA.

Valentine

,

J. W.

1985

.

Phanerozoic diversity patterns: profiles in macroevolution

.

Princeton Univ. Press, Princeton

, NJ.

Wagner

,

P.

1997

.

Patterns of morphologic diversification among the Rostroconchia

.

Paleobiology

23

:

115

150

.

Walker

,

J.

, and

J.

Geissman

.

2009

.

2009 GSA geologic time scale

.

GSA Today

19

:

60

.

Wall

,

P. D.

,

L. C.

Ivany

, and

B. H.

Wilkinson

.

2009

.

Revisiting Raup: exploring the influence of outcrop area on diversity in light of modern sample-standardization techniques

.

Paleobiology

35

:

146

167

.

Wesley-Hunt

,

G. D.

2005

.

The morphological diversification of carnivores in North America.

Paleobiology

31

:

35

55

.

Wiens

,

J. J.

2003

.

Incomplete taxa, incomplete characters, and phylogenetic accuracy: is there a missing data problem?

J. Vert. Paleontol.

23

:

297

310

.

Wills

,

M. A.

1998

.

Crustacean disparity through the Phanerozoic: comparing morphological and stratigraphic data

.

Biol. J. Linn. Soc.

65

:

455

500

.

Wills

,

M. A.

,

D. E. G.

Briggs

, and

R. A.

Fortey

.

1994

.

Disparity as an evolutionary index: a comparison of Cambrian and Recent arthropods

.

Paleobiology

20

:

93

130

.

Witton

,

M. P.

2008

.

A new approach to determining pterosaur body mass and its implications for pterosaur flight

.

Zitteliana B

28

:

143

158

.

Witton

,

M. P.

, and

D.

Naish

.

2008

.

A reappraisal of azhdarchid pterosaur functional morphology and paleoecology

.

PLoS one

3

:

e2271

.

Young

,

M. T.

,

S. L.

Brusatte

,

M.

Ruta

, and

M. B.

Andrade

.

2010

.

The evolution of Metriorhynchoidea (Mesoeucrocodylia, Thalattosuchia): an integrated approach using geometric morphometrics, analysis of disparity and biomechanics

.

Zool. J. Linn. Soc.

158

:

801

859

.

© 2012, Society for the Study of Evolution