Origins and Genetic Legacy of Prehistoric Dogs (original) (raw)

. Author manuscript; available in PMC: 2021 Apr 30.

Published in final edited form as: Science. 2020 Oct 29;370(6516):557–564. doi: 10.1126/science.aba9572

Abstract

Dogs were the first domestic animal, but little is known about their population history and to what extent it was linked to humans. We sequenced 27 ancient dog genomes and found that all dogs share a common ancestry distinct from present-day wolves, with limited gene flow from wolves since domestication, but substantial dog-to-wolf gene flow. By 11,000 years ago, at least five major ancestry lineages had diversified, demonstrating a deep genetic history of dogs during the Paleolithic. Co-analysis with human genomes reveals aspects of dog population history that mirror humans, including Levant-related ancestry in Africa and early agricultural Europe. Other aspects differ, including the impacts of steppe pastoralist expansions in West- and East Eurasia, and a complete turnover of Neolithic European dog ancestry.


Wolves were the first animal with which humans formed a mutualistic relationship, eventually giving rise to dogs. While there is little consensus regarding when (19), where (2, 813), and how many times (1, 8, 9, 14) domestication took place, the archaeological (9, 15) record attests to a long-term and close relationship to humans (9, 1618). Modern dog genomes have revealed a complex population structure (5, 8, 10, 12, 19, 20), but because only six ancient dog and wolf genomes are currently available (4, 9, 14, 21), the process by which this structure emerged remains largely unknown.

Previous mitochondrial DNA (2229) and genomic (9, 14, 21) studies have suggested an association between the genetic signatures of dogs and their archeological context. However, dog and human genomes have not been quantitatively co-analyzed to assess the degree to which the population history of dogs was linked to that of humans—or may have been decoupled as a result of trade, human preference for particular types of dogs, variation in infectious disease susceptibility, or dogs moving between human groups.

To reconstruct dog population history we sequenced 27 ancient dog genomes up to 10,900 years old from Europe, the Near East and Siberia (table S1) to a median of 1.5-fold coverage (range 0.1-11X) (Fig. 1A, table S2; (30)). To test the association with human population history, we compiled 17 sets of human genome-wide data (30) that matched the age, geographic location and cultural contexts of the ancient dogs (table S4), and directly compared genetic relationships within the two species.

Figure 1. Genomic structure of dogs dates to the Pleistocene.

Figure 1

A) Sampling locations of ancient dogs. B) Principal components analysis on all possible _f4_-statistics among ancient dogs (gray) and a selection of worldwide modern dogs. C) Outgroup f 3-statistics reveal a cline of Levant versus Baikal (horizontal and vertical axes, respectively) related ancestry across ancient west Eurasian dogs, but not among modern European dogs. D) Coalescent simulations demonstrating that a diagonal f 3-cline as in panel C is consistent with an admixture event, but less so with continuous gene flow and not with phylogenetic structure alone. E) An admixture graph that fits all _f4_-statistics between major dog lineages. The European dog was grafted onto the graph identified through exhaustive testing.

Global dog population structure has its origins in the Pleistocene

To characterize the global population structure of ancient and modern dogs, we applied principal component analysis (PCA) to a matrix of all possible f 4-statistics (30), alleviating differences in error rates and missing data. This approach recapitulates a major east-west axis of dog ancestry (PC1) (8, 9, 12), in which the western extreme comprises modern and ancient western Eurasian dogs and modern African dogs (Fig. 1B). The eastern extreme is represented by pre-contact North American dogs (21), three 7 ky dogs from Lake Baikal in Siberia, and modern East Asian dogs including New Guinea Singing Dogs and Australian dingoes. Similar results were obtained through standard model-based clustering (fig. S2).

All ancient and modern European dogs have greater affinity to eastern dog ancestry than ancient Near Eastern dogs have in f 4-tests (fig. S3), despite the overall east-west axis on PC1. Ancient European dogs are also distributed widely across a genetic cline between the East Eurasian and ancient Near Eastern dogs, which furthermore manifests as a linear cline along the diagonal when contrasting shared genetic drift with Baikal dogs and Levantine (Israel, 7kya) dogs using outgroup-f3 statistics (Fig. 1C). Simulations indicate that this linear, diagonal cline is difficult to explain with long-standing continuous gene flow or a tree-like history, but instead suggest that the history of Mesolithic and Neolithic European dogs was marked by a major admixture episode (Fig. 1D) (30).

We modeled the genetic history underlying dog population structure for five populations that represent major ancestries, and tested all 135,285 possible admixture graph models with up to two admixture events (30). One model uniquely fits the data, and features the Mesolithic Karelian dog (10.9 kya) as having received part of its ancestry from a lineage related to eastern dogs, and part from the Levantine lineage (Fig. 1E) (two highly similar models nearly fit, fig. S4). The model can be extended to feature the earliest Neolithic European dog (7 kya)(14) as a mixture of the Karelian and the Levantine branches without loss of fit (fig S5), supporting the dual ancestry model for European dogs suggested by the ancient ancestry cline (Fig. 1C). The observed phylogenetic structure implies that all five ancestry lineages (Neolithic Levant, Mesolithic Karelia, Mesolithic Baikal, ancient America, New Guinea Singing dog) must have existed by 10.9kya (the radiocarbon date of the Karelian dog), and thus most likely prior to the transition from the Pleistocene to the Holocene epoch ~11.6 kya.

No detectable evidence for multiple dog origins or extensive gene flow from wild canids

Studies have suggested that wolf populations in Europe (3, 11), the Middle East (12), Central Asia (10), Siberia (31), and East Asia (2, 8), or more than one of these (9), contributed to early dog diversity. One study, however, demonstrated that modern wolves and dogs are reciprocally monophyletic, and suggested bidirectional gene flow (5). We corroborated that gene flow must have occurred by identifying widespread asymmetries between dogs in their affinity to wolves (Fig. 2A,B, fig S7). However, the gene flow was likely largely unidirectional from dogs into wolves, since we also identified some gray wolves that are symmetrically related to all modern and ancient dogs (Fig. 2C). Past gene flow from wolves into specific dog populations would have manifested as an affinity to any member of the modern gray wolf lineage in these tests, so our results suggest that persistent gene flow into dogs has been so limited as to be undetectable at the current resolution of the data. Furthermore, this result is consistent with a scenario in which all dogs derive from a single ancient, now extinct wolf population, or possibly multiple closely related wolf populations. While it is still possible that other, thus far unsampled ancient wolf populations were independently involved in early domestication (3, 9, 31), our data indicate that they did not contribute substantially to later dogs.

Fig. 2. All detectable gene flow is consistent with being unidirectional from dogs into wolf populations.

Fig. 2

A) Illustration of asymmetry tests (f 4-statistics) comparing 35 Eurasian gray wolves to all pairs of 66 ancient and modern dogs. B) Selected results using Coyote as outgroup. C) A wolf from Xinjiang, western China, is not closer to some dog populations than to others, as the test statistics are consistent with being normally distributed around 0 (the quantile-quantile plot includes all 66 dogs). If there was a substantial gene flow from some wolf population into some dog population, we would expect all wolf individuals to display asymmetric relationships.

In contrast to the lack of wolf admixture into dogs, we identified dog admixture into almost all analyzed present-day wolves (Fig. 2B), with the strongest signals typically coming from dogs into geographically proximate wolf populations in Europe, the Near East and East Asia (fig S7). We also replicated affinities between ancient American dogs and Coyotes (21), and between African dogs and African Golden Wolves (32), though the direction of gene flow in both cases is unclear, and the small magnitude is unlikely to impact most analyses of dog relationships (table S5). We did not find genome-wide evidence for gene flow from Tibetan wolves into Tibetan dogs, despite evidence for wolf ancestry locally around the EPAS1 gene associated with adaptation to altitude (33, 34). Dogs thus do not show similar evidence of wild introgression that has been found in pigs, goats, horses, sheep and cattle (3540).

Assessing the relationship between dog and human population histories

We next quantitatively compared the population relationships observed in dogs with those of humans. First, using Procrustes rotation to align f 4-PCA results obtained on dog and human genomes matched in time and space (Fig. 3A; (30)), we find that the population structures of the two species resemble each other (Procrustes correlation = 0.48, p = 0.043). However, there are also several cases where the matched dogs and humans cluster in different parts of the PCA space. The greatest differences (Fig. 3B) are observed for Chalcolithic Iran, in which the human population is different from the Neolithic Levant (41, 42) but the dogs in the two regions are similar. In Neolithic Germany and Ireland, the humans are more shifted towards the Levant (43, 44) but the dogs are shifted towards Northern European hunter-gatherer contexts. In the Bronze Age Steppe and in Corded Ware Germany, the humans are shifted away from the Neolithic European cluster (45, 46) in a manner not seen in dogs.

Figure 3. Quantitative comparisons between dog and human population genomic structure.

Figure 3

A) Principal components analysis on all possible f 4-statistics on ancient dogs (blue), overlaid through Procrustes transformation by the corresponding analysis performed on ancient humans matched in time, space, and cultural context to the dogs (green). Dashed lines connect each matched pair. B) Euclidian residuals between the Procrustes-rotated human and dog coordinates. C) The three admixture graphs that fit for one species and provide the smallest error for the other. Scatter plots show absolute Z-scores for the difference between observed and predicted _f4_-statistics. D) Examples of _f4_-statistics that reveal similarities and differences between humans and dogs (far right text).

Second, we evaluated if the admixture graph topologies that best fit the data for one species could also explain population relationships of the other. Though we found no graphs that fit the data perfectly for both species, graphs that fit, or nearly fit dogs rank among the 0.8-2.8% top scoring graphs in the human search, and graphs that fit humans rank among the 0.007-1.2% top scoring graphs in the dog search (Fig. 3C, fig. S9). However, we note that this analysis does not take into account the different time depth of the two species’ population histories: the >40kya divergence of human East- and West Eurasian ancestries (47) is significantly older than the earliest appearance of dog morphology in the fossil record, conservatively dated to 14.5kya (48) though older (3, 31), disputed specimens (49, 50), have been claimed.

Third, we found that the sign (positive or negative) of f 4-statistics in dogs match the sign in humans in 71% of 31,878 tests (null expectation 50%) across 24 matched dog-human pairs, although this decreases to 58% when restricted to dogs and humans from Europe. We identified specific f 4-statistics that exemplify both concordance and discrepancy between the species (Fig. 3D). While it is not known what degree of concordance would be expected between the histories of two species based on biogeographical factors alone, the results of these three analyses demonstrate that ancestry relationships in dogs and humans share overall features, but are not identical over space and time, and there are several cases where they must have been decoupled.

Recurrent population histories

One notable example of concordance is that both humans and dogs in East Asia are closer to European than to Near Eastern populations, which in both humans (43) and our best-fitting graph (Fig. 1E) is best modelled by European ancestry being a mixture of ancestry related to the Near East and East Asia. However, the divergence of Near Eastern 'Basal Eurasian' ancestry in humans was likely >45 kya (43), suggesting that dog population dynamics may have mimicked earlier processes in humans. A second example is that all European dogs have a stronger affinity towards American and Siberian dogs than they have to New Guinea singing dogs, which likely represent a type of unadmixed East Asian dog ancestry, mirroring a circumpolar affinity between humans in Europe and the Americas (51) (Fig. 3D). Human groups at Lake Baikal 24-18kya had western Eurasian origins and contributed to Native American ancestry (51), but were largely replaced by the Holocene (52). Though the dogs at Lake Baikal dated to 7kya constitute a similar link between the Americas and Europe (Fig. 1C,E), they do so >10ky later (Fig. 3D). Thus, shared circumpolar ancestry through northern Eurasia is an important feature of both human and dog population structures, though this did likely not result from the same migration episodes.

Neolithic expansion into Europe

Ancient human genomes have revealed a major ancestry transformation associated with the expansion of Neolithic agriculturalists from the Near East into Europe (43, 45, 53), and a study of ancient dog mitochondria suggested they were accompanied by dogs (27). We hypothesized that the genomic ancestry cline we observe across ancient European dogs (Fig. 1C) could be, at least in part, due to admixture between dogs associated with Mesolithic hunter-gatherers and incoming Neolithic farmers. Three observations support this: first, the hypothesized hunter-gatherer end of the cline is occupied by the 10.9kBP Mesolithic Karelian dog, and dogs from a 4.8kBP hunter-gatherer Pitted Ware Culture site in Sweden. Second, relative to the Swedish hunter-gatherer dogs, a contemporaneous dog from a Swedish Neolithic agricultural context is shifted towards the Levantine end of the cline, mirroring humans at the same sites (41, 53, 54) (Fig. 3A,D; fig. S10D). Third, Neolithic Levantine affinity increases towards the south (p=0.0196, linear regression), consistent with a range expansion alongside Neolithic human groups. While dogs clearly associated with Mesolithic continental 'Western hunter-gatherer' (43) human groups have yet to be identified, our results suggest that such dogs would have strong affinity towards the Siberian end of the European cline. Overall, these results indicate that the Neolithic expansion of farmers into Europe was also associated with an ancestry transformation for dogs.

Increased copy number of the AMY2B gene, involved in starch digestion, has been linked to dietary adaptations of dogs during the agricultural transition (6, 55, 56). The paralogous AMY1 gene has been under adaptive evolution in humans (57), though this does not seem clearly linked to agriculture (58). We observe low copy numbers in dogs from human hunter-gatherer contexts (Fig. 4), although the Mesolithic Karelian dog may already have possessed an elevated number relative to wolves. Several Neolithic dogs have as many copies as present-day dogs, as early as in 5.8 ky old Iranian and 6 ky old Spanish dogs, but others display low numbers (14, 56), e.g. the 7 ky Levantine individual. These results suggest that selection for increased AMY2B copy number did not take place during the early stages of domestication, and in contrast to humans (58) was not advanced in Mesolithic hunter-gatherer contexts, but was variable in early agricultural populations and did not become widespread until several thousand years after the first appearance of starch-rich agricultural lifestyles.

Figure 4. Expansion of copy number in the AMY2B pancreatic amylase gene largely occurred after the transition to agriculture.

Figure 4

Ancient dogs are plotted against their age, with blue color indicating dogs from likely hunter-gatherer human contexts. Bars denote 95% binomial confidence intervals around the ratio of the number of reads mapping to the copy number variable region to those mapping to control regions throughout the genome.

Africa and the Near East

The clustering of modern African dogs with ancient dogs from the Levant and Iran, especially the oldest individual dating to 7 kya, suggests a Near Eastern origin (Fig. 1B,C, fig. S2). Western (Anatolia and the Levant) and eastern (Zagros mountains of Iran) human groups in the Fertile Crescent were highly genetically differentiated (41), and the western groups were the primary source of gene flow into Europe and Africa (41, 59) during the Neolithic. A source of African dog ancestry from the Levant (7kya) is a better fit than Iran (5.8kya) (Fig. 5A), mirroring the human history, as well as that of cattle (40). In contrast, we are unable to distinguish whether the Levant or Iran is the better source for Neolithic dog ancestry in Europe. Our results suggest a single origin of sub-Saharan African dogs from the Levant (Fig. 5B), with limited gene flow from outside the continent until the past few hundred years.

Figure 5. Ancestry of global dogs today.

Figure 5

A) For each present-day population, the ancestry proportions estimated by the best-fitting qpAdm model, restricted to models containing up to four of seven selected sources, are displayed. Populations for which a single component accounts for ≥98% of the ancestry are collapsed to smaller circles. Dog pictures were obtained from Wikimedia under the CC BY-SA 3.0 license (https://commons.wikimedia.org/wiki/Special:ListFiles/Desaix83). B) Illustrations of inferred population histories in three regions of the world.

In contrast to Africa, the 7kya Neolithic Levantine population does not appear to have contributed much, if any, ancestry to present-day dogs in the Near East. Instead, 2.3 ky old dogs in the Levant can be modelled as having 81% Iran-related and 19% Neolithic Europe-related ancestry (Data S1). By this time in the Levant, there was also human gene flow from Iran (41) and transient gene flow from Europe (60). However, our results suggest a more complete replacement of dog ancestry in the Levant by 2.3 kya (Fig. 5B). Later, modern Near Eastern dogs are best modelled as mixtures of the 2.3 ky Levantine and modern European sources (Data S1).

Steppe pastoralist expansions

Expansions of steppe pastoralists associated with the Yamnaya and Corded Ware cultures into Late Neolithic and Bronze Age Europe transformed the ancestry of human populations (43, 45, 46). To test if dog ancestry was similarly affected, we analyzed a 3.8 ky old dog from the eastern European steppe associated with the Bronze Age Srubnaya culture. While its ancestry resembles that of western European dogs (Fig. 1C, fig. S10), it is an outlier in the center of PC1-PC2 space (Fig. 1B). A Corded Ware-associated dog (4.7kya) from Germany, hypothesized to have steppe ancestry (14), can be modelled as deriving 51% of its ancestry from a source related to the Srubnaya steppe dog, and the rest from a Neolithic European source ((30); Data S1). We obtain similar results for a Bronze Age Swedish dog (45%; 3.1kya), but not a Bronze Age Italian dog (4kya).

Despite this potential link between the steppe and the Corded Ware dog, most later European dogs display no particular affinity to the Srubnaya dog. Modern European dogs instead cluster with Neolithic European dogs (Fig. 1B), and do not mirror the lasting ancestry shift seen in humans after the pastoralist expansion (Fig. 3A). While earlier and additional steppe dog genomes are needed to better understand this process, the relative continuity between Neolithic and present-day individuals suggests that the arrival of steppe pastoralists did not result in persistent large-scale shifts in the ancestry of European dogs.

Although steppe pastoralists also expanded east, they do not appear to have contributed much ancestry to present-day people in East Asia (46, 52). Many modern Chinese dogs display unambiguous evidence (negative f 3 tests (30)) of being the product of admixture between a population related to the New Guinea Singing Dog (and the Australian Dingo) and a West Eurasian-related population (table S6). A recent study also found a mitochondrial turnover in Chinese dogs in the last few thousand years (61). The best-fitting models involve ancestry from modern European breeds, but also substantial contributions from the 3.8k BP Srubnaya steppe dog (Fig. 5A, Data S1). Some populations, especially those in Siberia, additionally require a fourth source related to the 7ky old Lake Baikal dogs, but no or minimal New Guinea Singing Dog-related ancestry. Our results thus raise the possibility that the eastward migrations of steppe pastoralists had a more substantial impact on the ancestry of dogs than humans in East Asia (Fig. 5B).

Later homogenization of dog ancestry in Europe

The extensive range of ancestry diversity among early European dogs is not preserved today, as modern European dogs are all symmetrically related to the ancient dogs in our dataset (Fig. 1C, fig. S13, Data S1, (30)). This suggests little to no contribution of most local Mesolithic and Neolithic populations to present-day diversity in Europe. Instead, we found that a single dog from a Neolithic megalithic context dated to 5 kya at the Frälsegården site in southwestern Sweden can be modelled as a single-source proxy for 90-100% of the ancestry of most modern European dogs, to the exclusion of all other ancient dogs (fig. S13, Data S1). This implies that a population with ancestry similar to this individual, but not necessarily originating in Scandinavia, replaced other populations and erased the continent-wide genetic cline (Fig 5B). This ancestry was in the middle of the cline (Fig 1C), such that present-day European dogs can be modelled as about equal proportions of Karelian and Levantine-related ancestries (54% and 46% respectively, for German Shepherd using the admixture graph (Fig 1E)).

The Frälsegården dog is also favored as a partial ancestry source for a 4ky old Bronze Age dog from Italy, a 1.5ky old dog from Turkey and Byzantine and Medieval, but not earlier dogs in the Levant (Data S1), providing some constraints on the timing of this ancestry expansion. However, the circumstances that initiated or facilitated the homogenization of dog ancestry in Europe from a narrow subset of that present in the European Neolithic, including the phenomenal phenotypic diversity and genetic differentiation of modern breeds (12, 19, 20) (Fig. 1C), remain unknown.

More recently, this modern European ancestry has dispersed globally, and today is a major component of most dog populations worldwide (Fig. 5A). Our ancestry models, however, reveal that some pre-colonial ancestry does survive in breeds such as the Mexican Chihuahua (~4%) and Xoloitzcuintli (~3%), and the South African Rhodesian Ridgeback (~4%) (Data S1).

Discussion

The diversification of at least five dog ancestry lineages by the onset of the Holocene was followed by a dynamic population history that in many ways tracked that of humans, likely reflecting how dogs migrated alongside human groups. However, in several instances, these histories do not align, suggesting that humans also dispersed without dogs, dogs moved between human groups, or that dogs were cultural and/or economic trade commodities.

Certain aspects of genetic relationships between dog populations, such as an east-west Eurasian differentiation, circumpolar connections, and possible basal lineages in the Near East, resemble features of human population history that were established before the earliest estimated dates of dog domestication. This superficial mirroring between the species may therefore instead point to recurrent population dynamics, due to biogeographic or anthropological factors that remain to be understood. A key question is how dogs spread across Eurasia and the Americas by the Holocene, since no major human population movements have been identified after the initial out-of-Africa expansion that could have driven this global dispersal.

We find that the modern and ancient genomic data are consistent with a single origin for dogs, though a scenario involving multiple closely related wolf populations remains possible. However, in our view, the geographical origin of dogs remains unknown. Previously suggested points of origin based upon present-day patterns of genomic diversity (2, 8, 10) or affinities to modern wolf populations (12) are sensitive to the obscuring effects of more recent population dynamics and gene flow. Ultimately, integrating DNA from dogs and wolves even older than those analyzed here with archaeology, anthropology, ethology and other disciplines, is needed to determine where, and in which environmental and cultural context the first dogs originated.

Supplementary Material

Data S1

Supplementary material

One Sentence Summary.

Ancient dog genomes reveal no evidence for multiple origins but an early diversification, followed by a genetic history that both mirrors and differs from humans.

Acknowledgements

We thank S. Charlton, I. Lazaridis, A. Manin and I. Mathieson for comments on the manuscript, G.-D. Wang and C. Marsden for help with data access, and GORDAILUA (the Gipuzkoa Centre for Heritage Collections), S. San José, C. Olaetxea, M. Urteaga, A. Sampson, A.R. Sardari Zarchi and M. Abdollahi (ICHHTO, Iran) for facilitating sample access.

Funding

Ancient genome sequencing was supported by SciLifeLab National Projects and the Erik Philip Sörensen Foundation (to P.S.). A.B., T.D., and P.S. were supported by the Francis Crick Institute core funding (FC001595) from Cancer Research UK, the UK Medical Research Council, and the Wellcome Trust. P.S. was also supported by the European Research Council (grant no. 852558) and Wellcome Trust Investigator award (217223/Z/19/Z). R.L. was supported by the Social Sciences and Humanities Research Council of Canada (#SSHRC IG 435-2014-0075). Y. K. was supported by State Assignment of the Sobolev Institute of Geology and Mineralogy. M.S. was supported by ZIN RAS (state assignment no. АААА-А19-119032590102-7). A.T.L. was supported by the Smithsonian’s Peter Buck Postdoctoral Fellowship. Archaeological work in Serbia was supported by AHRC grant AH/J001406/1. Computations were supported by SNIC-UPPMAX (b2016004), and the UOXF ARC facility. L.A.F.F. was supported by the Wellcome Trust (Grant 210119/Z/18/Z) and by Wolfson College (University of Oxford). G.L. was supported by the ERC (Grant ERC-2013-StG-337574-UNDEAD). G.L. and K.D. were supported by the Natural Environmental Research Council (Grants NE/K005243/1 and NE/K003259/1). Dating was supported by the NERC Radiocarbon Facility (NF/2016/2/4).

Footnotes

Author contributions: GL and PS initiated the study. JS, K-GS, DA, EA, SA, GB-O, VIB, JB, DB, SF, IF, DF, MG, LH, LJ, JK-C, YK, RJL, DLD, MM, MN, VO, DO, MP, MR, DR, BR, MS, IS, AT, KT, IU, AV, PW, AG, and LD contributed material and archaeological information. RS, EE, OL, LG-F, JH, AJ, HR and AL did ancient DNA molecular work, supervised by AG, LD, RP, GL and PS. AB, LF, AC, TD, EKI-P and PS processed the genome sequence data, supervised by LF and PS. AB did population genomic analyses, supervised by PS. ATL did mtDNA analyses, supervised by GL. AB, LF, GL and PS wrote the paper with input from RP, KD and all other authors.

Competing interests: Authors declare no competing interests.

Data and materials availability

The generated DNA sequencing data will be made available in the European Nucleotide Archive (ENA) under study accession PRJEB38079.

References and notes

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1

Supplementary material

Data Availability Statement

The generated DNA sequencing data will be made available in the European Nucleotide Archive (ENA) under study accession PRJEB38079.