The Dawn of Human Matrilineal Diversity (original) (raw)

Abstract

The quest to explain demographic history during the early part of human evolution has been limited because of the scarce paleoanthropological record from the Middle Stone Age. To shed light on the structure of the mitochondrial DNA (mtDNA) phylogeny at the dawn of Homo sapiens, we constructed a matrilineal tree composed of 624 complete mtDNA genomes from sub-Saharan Hg L lineages. We paid particular attention to the Khoi and San (Khoisan) people of South Africa because they are considered to be a unique relic of hunter-gatherer lifestyle and to carry paternal and maternal lineages belonging to the deepest clades known among modern humans. Both the tree phylogeny and coalescence calculations suggest that Khoisan matrilineal ancestry diverged from the rest of the human mtDNA pool 90,000–150,000 years before present (ybp) and that at least five additional, currently extant maternal lineages existed during this period in parallel. Furthermore, we estimate that a minimum of 40 other evolutionarily successful lineages flourished in sub-Saharan Africa during the period of modern human dispersal out of Africa approximately 60,000–70,000 ybp. Only much later, at the beginning of the Late Stone Age, about 40,000 ybp, did introgression of additional lineages occur into the Khoisan mtDNA pool. This process was further accelerated during the recent Bantu expansions. Our results suggest that the early settlement of humans in Africa was already matrilineally structured and involved small, separately evolving isolated populations.

Introduction

Current genetic data support the hypothesis of a predominantly single origin for anatomically modern humans.1,2 The phylogeny of the maternally inherited mitochondrial DNA (mtDNA) has played a pivotal role in this model by anchoring our most recent maternal common ancestor to sub-Saharan Africa and suggesting a single dispersal wave out of that continent which populated the rest of the world much later.3–5 However, despite its importance as the cradle of humanity and the main location of anatomically modern humans for most of their existence, the initial Homo sapiens population dynamics and dispersal routes remain poorly understood.6,7 The potential to use present-day genetic patterns to detect the existence, or lack thereof, of matrilineal genetic structure among early Homo sapiens populations in sub-Saharan Africa is therefore of particular interest.

The human mtDNA phylogeny can be collapsed into two daughter branches, L0 and L1′2′3′4′5′6 (L1′5),5 located on opposite sides of its root (Figure 1).8,9 The L1′5 branch is far more widespread and has given rise to almost every mtDNA lineage found today, with two clades on this branch, (L3)M and (L3)N, forming the bulk of worldwide non-African genetic diversity and marking the out-of-Africa dispersal 50,000–65,000 years before present (ybp)4 (Figure 1). Current models, predating the recognition of L0 as sister to L1′5,9,10 suggest that the contemporary sub-Saharan mtDNA gene pool is the result of an early expansion of modern humans from their homeland, often suggested to be East Africa, to most of the African continent by exclusively L1 Hg clades, before being overwhelmed by a later expansion wave of L2 and L3 clades dated to 60,000–80,000 ybp.11,12 A more recent geographically restricted enrichment of the African maternal gene pool was shown to have occurred during the early Upper Paleolithic, when populations carrying mtDNA clades M1 and U6 arrived to north and northeast Africa from Eurasia, hardly penetrating the sub-Saharan portion of the continent, except Ethiopia.13,14 Therefore, the current sub-Saharan mtDNA gene pool is overwhelmingly a rich mix of L0 and L1′5 clades, found at varying frequencies throughout the continent.15

Figure 1.

Figure 1

Simplified Human mtDNA Phylogeny

The L0 and L1′5 branches are highlighted in light green and tan, respectively. The branches are made up of haplogroups L0–L6 which, in their turn, are divided into clades. Khoisan and non-Khoisan clades are shown in blue and purple, respectively. Clades involved in the African exodus are shown in pink. A time scale is given on the left. Approximate time periods for the beginning of African LSA modernization, appearance of African LSA sites, and solidization of LSA throughout Africa are shown by increasing colors densities. For a more detailed phylogeny, see Figure S1.

This entangled pattern of mtDNA variation gives an initial impression of lack of internal maternal genetic structure within the continent. Alternatively, it might indicate the elimination of such an early structure because of massive demographic shifts within the continent, the most dominant of which was certainly the recent Bantu expansions and spread of agriculturist style of living.15 However, some L(xM,N) clades do show significant phylogeographic structure in Africa, such as the localization of L1c1a to central Africa16 or the localization of L0d and L0k (previously L1d and L1k) to the Khoisan people,17–20 in which they account for over 60% of the contemporary mtDNA gene pool. Early studies based on mtDNA control region variation have suggested that Khoisan divergence dates to an early stage in the history of modern humans,18 whereas their anthropological and linguistic features show closer affinities to each other than to those of other populations in Africa.21,22 Their distinctiveness is also supported by phylogenetic studies of the male-specific Y chromosome that indicate that the most basal branch of the Y phylogeny is now common among the Khoisan but is rare or absent in other populations.18

To better understand the reason for the high prevalence of two basal mtDNA lineages L0d and L0k within Khoisan, and the possible implications that this pattern might have on our understanding of early maternal genetic structure within Homo sapiens populations, we studied, at the level of complete mtDNA sequences, the variation of 624 Hg L(xM,N) mtDNA genomes. Our findings enable the identification of different phylogenetic origins for L0d and L0k lineages versus all other contemporary mtDNA lineages found within the Khoisan and support a demographic model with extensive maternal genetic structure during the early evolutionary history of Homo sapiens. This maternal structure is likely the result of ancient population splits and movements and is not consistent with a homogenous distribution of modern humans throughout sub-Saharan Africa.

Material and Methods

Sampling

Table S1 available online details the information for each of the 624 samples included in this study. We evaluated all 315 Hg L(xM,N) complete mtDNA sequences reported in the literature.5,8–10,16,23–25 Next, we identified all Hg L(xM,N) samples in all population sample collections available in Haifa (D.M.B.), Family Tree DNA (D.M.B.), Johannesburg (H.S. and H.M.), National Geographic Society (R.S.W. and J.B.S.), Paris (L.Q.M.), Porto (L.P.), Rome (R.S.), and Tartu (E.M. and R.V.) and chose 309 for complete mtDNA sequencing. Samples were chosen to include the widest possible range of Hg L(xM,N) internal variation on the basis of the previously available sequence analysis of the mtDNA control region and are, therefore, biased toward rare variants. In addition, we attempted to focus on branches (e.g., L0d, L0k), populations (e.g., Khoisan), and geographic regions (e.g., Chad) for which the current data were scant. Last, we preferred to sequence variants that the current literature suggested to be rare or anecdotal in any given geographic region (e.g., L0k in the Near East). All samples reported herein were derived from blood, buccal swab, or blood cell samples that were collected with informed consent according to procedures approved by the Institutional Human Subjects Review Committees in their respective locations.

Complete mtDNA Sequencing

DNA was amplified with 18 primers to yield nine overlapping fragments as previously reported.26 After purification, the nine fragments were sequenced by means of 56 internal primers to obtain the complete mtDNA genome. Sequencing was performed on a 3730xl DNA Analyzer (Applied Biosystems), and the resulting sequences were analyzed with the Sequencher software (Gene Codes Corporation). Mutations were scored relative to the revised Cambridge Reference Sequence (rCRS).27 The 309 Hg L(xM,N) complete mtDNA sequences reported herein have been submitted to GenBank (accession numbers EU092658EU092966). Sample quality control was assured as follows:

Nomenclature

The term African Hg L(xM,N) is used to describe all mtDNA Haplogroups but (L3)M and (L3)N. We reserve the term branch to describe the two evolving sides of the root and have labeled them L0 and L1′2′3′4′5′6 (L1′5).5 The two major branches each composed of one to several haplogroups.30 Note that the L0 branch is made of the L0 Hg alone, whereas the L1′5 branch includes haplogroups L1–L6. Haplogroups are composed of clades (e.g., L0d and L0k), which in their turn are composed of lineages, which represent an evolving set of closely related haplotypes. The term haplotype describes the entire combination of substitutions retrieved from the complete sequence in any given sample and therefore indicates the tips of the phylogeny, whether a singleton or not. Numbers 1–16569 refer to the position of the substitution in the rCRS.27 We followed the consensus nomenclature scheme31 when possible. In many cases, we labeled previously unreported deep branches (e.g., L1c1c), understanding that these designations are meant to facilitate reading and future literature comparison and are prospective candidates of clades to be fully defined in the future, provided common ancestral substitution motifs could be identified in complete mtDNA sequences of other samples. Nomenclature within Hg L(xM,N) has been the subject of some ambiguity because of the relabeling of some of the clades. The clades L0d, L0f, L0k, and L5 were previously labeled L1d, L1f, L1k, and L1e, respectively. We followed the designation in5,8,15,32 for the definitions of the major branches with a single exception. We have eliminated the label L7 coined in5 and revert back to the original label L4a as suggested in13 because of the following: (1) A large number of samples (17) suggest position 16362 to be at the root of both clades, (2) both clades share similar distribution in East Africa and in southern West Eurasia, and (3) coalescence ages and the observed subclade-type architecture appear to be similar. We have not used the label L1c5 suggested by33 because our complete mtDNA-based analysis indicates it to be L1c1a1, as suggested by.15 To avoid confusion, we have skipped this label and moved from L1c4 to L1c6. We added labeling for previously unlabeled bifurcations if they became relevant for our discussion.

The term Khoisan is used in reference to two major ethnic groups of Southern Africa, the Khoi and San, though several other names exist for either one or both of these groups, such as the Khoi, Khoe, Khoi-San, and Khoe-San.

African Hg L Phylogeny

We generated a maximum-parsimony tree of 624 complete mtDNA sequences belonging to Hg L(xM,N) (Figure S1). The tree was rooted according to8 and includes 309 samples reported herein and 315 previously reported samples: 21 sequences from,23 six from,10 five from,34 ten from,9 93 from,24 126 from,8 23 from,5 four from,25 and 27 from.16 The genotyping information from5,34 included herein corrects several inaccuracies that were identified during the establishment of the phylogeny. Sequence data from35 were not incorporated into our summary tree because we counted at least 25 missing root-defining substitutions in some of the reported complete mtDNA sequences. Until the reason(s) for such substantial differences can be identified, we preferred to omit this published database. Mutations are shown on the branches. Transitions are labeled in capital letters (e.g., 10420G). Transversions are labeled in lowercase letters (e.g., 2836a). Sequencing alignment always prefers 3′ gap placement for indels. Deletions are indicated by a “d” after the deleted nucleotide position (e.g., 15944d). Insertions are indicated by a dot followed by the number and type of inserted nucleotide(s) (e.g., 5899.1C). In cases where an insertion was expected according to the phylogeny but a reversion of the insertion was observed, we denoted it as in the following example: sample L263, 5899.1Cd. Underlined nucleotide positions occur at least twice in the tree. An exclamation mark (!) at the end of a labeled position denotes a reversion to the ancestral state in the relative pathway to the rCRS. Sample names are denoted by the letter L followed by a serial number. The contemporary country in which the sample was collected (if known) is marked below the serial number, and the background is colored to grossly divide the samples into the Near East, Southwest Asia, the Mediterranean, Europe, and South, North, West, East, and sub-Saharan Africa as denoted in the color index at the upper-left corner of the figure. The ethnicity (if known) of the individual who donated the sample is further marked below. When the country from which the sample was collected is unknown, the gross geographic region is inferred from the ethnicity information. The information included herein from8 includes information from the coding region alone (435–16023) and is denoted by the letter p at the end of the serial number.

The tree was first drawn by hand, and its branches were validated by networks constructed with the program Network 4.2.0.1. We have applied the reduced median algorithm (r = 2), followed by the median-joining algorithm (epsilon = 2) as described at the Fluxus Engineering website. The hypervariable indels at positions 309, 315, and 16189 were excluded from the phylogeny. The information of the reported samples is presented in Table S1. Some caveats and possible genotyping or reading errors that might affect the accuracy of the phylogeny are detailed herein:

Age Estimates

For age estimation of ancestral nodes in our phylogenetic tree, we applied PAML36 to the coding-region polymorphisms of our samples, excluding indels, and by using the HKY85 substitution model. Each tip node of the phylogenetic tree was counted as one event if shared by a few samples. We eliminated from the coalescence analysis samples L025, L026, and L039, in which we observed three or more coding-region back mutations at haplogroup-defining positions. We used the rate of 5138 years per coding-region single-nucleotide polymorphism9 to translate the age estimates in mutations into ages in years. It is worth noting that age estimates in years should be cautiously interpreted because the actual mutation rate in years per mutation remains an open debate in the literature.8,37 The maximum-likelihood estimate of the transition to transversion rate on the basis of our data was 19.91, with a standard error of 1.02. It is important to consider the meaning of the age estimates given herein. Each estimate is a time to the most recent common ancestor of a set of mtDNA molecules. Thus the age of the L0d clade, defined by the available sequences, is 101,589 ± 10,318 ybp, but it started to diverge from its sister clade, L0abfk, 143,654 ± 11,111 ybp. Mutations defining the L0d clade could have occurred at any time between these two dates.

Hypothesis Testing of the Time of Isolation of the Khoisan

Our goal here was to evaluate whether it is likely that the phylogenetic restriction of Khoisan to lineages in L0d and L0k could result from an isolation event starting from a single, homogeneous Homo sapiens population at different points in time. Given a time X (say, 100,000 ybp), we consider three elements:

We then perform a permutation test to assess whether a random selection of Z lineages out of Y (given the phylogenetic tree of the Y lineages) is likely to have created an isolation measure smaller than or equal to L. In other words, we count how many groups of Z lineages can be isolated from the rest of the tree by cutting L links or less and then divide this number by the total number of groups of Z lineages (which is choosing Z out of Y). For the example of 100,000 ybp, the seven triplets that can be isolated from the rest of the tree by cutting at most two links are the following:

This gives a permutation test p value of 7/(14 choose 3) = 0.019 to the event that an isolation of three lineages by drift would lead to this level of phylogenetic localization. We applied this test to the phylogenetic tree at various time points (Table 1). As can be seen, the isolation-and-drift hypothesis can be rejected for times later than 100,000 ybp, with p values of 0.019 and 0.0016 for 100,000 and 90,000 ybp, respectively. For later dates, the p values decrease dramatically further.

Table 1.

Estimated Odds for the Occurrence of L0d and L0k Clades in Khoisan by Drift

X (Time ybp) Y (Number of Lineages) Z (Number of Khoisan Lineages) L (Localization Measure)a p Value p Value Corrected by FDRa
144,000 7 2 2 0.24 0.24
120,000 9 2 2 0.17 0.24
100,000 14 3 2 0.019 0.057
90,000 22 4 2 0.0016 0.0065
80,000 24 4 2 0.0012 0.0061

In analyzing the results in Table 1, we may want to take into account the issues of multiple comparisons and false discovery rate (FDR).38 First, we observe that our testing procedure can be considered sequential, because the hypothesis we are testing is that the isolation occurred at or after time X. So, as soon as we reject the hypothesis for time X, we are implicitly rejecting the hypothesis for all later times. Thus, we can reject the hypothesis that the isolation happened at 144,000 ybp or later at significance level 0.24 (in which case our second model of an early split must be correct). For the 100,000 ybp test, the p value of 0.019 implies that we would reject the hypothesis of isolation at or after this date at a significance level of 0.019 × 3 = 0.057 or higher (3 is the FDR correction factor, in this case identical to a Bonferroni correction), after a multiple-comparison correction. For the 90,000 ybp test, the result is significant at level 0.0016 × 4 = 0.0065 or higher. It should be noted that, because the hypotheses we are testing are positively correlated (relating to the evolution of one tree over time), the FDR correction we perform here is overly conservative.39

Results

Allocating the Khoisan mtDNA Lineages within the African Hg L Phylogeny

The contemporary composition of the Khoisan mtDNA gene pool shows that over 60% of Khoisan carry either L0d or L0k lineages, whereas the remaining 40% are a mixture of various non-L0d or L0k lineages found in sub-Saharan Africa.17–20 To survey contemporary Khoisan mtDNA diversity, we generated a maximum-parsimony tree composed of 309 previously unreported and 315 previously reported5,8–10,16,23,25,34,40 complete Hg L(xM,N) mtDNA genomes from populations located throughout the Hg L(xM,N) geographic range of distribution (Table S1) and including 38 Khoisan samples. In this instance, the detailed Hg L(xM,N) phylogeny served as a magnifying background for the accurate positioning of the 38 Khoisan mtDNA genomes. This in turn allowed us to focus on branches in which Khoisan and non-Khoisan samples were found in close phylogenetic proximity, in an attempt to understand the temporal origin and timing of their introduction into the Khoisan. To capture as many different lineages as possible within the Khoisan, sample selection was enriched for rare variants, both within and outside of the L0d and L0k clades. Given the reported structure of the Khoisan mtDNA gene pool17,18 it is likely that the 38 Khoisan complete mtDNA sequences cover most variation within Khoisan L0d and L0k clades but may incompletely represent Khoisan non-L0d and L0k clades.

Revealing the Remote and Recent Maternal Ancestors of Contemporary Khoisan

The observation of L0d and L0k lineages in non-Khoisan populations,13 as well as of various non-L0d or L0k lineages among Khoisan,15 implies that the correct interpretation of the Khoisan mtDNA gene pool depends on our ability to understand the phylogenetic origin, rather than just the frequencies,17,18 of the various lineages. L0d is represented by 30 samples, of which 20 are from the Khoisan, and L0k comprises seven samples, of which six are from the Khoisan and the other is from Yemen. Each of the ten non-Khoisan L0d samples was compared with its topologically closest Khoisan neighbor, and this yielded an average coalescence time estimate of 13,000 ybp, with the greatest time depth at 33,000 ± 9,000 ybp (Table 2, Figure S1). The single L0k Yemenite sample coalesced with the Khoisan L0k samples at 40,000 ± 9,000 ybp (Table 2, Figure S1). Similarly, all Khoisan sequences not belonging to L0d or L0k haplotypes were also assessed to determine coalescence age estimates with their nearest non-Khoisan topological neighbors. The L0abf clade and the L1′5 branch included one and 11 Khoisan samples, respectively, whose average coalescence with their respective closest topological non-Khoisan neighbors was 7,000 ybp, with the largest estimated age at 39,000 ± 6,000 ybp (Table 2). The observation of higher frequency and greater internal variation of L0d and L0k lineages within the Khoisan (Table 2, Figure S1) clearly points to this group as the initial source of these two haplogroups in non-Khoisan, whereas the higher frequency and internal variation of L1′5 lineages in non-Khoisan suggests that their presence in the Khoisan is the result of recent gene flow from elsewhere.

Table 2.

Coalescence Estimates for Nearest Neighboring Khoisan and Non-Khoisan Types

Haplogroup Khoisan (Sample ID) Non-Khoisan (Sample ID) Coalescence ± Standard Error (ybp)
L0d1aa L490 L219, L524 18,254 ± 6,349
L0d1ba L500 L220 16,667 ± 6,349
L0d1c1a L226, L488 L521 0 ± 2,600
L226, L488 L520 2,381 ± 2,381
L0d2aa L492, L035, L505, L498 L334 8,730 ± 3,175
L035, L505, L498 L215 6,349 ± 2,381
L0d2ca L503, L504 L209, L342 20,635 ± 5,556
L0d3a L501 L583 33,334 ± 8,730
L0ka L496, L018, L019, L506, L513, L15 L441 39,683 ± 8,730
L0a1b1 L519 L125 1,587 ± 794
L1b1a L512 L129 0 ± 2,600
L1b1a4 L038 L128, L349 5,556 ± 2,381
L1c1a1a1b L042 L268, L272, L276 3,968 ± 2,381
L1c1d L494 L277 7,143 ± 4,762
L2b1 L514 L033 0 ± 2,600
L2a2b L041 L238, L243 3,175 ± 2,381
L2a1f L227 L324, L571, L214, L581, L339 7,937 ± 3,968
L4b2a2 L211, L497 L428, L616, L386, L603 38,890 ± 5,556
L3e1b L507 L536 0 ± 2,600
L3f1b1 L518 L116, L118, L330 9,524 ± 2,381

Taken together, the complete mtDNA coalescence analysis reveals two independent sources for the contemporary Khoisan mtDNA gene pool. The lesser of these appears to be the result of recent introgression from a variety of haplogroups existing elsewhere in Africa. Even the oldest age estimates for these exogenous lineages postdate the onset of the Late Stone Age (LSA) (Table 2) and the apparent increase in modern human migration associated with that period,3,15 and the majority of these lineages are concordant with the very recent (3000–5000 ybp) expansion of Bantu-speaking peoples from western Africa.41 When these apparently recent introgression events are eliminated, this finding suggests that apart from extinct clades, the mtDNA gene pool of the Middle Stone Age (MSA) Khoisan ancestors was probably limited to the clades L0d and L0k.

Dating the Khoisan Division and Isolation

The concomitant occurrence of the two adjacent basal mtDNA clades, L0d and L0k, within the Khoisan demands an explanation. In the following, we compare two alternative hypotheses (Figure 2).

Figure 2.

Figure 2

Maternal Gene Flow within Africa

The gradual maternal movements suggested by the first (A) and second (B) hypotheses are denoted by the ascending numerical labels. A gradient colorization system is used to illustrate the timing of the events. The temporal direction and timing of the arrows and expansion waves are general and should not be treated as firm migratory paths.

(A) An initial prolonged colonization (brown) by anatomically modern humans (1) is followed by a dispersal wave (green) of a fracture of the population (2) and the localization of L0d and L0k to southern Africa (3).

(B) An early Homo sapiens division in a hypothetical migration zone (1) resulted in two separately evolving populations (2) and the localization of L0 (green) in southern Africa and L1′5 (red) in eastern Africa. A subsequent dispersal event of the L0abf subset from the southern population and its mergence with the eastern population (brown) is suggested (3), resulting in the former population composed only of L0d and L0k and the latter composed of L1′5 and L0abf.

Later dispersal waves from the eastern African population parallels the beginning of African LSA approximately 70,000 ybp (4). Rapid migrations during the LSA (5) brought descendants of the eastern African population into repeated contact with the southern population, peaking during the Bantu expansion (6).

The first hypothesis has been previously explained as the existence of a single ancestral MSA Homo sapiens population probably existing in eastern or southern Africa.6,11,13,42 According to this hypothesis, both L0 and L1′5 clades would have coevolved within it, and the localization of L0d and L0k to the southern part of Africa is then considered the result of a population split followed by drift. This could result from a migration followed by isolation (Figure 2A) and would thus reveal the footprint of an early spread of the ancestral population across sub-Saharan Africa.12 In the context of this hypothesis, one must consider the likelihood that from a population rich in a joint variety of L0 and L1′5 lineages, only the two basal and topologically adjacent L0 clades, L0d and L0k, would be enriched by drift within the Khoisan while becoming extinct in all non-Khoisan. The lower time limit of such a separation can be inferred from the likelihood that it occurred based on the composition of L0 and L1′5 clades at different time frames within and outside Khoisan because we evaluated it by our hypothesis testing of the time of isolation of the Khoisan. On the basis of our hypothesis testing of the time of isolation of the Khoisan, we conclude that it is unlikely that the genetic composition of modern Khoisan stemmed from a putative homogeneous L0 and L1′5 source population later than 90,000 ybp (p = 0.0065) (Table 1). An upper time limit for the underlying drifting event can be inferred from the first time L0d and L0k existed together, corresponding to the L0abfk split around 140,000 ybp (Figure S1). Naturally, this hypothesis cannot be extended to time periods earlier than the L0abfk split and the emergence of the L0k clade (Figure 1).

Here, we propose an alternative hypothesis, which suggests that the deepest L0-L1′5 split observed in the human mtDNA tree might represent both a phylogenetic and an ancient Homo sapiens population split into two small populations. This division, occurring in an unknown early Homo sapiens migratory zone, is dated by our coalescence estimates to 140,000–210,000 ybp (Figure S1) and was possibly generated by drift due to the small population sizes of that period.6,11,13,42 This hypothesis therefore suggests the localization of these early L0 and L1′5 mtDNA branches to populations located in southern and eastern Africa, respectively (Figure 2B). The presence of L0d and L0k within the contemporary Khoisan may therefore result from their independent evolution within the early southern L0 population rather than occurring as a matter of chance. The observation of L0abf lineages found throughout the L1′5 range would then be explained by a dispersal event circa 144,000 ybp (L0abfk split, Figure 1) where the successful integration of a subset of L0 lineages into the L1′5 population was likely due to favorable environmental conditions in eastern Africa compared with those in southern Africa.

Discussion

Khoisan: The First Division

The phylogenetic analysis of complete mtDNA sequences found among contemporary Khoisan suggests that their division from other modern humans occurred not later than 90,000 ybp and therefore reveals strong evidence for the existence of maternal structure early in the history of Homo sapiens. This hypothesis closely parallels the pattern seen earlier in the fossil hominine record, where the “bushy” tree43 shows clear evidence of population divergence during the evolution of our ancestors over millions of years.44 With this information, we further attempted to track the possible mechanisms that shaped the foundation and evolution of Khoisan ancestors. Although it is impossible to validate empirically the two suggested hypothesis (or even more complex intermediate scenarios) on the basis of the genetic data alone, three important points deserve mention.

First, our results highlight the L0abfk split about 133,000–155,000 ybp (Figure 1) as marking a key point in Homo sapiens matrilineal population structuring. Though the archeological record from this period is too poor to reliably identify reasons for the split(s), recent studies show that the sporadic settlements of Homo sapiens in northwest Africa, the Near East, Chad, and southern Africa45–47 may have been caused by stressful climatic fluctuations known to have occurred throughout the MSA.47,48 Archeological evidence reveals the early existence of Homo sapiens in southern Africa (70,000 ybp),46 and studies of the mtDNA in contemporary populations demonstrate convincingly that very deep (50,000–60,000 ybp) autochthonous mtDNA lineages can survive locally both in isolated habitats49 and open surroundings.4 Although it is tempting to link these early southern African settlements to ancestors of the Khoisan, our data cannot prove it, nor can they suggest the cradle of Homo sapiens to be southern or eastern Africa.

Second, it is evident that since the L0abfk split, the expansion dynamics of the L0d and L0k clades and that of the L0abf and L1′5 clades have proceeded in the most uneven ways, with one localizing to southern Africa and giving rise to the matrilineal ancestry of the present-day Khoisan and the other spreading to all corners of the world and giving rise to all present-day non-Khoisan populations, including non-Africans.

Third, it seems that these southern and eastern populations remained isolated from each other, at least maternally, for an extremely long period of between 50,000 and 100,000 years until the development of LSA technologies47 which, coupled with more favorable environmental conditions, may have allowed behaviorally modern Homo sapiens to expand its range.6 This apparent sign of maternal isolation and structure in the early settlement dynamics of Africa implies the formation of small, independent human communities rather than a uniform early spread of anatomically modern humans as previously suggested.11,12

Early Maternal Genetic Structure among Modern Humans

The proposed matrilineal sequestration of African MSA mtDNA into isolated populations does not seem to be restricted to Khoisan. A recent study showed that ancestors of contemporary Pygmies diverged from an ancestral Central African population no more than 70,000 ybp and that isolation was breached throughout the LSA.16 Moreover, this matrilineal sequestration pattern also offers a simple explanation to the surprising finding that of the more than 40 mtDNA lineages in Africa at the time modern humans left Africa3 (Figure S1), only two of the variants, (L3)M and (L3)N,4 gave rise to the entire wealth of mtDNA diversity outside of Africa.5,8 Different approaches were taken in the attempt to estimate the sub-Saharan Homo sapiens population size in different time frames.7 The understanding of the minimum number of existing maternal lineages in different time periods, as far as can be estimated from their survival to the present day, might benefit our understanding of the magnitude of Homo sapiens expansion in these periods and shed light on the frequency of the loss of mtDNA lineages in long time periods.

In summary, the study of extant genetic variation in African populations with complete mtDNA sequences provides an insight into past Homo sapiens demographics, suggesting that small groups of early humans remained in geographic and genetic isolation until migrations during the LSA. Studies of additional genomic regions, particularly of unlinked autosomal regions with their greater effective population size, may reveal additional details about these early demographic events from a genome-wide perspective.

Supplemental Data

One figure and one table are available at http://www.ajhg.org/.

Supplemental Data

Figure S1. African Hg L Phylogeny

Table S1. Source, Demography, and Genotyping Parameters of the 624 Complete mtDNA Sequences

Web Resources

The URLs for data presented herein are as follows:

Accession Numbers

The 309 Hg L(xM,N) complete mtDNA sequences reported herein have been submitted to GenBank under accession numbers EU092658EU092966.

Acknowledgments

We thank all individuals that have voluntarily donated their DNA sample to the study. We also thank Ryan Sprissler and Heather M. Issar from the Arizona Research Labs, University of Arizona, and Concetta Bormans and Michal Bronstein from the Genomics Research Center, Family Tree DNA, for excellent laboratory services. This study was supported by National Geographic Society, IBM, the Waitt Family Foundation, the Seaver Family Foundation, Family Tree DNA, and Arizona Research Labs. R.V. is grateful to Swedish Collegium of Advanced Studies for fellowship during the final preparation of the manuscript. S.R. is partially supported by European Union grant MIRG-CT-2007-208019. C.T.S. is supported by The Wellcome Trust. Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP) (L.P.) is supported by Programa Operacional Ciência, Tecnologia e Inovação (POCTI) and Quadro Comunitário de Apoio III.

The Genographic Consortium includes the following: Theodore G. Schurr, Department of Anthropology, University of Pennsylvania, Philadelphia, PA 19104-6398, USA; Fabricio R. Santos, Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-010, Brazil; Lluis Quintana-Murci, Unit of Human Evolutionary Genetics, CNRS URA3012, Institut Pasteur, Institut Pasteur, 75724 Paris, France; Jaume Bertranpetit, Evolutionary Biology Unit, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona 08003, Catalonia, Spain; David Comas, Evolutionary Biology Unit, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona 08003, Catalonia, Spain; Chris Tyler-Smith, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK; Elena Balanovska, Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow 115478, Russia; Oleg Balanovsky, Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow 115478, Russia; Doron M. Behar, Molecular Medicine Laboratory, Rambam Health Care Campus, Haifa 31096, Israel and Genomics Research Center, Family Tree DNA, Houston, TX 77008, USA; R. John Mitchell, Department of Genetics, La Trobe University, Melbourne, Victoria, 3086, Australia; Li Jin, Fudan University, Shanghai, China; Himla Soodyall, Division of Human Genetics, National Health Laboratory Service, Johannesburg, 2000, South Africa; Ramasamy Pitchappan, Department of Immunology, Madurai Kamaraj University, Madurai 625021 Tamil Nadu, India; Alan Cooper, Division of Earth and Environmental Sciences, University of Adelaide, South Australia 5005, Australia; Ajay K. Royyuru, Computational Biology Center, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA; Saharon Rosset, Department of Statistics and Operations Research, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel and Data Analytics Research Group, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA; Laxmi Parida, Computational Biology Center, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA; Jason Blue-Smith, Mission Programs, National Geographic Society, Washington, D.C. 20036, USA; David Soria Hernanz, Mission Programs, National Geographic Society, Washington, D.C. 20036, USA; and R. Spencer Wells, Mission Programs, National Geographic Society, Washington, D.C. 20036, USA.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. African Hg L Phylogeny

Table S1. Source, Demography, and Genotyping Parameters of the 624 Complete mtDNA Sequences