Alan Rogers - Academia.edu (original) (raw)
Papers by Alan Rogers
bioRxiv (Cold Spring Harbor Laboratory), Apr 23, 2020
Recent studies have suggested that selection is widespread throughout the genome and largely unco... more Recent studies have suggested that selection is widespread throughout the genome and largely uncompensated for in inferences of population history. To address this potential issue, we estimated site pattern frequencies for neutral and selection associated areas of the genome. There are notable differences in these frequencies between neutral regions and those affected by selection. However, these differences have relatively small effects when inferring population history. .
Proceedings of The Royal Society B: Biological Sciences, Jan 22, 2013
Males of many species help in the care and provisioning of offspring, and these investments often... more Males of many species help in the care and provisioning of offspring, and these investments often correlate with genetic relatedness. For example, many human males invest in the children of sisters, and this is especially so where men are less likely to share genes with children of wives. Although this makes qualitative sense, it has been difficult to support quantitatively. The prevailing model predicts investment in children of sisters only when paternity confidence falls below 0.268. This value is often seen as too low to be credible; so investment in sisters' children represents an unsolved problem. I show here that the prevailing model rests on a series of restrictive assumptions that underestimate relatedness to sisters' children. For this reason, it understates the fitness payoff to men who invest in these children. This effect can be substantial, especially in societies with low confidence in paternity. But this effect cannot be estimated solely from confidence in paternity. One must also estimate the probability that two siblings share the same father.
PLOS Genetics, Apr 27, 2017
The indigenous people of the Tibetan Plateau have been the subject of much recent interest becaus... more The indigenous people of the Tibetan Plateau have been the subject of much recent interest because of their unique genetic adaptations to high altitude. Recent studies have demonstrated that the Tibetan EPAS1 haplotype is involved in high altitude-adaptation and originated in an archaic Denisovan-related population. We sequenced the whole-genomes of 27 Tibetans and conducted analyses to infer a detailed history of demography and natural selection of this population. We detected evidence of population structure between the ancestral Han and Tibetan subpopulations as early as 44 to 58 thousand years ago, but with high rates of gene flow until approximately 9 thousand years ago. The CMS test ranked EPAS1 and EGLN1 as the top two positive selection candidates, and in addition identified PTGIS, VDR, and KCTD12 as new candidate genes. The advantageous Tibetan EPAS1 haplotype shared many variants with the Denisovan genome, with an ancient gene tree divergence between the Tibetan and Denisovan haplotypes of about 1 million years ago. With the exception of EPAS1, we observed no evidence of positive selection on Denisovan-like haplotypes.
BMC Genomics, 2010
Background: Though a variety of linkage disequilibrium tests have recently been introduced to mea... more Background: Though a variety of linkage disequilibrium tests have recently been introduced to measure the signal of recent positive selection, the statistical properties of the various methods have not been directly compared. While most applications of these tests have suggested that positive selection has played an important role in recent human history, the results of these tests have varied dramatically. Results: Here, we evaluate the performance of three statistics designed to detect incomplete selective sweeps, LRH and iHS, and ALnLH. To analyze the properties of these tests, we introduce a new computational method that can model complex population histories with migration and changing population sizes to simulate gene trees influenced by recent positive selection. We demonstrate that iHS performs substantially better than the other two statistics, with power of up to 0.74 at the 0.01 level for the variation best suited for full genome scans and a power of over 0.8 at the 0.01 level for the variation best suited for candidate gene tests. The performance of the iHS statistic was robust to complex demographic histories and variable recombination rates. Genome scans involving the other two statistics suffer from low power and high false positive rates, with false discovery rates of up to 0.96 for ALnLH. The difference in performance between iHS and ALnLH, did not result from the properties of the statistics, but instead from the different methods for mitigating the multiple comparison problem inherent in full genome scans. Conclusions: We introduce a new method for simulating genealogies influenced by positive selection with complex demographic scenarios. In a power analysis based on this method, iHS outperformed LRH and ALnLH in detecting incomplete selective sweeps. We also show that the single-site iHS statistic is more powerful in a candidate gene test than the multi-site statistic, but that the multi-site statistic maintains a low false discovery rate with only a minor loss of power when applied to a scan of the entire genome. Our results highlight the need for careful consideration of multiple comparison problems when evaluating and interpreting the results of full genome scans for positive selection.
Peer Community In Mathematical and Computational Biology
Soraggi et al. [2] describe HMMploidy, a statistical method that takes DNA sequencing data as inp... more Soraggi et al. [2] describe HMMploidy, a statistical method that takes DNA sequencing data as input and uses a hidden Markov model to estimate ploidy. The method allows ploidy to vary not only between individuals, but also between and even within chromosomes. This allows the method to detect aneuploidy and also chromosomal regions in which multiple paralogous loci have been mistakenly assembled on top of one another. HMMploidy estimates genotypes and ploidy simultaneously, with a separate estimate for each genome. The genome is divided into a series of non-overlapping windows (typically 100), and HMMploidy provides a separate estimate of ploidy within each window of each genome. The method is thus estimating a large number of parameters, and one might assume that this would reduce its accuracy. However, it benefits from large samples of genomes. Large samples increase the accuracy of internal allele frequency estimates, and
bioRxiv, 2022
Introgression appears increasingly ubiquitous in the evolutionary history of various taxa, includ... more Introgression appears increasingly ubiquitous in the evolutionary history of various taxa, including humans. However, accurately estimating introgression is difficult, particularly when 1) there are many parameters, 2) multiple models fit the data well, and 3) parameters are not simultaneously estimated. Here, we use the software Legofit to investigate the evolutionary history of bonobos (Pan paniscus) and chimpanzees (P. troglodytes) using whole genome sequences. This approach 1) ignores within-population variation, reducing the number of parameters requiring estimation, 2) allows for model selection, and 3) simultaneously estimates all parameters. We tabulated site patterns from the autosomes of 71 bonobos and chimpanzees representing all five extant Pan lineages. We then compared previously proposed demographic models and estimated parameters using a deterministic approach. We further considered sex bias in Pan evolutionary history by analyzing the site patterns from the X chromo...
We estimate the strength of kin-structured migration in six human populations (five from New Guin... more We estimate the strength of kin-structured migration in six human populations (five from New Guinea and one from Finland) and in one population of nonhuman primates. We also test the hypothesis that migration is not kin structured by generating a sampling distribution of the estimator under the null hypothesis of independent random migration. We are unable to detect a statistically significant level of kin-structured migration in any population. However, five of our six human populations were from Papua New Guinea, and we cannot dismiss the possibility that migration is kin structured in other parts of the world.
Molecular Biology and Evolution, 1992
In a recent paper, Henry Harpending and I (Rogers and Harpending 1992) interpret variation in hum... more In a recent paper, Henry Harpending and I (Rogers and Harpending 1992) interpret variation in human mitochondrial DNA by using a model by Li (1977). The model is unrealistic in several respects, one of which is its use of the "infinite sites" model (Kimura 1969) of molecular evolution. This model assumes that no nucleotide (or restriction) site mutates more than once, an assumption that is clearly violated in human data (Kocher and Wilson 199 1). The infinite-sites model may nonetheless be useful as an approximation, provided that the error it introduces is small. This note evaluates the relative error of this approximation. Harpending and I make use of the infinite-sites assumption at only one point in our analysis: we assume that differences between a pair of individuals are introduced at a rate, 2u per generation, which is constant in time. With a finite number of sites, this assumption cannot hold exactly, because, after a nucleotide site has been struck once by mutation, later mutations need not add to the count of differences between our pair of individuals. The more sites there are with prior mutations, the lower will be the rate at which new differences accumulate. Thus, differences accumulate at a decreasing, rather than a constant, rate. Let D(t) denote the expected number of differences over a finite number, K, of nucleotide sites between a pair of lineages that have been separated for t generations. Under the infinite-sites model, D(t) = 2Kyt, where l.t is the mutation rate per nucleotide site, and where K-* cc while p + 0, such that Kl.t remains finite. Here, I assume instead that K is finite and that mutation at the ith site follows a Poisson process with rate l.ti per generation. Thus, the number of mutations at site i since time zero is Poisson with mean 2pLit. In human data, transversions are rare. Thus, I assume that all mutations are transitions. Consider the comparison, between two lineages, at a single site, say site i. Initially, at time 0, the two lineages will be identical, since they share a common ancestor then. The first mutation along the path connecting the two lineages will make them different at site i, but the second will restore their identity, because of the assumption that all mutations are transitions. Each successive mutation at site i toggles the two lineages back and forth between the states of identity and nonidentity. The probability that the two lineages differ at site i is thus equal to the probability that the number of prior mutations there is odd. This probability is (1-e-4"1t)/2 (Haldane 1919). The expected sum of differences over all K sites is D(t) = KE[(1-e-4VL1')/2] = K[1+(4t)]/2, (1) where E denotes the expectation with respect to the distribution of mutation rates across sites, and where 4 (z) = E { epZp > is the Laplace transform of the distribution of site-specific mutation rates.
The method of isonymy, developed by Crow and Mange for estimating inbreeding from surname frequen... more The method of isonymy, developed by Crow and Mange for estimating inbreeding from surname frequencies, requires an assumption that has not been appreciated: It is necessary to assume that all males in some ancestral generation, the founding stock, had unique surnames. Because this assumption is seldom justified in real populations, the applicability of the isonymy method is extremely limited. Even worse, the estimates it provides refer to an unspecified founding stock, and this implies that these estimates are devoid of information.
Theoretical Population Biology, 1983
have emphasized the role of the segregation variance in models of assortative mating for continuo... more have emphasized the role of the segregation variance in models of assortative mating for continuous characters. This note examines its behavior in the context of a general additive model. Using known results concerning the effects of assortative mating and selection on genie variance and correlations among uniting gametes it is shown that the effects of these processes on segregation variance will be small if the effective number of loci is large. Thus models in which the segregation variance remains constant are approximate descriptions of the behavior of characters determined by many loci.
Proceedings of the National Academy of Sciences, 1998
Patterns of gene differences among humans contain information about the demographic history of ou... more Patterns of gene differences among humans contain information about the demographic history of our species. Haploid loci like mitochondrial DNA and the nonrecombining part of the Y chromosome show a pattern indicating expansion from a population of only several thousand during the late middle or early upper Pleistocene. Nuclear short tandem repeat loci also show evidence of this expansion. Both mitochondrial DNA and the Y chromosome coalesce within the last several hundred thousand years, and they cannot provide information about the population before their coalescence. Several nuclear loci are informative about our ancestral population size during nearly the whole Pleistocene. They indicate a small effective size, on the order of 10,000 breeding individuals, throughout this time period. This genetic evidence denies any version of the multiregional model of modern human origins. It implies instead that our ancestors were effectively a separate species for most of the Pleistocene.
Proceedings of the National Academy of Sciences, 2002
Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human g... more Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human genome, and as heritable variable landmarks they are useful markers for disease mapping and resolving population structure. Redundant coverage in overlaps of large-insert genomic clones, sequenced as part of the Human Genome Project, comprises a quarter of the genome, and it is representative in terms of base compositional and functional sequence features. We mined these regions to produce 500,000 high-confidence SNP candidates as a uniform resource for describing nucleotide diversity and its regional variation within the genome. Distributions of marker density observed at different overlap length scales under a model of recombination and population size change show that the history of the population represented by the public genome sequence is one of collapse followed by a recent phase of mild size recovery. The inferred times of collapse and recovery are Upper Paleolithic, in agreement ...
Annals of Human Genetics, 1987
Models of genetic population structure generally assume that emigrants from each local group are ... more Models of genetic population structure generally assume that emigrants from each local group are drawn at random from the set of individuals born there. We show that small violations of this assumption can have disproportionately large effects on genetic population structure, and we introduce a statistical method for measuring this effect.
Annual Review of Genomics and Human Genetics, 2000
This is a review of genetic evidence about the ancient demography of the ancestors of our species... more This is a review of genetic evidence about the ancient demography of the ancestors of our species and about the genesis of worldwide human diversity. The issue of whether or not a population size bottleneck occurred among our ancestors is under debate among geneticists as well as among anthropologists. The bottleneck, if it occurred, would confirm the Garden of Eden (GOE) model of the origin of modern humans. The competing model, multiregional evolution (MRE), posits that the number of human ancestors has been large, occupying much of the temperate Old World for the last two million years. While several classes of genetic marker seem to contain a strong signal of demographic recovery from a small number of ancestors, other nuclear loci show no such signal. The pattern at these loci is compatible with the existence of widespread balancing selection in humans. The study of human diversity at (putatively) neutral genetic marker loci has been hampered since the beginning by ascertainmen...
Evolutionary Ecology and Human Behavior, 2017
American journal of human genetics, 1995
To test hypotheses about the origin of modern humans, we analyzed mtDNA sequences, 30 nuclear res... more To test hypotheses about the origin of modern humans, we analyzed mtDNA sequences, 30 nuclear restriction-site polymorphisms (RSPs), and 30 tetranucleotide short tandem repeat (STR) polymorphisms in 243 Africans, Asians, and Europeans. An evolutionary tree based on mtDNA displays deep African branches, indicating greater genetic diversity for African populations. This finding, which is consistent with previous mtDNA analyses, has been interpreted as evidence for an African origin of modern humans. Both sets of nuclear polymorphisms, as well as a third set of trinucleotide polymorphisms, are highly consistent with one another but fail to show deep branches for African populations. These results, which represent the first direct comparison of mtDNA and nuclear genetic data in major continental populations, undermine the genetic evidence for an African origin of modern humans.
American Journal of Physical Anthropology, 1988
Statistical methods are introduced for analysis of the migration component of genetic drift, i.e.... more Statistical methods are introduced for analysis of the migration component of genetic drift, i.e., of the stochastic changes that affect allele frequencies during migration between local groups. Attention focuses on alpha M, a parameter that measures the extent to which this component of drift departs from the ideal of independent random sampling, and which can be interpreted as a measure of the extent to which migration is kin-structured. It is shown that alpha M can be estimated from genetic data, even in the absence of information about the genealogical relationships of migrants, and Monte-Carlo simulations are used to approximate the sampling distribution of the estimator under the null hypothesis of independent random sampling. Application of these methods to data from the Aland Islands, Finland, shows that the migration pattern there is consistent with the hypothesis of independent random sampling.
bioRxiv (Cold Spring Harbor Laboratory), Apr 23, 2020
Recent studies have suggested that selection is widespread throughout the genome and largely unco... more Recent studies have suggested that selection is widespread throughout the genome and largely uncompensated for in inferences of population history. To address this potential issue, we estimated site pattern frequencies for neutral and selection associated areas of the genome. There are notable differences in these frequencies between neutral regions and those affected by selection. However, these differences have relatively small effects when inferring population history. .
Proceedings of The Royal Society B: Biological Sciences, Jan 22, 2013
Males of many species help in the care and provisioning of offspring, and these investments often... more Males of many species help in the care and provisioning of offspring, and these investments often correlate with genetic relatedness. For example, many human males invest in the children of sisters, and this is especially so where men are less likely to share genes with children of wives. Although this makes qualitative sense, it has been difficult to support quantitatively. The prevailing model predicts investment in children of sisters only when paternity confidence falls below 0.268. This value is often seen as too low to be credible; so investment in sisters' children represents an unsolved problem. I show here that the prevailing model rests on a series of restrictive assumptions that underestimate relatedness to sisters' children. For this reason, it understates the fitness payoff to men who invest in these children. This effect can be substantial, especially in societies with low confidence in paternity. But this effect cannot be estimated solely from confidence in paternity. One must also estimate the probability that two siblings share the same father.
PLOS Genetics, Apr 27, 2017
The indigenous people of the Tibetan Plateau have been the subject of much recent interest becaus... more The indigenous people of the Tibetan Plateau have been the subject of much recent interest because of their unique genetic adaptations to high altitude. Recent studies have demonstrated that the Tibetan EPAS1 haplotype is involved in high altitude-adaptation and originated in an archaic Denisovan-related population. We sequenced the whole-genomes of 27 Tibetans and conducted analyses to infer a detailed history of demography and natural selection of this population. We detected evidence of population structure between the ancestral Han and Tibetan subpopulations as early as 44 to 58 thousand years ago, but with high rates of gene flow until approximately 9 thousand years ago. The CMS test ranked EPAS1 and EGLN1 as the top two positive selection candidates, and in addition identified PTGIS, VDR, and KCTD12 as new candidate genes. The advantageous Tibetan EPAS1 haplotype shared many variants with the Denisovan genome, with an ancient gene tree divergence between the Tibetan and Denisovan haplotypes of about 1 million years ago. With the exception of EPAS1, we observed no evidence of positive selection on Denisovan-like haplotypes.
BMC Genomics, 2010
Background: Though a variety of linkage disequilibrium tests have recently been introduced to mea... more Background: Though a variety of linkage disequilibrium tests have recently been introduced to measure the signal of recent positive selection, the statistical properties of the various methods have not been directly compared. While most applications of these tests have suggested that positive selection has played an important role in recent human history, the results of these tests have varied dramatically. Results: Here, we evaluate the performance of three statistics designed to detect incomplete selective sweeps, LRH and iHS, and ALnLH. To analyze the properties of these tests, we introduce a new computational method that can model complex population histories with migration and changing population sizes to simulate gene trees influenced by recent positive selection. We demonstrate that iHS performs substantially better than the other two statistics, with power of up to 0.74 at the 0.01 level for the variation best suited for full genome scans and a power of over 0.8 at the 0.01 level for the variation best suited for candidate gene tests. The performance of the iHS statistic was robust to complex demographic histories and variable recombination rates. Genome scans involving the other two statistics suffer from low power and high false positive rates, with false discovery rates of up to 0.96 for ALnLH. The difference in performance between iHS and ALnLH, did not result from the properties of the statistics, but instead from the different methods for mitigating the multiple comparison problem inherent in full genome scans. Conclusions: We introduce a new method for simulating genealogies influenced by positive selection with complex demographic scenarios. In a power analysis based on this method, iHS outperformed LRH and ALnLH in detecting incomplete selective sweeps. We also show that the single-site iHS statistic is more powerful in a candidate gene test than the multi-site statistic, but that the multi-site statistic maintains a low false discovery rate with only a minor loss of power when applied to a scan of the entire genome. Our results highlight the need for careful consideration of multiple comparison problems when evaluating and interpreting the results of full genome scans for positive selection.
Peer Community In Mathematical and Computational Biology
Soraggi et al. [2] describe HMMploidy, a statistical method that takes DNA sequencing data as inp... more Soraggi et al. [2] describe HMMploidy, a statistical method that takes DNA sequencing data as input and uses a hidden Markov model to estimate ploidy. The method allows ploidy to vary not only between individuals, but also between and even within chromosomes. This allows the method to detect aneuploidy and also chromosomal regions in which multiple paralogous loci have been mistakenly assembled on top of one another. HMMploidy estimates genotypes and ploidy simultaneously, with a separate estimate for each genome. The genome is divided into a series of non-overlapping windows (typically 100), and HMMploidy provides a separate estimate of ploidy within each window of each genome. The method is thus estimating a large number of parameters, and one might assume that this would reduce its accuracy. However, it benefits from large samples of genomes. Large samples increase the accuracy of internal allele frequency estimates, and
bioRxiv, 2022
Introgression appears increasingly ubiquitous in the evolutionary history of various taxa, includ... more Introgression appears increasingly ubiquitous in the evolutionary history of various taxa, including humans. However, accurately estimating introgression is difficult, particularly when 1) there are many parameters, 2) multiple models fit the data well, and 3) parameters are not simultaneously estimated. Here, we use the software Legofit to investigate the evolutionary history of bonobos (Pan paniscus) and chimpanzees (P. troglodytes) using whole genome sequences. This approach 1) ignores within-population variation, reducing the number of parameters requiring estimation, 2) allows for model selection, and 3) simultaneously estimates all parameters. We tabulated site patterns from the autosomes of 71 bonobos and chimpanzees representing all five extant Pan lineages. We then compared previously proposed demographic models and estimated parameters using a deterministic approach. We further considered sex bias in Pan evolutionary history by analyzing the site patterns from the X chromo...
We estimate the strength of kin-structured migration in six human populations (five from New Guin... more We estimate the strength of kin-structured migration in six human populations (five from New Guinea and one from Finland) and in one population of nonhuman primates. We also test the hypothesis that migration is not kin structured by generating a sampling distribution of the estimator under the null hypothesis of independent random migration. We are unable to detect a statistically significant level of kin-structured migration in any population. However, five of our six human populations were from Papua New Guinea, and we cannot dismiss the possibility that migration is kin structured in other parts of the world.
Molecular Biology and Evolution, 1992
In a recent paper, Henry Harpending and I (Rogers and Harpending 1992) interpret variation in hum... more In a recent paper, Henry Harpending and I (Rogers and Harpending 1992) interpret variation in human mitochondrial DNA by using a model by Li (1977). The model is unrealistic in several respects, one of which is its use of the "infinite sites" model (Kimura 1969) of molecular evolution. This model assumes that no nucleotide (or restriction) site mutates more than once, an assumption that is clearly violated in human data (Kocher and Wilson 199 1). The infinite-sites model may nonetheless be useful as an approximation, provided that the error it introduces is small. This note evaluates the relative error of this approximation. Harpending and I make use of the infinite-sites assumption at only one point in our analysis: we assume that differences between a pair of individuals are introduced at a rate, 2u per generation, which is constant in time. With a finite number of sites, this assumption cannot hold exactly, because, after a nucleotide site has been struck once by mutation, later mutations need not add to the count of differences between our pair of individuals. The more sites there are with prior mutations, the lower will be the rate at which new differences accumulate. Thus, differences accumulate at a decreasing, rather than a constant, rate. Let D(t) denote the expected number of differences over a finite number, K, of nucleotide sites between a pair of lineages that have been separated for t generations. Under the infinite-sites model, D(t) = 2Kyt, where l.t is the mutation rate per nucleotide site, and where K-* cc while p + 0, such that Kl.t remains finite. Here, I assume instead that K is finite and that mutation at the ith site follows a Poisson process with rate l.ti per generation. Thus, the number of mutations at site i since time zero is Poisson with mean 2pLit. In human data, transversions are rare. Thus, I assume that all mutations are transitions. Consider the comparison, between two lineages, at a single site, say site i. Initially, at time 0, the two lineages will be identical, since they share a common ancestor then. The first mutation along the path connecting the two lineages will make them different at site i, but the second will restore their identity, because of the assumption that all mutations are transitions. Each successive mutation at site i toggles the two lineages back and forth between the states of identity and nonidentity. The probability that the two lineages differ at site i is thus equal to the probability that the number of prior mutations there is odd. This probability is (1-e-4"1t)/2 (Haldane 1919). The expected sum of differences over all K sites is D(t) = KE[(1-e-4VL1')/2] = K[1+(4t)]/2, (1) where E denotes the expectation with respect to the distribution of mutation rates across sites, and where 4 (z) = E { epZp > is the Laplace transform of the distribution of site-specific mutation rates.
The method of isonymy, developed by Crow and Mange for estimating inbreeding from surname frequen... more The method of isonymy, developed by Crow and Mange for estimating inbreeding from surname frequencies, requires an assumption that has not been appreciated: It is necessary to assume that all males in some ancestral generation, the founding stock, had unique surnames. Because this assumption is seldom justified in real populations, the applicability of the isonymy method is extremely limited. Even worse, the estimates it provides refer to an unspecified founding stock, and this implies that these estimates are devoid of information.
Theoretical Population Biology, 1983
have emphasized the role of the segregation variance in models of assortative mating for continuo... more have emphasized the role of the segregation variance in models of assortative mating for continuous characters. This note examines its behavior in the context of a general additive model. Using known results concerning the effects of assortative mating and selection on genie variance and correlations among uniting gametes it is shown that the effects of these processes on segregation variance will be small if the effective number of loci is large. Thus models in which the segregation variance remains constant are approximate descriptions of the behavior of characters determined by many loci.
Proceedings of the National Academy of Sciences, 1998
Patterns of gene differences among humans contain information about the demographic history of ou... more Patterns of gene differences among humans contain information about the demographic history of our species. Haploid loci like mitochondrial DNA and the nonrecombining part of the Y chromosome show a pattern indicating expansion from a population of only several thousand during the late middle or early upper Pleistocene. Nuclear short tandem repeat loci also show evidence of this expansion. Both mitochondrial DNA and the Y chromosome coalesce within the last several hundred thousand years, and they cannot provide information about the population before their coalescence. Several nuclear loci are informative about our ancestral population size during nearly the whole Pleistocene. They indicate a small effective size, on the order of 10,000 breeding individuals, throughout this time period. This genetic evidence denies any version of the multiregional model of modern human origins. It implies instead that our ancestors were effectively a separate species for most of the Pleistocene.
Proceedings of the National Academy of Sciences, 2002
Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human g... more Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human genome, and as heritable variable landmarks they are useful markers for disease mapping and resolving population structure. Redundant coverage in overlaps of large-insert genomic clones, sequenced as part of the Human Genome Project, comprises a quarter of the genome, and it is representative in terms of base compositional and functional sequence features. We mined these regions to produce 500,000 high-confidence SNP candidates as a uniform resource for describing nucleotide diversity and its regional variation within the genome. Distributions of marker density observed at different overlap length scales under a model of recombination and population size change show that the history of the population represented by the public genome sequence is one of collapse followed by a recent phase of mild size recovery. The inferred times of collapse and recovery are Upper Paleolithic, in agreement ...
Annals of Human Genetics, 1987
Models of genetic population structure generally assume that emigrants from each local group are ... more Models of genetic population structure generally assume that emigrants from each local group are drawn at random from the set of individuals born there. We show that small violations of this assumption can have disproportionately large effects on genetic population structure, and we introduce a statistical method for measuring this effect.
Annual Review of Genomics and Human Genetics, 2000
This is a review of genetic evidence about the ancient demography of the ancestors of our species... more This is a review of genetic evidence about the ancient demography of the ancestors of our species and about the genesis of worldwide human diversity. The issue of whether or not a population size bottleneck occurred among our ancestors is under debate among geneticists as well as among anthropologists. The bottleneck, if it occurred, would confirm the Garden of Eden (GOE) model of the origin of modern humans. The competing model, multiregional evolution (MRE), posits that the number of human ancestors has been large, occupying much of the temperate Old World for the last two million years. While several classes of genetic marker seem to contain a strong signal of demographic recovery from a small number of ancestors, other nuclear loci show no such signal. The pattern at these loci is compatible with the existence of widespread balancing selection in humans. The study of human diversity at (putatively) neutral genetic marker loci has been hampered since the beginning by ascertainmen...
Evolutionary Ecology and Human Behavior, 2017
American journal of human genetics, 1995
To test hypotheses about the origin of modern humans, we analyzed mtDNA sequences, 30 nuclear res... more To test hypotheses about the origin of modern humans, we analyzed mtDNA sequences, 30 nuclear restriction-site polymorphisms (RSPs), and 30 tetranucleotide short tandem repeat (STR) polymorphisms in 243 Africans, Asians, and Europeans. An evolutionary tree based on mtDNA displays deep African branches, indicating greater genetic diversity for African populations. This finding, which is consistent with previous mtDNA analyses, has been interpreted as evidence for an African origin of modern humans. Both sets of nuclear polymorphisms, as well as a third set of trinucleotide polymorphisms, are highly consistent with one another but fail to show deep branches for African populations. These results, which represent the first direct comparison of mtDNA and nuclear genetic data in major continental populations, undermine the genetic evidence for an African origin of modern humans.
American Journal of Physical Anthropology, 1988
Statistical methods are introduced for analysis of the migration component of genetic drift, i.e.... more Statistical methods are introduced for analysis of the migration component of genetic drift, i.e., of the stochastic changes that affect allele frequencies during migration between local groups. Attention focuses on alpha M, a parameter that measures the extent to which this component of drift departs from the ideal of independent random sampling, and which can be interpreted as a measure of the extent to which migration is kin-structured. It is shown that alpha M can be estimated from genetic data, even in the absence of information about the genealogical relationships of migrants, and Monte-Carlo simulations are used to approximate the sampling distribution of the estimator under the null hypothesis of independent random sampling. Application of these methods to data from the Aland Islands, Finland, shows that the migration pattern there is consistent with the hypothesis of independent random sampling.