Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations - PubMed (original) (raw)

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations

Alicia R Martin et al. Am J Hum Genet. 2017.

Erratum in

Abstract

The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.

Keywords: 1000 Genomes Project; GWAS; admixed populations; complex trait genetics; local ancestry; polygenic risk scores; population genetics; statistical genetics; summary statistics.

Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Sub-continental Diversity and Origins of African, European, and Native American Components of Recently Admixed American Populations (A) ADMIXTURE analysis at K = 3 focusing on admixed Americas samples, with the NAT, CEU, and YRI as reference populations. (B, D, and F) Local ancestry karyograms for representative PEL individual HG01893 with (B) African, (D) European, and (F) Native American components shown. (C, E, and G) Ancestry-specific PCA applied to admixed haploid genomes as well as ancestrally homogeneous continental reference populations from 1000 Genomes (where possible) for (C) African tracts, (E) European tracts, and (G) Native American tracts. A small number of admixed samples that constituted major outliers from the ancestry-specific PCA analysis were removed, including (C) one ASW sample (NA20314) and (E) eight samples, including three ACB, two ASW, one PEL, and two PUR samples.

Figure 2

Figure 2

Heterozygosity by Continental and Diploid Local Ancestry Heterozygosity, estimated here as 2pq, is calculated in admixed populations stratified by diploid local ancestry in (A) the whole genome, (B) sites from the GWAS catalog, and (C) sites from ClinVar classified as “pathogenic” or “likely pathogenic.” The mean and 95% confidence intervals were calculated by bootstrapping 1,000 times. Populations not shown in a given panel have too few diploid ancestry tracts overlapping sites to calculate heterozygosity.

Figure 3

Figure 3

Imputation Accuracy by Local Ancestry in the Americas Accuracy was assessed via a leave-one-out strategy, stratified by diploid local ancestry on chromosome 9 for the Illumina OmniExpress genotyping array. Dashed lines indicate heterozygous diploid ancestry, and solid lines show homozygous diploid ancestry.

Figure 4

Figure 4

Biased Genetic Discoveries Influence Disease Risk Inferences Inferred and standardized polygenic risk scores across all individuals and colored by population for (A) height based on summary statistics from Wood et al., (B) schizophrenia based on summary statistics from the Schizophrenia Working Group of the Psychiatric Genomics Consortium, (C) type II diabetes summary statistics derived from a European cohort from Gaulton et al., and (D) type II diabetes summary statistics derived from a multiethnic cohort from Mahajan et al.

Figure 5

Figure 5

Coalescent Simulation Framework to Generate True and Inferred Polygenic Risk Scores Results of true and inferred polygenic risk scores, as well as their correlation, were computed via GWAS summary statistics from 10,000 simulated EUR case and control subjects modeling European, East Asian, and African population history (demographic parameters are from Gravel et al.14). (A) The distribution of mean true, unstandardized polygenic risk scores for each population across 500 simulations with m = 1,000 causal variants and _h_2 = 0.67. (B) The distribution of mean inferred, unstandardized polygenic risk for the same simulation parameters as in (A) (center) and standardized true versus inferred polygenic risk scores for three different coalescent simulation replicates showing 10,000 randomly drawn samples from each population not included as case or control subjects (right). (C) Violin plots show Pearson’s correlation across 50 iterations per parameter set between true and inferred polygenic risk scores across differing genetic architectures, including m = 200, 500, and 1,000 causal variants and _h_2 = 0.67.

Similar articles

Cited by

References

    1. Need A.C., Goldstein D.B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 2009;25:489–494. - PubMed
    1. Bustamante C.D., Burchard E.G., De la Vega F.M. Genomics for the world. Nature. 2011;475:163–165. - PMC - PubMed
    1. Petrovski S., Goldstein D.B. Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome Biol. 2016;17:157. - PMC - PubMed
    1. Popejoy A.B., Fullerton S.M. Genomics is failing on diversity. Nature. 2016;538:161–164. - PMC - PubMed
    1. Carlson C.S., Matise T.C., North K.E., Haiman C.A., Fesinmeyer M.D., Buyske S., Schumacher F.R., Peters U., Franceschini N., Ritchie M.D., PAGE Consortium Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 2013;11:e1001661. - PMC - PubMed

MeSH terms

Grants and funding

LinkOut - more resources