Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery (original) (raw)

References

  1. Anwar, W.A., Khyatti, M. & Hemminki, K. Consanguinity and genetic diseases in North Africa and immigrants to Europe. Eur. J. Public Health 24 (Suppl. 1), 57–63 (2014).
    Article PubMed Google Scholar
  2. Al-Gazali, L., Hamamy, H. & Al-Arrayad, S. Genetic disorders in the Arab world. Br. Med. J. 333, 831–834 (2006).
    Article Google Scholar
  3. Hussain, R. & Bittles, A.H. The prevalence and demographic characteristics of consanguineous marriages in Pakistan. J. Biosoc. Sci. 30, 261–275 (1998).
    Article CAS PubMed Google Scholar
  4. Sheffield, V.C., Stone, E.M. & Carmi, R. Use of isolated inbred human populations for identification of disease genes. Trends Genet. 14, 391–396 (1998).
    Article CAS PubMed Google Scholar
  5. Sharp, J.M. The Broader Middle East and North Africa Initiative: an overview. in CRS Report for Congress Congressional Research Service. The Library of Congress. US Government. Vol. RS22053 (2005).
  6. Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  7. Ravindranath, V. et al. Regional research priorities in brain and nervous system disorders. Nature 527, S198–S206 (2015).
    Article CAS PubMed Google Scholar
  8. Hunter-Zinck, H. et al. Population genetic structure of the people of Qatar. Am. J. Hum. Genet. 87, 17–25 (2010).
    Article CAS PubMed PubMed Central Google Scholar
  9. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
  10. Moreno-Estrada, A. et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 9, e1003925 (2013).
    Article PubMed PubMed Central CAS Google Scholar
  11. Botigué, L.R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl. Acad. Sci. USA 110, 11791–11796 (2013).
    Article PubMed PubMed Central Google Scholar
  12. Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).
    Article CAS PubMed Google Scholar
  13. Henn, B.M. et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 8, e1002397 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  14. Gérard, N., Berriche, S., Aouizérate, A., Diéterlen, F. & Lucotte, G. North African Berber and Arab influences in the western Mediterranean revealed by Y-chromosome DNA haplotypes. Hum. Biol. 78, 307–316 (2006).
    Article PubMed Google Scholar
  15. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
    Article CAS PubMed PubMed Central Google Scholar
  16. Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  17. SIGMA Type 2 Diabetes Consortium. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 506, 97–101 (2014).
  18. Pickrell, J.K. & Pritchard, J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  19. Tadmouri, G.O. et al. Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009).
    Article PubMed PubMed Central Google Scholar
  20. Leutenegger, A.L., Sahbatou, M., Gazal, S., Cann, H. & Génin, E. Consanguinity around the world: what do the genomic data of the HGDP-CEPH diversity panel tell us? Eur. J. Hum. Genet. 19, 583–587 (2011).
    Article CAS PubMed PubMed Central Google Scholar
  21. Pippucci, T., Magi, A., Gialluisi, A. & Romeo, G. Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies. Hum. Hered. 77, 63–72 (2014).
    Article PubMed Google Scholar
  22. Pemberton, T.J. et al. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  23. Szpiech, Z.A. et al. Long runs of homozygosity are enriched for deleterious variation. Am. J. Hum. Genet. 93, 90–102 (2013).
    Article CAS PubMed PubMed Central Google Scholar
  24. Itan, Y. & Casanova, J.L. Can the impact of human genetic variations be predicted? Proc. Natl. Acad. Sci. USA 112, 11426–11427 (2015).
    Article CAS PubMed PubMed Central Google Scholar
  25. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  26. Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).
    Article CAS PubMed Google Scholar
  27. Jones, S. The Darwin Archipelago (Yale University Press, 2011).
  28. Haldane, J.B.S. The effect of variation of fitness. Am. Nat. 71, 337–349 (1937).
    Article Google Scholar
  29. Overall, A.D., Ahmad, M. & Nichols, R.A. The effect of reproductive compensation on recessive disorders within consanguineous human populations. Heredity 88, 474–479 (2002).
    Article CAS PubMed Google Scholar
  30. Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  31. Simons, Y.B., Turchin, M.C., Pritchard, J.K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  32. Casanova, J.L., Conley, M.E., Seligman, S.J., Abel, L. & Notarangelo, L.D. Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies. J. Exp. Med. 211, 2137–2149 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  33. MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  34. Novarino, G. et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343, 506–511 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  35. Blackstone, C., O'Kane, C.J. & Reid, E. Hereditary spastic paraplegias: membrane traffic and the motor pathway. Nat. Rev. Neurosci. 12, 31–42 (2011).
    Article CAS PubMed PubMed Central Google Scholar
  36. Dixon-Salazar, T.J. et al. Exome sequencing can improve diagnosis and alter patient management. Sci. Transl. Med. 4, 138ra78 (2012).
    Article PubMed PubMed Central Google Scholar
  37. Okada, S. et al. Impairment of immunity to Candida and Mycobacterium in humans with bi-allelic RORC mutations. Science 349, 606–613 (2015).
    Article CAS PubMed PubMed Central Google Scholar
  38. Alsalem, A.B., Halees, A.S., Anazi, S., Alshamekh, S. & Alkuraya, F.S. Autozygome sequencing expands the horizon of human knockout research and provides novel insights into human phenotypic variation. PLoS Genet. 9, e1004030 (2013).
    Article PubMed PubMed Central CAS Google Scholar
  39. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
    Article CAS PubMed PubMed Central Google Scholar
  40. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
    Article PubMed PubMed Central CAS Google Scholar
  41. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
    Article PubMed PubMed Central CAS Google Scholar
  42. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
    Article CAS PubMed PubMed Central Google Scholar
  43. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
    Article CAS PubMed PubMed Central Google Scholar
  44. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
    Article CAS PubMed PubMed Central Google Scholar
  45. Cann, H.M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).
    Article CAS PubMed Google Scholar
  46. Behar, D.M. et al. The genome-wide structure of the Jewish people. Nature 466, 238–242 (2010).
    Article CAS PubMed Google Scholar
  47. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
    CAS PubMed PubMed Central Google Scholar
  48. Pruitt, K.D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).
    Article CAS PubMed Google Scholar
  49. Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
    Article CAS PubMed PubMed Central Google Scholar
  50. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
    CAS PubMed Google Scholar
  51. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer Science & Business Media, 2009).
  52. Polasek, O. et al. Comparative assessment of methods for estimating individual genome-wide homozygosity-by-descent from human genomic data. BMC Genomics 11, 139 (2010).
    Article PubMed PubMed Central CAS Google Scholar
  53. Magi, A. et al. H3M2: detection of runs of homozygosity from whole-exome sequencing data. Bioinformatics 30, 2852–2859 (2014).
    Article CAS PubMed Google Scholar
  54. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
    Article PubMed PubMed Central CAS Google Scholar
  55. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
    Article CAS PubMed PubMed Central Google Scholar
  56. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
    Article CAS Google Scholar
  57. Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
    Article PubMed PubMed Central CAS Google Scholar
  58. Erichsen, A.K., Koht, J., Stray-Pedersen, A., Abdelnoor, M. & Tallaksen, C.M. Prevalence of hereditary ataxia and spastic paraplegia in southeast Norway: a population-based study. Brain 132, 1577–1588 (2009).
    Article PubMed Google Scholar
  59. Stevanin, G. et al. Mutations in SPG11 are frequent in autosomal recessive spastic paraplegia with thin corpus callosum, cognitive decline and lower motor neuron degeneration. Brain 131, 772–784 (2008).
    Article PubMed Google Scholar
  60. Vardi-Saliternik, R., Friedlander, Y. & Cohen, T. Consanguinity in a population sample of Israeli Muslim Arabs, Christian Arabs and Druze. Ann. Hum. Biol. 29, 422–431 (2002).
    Article CAS PubMed Google Scholar
  61. Shami, S.A., Qaisar, R. & Bittles, A.H. Consanguinity and adult morbidity in Pakistan. Lancet 338, 954 (1991).
    Article CAS PubMed Google Scholar
  62. Stoltenberg, C., Magnus, P., Lie, R.T., Daltveit, A.K. & Irgens, L.M. Birth defects and parental consanguinity in Norway. Am. J. Epidemiol. 145, 439–448 (1997).
    Article CAS PubMed Google Scholar
  63. Do, R. et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat. Genet. 47, 126–131 (2015).
    Article CAS PubMed PubMed Central Google Scholar
  64. SIGMA Type 2 Diabetes Consortium. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. J. Am. Med. Assoc. 311, 2305–2314 (2014).
  65. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
    Article CAS PubMed PubMed Central Google Scholar
  66. Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).
    Article PubMed PubMed Central CAS Google Scholar
  67. Wang, S., Lachance, J., Tishkoff, S.A., Hey, J. & Xing, J. Apparent variation in Neanderthal admixture among African populations is consistent with gene flow from non-African populations. Genome Biol. Evol. 5, 2075–2081 (2013).
    Article PubMed PubMed Central Google Scholar
  68. Lowery, R.K. et al. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms. Gene 530, 83–94 (2013).
    Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank S. Sunyaev and D. Reich for help with PolyPhen-2 and DAF corrections, M. Turchin for help with purging analysis, J. Pickrell for help with TreeMix, and V. Bafna, N. Schork, and S. Bonissone for suggestions. Work was supported by grants from the US National Institutes of Health (P01HD070494 and R01NS048453), the Qatari National Research Foundation (NPRP6-1463), the Simons Foundation Autism Research Initiative (175303 and 275275) to J.G.G., the Yale Center for Mendelian Disorders (U54HG006504), the Broad Institute (U54HG003067), the Rockefeller University CTSA (5UL1RR024143-04), the Howard Hughes Medical Institute (to J.G.G. and J.-L.C.), INSERM, the St. Giles Foundation, and the Candidoser Association and by grants R01AI088364, R37AI095983, P01AI061093, U01AI109697 (to J.-L.C.), U01AI088685 (to J.-L.C. and L.A.), R21AI107508 (to E. Jouanguy), the DHFMR Collaborative Research Grant, and KACST 13-BIO1113-20 (to F.S.A.).

Author information

Authors and Affiliations

  1. Howard Hughes Medical Institute, Rockefeller University, New York, New York, USA
    Eric M Scott, Emily G Spencer, Yupeng He, Mostafa Abdellateef Azab, Jean-Laurent Casanova & Joseph G Gleeson
  2. Department of Neurosciences, Rady Children's Institute for Genomic Medicine, University of California, San Diego, La Jolla, California, USA
    Eric M Scott, Emily G Spencer, Yupeng He, Mostafa Abdellateef Azab & Joseph G Gleeson
  3. Laboratory for Pediatric Brain Disease, Rockefeller University, New York, New York, USA
    Eric M Scott, Emily G Spencer, Yupeng He, Mostafa Abdellateef Azab & Joseph G Gleeson
  4. Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
    Anason Halees
  5. St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller University, New York, New York, USA
    Yuval Itan, Bertrand Boisson, Laurent Abel & Jean-Laurent Casanova
  6. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
    Stacey B Gabriel
  7. Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, INSERM, Paris, France
    Aziz Belkadi, Bertrand Boisson, Laurent Abel & Jean-Laurent Casanova
  8. Paris Descartes University, Imagine Institute, Paris, France
    Aziz Belkadi, Bertrand Boisson, Laurent Abel & Jean-Laurent Casanova
  9. Department of Molecular Biology and Genetics, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
    Andrew G Clark
  10. Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
    Fowzan S Alkuraya
  11. Department of Anatomy and Cell Biology, College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
    Fowzan S Alkuraya
  12. Pediatric Hematology–Immunology Unit, Necker Hospital for Sick Children, Paris, France
    Jean-Laurent Casanova

Authors

  1. Eric M Scott
  2. Anason Halees
  3. Yuval Itan
  4. Emily G Spencer
  5. Yupeng He
  6. Mostafa Abdellateef Azab
  7. Stacey B Gabriel
  8. Aziz Belkadi
  9. Bertrand Boisson
  10. Laurent Abel
  11. Andrew G Clark
  12. Fowzan S Alkuraya
  13. Jean-Laurent Casanova
  14. Joseph G Gleeson

Consortia

Greater Middle East Variome Consortium

Contributions

E.M.S. performed analysis and generated all figures. A.H., Y.I., Y.H., and M.A.A. consulted on analysis. E.G.S., A.B., B.B., L.A., F.S.A., J.-L.C., and J.G.G. contributed subjects and jointly wrote and edited the manuscript. S.B.G. oversaw sequencing. A.G.C. consulted on population studies. The GME Variome Consortium identified subjects for study.

Corresponding author

Correspondence toJoseph G Gleeson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Country distribution of GME samples and designation of geographical subregions.

GME samples collected across 20 countries and territories from the GME. Pie size corresponds to the number of samples from each country, and each pie shows the proportion of samples filtered because of quality control and relationship status (Online Methods). Geographical subregions are colored to show the sets of grouped countries. Some non-uniformity of sampling was inevitable owing to the inaccessibility of some populations. Map downloaded from https://www.presentationmagazine.com/ then colored.

Supplementary Figure 2 Unbiased genetic clustering demonstrates shorter genetic distance between samples from proximal geographical subregions.

Dendrogram of unbiased genetic clustering correlated with geographical subregion designation. 2,497 samples underwent exome sequencing from the Greater Middle East Consortium, including 1,111 GME samples as well as samples from Africa, East Asia, Europe, the Americas, Oceania, and unknown regions. Calculated identity-by-state (IBS) distances between samples represent the number of non-identical positions. Concordance between recruitment location and IBS clustering for all GME subregions was observed. Some intermixing was evident, suggesting recent migration events.

Supplementary Figure 3 ADMIXTURE cross-validation.

(a) Cross-validation errors for the ADMIXTURE results shown in Supplementary Figure 1. Analysis with k = 6 gave the lowest cross-validation error. (b) Cross-validation errors for GME and 1000 Genomes Project samples.

Supplementary Figure 4 Unsupervised ADMIXTURE analysis of GME populations shows genetic history.

Results of ADMIXTURE analysis for LD-filtered variants for 1,111 GME samples across the six geographical subregions. Eleven iterations of k were run, from 2 to 12, to optimize clustering. Each vertical bar represents a single individual. The y axis shows the estimated proportion of the genome assigned to each ancestral cluster. Samples grouped by subregion and organized from west (left) to east (right), showing trends of overlap. Substantial substructure was apparent throughout much of the GME, but three apparent ‘sources’ of ancestral populations stem from the NWA (yellow), AP (red), and PP (green) subregions.

Supplementary Figure 5 Introgression analysis of GME and 1000 Genomes Project exome samples shows consistent Neanderthal introgression on all GME, European, and East Asian samples except for NWA.

(a) Individuals from the 1000 Genomes Project reference populations and GME subregions were projected onto the first two principal components calculated from Neanderthal, chimpanzee, and Denisovan genomes. PC1 separates ancient human populations from chimpanzee, and PC2 separates the Neanderthal and Denisovan populations. When human samples were projected onto these principal components, they clustered near the center of these three species. Arrows are drawn from the center of the sub-Saharan African populations to each of the ancestral human and chimpanzee points. The sub-Saharan African populations represent a control group, where only limited Neanderthal and Denisovan introgression should be present. (b) Magnified view of a showing the dispersal of human populations within these two principal components. Samples are colored on the basis of continental origin, and subpopulations are labeled to indicate the center of each population. African populations were found to be separate from the remaining populations, which were found from this adjusted origin along the Neanderthal vector. Most populations were found to be tightly clustered with only the TP and NWA populations, showing clear separation, suggesting a common time point of introgression among these clustered populations. The NWA samples had less introgression than the other GME populations.

Supplementary Figure 6 Heat map of pairwise _F_ST values among all 1000 Genomes Project and GME populations identifies three clusters with a low degree of differentiation.

Top right, Wright's fixation index; bottom left, standard error values. Populations are ordered on the basis of geographical location. Three distinct clusters of close populations (shown as a blue gradient) are evident: 1000 Genomes Project Africa (LWK and YRI); 1000 Genomes Project Europe (FIN, CEU, and TSI), and GME subregions (NWA, NEA, AP, SD, TP, and PP); and 1000 Genomes Project East Asia (JPT, CHS, and CHB). Among global populations, the GME and European populations were more closely related than any other two continental regions. The greatest distance between any two populations was estimated as 0.212 for YRI and JPT. As populations became more distant, standard error values increased but remained small for all comparisons.

Supplementary Figure 7 Principal-component analysis on GME and 1000 Genomes Project populations showed that PC3 and PC4 explained inter-GME variance.

Plots comparing all combinations of PC1, PC2, PC3, and PC4 and percentages of variance explained. GME populations are color-coded by geographical regions. PC1 (39.03%) and PC2 (31.38%) together accounted for the majority of variation in the data and were associated with separating Africans and East Asians from other samples, respectively. PC3 and PC4 separated GME and European populations along north–south and east–west axes, respectively. AP was the most distant cluster from the 1000 Genomes Project reference populations, showing the greatest separation along PC3. Both of the North African populations tended to cluster closer to the sub-Saharan African cluster, whereas PP and TP trended toward the East Asian cluster.

Supplementary Figure 8 Reported consanguineous marriage rates many fold higher in GME than in other continental populations.

Clinical survey results aggregated to estimate regional averages of the consanguineous marriage rate. Weighted averages, taking sample size into account, were calculated across all studies falling within a given region. The highest rates of consanguineous marriage were documented in PP and AP.

Supplementary Figure 9 GME samples carried longer and rarer runs of homozygosity than 1000 Genomes Project populations.

(a) Cumulative proportion total ROH length by bin for African, East Asian, European, and GME populations. African populations had the shortest accumulation of ROH spans, whereas GME populations showed the longest despite the limited influence of bottlenecks. (b) Distribution of total ROH length (in Mb) for all 1000 Genomes Project and GME populations. Wider distributions were evident for the GME populations owing to heterogeneity in long ROHs. (c) The total number of exomic bases found in ROHs binned by frequency in each population. GME ROHs tended to be unique in comparison to 1000 Genomes Project populations.

Supplementary Figure 10 Identity-by-state distance comparing human and chimpanzee reference genomes showed burden bias associated with hg19 corrected using estimated ancestral alleles.

(a) Homozygous and heterozygous variant counts shown for samples using hg19 (left) and PanTro2 (right) as the reference genomes. PanTro2 alleles demonstrated a linear relationship between populations, arguing for no burden difference. (b) IBS distance to the reference for chimpanzee genomes PanTro2 and PanTro4 (x axis) versus human hg19 (y axis). Human populations stratify by IBS distance using the hg19 reference genome. With chimpanzee ancestral variants, populations were equidistant from the chimpanzee reference genome.

Supplementary Figure 11 Correction of PolyPhen-2 predictions for derived variants resolved missense burden bias.

(a) The proportions of derived (Der) and ancestral (Anc) variants falling into each PolyPhen-2 class (B, benign; P, possibly damaging; D, probably damaging), across 14 allele frequency bins. The bias was apparent in the absence of possibly damaging and probably damaging calls for derived variants across nearly all bins. This bias can misrepresent results when comparing populations. (b) The same proportions after correction of derived variant PolyPhen-2 classes (Online Methods). Derived variant classes reflect the distributions of the ancestral variants. The x axis shows derived allele frequency bins, with parentheses and square brackets designating exclusion and inclusion, respectively.

Supplementary Figure 12 Mean derived allele frequencies for GME and 1000 Genomes Project populations across seven functional and deleteriousness variant classes suggested equivalent selective pressure.

(a) Calculated mean DAFs and standard errors for GME and 1000 Genomes Project populations. Variants were separated by functional class (noncoding, synonymous, nonsynonymous, and LOF) and corrected PolyPhen-2 deleteriousness class (benign, possibly damaging, probably and damaging). Populations are ordered as indicated on the right. No significant difference between populations was found for any variant class. (b) Mean DAF comparison for the X chromosome. Large error bars for some classes reflect limited ascertainment of variants within those classes.

Supplementary Figure 13 Comparison of allele frequency estimates from Exome Variant Server European-American and African-American populations showed poor correlation.

Comparison of the distribution of estimated allele frequencies for shared variants from two populations, EA and AA, showed poor correlation (Pearson's r = 0.1147). Hexagonal bins are colored according to the abundance of variants falling within each region. The linear regression line (blue) and identity line (black) are shown.

Supplementary information

Rights and permissions

About this article

Cite this article

Scott, E., Halees, A., Itan, Y. et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery.Nat Genet 48, 1071–1076 (2016). https://doi.org/10.1038/ng.3592

Download citation