Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery (original) (raw)

References

Anwar, W.A., Khyatti, M. & Hemminki, K. Consanguinity and genetic diseases in North Africa and immigrants to Europe. Eur. J. Public Health 24 (Suppl. 1), 57–63 (2014).
Article PubMed Google Scholar
Al-Gazali, L., Hamamy, H. & Al-Arrayad, S. Genetic disorders in the Arab world. Br. Med. J. 333, 831–834 (2006).
Article Google Scholar
Hussain, R. & Bittles, A.H. The prevalence and demographic characteristics of consanguineous marriages in Pakistan. J. Biosoc. Sci. 30, 261–275 (1998).
Article CAS PubMed Google Scholar
Sheffield, V.C., Stone, E.M. & Carmi, R. Use of isolated inbred human populations for identification of disease genes. Trends Genet. 14, 391–396 (1998).
Article CAS PubMed Google Scholar
Sharp, J.M. The Broader Middle East and North Africa Initiative: an overview. in CRS Report for Congress Congressional Research Service. The Library of Congress. US Government. Vol. RS22053 (2005).
Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ravindranath, V. et al. Regional research priorities in brain and nervous system disorders. Nature 527, S198–S206 (2015).
Article CAS PubMed Google Scholar
Hunter-Zinck, H. et al. Population genetic structure of the people of Qatar. Am. J. Hum. Genet. 87, 17–25 (2010).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Moreno-Estrada, A. et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 9, e1003925 (2013).
Article PubMed PubMed Central CAS Google Scholar
Botigué, L.R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl. Acad. Sci. USA 110, 11791–11796 (2013).
Article PubMed PubMed Central Google Scholar
Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).
Article CAS PubMed Google Scholar
Henn, B.M. et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 8, e1002397 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gérard, N., Berriche, S., Aouizérate, A., Diéterlen, F. & Lucotte, G. North African Berber and Arab influences in the western Mediterranean revealed by Y-chromosome DNA haplotypes. Hum. Biol. 78, 307–316 (2006).
Article PubMed Google Scholar
Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).
Article CAS PubMed PubMed Central Google Scholar
SIGMA Type 2 Diabetes Consortium. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 506, 97–101 (2014).
Pickrell, J.K. & Pritchard, J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tadmouri, G.O. et al. Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009).
Article PubMed PubMed Central Google Scholar
Leutenegger, A.L., Sahbatou, M., Gazal, S., Cann, H. & Génin, E. Consanguinity around the world: what do the genomic data of the HGDP-CEPH diversity panel tell us? Eur. J. Hum. Genet. 19, 583–587 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pippucci, T., Magi, A., Gialluisi, A. & Romeo, G. Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies. Hum. Hered. 77, 63–72 (2014).
Article PubMed Google Scholar
Pemberton, T.J. et al. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292 (2012).
Article CAS PubMed PubMed Central Google Scholar
Szpiech, Z.A. et al. Long runs of homozygosity are enriched for deleterious variation. Am. J. Hum. Genet. 93, 90–102 (2013).
Article CAS PubMed PubMed Central Google Scholar
Itan, Y. & Casanova, J.L. Can the impact of human genetic variations be predicted? Proc. Natl. Acad. Sci. USA 112, 11426–11427 (2015).
Article CAS PubMed PubMed Central Google Scholar
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).
Article CAS PubMed Google Scholar
Jones, S. The Darwin Archipelago (Yale University Press, 2011).
Haldane, J.B.S. The effect of variation of fitness. Am. Nat. 71, 337–349 (1937).
Article Google Scholar
Overall, A.D., Ahmad, M. & Nichols, R.A. The effect of reproductive compensation on recessive disorders within consanguineous human populations. Heredity 88, 474–479 (2002).
Article CAS PubMed Google Scholar
Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).
Article CAS PubMed PubMed Central Google Scholar
Simons, Y.B., Turchin, M.C., Pritchard, J.K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).
Article CAS PubMed PubMed Central Google Scholar
Casanova, J.L., Conley, M.E., Seligman, S.J., Abel, L. & Notarangelo, L.D. Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies. J. Exp. Med. 211, 2137–2149 (2014).
Article CAS PubMed PubMed Central Google Scholar
MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).
Article CAS PubMed PubMed Central Google Scholar
Novarino, G. et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343, 506–511 (2014).
Article CAS PubMed PubMed Central Google Scholar
Blackstone, C., O'Kane, C.J. & Reid, E. Hereditary spastic paraplegias: membrane traffic and the motor pathway. Nat. Rev. Neurosci. 12, 31–42 (2011).
Article CAS PubMed PubMed Central Google Scholar
Dixon-Salazar, T.J. et al. Exome sequencing can improve diagnosis and alter patient management. Sci. Transl. Med. 4, 138ra78 (2012).
Article PubMed PubMed Central Google Scholar
Okada, S. et al. Impairment of immunity to Candida and Mycobacterium in humans with bi-allelic RORC mutations. Science 349, 606–613 (2015).
Article CAS PubMed PubMed Central Google Scholar
Alsalem, A.B., Halees, A.S., Anazi, S., Alshamekh, S. & Alkuraya, F.S. Autozygome sequencing expands the horizon of human knockout research and provides novel insights into human phenotypic variation. PLoS Genet. 9, e1004030 (2013).
Article PubMed PubMed Central CAS Google Scholar
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central CAS Google Scholar
Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Article PubMed PubMed Central CAS Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Cann, H.M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).
Article CAS PubMed Google Scholar
Behar, D.M. et al. The genome-wide structure of the Jewish people. Nature 466, 238–242 (2010).
Article CAS PubMed Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
CAS PubMed PubMed Central Google Scholar
Pruitt, K.D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).
Article CAS PubMed Google Scholar
Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
CAS PubMed Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer Science & Business Media, 2009).
Polasek, O. et al. Comparative assessment of methods for estimating individual genome-wide homozygosity-by-descent from human genomic data. BMC Genomics 11, 139 (2010).
Article PubMed PubMed Central CAS Google Scholar
Magi, A. et al. H3M2: detection of runs of homozygosity from whole-exome sequencing data. Bioinformatics 30, 2852–2859 (2014).
Article CAS PubMed Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central CAS Google Scholar
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Article CAS Google Scholar
Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article PubMed PubMed Central CAS Google Scholar
Erichsen, A.K., Koht, J., Stray-Pedersen, A., Abdelnoor, M. & Tallaksen, C.M. Prevalence of hereditary ataxia and spastic paraplegia in southeast Norway: a population-based study. Brain 132, 1577–1588 (2009).
Article PubMed Google Scholar
Stevanin, G. et al. Mutations in SPG11 are frequent in autosomal recessive spastic paraplegia with thin corpus callosum, cognitive decline and lower motor neuron degeneration. Brain 131, 772–784 (2008).
Article PubMed Google Scholar
Vardi-Saliternik, R., Friedlander, Y. & Cohen, T. Consanguinity in a population sample of Israeli Muslim Arabs, Christian Arabs and Druze. Ann. Hum. Biol. 29, 422–431 (2002).
Article CAS PubMed Google Scholar
Shami, S.A., Qaisar, R. & Bittles, A.H. Consanguinity and adult morbidity in Pakistan. Lancet 338, 954 (1991).
Article CAS PubMed Google Scholar
Stoltenberg, C., Magnus, P., Lie, R.T., Daltveit, A.K. & Irgens, L.M. Birth defects and parental consanguinity in Norway. Am. J. Epidemiol. 145, 439–448 (1997).
Article CAS PubMed Google Scholar
Do, R. et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat. Genet. 47, 126–131 (2015).
Article CAS PubMed PubMed Central Google Scholar
SIGMA Type 2 Diabetes Consortium. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. J. Am. Med. Assoc. 311, 2305–2314 (2014).
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Article CAS PubMed PubMed Central Google Scholar
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).
Article PubMed PubMed Central CAS Google Scholar
Wang, S., Lachance, J., Tishkoff, S.A., Hey, J. & Xing, J. Apparent variation in Neanderthal admixture among African populations is consistent with gene flow from non-African populations. Genome Biol. Evol. 5, 2075–2081 (2013).
Article PubMed PubMed Central Google Scholar
Lowery, R.K. et al. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms. Gene 530, 83–94 (2013).
Article CAS PubMed Google Scholar

Acknowledgements

The authors thank S. Sunyaev and D. Reich for help with PolyPhen-2 and DAF corrections, M. Turchin for help with purging analysis, J. Pickrell for help with TreeMix, and V. Bafna, N. Schork, and S. Bonissone for suggestions. Work was supported by grants from the US National Institutes of Health (P01HD070494 and R01NS048453), the Qatari National Research Foundation (NPRP6-1463), the Simons Foundation Autism Research Initiative (175303 and 275275) to J.G.G., the Yale Center for Mendelian Disorders (U54HG006504), the Broad Institute (U54HG003067), the Rockefeller University CTSA (5UL1RR024143-04), the Howard Hughes Medical Institute (to J.G.G. and J.-L.C.), INSERM, the St. Giles Foundation, and the Candidoser Association and by grants R01AI088364, R37AI095983, P01AI061093, U01AI109697 (to J.-L.C.), U01AI088685 (to J.-L.C. and L.A.), R21AI107508 (to E. Jouanguy), the DHFMR Collaborative Research Grant, and KACST 13-BIO1113-20 (to F.S.A.).

Author information

Authors and Affiliations

Howard Hughes Medical Institute, Rockefeller University, New York, New York, USA
Eric M Scott, Emily G Spencer, Yupeng He, Mostafa Abdellateef Azab, Jean-Laurent Casanova & Joseph G Gleeson
Department of Neurosciences, Rady Children's Institute for Genomic Medicine, University of California, San Diego, La Jolla, California, USA
Eric M Scott, Emily G Spencer, Yupeng He, Mostafa Abdellateef Azab & Joseph G Gleeson
Laboratory for Pediatric Brain Disease, Rockefeller University, New York, New York, USA
Eric M Scott, Emily G Spencer, Yupeng He, Mostafa Abdellateef Azab & Joseph G Gleeson
Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
Anason Halees
St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller University, New York, New York, USA
Yuval Itan, Bertrand Boisson, Laurent Abel & Jean-Laurent Casanova
Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
Stacey B Gabriel
Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, INSERM, Paris, France
Aziz Belkadi, Bertrand Boisson, Laurent Abel & Jean-Laurent Casanova
Paris Descartes University, Imagine Institute, Paris, France
Aziz Belkadi, Bertrand Boisson, Laurent Abel & Jean-Laurent Casanova
Department of Molecular Biology and Genetics, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
Andrew G Clark
Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
Fowzan S Alkuraya
Department of Anatomy and Cell Biology, College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
Fowzan S Alkuraya
Pediatric Hematology–Immunology Unit, Necker Hospital for Sick Children, Paris, France
Jean-Laurent Casanova

Authors

Eric M Scott
Anason Halees
Yuval Itan
Emily G Spencer
Yupeng He
Mostafa Abdellateef Azab
Stacey B Gabriel
Aziz Belkadi
Bertrand Boisson
Laurent Abel
Andrew G Clark
Fowzan S Alkuraya
Jean-Laurent Casanova
Joseph G Gleeson

Consortia

Greater Middle East Variome Consortium

Sohair Abdel Rahim
, Sawsan Abdel-Hadi
, Ghada Abdel-Salam
, Ekram Abdel-Salam
, Mohammed Abdou
, Avinash Abhytankar
, Parisa Adimi
, Jamil Ahmad
, Mustafa Akcakus
, Guside Aksu
, Sami Al Hajjar
, Suliman Al Juamaah
, Saleh Al Muhsen
, Nouriya Al Sannaa
, Salem Al Tameni
, Jumana Al-Aama
, Nasir Al-Allawi
, Raidah Al-Baradie
, Lihadh Al-Gazali
, Amal Al-Hashem
, Waleed Al-Herz
, Deema Al-Jeaid
, Asma Al-Tawari
, Abdullah Alangari
, Alexandre Alcais
, Tariq S AlFawaz
, Zobaida Alsum
, Aomar Ammar-Khodja
, Sepideh Amouian
, Cigdem Arikan
, Omid Aryani
, Ayca Aslanger
, Cigdem Aydogmus
, Caner Aytekin
, Matloob Azam
, Boglarka Bansagi
, Mohamed-Rhida Barbouche
, Laila Bastaki
, Tawfeg Ben-Omran
, Parayil Sankaran Bindu
, Lizbeth Blancas
, Stéphanie Boisson-Dupuis
, Damien Bonnet
, Omar Boudghene Stambouli
, Aziz Bousfiha
, Lobna Boussafara
, Jeannette Boutros
, Jacinta Bustamante
, Huseyin Caksen
, Yildiz Camcioglu
, Emilie Catherinot
, Fatma C Celik
, Michael Ciancanelli
, Funda E Cipe
, Gary Clark
, Aurélie Cobat
, Sinan Comu
, Angela Condie
, Antonio Condino-Neto
, Mukesh Desai
, William Dobyns
, Figen Dogu
, Mohamed Domaia
, Meltem Dorum
, Odul Egritas
, Safa El Azbaoui
, Jamila El Baghdadi
, Mona El Ruby
, Ashraf El-Harouni
, Reem A Elfeky
, Gehad Elghazali
, Eissa Faqeih
, Elif Fenerci
, Claire Fieschi
, Cipe Funda
, Iman Gamal
, Umit Gelik
, Fetah Genel
, Alper Gezdirici
, Katta M Girisha
, Amy Goldstein
, Padraic Grattan-Smith
, Neerja Gupta
, Jin Hahn
, Nevin Hatipoglu
, Raoul Hennekam
, Massoud Houshmand
, Philippe Ichai
, Aydan Ikinciogullari
, Samira Ismail
, Chaim Jalas
, Emmanuelle Jouanguy
, Madhulika Kabra
, Göknur Kalkan
, Majdi Kara
, Neslihan Karaca
, Kadri Karaer
, Ariana Kariminejad
, Hulya Kayserili
, Melike Keser-Emiroglu
, Sara S Kilic
, Najib Kissani
, Cristina Kokron
, Roshan Koul
, Necil Kutukculer
, Fanny Lanternier
, Alireza Mahdaviani
, Nizar Mahlaoui
, Lobna Mansour
, Davood Mansouri
, Lucia Margari
, Enza Maria Valente
, Naima Marzouki
, Amira Masri
, Amina Megahed
, Hisham Megahed
, Najla Mekki
, Mehrnaz Mesdaghi
, Mohd Mikati
, Faezeh Mojahedi
, John Mulley
, Sheela Nampoothiri
, Carmen Navarrete
, Tarek Omar
, Azza Oraby
, Ayse Pandaluz
, Nima Parvaneh
, Turkan Patiroglu
, Zeynep Peker Koc
, Isabelle Pellier
, Capucine Picard
, Anne Puel
, Annick Raas-Rothschild
, Anna Rajab
, Didier Raoult
, Ismail Reisli
, Nima Rezaei
, Ayoub Sabri
, Yasin Sahin
, Laila Saleem
, Fadia Salem
, Najla Sameer AlSediq
, Ozden Sanal
, Terry Sanger
, Hanan Shakankiry
, Lei Shang
, Nabil Shehata
, Nuri Shembesh
, Vared Shkalim
, Ameen Softah
, Sameera Sogaty
, Neveen Soliman
, Fatma Sonmez-Aunaci
, Laszlo Sztriha
, Lynda Taibi-Berrah
, Samia Temtamy
, Hasan Tonekaboni
, Doris Trauner
, Beyhan Tuysuz
, Beyhan Tuysuz
, Ali Varan
, Guillaume Vogt
, Christopher Walsh
, Geoffrey Woods
, Gozde Yesil
, Alisan Yildiran
, Basak Yildiz
, Adnan Yuksel
, Maha Zaki
& Shen-Ying Zhang

Contributions

E.M.S. performed analysis and generated all figures. A.H., Y.I., Y.H., and M.A.A. consulted on analysis. E.G.S., A.B., B.B., L.A., F.S.A., J.-L.C., and J.G.G. contributed subjects and jointly wrote and edited the manuscript. S.B.G. oversaw sequencing. A.G.C. consulted on population studies. The GME Variome Consortium identified subjects for study.

Corresponding author

Correspondence toJoseph G Gleeson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Country distribution of GME samples and designation of geographical subregions.

GME samples collected across 20 countries and territories from the GME. Pie size corresponds to the number of samples from each country, and each pie shows the proportion of samples filtered because of quality control and relationship status (Online Methods). Geographical subregions are colored to show the sets of grouped countries. Some non-uniformity of sampling was inevitable owing to the inaccessibility of some populations. Map downloaded from https://www.presentationmagazine.com/ then colored.

Supplementary Figure 2 Unbiased genetic clustering demonstrates shorter genetic distance between samples from proximal geographical subregions.

Dendrogram of unbiased genetic clustering correlated with geographical subregion designation. 2,497 samples underwent exome sequencing from the Greater Middle East Consortium, including 1,111 GME samples as well as samples from Africa, East Asia, Europe, the Americas, Oceania, and unknown regions. Calculated identity-by-state (IBS) distances between samples represent the number of non-identical positions. Concordance between recruitment location and IBS clustering for all GME subregions was observed. Some intermixing was evident, suggesting recent migration events.

Supplementary Figure 3 ADMIXTURE cross-validation.

(a) Cross-validation errors for the ADMIXTURE results shown in Supplementary Figure 1. Analysis with k = 6 gave the lowest cross-validation error. (b) Cross-validation errors for GME and 1000 Genomes Project samples.

Supplementary Figure 4 Unsupervised ADMIXTURE analysis of GME populations shows genetic history.

Results of ADMIXTURE analysis for LD-filtered variants for 1,111 GME samples across the six geographical subregions. Eleven iterations of k were run, from 2 to 12, to optimize clustering. Each vertical bar represents a single individual. The y axis shows the estimated proportion of the genome assigned to each ancestral cluster. Samples grouped by subregion and organized from west (left) to east (right), showing trends of overlap. Substantial substructure was apparent throughout much of the GME, but three apparent ‘sources’ of ancestral populations stem from the NWA (yellow), AP (red), and PP (green) subregions.

Supplementary Figure 5 Introgression analysis of GME and 1000 Genomes Project exome samples shows consistent Neanderthal introgression on all GME, European, and East Asian samples except for NWA.

(a) Individuals from the 1000 Genomes Project reference populations and GME subregions were projected onto the first two principal components calculated from Neanderthal, chimpanzee, and Denisovan genomes. PC1 separates ancient human populations from chimpanzee, and PC2 separates the Neanderthal and Denisovan populations. When human samples were projected onto these principal components, they clustered near the center of these three species. Arrows are drawn from the center of the sub-Saharan African populations to each of the ancestral human and chimpanzee points. The sub-Saharan African populations represent a control group, where only limited Neanderthal and Denisovan introgression should be present. (b) Magnified view of a showing the dispersal of human populations within these two principal components. Samples are colored on the basis of continental origin, and subpopulations are labeled to indicate the center of each population. African populations were found to be separate from the remaining populations, which were found from this adjusted origin along the Neanderthal vector. Most populations were found to be tightly clustered with only the TP and NWA populations, showing clear separation, suggesting a common time point of introgression among these clustered populations. The NWA samples had less introgression than the other GME populations.

Supplementary Figure 6 Heat map of pairwise _F_ST values among all 1000 Genomes Project and GME populations identifies three clusters with a low degree of differentiation.

Top right, Wright's fixation index; bottom left, standard error values. Populations are ordered on the basis of geographical location. Three distinct clusters of close populations (shown as a blue gradient) are evident: 1000 Genomes Project Africa (LWK and YRI); 1000 Genomes Project Europe (FIN, CEU, and TSI), and GME subregions (NWA, NEA, AP, SD, TP, and PP); and 1000 Genomes Project East Asia (JPT, CHS, and CHB). Among global populations, the GME and European populations were more closely related than any other two continental regions. The greatest distance between any two populations was estimated as 0.212 for YRI and JPT. As populations became more distant, standard error values increased but remained small for all comparisons.

Supplementary Figure 7 Principal-component analysis on GME and 1000 Genomes Project populations showed that PC3 and PC4 explained inter-GME variance.

Plots comparing all combinations of PC1, PC2, PC3, and PC4 and percentages of variance explained. GME populations are color-coded by geographical regions. PC1 (39.03%) and PC2 (31.38%) together accounted for the majority of variation in the data and were associated with separating Africans and East Asians from other samples, respectively. PC3 and PC4 separated GME and European populations along north–south and east–west axes, respectively. AP was the most distant cluster from the 1000 Genomes Project reference populations, showing the greatest separation along PC3. Both of the North African populations tended to cluster closer to the sub-Saharan African cluster, whereas PP and TP trended toward the East Asian cluster.

Supplementary Figure 8 Reported consanguineous marriage rates many fold higher in GME than in other continental populations.

Clinical survey results aggregated to estimate regional averages of the consanguineous marriage rate. Weighted averages, taking sample size into account, were calculated across all studies falling within a given region. The highest rates of consanguineous marriage were documented in PP and AP.

Supplementary Figure 9 GME samples carried longer and rarer runs of homozygosity than 1000 Genomes Project populations.

(a) Cumulative proportion total ROH length by bin for African, East Asian, European, and GME populations. African populations had the shortest accumulation of ROH spans, whereas GME populations showed the longest despite the limited influence of bottlenecks. (b) Distribution of total ROH length (in Mb) for all 1000 Genomes Project and GME populations. Wider distributions were evident for the GME populations owing to heterogeneity in long ROHs. (c) The total number of exomic bases found in ROHs binned by frequency in each population. GME ROHs tended to be unique in comparison to 1000 Genomes Project populations.

Supplementary Figure 10 Identity-by-state distance comparing human and chimpanzee reference genomes showed burden bias associated with hg19 corrected using estimated ancestral alleles.

(a) Homozygous and heterozygous variant counts shown for samples using hg19 (left) and PanTro2 (right) as the reference genomes. PanTro2 alleles demonstrated a linear relationship between populations, arguing for no burden difference. (b) IBS distance to the reference for chimpanzee genomes PanTro2 and PanTro4 (x axis) versus human hg19 (y axis). Human populations stratify by IBS distance using the hg19 reference genome. With chimpanzee ancestral variants, populations were equidistant from the chimpanzee reference genome.

Supplementary Figure 11 Correction of PolyPhen-2 predictions for derived variants resolved missense burden bias.

(a) The proportions of derived (Der) and ancestral (Anc) variants falling into each PolyPhen-2 class (B, benign; P, possibly damaging; D, probably damaging), across 14 allele frequency bins. The bias was apparent in the absence of possibly damaging and probably damaging calls for derived variants across nearly all bins. This bias can misrepresent results when comparing populations. (b) The same proportions after correction of derived variant PolyPhen-2 classes (Online Methods). Derived variant classes reflect the distributions of the ancestral variants. The x axis shows derived allele frequency bins, with parentheses and square brackets designating exclusion and inclusion, respectively.

Supplementary Figure 12 Mean derived allele frequencies for GME and 1000 Genomes Project populations across seven functional and deleteriousness variant classes suggested equivalent selective pressure.

(a) Calculated mean DAFs and standard errors for GME and 1000 Genomes Project populations. Variants were separated by functional class (noncoding, synonymous, nonsynonymous, and LOF) and corrected PolyPhen-2 deleteriousness class (benign, possibly damaging, probably and damaging). Populations are ordered as indicated on the right. No significant difference between populations was found for any variant class. (b) Mean DAF comparison for the X chromosome. Large error bars for some classes reflect limited ascertainment of variants within those classes.

Supplementary Figure 13 Comparison of allele frequency estimates from Exome Variant Server European-American and African-American populations showed poor correlation.

Comparison of the distribution of estimated allele frequencies for shared variants from two populations, EA and AA, showed poor correlation (Pearson's r = 0.1147). Hexagonal bins are colored according to the abundance of variants falling within each region. The linear regression line (blue) and identity line (black) are shown.

Supplementary information

Rights and permissions

About this article

Cite this article

Scott, E., Halees, A., Itan, Y. et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery.Nat Genet 48, 1071–1076 (2016). https://doi.org/10.1038/ng.3592

Download citation

Received: 12 January 2016
Accepted: 20 May 2016
Published: 18 July 2016
Issue date: September 2016
DOI: https://doi.org/10.1038/ng.3592