Genotyping, sequencing and analysis of 140,000 adults from the Mexico City Prospective Study (original) (raw)

Genotyping, sequencing and analysis of 140,000 adults from Mexico City

Nature, 2023

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City 1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using wholegenome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent. Latin American populations harbour extensive genetic diversity that reflects a complex history of migration throughout the Americas, post-Colonial admixture between continents and more recent population growth 2,3. The distinct patterns of genomic variation that exist in these populations have led to key insights into the genetic architecture of rare and common diseases. Founder populations are prevalent throughout Latin America, and analyses of deleterious variants that segregate at higher frequency in these populations have identified clinically relevant new variants 4,5. Moreover, Latin American populations include a significant proportion of Indigenous American subpopulations that have mostly remained genetically uncharacterized. Admixture among European, Indigenous American and African ancestry populations can result in allele frequency distributions that substantially diverge from ancestral populations. Variants that are rare in one ancestry population but common in another may therefore segregate at a higher frequency in an admixed population. This leads to opportunities for new discoveries in these populations that may be missed when studying single ancestry populations 6. For example, in a study of Mexican adults 7 , a haplotype in the SLC16A11 locus that is common in Indigenous Americans but rare in Europeans was strongly

Nationwide genomic biobank in Mexico unravels demographic history and complex trait architecture from 6,057 individuals

Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories as well as complex trait architectures remain hidden due to the lack of Big Data. To fill this gap, the Mexican Biobank project genotyped 1.8 million markers in 6,057 individuals from 32 states and 898 sampling localities across Mexico with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Through a suite of state-of-the-art methods for ancestry deconvolution and inference of identity-by-descent (IBD) segments, we inferred detailed ancestral histories for the last 200 generations in different Mesoamerican regions, unraveling native and colonial/post-colonial demographic dynamics. We observed large variations in runs of homozygosity (ROH) among genomic regions with different ancestral origins reflecting their demographic histories, which also affect the distribution of rare deleterious variants across Mexico. We analyzed...

Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights

PLOS ONE, 2021

There has been limited study of Native American whole genome diversity to date, which impairs effective implementation of personalized medicine and a detailed description of its demographic history. Here we report high coverage whole genome sequencing of 76 unrelated individuals, from 27 indigenous groups across Mexico, with more than 97% average Native American ancestry. On average, each individual has 3.26 million Single Nucleotide Variants and short indels, that together comprise a catalog of 9,737,152 variants, 44,118 of which are novel. We report 497 common Single Nucleotide Variants (with allele frequency > 5%) mapped to drug responses and 316,577 in enhancer or promoter elements; interestingly we found some of these enhancer variants in PPARG, a nuclear receptor involved in highly prevalent health problems in Mexican population, such as obesity, diabetes, and insulin resistance. By detecting signals of positive selection we report 24 enriched key pathways under selection, ...

Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits

E Life, 2020

People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.

Population History and Gene Divergence in Native Mexicans Inferred from 76 Human Exomes

Molecular Biology and Evolution, 2019

Native American genetic variation remains underrepresented in most catalogs of human genome sequencing data. Previous genotyping efforts have revealed that Mexico's Indigenous population is highly differentiated and substructured, thus potentially harboring higher proportions of private genetic variants of functional and biomedical relevance. Here we have targeted the coding fraction of the genome and characterized its full site frequency spectrum by sequencing 76 exomes from five Indigenous populations across Mexico. Using diffusion approximations, we modeled the demographic history of Indigenous populations from Mexico with northern and southern ethnic groups splitting 7.2 KYA and subsequently diverging locally 6.5 and 5.7 KYA, respectively. Selection scans for positive selection revealed BCL2L13 and KBTBD8 genes as potential candidates for adaptive evolution in Rar amuris and Triquis, respectively. BCL2L13 is highly expressed in skeletal muscle and could be related to physical endurance, a well-known phenotype of the northern Mexico Rar amuri. The KBTBD8 gene has been associated with idiopathic short stature and we found it to be highly differentiated in Triqui, a southern Indigenous group from Oaxaca whose height is extremely low compared to other Native populations.

Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained below 2,000 individuals for up to 10,000 years ago. The proportion of missense variants predicted as damaging is higher for unde-scribed (~ 30%) than for previously reported variants (~ 15%). Several variants previously associated with biological traits are highly frequent in the Native American genomes. These findings suggest that the demographic and adaptive processes that occurred in these groups shaped their genetic architecture and could have implications in biological processes of the Native Americans and Mestizos of today.

Genome-wide distribution of ancestry in Mexican Americans

Human Genetics, 2008

Migrations to the new world brought together individuals from at least three continents. These indigenous and migrant populations inter-mated and subsequently formed new admixed populations, such as African and Latino Americans. These unprecedented events brought together genomes that had evolved independently on different continents for tens of thousands of years and presented new environmental challenges for the indigenous and migrant populations, as well as their offspring. These circumstances provided novel opportunities for natural selection to occur that could be reflected in deviations from the genome-wide ancestry distribution at specific selected loci. Here we present an analysis examining European, Native American and African ancestry based on 284 microsatellite markers in a study of Mexican Americans from the Family Blood Pressure Program. We identified two genomic regions where there was a significant decrement in African ancestry (at 2p25.1, p < 10 −8 and 9p24.1, p< 2×10 −5) and one region with a significant increase in European ancestry (at 1p33, p< 2 × 10 −5). We show that these regions are not related to blood pressure. These locations may harbor genes that have been subjected to natural selection in the ancestral mixing of Mexicans.

A Genomewide Single-Nucleotide–Polymorphism Panel for Mexican American Admixture Mapping

American Journal of Human Genetics, 2007

For admixture mapping studies in Mexican Americans (MAM), we define a genomewide single-nucleotide–polymorphism (SNP) panel that can distinguish between chromosomal segments of Amerindian (AMI) or European (EUR) ancestry. These studies used genotypes for >400,000 SNPs, defined in EUR and both Pima and Mayan AMI, to define a set of ancestry-informative markers (AIMs). The use of two AMI populations was necessary to remove a subset of SNPs that distinguished genotypes of only one AMI subgroup from EUR genotypes. The AIMs set contained 8,144 SNPs separated by a minimum of 50 kb with only three intermarker intervals >1 Mb and had EUR/AMI FST values >0.30 (mean FST=0.48) and Mayan/Pima FST values <0.05 (mean FST<0.01). Analysis of a subset of these SNP AIMs suggested that this panel may also distinguish ancestry between EUR and other disparate AMI groups, including Quechuan from South America. We show, using realistic simulation parameters that are based on our analyses of MAM genotyping results, that this panel of SNP AIMs provides good power for detecting disease-associated chromosomal segments for genes with modest ethnicity risk ratios. A reduced set of 5,287 SNP AIMs captured almost the same admixture mapping information, but smaller SNP sets showed substantial drop-off in admixture mapping information and power. The results will enable studies of type 2 diabetes, rheumatoid arthritis, and other diseases among which epidemiological studies suggest differences in the distribution of ancestry-associated susceptibility.

Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico

Proceedings of The National Academy of Sciences, 2009

Mexico is developing the basis for genomic medicine to improve healthcare of its population. The extensive study of genetic diversity and linkage disequilibrium structure of different populations has made it possible to develop tagging and imputation strategies to comprehensively analyze common genetic variation in association studies of complex diseases. We assessed the benefit of a Mexican haplotype map to improve identification of genes related to common diseases in the Mexican population. We evaluated genetic diversity, linkage disequilibrium patterns, and extent of haplotype sharing using genomewide data from Mexican Mestizos from regions with different histories of admixture and particular population dynamics. Ancestry was evaluated by including 1 Mexican Amerindian group and data from the HapMap. Our results provide evidence of genetic differences between Mexican subpopulations that should be considered in the design and analysis of association studies of complex diseases. In addition, these results support the notion that a haplotype map of the Mexican Mestizo population can reduce the number of tag SNPs required to characterize common genetic variation in this population. This is one of the first genomewide genotyping efforts of a recently admixed population in Latin America.