Nationwide genomic biobank in Mexico unravels demographic history and complex trait architecture from 6,057 individuals (original) (raw)

Genome-wide distribution of ancestry in Mexican Americans

Human Genetics, 2008

Migrations to the new world brought together individuals from at least three continents. These indigenous and migrant populations inter-mated and subsequently formed new admixed populations, such as African and Latino Americans. These unprecedented events brought together genomes that had evolved independently on different continents for tens of thousands of years and presented new environmental challenges for the indigenous and migrant populations, as well as their offspring. These circumstances provided novel opportunities for natural selection to occur that could be reflected in deviations from the genome-wide ancestry distribution at specific selected loci. Here we present an analysis examining European, Native American and African ancestry based on 284 microsatellite markers in a study of Mexican Americans from the Family Blood Pressure Program. We identified two genomic regions where there was a significant decrement in African ancestry (at 2p25.1, p < 10 −8 and 9p24.1, p< 2×10 −5) and one region with a significant increase in European ancestry (at 1p33, p< 2 × 10 −5). We show that these regions are not related to blood pressure. These locations may harbor genes that have been subjected to natural selection in the ancestral mixing of Mexicans.

Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits

E Life, 2020

People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.

Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained below 2,000 individuals for up to 10,000 years ago. The proportion of missense variants predicted as damaging is higher for unde-scribed (~ 30%) than for previously reported variants (~ 15%). Several variants previously associated with biological traits are highly frequent in the Native American genomes. These findings suggest that the demographic and adaptive processes that occurred in these groups shaped their genetic architecture and could have implications in biological processes of the Native Americans and Mestizos of today.

Genotyping, sequencing and analysis of 140,000 adults from Mexico City

Nature, 2023

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City 1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using wholegenome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent. Latin American populations harbour extensive genetic diversity that reflects a complex history of migration throughout the Americas, post-Colonial admixture between continents and more recent population growth 2,3. The distinct patterns of genomic variation that exist in these populations have led to key insights into the genetic architecture of rare and common diseases. Founder populations are prevalent throughout Latin America, and analyses of deleterious variants that segregate at higher frequency in these populations have identified clinically relevant new variants 4,5. Moreover, Latin American populations include a significant proportion of Indigenous American subpopulations that have mostly remained genetically uncharacterized. Admixture among European, Indigenous American and African ancestry populations can result in allele frequency distributions that substantially diverge from ancestral populations. Variants that are rare in one ancestry population but common in another may therefore segregate at a higher frequency in an admixed population. This leads to opportunities for new discoveries in these populations that may be missed when studying single ancestry populations 6. For example, in a study of Mexican adults 7 , a haplotype in the SLC16A11 locus that is common in Indigenous Americans but rare in Europeans was strongly

Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights

PLOS ONE, 2021

There has been limited study of Native American whole genome diversity to date, which impairs effective implementation of personalized medicine and a detailed description of its demographic history. Here we report high coverage whole genome sequencing of 76 unrelated individuals, from 27 indigenous groups across Mexico, with more than 97% average Native American ancestry. On average, each individual has 3.26 million Single Nucleotide Variants and short indels, that together comprise a catalog of 9,737,152 variants, 44,118 of which are novel. We report 497 common Single Nucleotide Variants (with allele frequency > 5%) mapped to drug responses and 316,577 in enhancer or promoter elements; interestingly we found some of these enhancer variants in PPARG, a nuclear receptor involved in highly prevalent health problems in Mexican population, such as obesity, diabetes, and insulin resistance. By detecting signals of positive selection we report 24 enriched key pathways under selection, ...

Genotyping, sequencing and analysis of 140,000 adults from the Mexico City Prospective Study

The Mexico City Prospective Study (MCPS) is a prospective cohort of over 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City. We generated genotype and exome sequencing data for all individuals, and whole genome sequencing for 10,000 selected individuals. We uncovered high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Native American, European and African ancestry, with extensive admixture from indigenous groups in Central, Southern and South Eastern Mexico. Native Mexican segments of the genome had lower levels of coding variation, but an excess of homozygous loss of function variants compared with segments of African and European origin. We estimated population specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Native Mexico at exome variants, all available via a public browser. Using ...

A Genomewide Single-Nucleotide–Polymorphism Panel for Mexican American Admixture Mapping

American Journal of Human Genetics, 2007

For admixture mapping studies in Mexican Americans (MAM), we define a genomewide single-nucleotide–polymorphism (SNP) panel that can distinguish between chromosomal segments of Amerindian (AMI) or European (EUR) ancestry. These studies used genotypes for >400,000 SNPs, defined in EUR and both Pima and Mayan AMI, to define a set of ancestry-informative markers (AIMs). The use of two AMI populations was necessary to remove a subset of SNPs that distinguished genotypes of only one AMI subgroup from EUR genotypes. The AIMs set contained 8,144 SNPs separated by a minimum of 50 kb with only three intermarker intervals >1 Mb and had EUR/AMI FST values >0.30 (mean FST=0.48) and Mayan/Pima FST values <0.05 (mean FST<0.01). Analysis of a subset of these SNP AIMs suggested that this panel may also distinguish ancestry between EUR and other disparate AMI groups, including Quechuan from South America. We show, using realistic simulation parameters that are based on our analyses of MAM genotyping results, that this panel of SNP AIMs provides good power for detecting disease-associated chromosomal segments for genes with modest ethnicity risk ratios. A reduced set of 5,287 SNP AIMs captured almost the same admixture mapping information, but smaller SNP sets showed substantial drop-off in admixture mapping information and power. The results will enable studies of type 2 diabetes, rheumatoid arthritis, and other diseases among which epidemiological studies suggest differences in the distribution of ancestry-associated susceptibility.

Differentiation of Hispanic biogeographic ancestry with 80 ancestry informative markers

Scientific Reports

Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); however, population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations. Utilizing unrelated individuals from the Genomic Origins and Admixture in Latinos dataset (GOAL, n = 160), we designed an 80 SNP panel (Setser80) that accurately depicts BGA through STRUCTURE and PCA. We compared our Setser80 to the Seldin and Kidd panels via resampling simulations, which models data based on allele frequencies. We incorporated Admixed American 1000 Genomes populations (1000 G, n = 347), into a combined populations dataset to determine robustness. Using multinomial logistic regression (MLR), we compared the 3 panels on the combined dataset and found overall MLR classification accuracies: 93.2% Setser80, 87.9% Seldin panel, 71.4% Kidd panel. Naïve Bayesian classification had similar results on the combined dataset: 91.5% Setser80...

Genetic diversity in populations across Latin America: implications for population and medical genetic studies

Hispanic/Latino (H/L) populations, although linked by culture and aspects of shared history, reflect the complexity of history and migration influencing the Americas. The original settlement by indigenous Americans, followed by postcolonial admixture from multiple continents, has yielded localized genetic patterns. In addition, numerous H/L populations appear to have signatures of pre-colonization and post-colonization bottlenecks, indicating that tens of millions of H/ Ls may harbor signatures of founder effects today. Based on both population and medical genetic findings we highlight the extreme differentiation across the Americas, providing evidence for why H/Ls should not be considered a single population in modern human genetics. We highlight the need for additional sampling of understudied H/L groups, and ramifications of these findings for genomic medicine in one-tenth of the world's population.

AFA: Computationally efficient Ancestral Frequency estimation in Admixed populations: the Hispanic Community Health Study/Study of Latinos

bioRxiv, 2021

We developed a computationally efficient method, Ancestral Frequency estimation in Admixed populations (AFA), to estimate the frequencies of bi-allelic variants in admixed populations with an unlimited number of ancestries. AFA uses maximum likelihood estimation by modeling the conditional probability of having an allele given proportions of genetic ancestries. It can be applied using either global or local proportions of genetic ancestries. Simulations mimicking admixture demonstrated the high accuracy of the method. We implemented the method on data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an admixed population with three predominant continental ancestries: Amerindian, European, and African. Comparison of the European and African estimated frequencies to the respective gnomAD frequencies demonstrated high correlations, with Pearson R2=0.97-0.99. We provide a genome-wide dataset of the estimated three ancestral allele frequencies in HCHS/SOL for all ava...