Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations - PubMed (original) (raw)

. 2016 Oct 20;167(3):643-656.e17.

doi: 10.1016/j.cell.2016.09.024.

Maxime Rotival 1, Julien Pothlichet 1, Yong-Hwee Eddie Loh 1, Michael Dannemann 2, Nora Zidane 1, Guillaume Laval 1, Etienne Patin 1, Christine Harmant 1, Marie Lopez 3, Matthieu Deschamps 3, Nadia Naffakh 4, Darragh Duffy 5, Anja Coen 6, Geert Leroux-Roels 6, Frederic Clément 6, Anne Boland 7, Jean-François Deleuze 7, Janet Kelso 2, Matthew L Albert 8, Lluis Quintana-Murci 9

Affiliations

Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations

Hélène Quach et al. Cell. 2016.

Abstract

Humans differ in the outcome that follows exposure to life-threatening pathogens, yet the extent of population differences in immune responses and their genetic and evolutionary determinants remain undefined. Here, we characterized, using RNA sequencing, the transcriptional response of primary monocytes from Africans and Europeans to bacterial and viral stimuli-ligands activating Toll-like receptor pathways (TLR1/2, TLR4, and TLR7/8) and influenza virus-and mapped expression quantitative trait loci (eQTLs). We identify numerous cis-eQTLs that contribute to the marked differences in immune responses detected within and between populations and a strong trans-eQTL hotspot at TLR1 that decreases expression of pro-inflammatory genes in Europeans only. We find that immune-responsive regulatory variants are enriched in population-specific signals of natural selection and show that admixture with Neandertals introduced regulatory variants into European genomes, affecting preferentially responses to viral challenges. Together, our study uncovers evolutionarily important determinants of differences in host immune responsiveness between human populations.

Keywords: Neandertal admixture; eQTL mapping; evolution; immune response; monocytes; natural selection; population genetics; transcriptional responses.

Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

PubMed Disclaimer

Figures

None

Graphical abstract

Figure 1

Figure 1

Transcriptional Response of Primary Monocytes to TLR Activation and Influenza A Virus Infection (A) PC analysis of adjusted RNA-sequencing expression profiles in the five conditions tested in Africans (AFB) and Europeans (EUB). (B) Weighted correlation network analysis. Relative size of the modules (left), expression patterns of genes in modules that are upregulated after stimulation (1–4) with boxplots representing relative expression based on PC1 (middle), and most associated transcription factor binding motifs in genes within modules (right). (C) Most significant GO biological process enrichments of genes in modules 1–4. See also Figure S4 and Table S1.

Figure 2

Figure 2

Genetic Determinants of Population Differences in Immune Response (A) Number of genes harboring reQTLs in single conditions or combinations of stimulations (∗∗∗p < 0.001, significance of overlap between stimulation conditions). (B) Enrichment of (r)eQTLs in transcription factor (TF) binding sites. The five TFs presenting the strongest enrichments are shown for basal state eQTLs and reQTLs in different sets of stimulated conditions. Dots show the estimated odds ratio for the presence of binding sites of the TFs under consideration among (r)eQTLs, and horizontal lines show the 95% confidence interval of the odds ratio. (C) Proportion (in percentage) of popDRGs harboring a local reQTL. Within each condition, popDRGs of different strengths (light color, |log2FCpop| > 0.2; dark color, |log2FCpop| > 0.5), as well as the proportion of reQTLs expected at the genome-wide level (in gray), are represented (∗∗∗p < 0.001, significance of enrichment). (D) Fraction of population differences attributable to (r)eQTLs among popDEGs and popDRGs. (r)eQTLs are sorted by increasing effect size and divided into quintiles. Dark color bars indicate the fraction of |log2FCpop| of popDRGs attributable to reQTLs for each stimulation, and the light color bars represent the fraction that is not explained by reQTLs. For the non-stimulated (NS) condition, the fraction of |log2FCpop| of popDEGs attributable to eQTLs is reported. (E) TLR-induced reQTL at NCF2 in both Africans and Europeans (left), and mean population expression of NCF2 (right). (F) European-specific reQTL at ST3GAL6 induced by R848 (left), and mean population expression of ST3GAL6 (right). See also Figure S6 and Table S2.

Figure 3

Figure 3

Allele-Specific Expression upon Immune Stimulation (A) Strongest allele-specific response QTLs (asrQTLs) for each stimulation condition and population. (B) asrQTL of ARL5B. Individual allelic ratios, in non-stimulated and IAV-infected conditions, are grouped by reQTL rs2130531 genotype and phase with exonic variants, with color-coding for population (dark and light colors for Africans and Europeans, respectively). Vertical bars show the 95% binomial confidence interval of the allelic ratio. (C) Distribution of allelic ratios across genes harboring aseQTLs. Ratios are grouped by eQTL genotype (HMZ, homozygous; HTZ, heterozygous) for basal state or stimulated conditions. (D) Percentage of ASE events observed for different categories of rare exonic variants. Vertical bars indicate 95% confidence intervals for the estimated percentage (∗∗∗p < 0.001). See also Figure S6 and Table S2.

Figure 4

Figure 4

Identification of Master Regulatory Loci of Immune Responses (A) Genome-wide distribution of _trans_-eQTLs. For each locus, the number of associated genes identified at a genome-wide FDR of 5% or using an SNP-based Bonferroni correction is represented by black and gray bars, respectively. (B) Enrichment of popDRGs in genes regulated by _trans_-eQTLs. Within each condition of stimulation, popDRGs of different strengths (light color, |log2FCpop| > 0.2; dark color, |log2FCpop| > 0.5) are represented (∗p < 0.05, ∗∗∗p < 0.001). (C) Fine mapping of the Pam3CSK4-induced _trans_-eQTL at TLR1. The significance of SNP associations with the expression patterns (PC1) of the 432 _trans_-regulated genes is shown for basal and Pam3CSK4 conditions (gray and green dots, respectively). Only the gene overlapping the strongest _trans_-eQTL signal is represented. (D) _TLR1 trans_-associated genes at pBonferroni < 0.05. The size of the circles reflects the proportion of the variance of gene expression explained by rs5743618, and colors indicate the direction of the change in expression associated with the derived allele. Only the 100 most significant genes are shown. (E) Fraction of population differences in gene expression (|log2FCpop|) attributable to rs5743618 among popDRGs regulated by the TLR1 locus (in dark green). Only the tail distribution of popDRGs with the largest population differences is represented. Genes involved in the GO biological process “response to molecule of bacterial origin” are reported. (F) Impact of the derived allele of TLR1 rs5743618 (C allele) on the expression patterns (PC1) of the 81 inflammatory genes from module 1. See also Table S3.

Figure 5

Figure 5

Natural Selection Driving Population Differences of Immune Response (A) Distribution of neutrality statistics among local (r)eQTLs. For each locus, the maximum _F_ST or |iHS| across all SNPs in high LD (r2 > 0.8) is considered, focusing on selection signals targeting the derived allele. The genome-wide distribution of these statistics (after pruning for LD, to avoid overweighting long haplotypes) is provided as a reference (all). Significance was assessed by resampling random SNPs from the genome, matched for MAF and LD (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001). Top: Africans. Bottom: Europeans. (B) Fine mapping of the selection signal detected at the _CCR1_ reQTL in Africans. Top: SNP associations with _CCR1_ expression (-log10(peQTL)) are represented for the basal (gray) and Pam3CSK4 (green) conditions. SNPs located in the _CCR1_ reQTL peak region are in high LD (r2 > 0.8) with the reQTL peak-SNP. Middle and bottom: _F_ST and |iHS| values for SNPs with a DAF ≥ 0.2, respectively. For each SNP, the size of the dots is proportional to the association with CCR1 expression, with blue and red indicating selection on derived and ancestral alleles, respectively. The blue line represents the recombination rate at the locus. Only the gene for which the eQTL was detected is represented. See also Table S4.

Figure 6

Figure 6

Neandertal Introgression of Immune Regulatory Variants in Europeans (A) Enrichment of (r)eQTLs in archaic SNPs. The observed number of archaic eQTLs is presented for each condition (colored dots) in Europeans with respect to the expected distribution of archaic eQTLs under the assumption of independence (gray bars) (see STAR Methods) (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001). (B) Fine mapping of the archaic reQTL at PNMA1 in Europeans. SNP associations with PNMA1 expression (−log10(peQTL)) for the basal (gray) and stimulated (in colors) conditions (top). European individuals, and their corresponding archaic and modern haplotypes at the PNMA1 locus, are sorted by increasing levels of PNMA1 expression. Red dots represent archaic SNPs, and red lines represent the largest consecutive stretch of archaic alleles associated with PNMA1 expression (middle). Frequency distribution of archaic SNPs at the locus is shown (bottom). Only the gene for which the eQTL was observed is represented. See also Figure S7 and Table S5.

Figure S1

Figure S1

Overview of the Experimental Setting, Related to STAR Methods The transcriptional response of primary monocytes from 200 healthy donors of European and African descent to various immune stimulations was dissected to pinpoint molecular phenotypes differing between populations and under genetic control.

Figure S2

Figure S2

Processing of RNA-Sequencing Data, Related to STAR Methods (A) Read counts across the 970 RNA samples (in millions of reads). (B) Alignment of reads with various genomic features. Reads aligning with splice junctions are counted as many times as the number of genomic features they overlap. TSS: transcription start site, UTR: untranslated region, CDS: coding sequence, TES: transcription end site, up: upstream, down: downstream. (C) Distribution of GC content across samples. (D) Correlations between gene expression and global GC content. The expression of a large proportion of high-GC content genes was positively correlated with GC content (3rd quartile of GC content, in red), whereas the expression of low-GC content genes tended to be negatively correlated to GC content (1st quartile of GC content, in blue). The correlation distribution for total genes is shown in gray. Expectations were calculated by permutation (black). (E) Illustration of the effect of global GC content on gene expression. UFM1 (GC content: 33.9%) expression is plotted for all RNA samples, ordered by global GC content, for each condition and population. (F) Effect of various technical batches and confounding factors on gene expression. The proportion of genes whose expression levels are associated, for different levels of significance, with each factor is presented before and after correction of the data (up and bottom panels, respectively). The association of each gene with each cofactor was assessed by determining the proportion of the variance of gene expression explained by the cofactor under consideration, after adjustment for all other cofactors. (G) Boxplots of coefficient of variation (CV) in technical and biological replicates across conditions. CV distributions of technical replicates are smaller in magnitude and less variable compared to distributions of pairwise biological replicates (Wilcoxon Rank-Sum Test, ∗∗∗p < 0.001). (H) Boxplots of correlation coefficient (r) between technical and biological replicates. r calculated between technical replicates (red circles) are significant outliers to the r distributions of pairwise biological replicates (boxplots).

Figure S3

Figure S3

Overview of the Pipeline Used for RNA-Seq Data Pre-processing, Related to STAR Methods We filtered out samples with irregular gene body coverage and used Cufflinks/CuffDiff to estimate transcript level FPKM. We removed all transcripts for which the total gene FPKM was less than 1 in all conditions. We then log-transformed the data and performed adjustments for GC content, 3′/5′ bias, date of experiment and library preparation batch. We carried out kNN imputation before ComBat, to handle missing values. Batch covariates were treated sequentially, as ComBat can only handle one batch variable at a time. The corrected FPKM values were then transformed back to the normal scale and forced to positive values to calculate gene-level FPKM. Gene expression was considered on the log scale for all subsequent analyses.

Figure S4

Figure S4

Weighted Correlation Network Analysis, Related to Figure 1 The relative expression of each gene module is based on the first principal component for the expression of genes present in the corresponding module.

Figure S5

Figure S5

Genetic Structure of the Population Samples, Related to STAR Methods (A) Genetic ancestry of African-descent Belgians (AFB) and European-descent Belgians (AFB), estimated by ADMIXTURE. Each vertical line represents an individual genome, which is partitioned into K different genetic clusters. This analysis was performed on 229,320 independent SNPs and 789 individuals from 22 populations, including EUB and AFB, together with a selection of representative populations from sub-Saharan Africa, North Africa, the Near East and Europe (Behar et al., 2010, Patin et al., 2014). We made K vary from 2 to 10, and ran five iterations with different random seeds for each K value. The run with the lowest cross-validation error rate for each K value is shown for K = 2 to 5. (B) Cross-validation (CV) error rates of ADMIXTURE results for 5 different iterations and K prior values. Minimum CV values for each K are in red. CV values start increasing at K = 6. (C) Local genetic sub-structure in the AFB population sample, estimated by principal component analysis (PCA). This analysis was performed on 341,593 independent SNPs and 511 individuals from 7 western and central African populations (Patin et al., 2014). (D) Local genetic sub-structure in the EUB population, estimated by PCA. This analysis was performed on 182,572 independent SNPs and 220 individuals from 13 European populations (Behar et al., 2010). (C-D) PC1 and PC2 are shown, together with the proportion of variance explained.

Figure S6

Figure S6

eQTL and aseQTL Mapping, Related to Figures 2 and 3 (A) Number of genes associated with an eQTL across conditions. (B) Enrichment of (r)eQTLs in regulatory elements. CD14+ cell regulatory elements from Ensembl Regulatory Build were considered for the analysis. Significance was assessed relative to the genome-wide distribution of SNPs overlapping regulatory elements within 1Mb of the gene transcription start site. ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001. (C) General rationale of allele-specific eQTL mapping. We identified eQTLs leading to allelic imbalance (i.e., aseQTLs) by determining the ratio of reads from the two alleles of a heterozygous exonic SNP, according to their phase with the genotypes of the regulatory SNP that affects a transcription factor binding site. (D) Allele-specific expression in the context of cell stimulation. The aseQTL is detectable only after stimulation (TLR/IAV) by the presence of a transcription factor that is not expressed at the basal state (NS). (E) Power to detect aseQTLs as a function of eQTL effect size, number of observations and number of reads per exonic SNP. (F-G) Effect of regulatory variants on allele-specific expression, with (F) distribution of aseQTLs among local eQTLs and (G) distribution of asrQTLs among local reQTLs.

Figure S7

Figure S7

Introgression of Regulatory Variants from Neandertals, Related to Figure 6 (A) Models of incomplete lineage sorting and introgression from Neandertals. Incomplete lineage sorting (ILS) scenario (left panel). An ancient variant predating the split between humans and Neandertals was retained in both lineages, but lost from the African population. Haplotypes carrying this ancient allele in Europeans are expected to be short because the time window allowed multiple recombination events to occur. Scenario of introgression from Neandertals (right panel). An archaic allele from Neandertals has been introgressed in Europeans and haplotypes containing this allele are expected to be longer than those resulting from ILS, due to the more recent occurrence of the admixture event. (B) Frequency of the archaic haplotype of PNMA1, tagged by SNP rs12436322, in different world-wide populations. Pie size is proportional to the number of individuals. The non-archaic allele is reported in violet (1000 Genomes data) and in dark blue (Simons Genome Diversity Project Dataset), and the archaic allele is presented in orange. (C) SNP associations, [-log10(peQTL)], with PNMA1 expression in Africans and Europeans are shown (upper panel). Archaic haplotypes at the PNMA1 locus in Europeans (lower panel) are ordered by increasing levels of PNMA1 expression, and archaic SNPs are represented in red for each haplotype. Red lines indicate the core archaic haplotype, defined as the haplotype within the eQTL carrying the largest number of archaic alleles. The eQTL identified in Europeans spans a region of 102 kb surrounding the gene, and overlaps an eQTL present in Africans. (D) Dissection of the PNMA1 core haplotype. Haplotypes at the PNMA1 locus are represented by horizontal lines showing, for SNPs with a MAF ≥ 5% in either population, the ancestral allele in white and the derived allele in color. The black horizontal line separates European haplotypes (top) from African haplotypes (bottom). Within each population, SNPs associated with PNMA1 expression are highlighted in blue for those specific to the Neandertal lineage (i.e. archaic SNPs) and in red for the others. In total, 12 archaic alleles at the locus tag the archaic core haplotype associated with an upregulation of PNMA1 expression, either in the basal condition or in response to R848 and IAV. This haplotype is longer than expected under the ILS scenario, suggesting that introgression occurred in Europeans through admixture with Neandertals. This archaic haplotype, tagged by the aSNP rs12436322, has re-introduced the ancestral allele of rs6574138 in Europeans (i.e. all individuals not carrying the archaic haplotypes have the derived allele), which is also present in Africans and associated with PNMA1 expression. SNP rs6574138 overlaps with ENCODE binding sites, consistent with its putative functional role in the regulation of PNMA1 expression in both Europeans and Africans. (E) Inferred history of the ancestral and derived alleles at rs6574138 in modern human populations, up to their most recent common ancestor. Our data suggest that rs6574138 predates the split between Neandertals and modern humans and that the derived allele of this variant was fixed in early Europeans. The ancestral allele of rs6574138 was then reintroduced in Europeans by introgression of the Neandertal haplotype tagged by rs12436322, and is responsible for the variability in PNMA1 expression in the contemporary European population.

References

    1. Abi-Rached L., Jobin M.J., Kulkarni S., McWhinnie A., Dalva K., Gragert L., Babrzadeh F., Gharizadeh B., Luo M., Plummer F.A. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science. 2011;334:89–94. - PMC - PubMed
    1. Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. - PMC - PubMed
    1. Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. - PMC - PubMed
    1. Areschoug T., Gordon S. Scavenger receptors: role in innate immunity and microbial pathogenesis. Cell. Microbiol. 2009;11:1160–1169. - PubMed
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., The Gene Ontology Consortium Gene ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources