Patterns of positive selection in six Mammalian genomes - PubMed (original) (raw)
Comparative Study
Patterns of positive selection in six Mammalian genomes
Carolin Kosiol et al. PLoS Genet. 2008.
Abstract
Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. The LRTs used to detect positive selection in the six mammalian genomes.
(A–I) Panel A shows the test for selection on any branch of the phylogeny, and panels B–I show the lineage- and clade-specific tests, with branches under positive selection highlighted. The numbers below each subfigure represent the number of positively selected genes identified by each LRT (FDR<0.05) and the total number of ortholog sets tested. In (A), branch lengths are drawn proportional to their estimates in substitutions per site, and each branch is labeled with the corresponding estimate of ω. All tests are based on an unrooted phylogeny; the trees are rooted for display purposes only. Nominal _P_-value thresholds for FDR<0.05 were: (A) 1.1×10−3, (B) 9.1×10−5, (C) 7.7×10−5, (D) 2.9×10−4, (E) 2.8×10−4, (F) 2.5×10−5, (G) 5.4×10−5, (H) 1.8×10−5, (I) 5.9×10−5.
Figure 2. Hierarchical clustering of 27 over-represented GO categories identified by the Mann-Whitney U test (“biological process” group only), based on the genes assigned to each category.
This dendrogram is derived from a dissimilarity matrix defined such that any two GO categories, X and Y, have dissimilarity 0 when all genes assigned to X are also assigned to Y (or vice-versa), and dissimilarity 1 when the sets of genes assigned to X and Y do not overlap. Specifically, X and Y have dissimilarity , where
denotes the (nonempty) set of genes assigned to GO category C. Thus, GO categories associated with similar sets of genes group together in the dendrogram, even if these categories are not closely related in the GO hierarchy (such as “cytolysis” and “single fertilization”). Full names of abbreviated categories (*) are “humoral immune response mediated by circulating immunoglobulin,” “activation of plasma proteins during acute inflammatory response,” and “adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains.” (Dendrogram produced by the hclust function in R with method = “average”.)
Figure 3. Structural analysis of the HAVCR1 gene.
At top is a graph showing the domain structure of the gene and corresponding Bayes Empirical Bayes posterior probabilities (PP) of positive selection, based on our six-species alignments, with sites predicted to be under positive selection (PP>0.95) in red. At bottom right is a structural diagram (based on the structure of the IgV domain of the mouse gene) showing the interaction between two receptors that have been implicated in the regulation of HAVCR1's immune function. It is thought that clustering of receptors within the same cell surface might facilitate phosphorylation of the cytoplasmic tail, and that interaction between receptors from different cells might be a mechanism for B–T cell adhesion . Predicted residue 39 falls within the region of these receptors, very near residue 37, which directly interacts with the opposite receptor (according to the available mouse structure). In addition, predicted residues 54 and 56 are adjacent to the virus-binding surface (shown in pink), as defined by a polymorphism in macaque . Interestingly, the residue that falls between them (55) appears to be critical for virus-binding at the homologous loop in the CEA coronavirus receptor . Residue 75 in the IgV domain also shows evidence of positive selection (PP>0.90, shown in orange) but its function is unknown.
Figure 4. Patterns of positive selection on the mammalian phylogeny.
(A) Probabilities that each gene gains (green) or loses (red) positive selection on each branch, under the Bayesian switching model. Switching events are allowed to occur early (near ancestor) or late (near descendant) on internal branches, and early on external branches. The prior probability of selection at the root of the tree is shown in parentheses. (The primate-rodent ancestor is treated as the root for this analysis; see Text S1.) The values shown are posterior means. The full posterior distributions are summarized in Figure S2. (B) Expected numbers of genes under positive selection on each branch (blue) and under positive selection only on each branch (red), out of the 544 PSGs examined, with 95% credible intervals in parentheses. Branch thicknesses are proportional to numbers in blue. Similar estimates are also shown for genes under positive selection on all branches of the primate and rodent clades (blue), on only the branches of these clades (red), and on all branches of the tree (blue). All estimates are based on 10,000 iterations of the Gibbs sampler, excluding a 100 iteration burn-in period. On each iteration, all switching parameters and the selection histories for all genes were sampled (see Text S1).
Figure 5. Distributions of expression levels in PSGs (red) and non-PSGs (blue) for three tissue types.
(A–C) Distributions as estimated from Affymetrix Human Exon 1.0 ST Array data by the RMA algorithm . The other eight tissue types showed similar differences between PSGs and non-PSGs (Figure S7). (D) Distribution of degree of tissue bias in expression levels for PSGs (red) and non-PSGs (blue), as measured by the statistic τ (Methods). An alternative measure of tissue bias (γ) showed a similar pattern.
Figure 6. Power of the LRT for selection on any branch of the phylogeny as a function of the nonsynonymous-synonymous rate ratio ω.
Power is defined as the fraction of tests resulting in nominal P<0.05. (The effect of controlling for multiple comparisons is shown in Figure S3.) When _ω_≤1, these fractions are estimates of the false positive rate. Each data point is based on 1000 data sets simulated with evolver under the assumption of a constant ω among lineages and among sites (model M0). All other parameters (including the transition-transversion ratio κ, the codon frequencies, and the branch lengths) were fixed at values estimated from the real data. Results are shown for short (200-codon) and long (500-codon) genes and three sets of species: hominids (human and chimpanzee), primates (human, chimpanzee, and macaque), and all six mammals. Details on the computation of _P_-values are given in Text S1. Note the logarithmic scale on the _x_-axis.
Similar articles
- Genome-wide signals of positive selection in strongylocentrotid sea urchins.
Kober KM, Pogson GH. Kober KM, et al. BMC Genomics. 2017 Jul 21;18(1):555. doi: 10.1186/s12864-017-3944-7. BMC Genomics. 2017. PMID: 28732465 Free PMC article. - Detection of nonneutral substitution rates on mammalian phylogenies.
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Pollard KS, et al. Genome Res. 2010 Jan;20(1):110-21. doi: 10.1101/gr.097857.109. Epub 2009 Oct 26. Genome Res. 2010. PMID: 19858363 Free PMC article. - A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships.
Amrine-Madsen H, Koepfli KP, Wayne RK, Springer MS. Amrine-Madsen H, et al. Mol Phylogenet Evol. 2003 Aug;28(2):225-40. doi: 10.1016/s1055-7903(03)00118-0. Mol Phylogenet Evol. 2003. PMID: 12878460 - Distribution, expression and methylation analysis of positively selected genes provides insights into the evolution in Brassica rapa.
Guo Y, Liu J, Wang X, Li Y, Hou X, Du J. Guo Y, et al. PLoS One. 2021 Oct 8;16(10):e0256120. doi: 10.1371/journal.pone.0256120. eCollection 2021. PLoS One. 2021. PMID: 34624037 Free PMC article. - Mammalian genome research resources available from the National BioResource Project in Japan.
Mizuno-Iijima S, Kawamoto S, Asano M, Mashimo T, Wakana S, Nakamura K, Nishijima KI, Okamoto H, Saito K, Yoshina S, Miwa Y, Nakamura Y, Ohkuma M, Yoshiki A. Mizuno-Iijima S, et al. Mamm Genome. 2024 Dec;35(4):497-523. doi: 10.1007/s00335-024-10063-2. Epub 2024 Sep 11. Mamm Genome. 2024. PMID: 39261329 Free PMC article. Review.
Cited by
- Evolutionary dynamics of the human NADPH oxidase genes CYBB, CYBA, NCF2, and NCF4: functional implications.
Tarazona-Santos E, Machado M, Magalhães WC, Chen R, Lyon F, Burdett L, Crenshaw A, Fabbri C, Pereira L, Pinto L, Redondo RA, Sestanovich B, Yeager M, Chanock SJ. Tarazona-Santos E, et al. Mol Biol Evol. 2013 Sep;30(9):2157-67. doi: 10.1093/molbev/mst119. Epub 2013 Jul 2. Mol Biol Evol. 2013. PMID: 23821607 Free PMC article. - Recent positive selection has acted on genes encoding proteins with more interactions within the whole human interactome.
Luisi P, Alvarez-Ponce D, Pybus M, Fares MA, Bertranpetit J, Laayouni H. Luisi P, et al. Genome Biol Evol. 2015 Apr 2;7(4):1141-54. doi: 10.1093/gbe/evv055. Genome Biol Evol. 2015. PMID: 25840415 Free PMC article. - Mutation pressure mediates a pattern of substitution rates with latitude and climate in carnivores.
Zhao C, Liu G, Yang X, Wang X, Zhou S, Liu Z, Liu K, Zhang H. Zhao C, et al. Ecol Evol. 2024 Aug 27;14(8):e70159. doi: 10.1002/ece3.70159. eCollection 2024 Aug. Ecol Evol. 2024. PMID: 39193169 Free PMC article. - The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC.
Gharib WH, Robinson-Rechavi M. Gharib WH, et al. Mol Biol Evol. 2013 Jul;30(7):1675-86. doi: 10.1093/molbev/mst062. Epub 2013 Apr 4. Mol Biol Evol. 2013. PMID: 23558341 Free PMC article. - Evolution of land plant genes encoding L-Ala-D/L-Glu epimerases (AEEs) via horizontal gene transfer and positive selection.
Yang Z, Wang Y, Zhou Y, Gao Q, Zhang E, Zhu L, Hu Y, Xu C. Yang Z, et al. BMC Plant Biol. 2013 Mar 1;13:34. doi: 10.1186/1471-2229-13-34. BMC Plant Biol. 2013. PMID: 23452519 Free PMC article.
References
- Kreitman M. Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000;1:539–559. - PubMed
- Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005;39:197–218. - PubMed
- Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science. 2003;302:1960–1963. - PubMed