Patterns of positive selection in six Mammalian genomes - PubMed (original) (raw)

Comparative Study

Patterns of positive selection in six Mammalian genomes

Carolin Kosiol et al. PLoS Genet. 2008.

Abstract

Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. The LRTs used to detect positive selection in the six mammalian genomes.

(A–I) Panel A shows the test for selection on any branch of the phylogeny, and panels B–I show the lineage- and clade-specific tests, with branches under positive selection highlighted. The numbers below each subfigure represent the number of positively selected genes identified by each LRT (FDR<0.05) and the total number of ortholog sets tested. In (A), branch lengths are drawn proportional to their estimates in substitutions per site, and each branch is labeled with the corresponding estimate of ω. All tests are based on an unrooted phylogeny; the trees are rooted for display purposes only. Nominal _P_-value thresholds for FDR<0.05 were: (A) 1.1×10−3, (B) 9.1×10−5, (C) 7.7×10−5, (D) 2.9×10−4, (E) 2.8×10−4, (F) 2.5×10−5, (G) 5.4×10−5, (H) 1.8×10−5, (I) 5.9×10−5.

Figure 2

Figure 2. Hierarchical clustering of 27 over-represented GO categories identified by the Mann-Whitney U test (“biological process” group only), based on the genes assigned to each category.

This dendrogram is derived from a dissimilarity matrix defined such that any two GO categories, X and Y, have dissimilarity 0 when all genes assigned to X are also assigned to Y (or vice-versa), and dissimilarity 1 when the sets of genes assigned to X and Y do not overlap. Specifically, X and Y have dissimilarity formula image, where formula image denotes the (nonempty) set of genes assigned to GO category C. Thus, GO categories associated with similar sets of genes group together in the dendrogram, even if these categories are not closely related in the GO hierarchy (such as “cytolysis” and “single fertilization”). Full names of abbreviated categories (*) are “humoral immune response mediated by circulating immunoglobulin,” “activation of plasma proteins during acute inflammatory response,” and “adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains.” (Dendrogram produced by the hclust function in R with method = “average”.)

Figure 3

Figure 3. Structural analysis of the HAVCR1 gene.

At top is a graph showing the domain structure of the gene and corresponding Bayes Empirical Bayes posterior probabilities (PP) of positive selection, based on our six-species alignments, with sites predicted to be under positive selection (PP>0.95) in red. At bottom right is a structural diagram (based on the structure of the IgV domain of the mouse gene) showing the interaction between two receptors that have been implicated in the regulation of HAVCR1's immune function. It is thought that clustering of receptors within the same cell surface might facilitate phosphorylation of the cytoplasmic tail, and that interaction between receptors from different cells might be a mechanism for B–T cell adhesion . Predicted residue 39 falls within the region of these receptors, very near residue 37, which directly interacts with the opposite receptor (according to the available mouse structure). In addition, predicted residues 54 and 56 are adjacent to the virus-binding surface (shown in pink), as defined by a polymorphism in macaque . Interestingly, the residue that falls between them (55) appears to be critical for virus-binding at the homologous loop in the CEA coronavirus receptor . Residue 75 in the IgV domain also shows evidence of positive selection (PP>0.90, shown in orange) but its function is unknown.

Figure 4

Figure 4. Patterns of positive selection on the mammalian phylogeny.

(A) Probabilities that each gene gains (green) or loses (red) positive selection on each branch, under the Bayesian switching model. Switching events are allowed to occur early (near ancestor) or late (near descendant) on internal branches, and early on external branches. The prior probability of selection at the root of the tree is shown in parentheses. (The primate-rodent ancestor is treated as the root for this analysis; see Text S1.) The values shown are posterior means. The full posterior distributions are summarized in Figure S2. (B) Expected numbers of genes under positive selection on each branch (blue) and under positive selection only on each branch (red), out of the 544 PSGs examined, with 95% credible intervals in parentheses. Branch thicknesses are proportional to numbers in blue. Similar estimates are also shown for genes under positive selection on all branches of the primate and rodent clades (blue), on only the branches of these clades (red), and on all branches of the tree (blue). All estimates are based on 10,000 iterations of the Gibbs sampler, excluding a 100 iteration burn-in period. On each iteration, all switching parameters and the selection histories for all genes were sampled (see Text S1).

Figure 5

Figure 5. Distributions of expression levels in PSGs (red) and non-PSGs (blue) for three tissue types.

(A–C) Distributions as estimated from Affymetrix Human Exon 1.0 ST Array data by the RMA algorithm . The other eight tissue types showed similar differences between PSGs and non-PSGs (Figure S7). (D) Distribution of degree of tissue bias in expression levels for PSGs (red) and non-PSGs (blue), as measured by the statistic τ (Methods). An alternative measure of tissue bias (γ) showed a similar pattern.

Figure 6

Figure 6. Power of the LRT for selection on any branch of the phylogeny as a function of the nonsynonymous-synonymous rate ratio ω.

Power is defined as the fraction of tests resulting in nominal P<0.05. (The effect of controlling for multiple comparisons is shown in Figure S3.) When _ω_≤1, these fractions are estimates of the false positive rate. Each data point is based on 1000 data sets simulated with evolver under the assumption of a constant ω among lineages and among sites (model M0). All other parameters (including the transition-transversion ratio κ, the codon frequencies, and the branch lengths) were fixed at values estimated from the real data. Results are shown for short (200-codon) and long (500-codon) genes and three sets of species: hominids (human and chimpanzee), primates (human, chimpanzee, and macaque), and all six mammals. Details on the computation of _P_-values are given in Text S1. Note the logarithmic scale on the _x_-axis.

Similar articles

Cited by

References

    1. Yang Z. The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci USA. 2005;102:3179–3180. - PMC - PubMed
    1. Kreitman M. Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000;1:539–559. - PubMed
    1. Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005;39:197–218. - PubMed
    1. Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science. 2003;302:1960–1963. - PubMed
    1. Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005;3:e170. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources