Factors that Contribute to Variation in Evolutionary Rate among Arabidopsis Genes (original) (raw)

Journal Article

,

1Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine

Search for other works by this author on:

1Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine

Search for other works by this author on:

Cite

Liang Yang, Brandon S. Gaut, Factors that Contribute to Variation in Evolutionary Rate among Arabidopsis Genes, Molecular Biology and Evolution, Volume 28, Issue 8, August 2011, Pages 2359–2369, https://doi.org/10.1093/molbev/msr058
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Surprisingly, few studies have described evolutionary rate variation among plant nuclear genes, with little investigation of the causes of rate variation. Here, we describe evolutionary rates for 11,492 ortholog pairs between Arabidopsis thaliana and A. lyrata and investigate possible contributors to rate variation among these genes. Rates of evolution at synonymous sites vary along chromosomes, suggesting that mutation rates vary on genomic scales, perhaps as a function of recombination rate. Rates of evolution at nonsynonymous sites correlate most strongly with expression patterns, but they also vary as to whether a gene is duplicated and retained after a whole-genome duplication (WGD) event. WGD genes evolve more slowly, on average, than nonduplicated genes and non-WGD duplicates. We hypothesize that levels and patterns of expression are not only the major determinants that explain nonsynonymous rate variation among genes but also a critical determinant of gene retention after duplication.

Introduction

Evolutionary rates vary among genes, but the underlying causes of rate variation remain obscure. To identify potential causes, researchers have investigated correlations between evolutionary rates and genomic parameters. For example, rates of both nonsynonymous and synonymous evolution correlate with gene characteristics like length (Marais and Duret 2001) and intron number (Larracuente et al. 2008), suggesting that rates are partly a property of gene organization. Rates also vary over chromosomal scales, suggesting that forces like recombination (Pál et al. 2001a) and mutation also contribute to rate variance among genes.

Historically, however, it has been thought that rate variation primarily reflects variation in functional importance (Zuckerkandl 1976). Many studies have tested this idea by assessing the correlation between evolutionary rates and measures of protein function, like “protein dispensability” (or overall importance) (Hirsh and Fraser 2001; Yang et al. 2003) and “protein stability” (Zeldovich et al. 2007; Lobkovsky et al. 2010). Although evolutionary rates do often correlate with these variables, rates seem to correlate most strongly with measures related to gene expression, (Pál et al. 2006) like codon bias (Sharp and Li 1987; Urrutia and Hurst 2001), expression level (Pál et al. 2001b; Subramanian and Kumar 2004; Drummond et al. 2006) and expression breadth (Duret and Mouchiroud 2000; Zhang and Li 2004). The reason for the correlation with gene expression is not entirely clear, but highly expressed genes may be under strong selective constraint for translation robustness (Drummond et al. 2005; Drummond and Wilke 2008) and broadly expressed genes may be constrained by the need to function in several biochemical environments (Duret and Mouchiroud 2000).

Recently, researchers have discovered another factor that correlates with evolutionary rate: duplication status. This correlation was discovered on the basis of ortholog comparisons, classifying ortholog pairs into those that have paralogs (i.e., duplicates) and into those that do not (singletons). This approach has revealed that slower evolution of duplicated genes is a common feature of eukaryotes (Yang et al. 2003; Davis and Petrov 2004; Jordan et al. 2004). Explanations for this observation include the ideas that conserved, functionally important genes are more likely to be retained as duplicates (Davis and Petrov 2004) and that retained duplicates are highly structurally constrained (Yang et al. 2003). In contrast to duplicates, singleton genes are more poorly annotated and may have less critical functions (Jordan et al. 2004).

Gene duplication is particularly common in plants due to the prevalence of whole-genome duplication (WGD) by polyploidy. WGD has occurred throughout the evolutionary history of flowering plants (Soltis PS and Soltis DE 2009), including a putative event at the base of the angiosperms (Jaillon et al. 2007; Edger and Pires 2009). In addition, many species, like Arabidopsis and maize, have experienced multiple WGD events over their evolutionary history (Vision et al. 2000; Gaut 2001). However, WGD is not the only mode of gene duplication. Genes may also be duplicated by segmental events that encompass large chromosomal regions, by dispersed duplication of single genes (Akhunov et al. 2003), and by tandem duplication. These alternate sources of gene duplication can be consequential; for example, in Arabidopsis thaliana tandem duplicates comprise almost as many genes (up to 18%) as those duplicated by WGD events (∼25%) (Lockton and Gaut 2005).

WGD and non-WGD genes differ in function. For example, genes retained as duplicates after WGD events are overrepresented for transcription factors and signal transducers (Blanc and Wolfe 2004; Maere et al. 2005). In contrast, tandemly duplicated genes, which represent one type of non-WGD duplicate, are biased for membranous proteins and genes involved in stress response (Rizzon et al. 2006). Given these functional differences, it seems plausible that evolutionary rates vary not only between singletons and duplicates but also between WGD and non-WGD duplicates.

To date, there have been no genome-wide studies of evolutionary rates among plant nuclear genes. As a result, there has been no accurate description of rate variation among genes, little investigation as to whether rates vary along chromosomes, and few attempts to correlate evolutionary rate with duplication status or other important evolutionary characteristics. The dearth of rate studies stems from a lack of sequenced closely related genomes that permit accurate ortholog identification (Gaut and Ross-Ibarra 2008). The A. lyrata genome sequence (Hu et al. 2011) removes this obstacle with respect to identifying orthologs in A. thaliana. A. lyrata and A. thaliana diverged ∼13 million years ago (Ma) (Beilstein et al. 2010) and have ∼80% sequence identity over whole-genome alignments (Hu et al. 2011). Moreover, the two genomes have shared WGD events, including one or two events near the base of the angiosperms (Simillion et al. 2002; Bowers et al. 2003) and a third event that occurred ∼40 Ma, near the base of the Brassicaceae (Schranz and Mitchell-Olds 2006). As a result, the duplication status of individual genes should be well conserved between species.

In this study, we seek to characterize genome-wide patterns of rate variation among plant nuclear genes in the hope of inferring some of the evolutionary forces that shape this variation. We begin by estimating the number of synonymous substitutions per synonymous site (_K_s), the number of nonsynonymous substitutions per nonsynonymous site (Ka) and their ratio Ka/_K_s (= ω) between A. lyrata and A. thaliana orthologs. We then use these rate estimates to address the following three questions. First, what is the distribution of _K_s, Ka, and ω among genes and is this distribution clustered along the physical length of chromosomes? Second, what are the major correlates of evolutionary rates? Finally, do rates vary as a function of duplication status?

Materials and Methods

Orthologs, Duplicates, and Singletons

The orthologs and alignments for this study are the same as those used in Yang et al. (2011). Ks, Ka and ω were estimated in PAML (Yang 1997), using default parameters.

To identify duplicated and singleton genes within species, an all-against-all BlastP (Altschul et al. 1997) with default parameters was performed on all annotated protein sequences within A. thaliana and A. lyrata. Singleton genes were defined as those genes with no hit with an _e_-value ≤0.1 (Gu 2003). Duplicated genes were identified according to previous methods (Gu et al. 2002). Briefly, we first selected Blast alignments with e_-values ≤10−10. Then two proteins were denoted as forming a link if 1) the alignable region length (L) was over 80% of the longer protein and 2) the identity (I) between them was ≥30% if the alignable region is longer than 150 amino acids or I ≥ 0.06 + 4.8_L_−0.32[1 + exp(−_L/1,000)] (Rost 1999) if otherwise. Next, we submitted each protein as the query to search against the A. thaliana repetitive elements in Repbase (Jurka et al. 2005). Proteins were removed from further consideration if they formed a link due to their homology with the same repetitive element. Finally, a single-linkage algorithm was used to group proteins into families. If genes were not positively identified as either a “singleton” or as a “duplicate” according to the above definitions, then they were not assigned to groups based on duplication status.

For duplicated genes, we classified them as to whether or not they were derived from WGD events according to the assignments of Blanc et al. (2003). The Blanc et al. (2003) data set contained 1,372 and 2,584 gene pairs representing early and a more recent WGD event, respectively. Because some genes were found in both age classes, we restricted our data set to genes with only one age annotation. We also repeated all analyses using the WGD duplication definitions of Bowers et al. (2003). All the results were qualitatively identical with the Blanc et al. (2003) definitions, and so we report only the Blanc et al. (2003) results here.

Principal Component Regression Analysis

We utilized principal component regression (PCR) analysis to explore the potential contribution of evolutionary parameters to the total variance in evolutionary rate among genes (Jolliffe 2002; Drummond et al. 2006). The package “pls” from R language (Ihaka and Gentleman 1996) was used to perform PCR, with Ka and _K_s as response variables and 14 possible determinants of protein evolutionary rates as predictor variables (see below). We log transformed the predictor variables if log transformation led to a higher correlation coefficient, added a constant (0.001) before log transformation if the variables included zero values, and scaled the predictors by dividing each variable by its sample standard deviation before the PCR.

Potential Determinants of Protein Evolutionary Rates

We incorporated 14 gene characteristics into our PCR model based on availability and precedence in the literature. For each A. thaliana gene with an ortholog in A. lyrata, we calculated the following.

Gene Structure

We calculated statistics such as gene length, GC content, 5′ and 3′ UTR length, intron number, average intron length, and the frequency of optimal codons (_F_op) (Ikemura 1985), as estimated by CodonW (http://codonw.sourceforge.net/) with preferred codons in A. thaliana from Wright et al. (2004).

Local Recombination Rate

We obtained A. thaliana genetic markers and genetic map positions from Singer et al. (2006). Given these markers, local recombination rates were estimated by using MareyMap (Rezvoy et al. 2007) with LOESS interpolation. The LOESS procedure depends on two parameters: the polynomial degree and the span, which describes the number of points used to calculate the local polynomial around a marker. Here, we employed LOESS with second degree polynomial fitting and a span consisting of 25% of the total number of points.

Levels and Patterns of Gene Expression

The expression data were obtained from the Arabidopsis Development Atlas (ADA, available at ftp://ftp.arabidopsis.org/, ExpressionSet ME00319), which contains triplicate expression estimates for ∼80% of known Arabidopsis genes across 79 different tissues and developmental time points, using the Affymetrix ATH1 chip (Schmid et al. 2005). In order to minimize the effects of cross-hybridization, we matched each Affymetrix probe to the genome annotation and excluded any probe that matched multiple genes. The mean value of each triplicate was calculated for each probe under each condition. We used ADA data to estimate gene expression level and tissue specificity. The expression level of a gene was estimated by the average value of all the 79 samples. The tissue specificity was measured with the index τ (Yanai et al. 2005):

graphic

where n = 79 is the number of tissues and conditions, and S(i,max) is the highest expression of gene i across the n tissues and conditions. The index τ ranges from 0 to 1, with higher value indicating higher specificity (or, synonymously, higher variation in expression across libraries). If a gene is expressed in only one library, τ approaches 1. In contrast, if a gene is expressed equally in all libraries, τ = 0. The advantage of using τ rather than expression breadth as a measure of specificity has been documented previously (Liao and Zhang 2006).

We also examined Massively Parallel Signature Sequencing (MPSS) expression data for A. thaliana (http://mpss.udel.edu/at/) (Meyers et al. 2004), using similar methods. The MPSS data yielded qualitatively similar results, and we thus focused on the ADA data throughout the manuscript.

Function

We assessed the multifunctionality of a gene by counting the number of biological processes in which a gene is involved (Salathe et al. 2006), according to Gene Ontology (GO) Slim annotations that classify proteins to gain a high-level view of the functions (Prachumwat and Li 2006). GO annotations were obtained from The Arabidopsis Information Resource (http://www.arabidopsis.org).

Promoter Divergence

Yang et al. (2011) measured divergence between the upstream sequences of each orthologous gene pair by the shared motif method (Castillo-Davis et al. 2004). For each orthologous gene pair, we obtained the divergence score _d_SM, defined as the fraction of both sequences that does not contain a region of significant local similarity (Castillo-Davis et al. 2004). A _d_SM value of 0 indicates complete sharing of motifs between sequences, whereas a _d_SM value of 1 indicates an absence of shared motifs. Yang et al. (2011) analyzed sequences encompassing 500 bp upstream from the translation start site, but results were qualitatively similar with longer upstream sequences (data not shown).

Duplication Mode

For the purposes of the PCR, genes were given discrete values to reflect duplication class: “1” for early WGD duplicates, “2” for recent WGD duplicates, “3” for non-WGD duplicates, and “4” for singletons.

Chromosomal Position

To include information about chromosomal location, we scaled the distance of each gene from the centromere. On each chromosomal arm, values ranged from 0 to 1, with higher values indicating greater physical distance from the centromere.

Results

The Distribution of Rate Variation among Orthologs

We began with 19,119 orthologous pairs (Yang et al. 2011) and then culled the data. First, we retained only the orthologous pairs that were defined as duplicated in both species or deemed as singletons in both species. Second, we retained only duplicated genes that had a single unambiguous assignment with regard to early or recent WGD events. Finally, we discarded 191 genes at the extreme tail of the _K_s distribution (_K_s > 0.3), which could denote either misalignment or potential sequence saturation. Our final data set consisted of 11,492 orthologous pairs, including 9,995 duplicated genes and 1,497 singletons.

Ka, _K_s, and ω were calculated for these remaining 11,492 orthologs; their frequency distributions are provided in figure 1. The _K_s distribution had a mean of 0.147 (table 1) and ranged from complete sequence identity (_K_s = 0.0) for two genes (At2g07669 and At2g07772) to _K_s = 0.3. The coefficient of variation (CV) of _K_s was 0.30. The Ka estimates had a lower mean, at 0.028, and ranged from Ka = 0.0 to a high value of 0.29 (table 1). On average, however, _K_s and Ka differed ∼5-fold as reflected in average ω estimates of 0.203 (table 1). The ω distribution also lacked a prominent tail of genes with values >1.0 that could be indicative of positive selection; only 0.7% (90 of 11,492) of orthologous pairs yielded ω estimates >1.0 (see supplementary table S1, Supplementary Material online for details). Overall, Ka and _K_s values were highly positively correlated across genes (Spearman’s rank correlation ρ = 0.21, P < 10−16), suggesting the possibility that common evolutionary mechanisms affect both synonymous and nonsynonymous sites.

Table 1.

Evolutionary Rates for Duplicates and Singletons.

Early WGDs Recent WGDs Non-WGDs Singletons Total
Ka Mean (SD) 0.021 (0.017) 0.024 (0.018) 0.031 (0.029) 0.032 (0.027) 0.028 (0.026)
CV 0.81 0.75 0.94 0.84 0.93
Range 0–0.16 0–0.15 0–0.28 0–0.29 0–0.29
Range (90%) 0.0035–0.050 0.0039–0.060 0.0048–0.090 0.0076–0.063 0.0044–0.075
_K_s Mean (SD) 0.147 (0.041) 0.145 (0.042) 0.150 (0.045) 0.144 (0.049) 0.147 (0.044)
CV 0.28 0.29 0.30 0.34 0.30
Range 0.039–0.29 0.021–0.30 0.015–0.30 0–0.29 0–0.30
Range (90%) 0.084–0.22 0.083–0.22 0.084–0.23 0.087–0.21 0.082–0.23
Ka/_K_s Mean (SD) 0.147 (0.114) 0.178 (0.136) 0.216 (0.208) 0.244 (0.260) 0.203 (0.194)
CV 0.78 0.76 0.96 1.07 0.96
Range 0.001–1.21 0.001–1.52 0.001–4.21 0.001–4.35 0.001–4.35
Range (90%) 0.024–0.35 0.030–0.43 0.035–0.59 0.057–0.46 0.032–0.54
_d_SM Mean (SD) 0.183 (0.194) 0.192 (0.206) 0.244 (0.240) 0.299 (0.267) 0.229 (0.233)
CV 1.06 1.07 0.98 0.89 1.02
Range 0–1.00 0–1.00 0–1.00 0–1.00 0–1.00
Range (90%) 0.008–0.616 0.008–0.628 0.010–0.75 0.012–0.819 0.009–0.715
Early WGDs Recent WGDs Non-WGDs Singletons Total
Ka Mean (SD) 0.021 (0.017) 0.024 (0.018) 0.031 (0.029) 0.032 (0.027) 0.028 (0.026)
CV 0.81 0.75 0.94 0.84 0.93
Range 0–0.16 0–0.15 0–0.28 0–0.29 0–0.29
Range (90%) 0.0035–0.050 0.0039–0.060 0.0048–0.090 0.0076–0.063 0.0044–0.075
_K_s Mean (SD) 0.147 (0.041) 0.145 (0.042) 0.150 (0.045) 0.144 (0.049) 0.147 (0.044)
CV 0.28 0.29 0.30 0.34 0.30
Range 0.039–0.29 0.021–0.30 0.015–0.30 0–0.29 0–0.30
Range (90%) 0.084–0.22 0.083–0.22 0.084–0.23 0.087–0.21 0.082–0.23
Ka/_K_s Mean (SD) 0.147 (0.114) 0.178 (0.136) 0.216 (0.208) 0.244 (0.260) 0.203 (0.194)
CV 0.78 0.76 0.96 1.07 0.96
Range 0.001–1.21 0.001–1.52 0.001–4.21 0.001–4.35 0.001–4.35
Range (90%) 0.024–0.35 0.030–0.43 0.035–0.59 0.057–0.46 0.032–0.54
_d_SM Mean (SD) 0.183 (0.194) 0.192 (0.206) 0.244 (0.240) 0.299 (0.267) 0.229 (0.233)
CV 1.06 1.07 0.98 0.89 1.02
Range 0–1.00 0–1.00 0–1.00 0–1.00 0–1.00
Range (90%) 0.008–0.616 0.008–0.628 0.010–0.75 0.012–0.819 0.009–0.715

Note.—SD, standard deviation; CV, coefficient of variation.

Table 1.

Evolutionary Rates for Duplicates and Singletons.

Early WGDs Recent WGDs Non-WGDs Singletons Total
Ka Mean (SD) 0.021 (0.017) 0.024 (0.018) 0.031 (0.029) 0.032 (0.027) 0.028 (0.026)
CV 0.81 0.75 0.94 0.84 0.93
Range 0–0.16 0–0.15 0–0.28 0–0.29 0–0.29
Range (90%) 0.0035–0.050 0.0039–0.060 0.0048–0.090 0.0076–0.063 0.0044–0.075
_K_s Mean (SD) 0.147 (0.041) 0.145 (0.042) 0.150 (0.045) 0.144 (0.049) 0.147 (0.044)
CV 0.28 0.29 0.30 0.34 0.30
Range 0.039–0.29 0.021–0.30 0.015–0.30 0–0.29 0–0.30
Range (90%) 0.084–0.22 0.083–0.22 0.084–0.23 0.087–0.21 0.082–0.23
Ka/_K_s Mean (SD) 0.147 (0.114) 0.178 (0.136) 0.216 (0.208) 0.244 (0.260) 0.203 (0.194)
CV 0.78 0.76 0.96 1.07 0.96
Range 0.001–1.21 0.001–1.52 0.001–4.21 0.001–4.35 0.001–4.35
Range (90%) 0.024–0.35 0.030–0.43 0.035–0.59 0.057–0.46 0.032–0.54
_d_SM Mean (SD) 0.183 (0.194) 0.192 (0.206) 0.244 (0.240) 0.299 (0.267) 0.229 (0.233)
CV 1.06 1.07 0.98 0.89 1.02
Range 0–1.00 0–1.00 0–1.00 0–1.00 0–1.00
Range (90%) 0.008–0.616 0.008–0.628 0.010–0.75 0.012–0.819 0.009–0.715
Early WGDs Recent WGDs Non-WGDs Singletons Total
Ka Mean (SD) 0.021 (0.017) 0.024 (0.018) 0.031 (0.029) 0.032 (0.027) 0.028 (0.026)
CV 0.81 0.75 0.94 0.84 0.93
Range 0–0.16 0–0.15 0–0.28 0–0.29 0–0.29
Range (90%) 0.0035–0.050 0.0039–0.060 0.0048–0.090 0.0076–0.063 0.0044–0.075
_K_s Mean (SD) 0.147 (0.041) 0.145 (0.042) 0.150 (0.045) 0.144 (0.049) 0.147 (0.044)
CV 0.28 0.29 0.30 0.34 0.30
Range 0.039–0.29 0.021–0.30 0.015–0.30 0–0.29 0–0.30
Range (90%) 0.084–0.22 0.083–0.22 0.084–0.23 0.087–0.21 0.082–0.23
Ka/_K_s Mean (SD) 0.147 (0.114) 0.178 (0.136) 0.216 (0.208) 0.244 (0.260) 0.203 (0.194)
CV 0.78 0.76 0.96 1.07 0.96
Range 0.001–1.21 0.001–1.52 0.001–4.21 0.001–4.35 0.001–4.35
Range (90%) 0.024–0.35 0.030–0.43 0.035–0.59 0.057–0.46 0.032–0.54
_d_SM Mean (SD) 0.183 (0.194) 0.192 (0.206) 0.244 (0.240) 0.299 (0.267) 0.229 (0.233)
CV 1.06 1.07 0.98 0.89 1.02
Range 0–1.00 0–1.00 0–1.00 0–1.00 0–1.00
Range (90%) 0.008–0.616 0.008–0.628 0.010–0.75 0.012–0.819 0.009–0.715

Note.—SD, standard deviation; CV, coefficient of variation.

The frequency distributions of Ka, Ks, and Ka/Ks (ω).

FIG. 1.

The frequency distributions of Ka, _K_s, and Ka/_K_s (ω).

To investigate the distribution of _K_s,Ka, and ω along A. thaliana chromosomes, we plotted mean values for nonoverlapping windows of 0.5 Mb, corresponding to an average of 48.9 genes in each window (fig. 2 for chromosome 1; supplementary fig. S1Supplementary Data, Supplementary Material online, for chromosome 2–5, respectively). In general, there were few marked peaks for _K_s (fig. 2). To test whether divergence values within windows were higher than expected, we randomly permuted _K_s values among genes, holding gene location (and window definitions) constant. Over 10,000 permutations, we determined whether an observed _K_s value for a window was extreme. Figure 2 provides an example whereby _K_s values in a window are elevated for some regions near the centromeres and in the region spanning 24–27 Mb on chromosome 1. Generally, when _K_s values were extreme, they tended to be elevated in arm regions proximal to centromeres and reduced near telomeres (fig. 2; supplementary figs. S1Supplementary Data, Supplementary Material online). We also performed permutation analyses for Ka and ω for which there were generally fewer regions of significantly high and low rates compared with _K_s (fig. 2; supplementary figs. S1Supplementary Data, Supplementary Material online).

The distributions of mean values of Ks, Ka, and Ka/Ks (ω) for 0.5 Mb nonoverlapping windows along chromosome 1 in Arabidopsis thaliana. To test whether divergence values within windows were higher or lower than expected, we randomly permuted the K values among genes, holding gene location (and window definitions) constant. Over 10,000 permutations, we determined whether the observed value for a window was extreme. The top bar in each plot shows the P values that the observed value is higher than expected; the bottom bar in each plot shows the P values that the observed value is lower than expected. The dotted lines indicate the mean values of evolutionary rates for all genes on chromosome 1.

FIG. 2.

The distributions of mean values of _K_s, Ka, and Ka/_K_s (ω) for 0.5 Mb nonoverlapping windows along chromosome 1 in Arabidopsis thaliana. To test whether divergence values within windows were higher or lower than expected, we randomly permuted the K values among genes, holding gene location (and window definitions) constant. Over 10,000 permutations, we determined whether the observed value for a window was extreme. The top bar in each plot shows the P values that the observed value is higher than expected; the bottom bar in each plot shows the P values that the observed value is lower than expected. The dotted lines indicate the mean values of evolutionary rates for all genes on chromosome 1.

Determinants of Evolutionary Rates

To perform a general analysis of the factors that contribute to evolutionary rates, we selected 14 variables that might correlate with (or contribute to) evolutionary rates (table 2). The 14 variables were available for 5439 orthologs. Most of these variables were correlated with evolutionary rates in pairwise fashion (table 2). For example, all 14 variables except recombination rate were significantly correlated with Ka and ω (Spearman’s rank correlation, P < 10−9). Most variables were correlated with _K_s as well, but the pattern differed slightly from Ka (table 2); for example, duplication mode was correlated with Ka but not _K_s.

Table 2.

Pairwise Correlations of Evolutionary Rates with Potentially Contributing Factors.

Variable Ka _K_s Ka/_K_s
Duplication mode 0.13*** 0.01 0.13***
Chromosomal position −0.06*** −0.17*** 0.001
Recombination rate 0.04* 0.12*** −0.002
Expression level −0.42*** −0.09*** −0.39***
Tissue specificity (τ) 0.28*** 0.12*** 0.24***
_d_SM 0.18*** 0.11*** 0.14***
_F_op −0.12*** 0.04* −0.14***
Multifunctionality −0.19*** −0.03# −0.18***
Gene length −0.25*** −0.20*** −0.18***
5′ UTR length −0.20*** −0.13*** −0.15***
3′ UTR length −0.20*** −0.10*** −0.16***
Intron number −0.19*** −0.26*** −0.10***
Average intron length −0.11*** −0.03* −0.09***
G + C content −0.20*** −0.01 −0.20***
Variable Ka _K_s Ka/_K_s
Duplication mode 0.13*** 0.01 0.13***
Chromosomal position −0.06*** −0.17*** 0.001
Recombination rate 0.04* 0.12*** −0.002
Expression level −0.42*** −0.09*** −0.39***
Tissue specificity (τ) 0.28*** 0.12*** 0.24***
_d_SM 0.18*** 0.11*** 0.14***
_F_op −0.12*** 0.04* −0.14***
Multifunctionality −0.19*** −0.03# −0.18***
Gene length −0.25*** −0.20*** −0.18***
5′ UTR length −0.20*** −0.13*** −0.15***
3′ UTR length −0.20*** −0.10*** −0.16***
Intron number −0.19*** −0.26*** −0.10***
Average intron length −0.11*** −0.03* −0.09***
G + C content −0.20*** −0.01 −0.20***

NOTE.—The coefficients were calculated based on Spearman rank correlation.

#P < 0.05, *P < 10−3, **P < 10−6, *** P < 10−9.

Table 2.

Pairwise Correlations of Evolutionary Rates with Potentially Contributing Factors.

Variable Ka _K_s Ka/_K_s
Duplication mode 0.13*** 0.01 0.13***
Chromosomal position −0.06*** −0.17*** 0.001
Recombination rate 0.04* 0.12*** −0.002
Expression level −0.42*** −0.09*** −0.39***
Tissue specificity (τ) 0.28*** 0.12*** 0.24***
_d_SM 0.18*** 0.11*** 0.14***
_F_op −0.12*** 0.04* −0.14***
Multifunctionality −0.19*** −0.03# −0.18***
Gene length −0.25*** −0.20*** −0.18***
5′ UTR length −0.20*** −0.13*** −0.15***
3′ UTR length −0.20*** −0.10*** −0.16***
Intron number −0.19*** −0.26*** −0.10***
Average intron length −0.11*** −0.03* −0.09***
G + C content −0.20*** −0.01 −0.20***
Variable Ka _K_s Ka/_K_s
Duplication mode 0.13*** 0.01 0.13***
Chromosomal position −0.06*** −0.17*** 0.001
Recombination rate 0.04* 0.12*** −0.002
Expression level −0.42*** −0.09*** −0.39***
Tissue specificity (τ) 0.28*** 0.12*** 0.24***
_d_SM 0.18*** 0.11*** 0.14***
_F_op −0.12*** 0.04* −0.14***
Multifunctionality −0.19*** −0.03# −0.18***
Gene length −0.25*** −0.20*** −0.18***
5′ UTR length −0.20*** −0.13*** −0.15***
3′ UTR length −0.20*** −0.10*** −0.16***
Intron number −0.19*** −0.26*** −0.10***
Average intron length −0.11*** −0.03* −0.09***
G + C content −0.20*** −0.01 −0.20***

NOTE.—The coefficients were calculated based on Spearman rank correlation.

#P < 0.05, *P < 10−3, **P < 10−6, *** P < 10−9.

Of course, many of these factors are intercorrelated, making it difficult to identify the contribution of individual factors to evolutionary rates. Although not without limitations (see Discussion), PCR is one approach to begin to tease apart the separate contribution of each predictor to the total variation in evolutionary rate among genes (Drummond et al. 2006). In this method, the 14 total predictor variables are scaled and then transformed orthogonally. In theory, the greatest variance explained by any projection of the data lies on the first principal component, and the factors with the greatest contribution to rate variation can be inferred.

We applied the PCR method separately to _K_s and Ka variation. With regard to _K_s, the key results are provided in figure 3, with variance estimates in supplementary table S2, Supplementary Material online. The first principal component represented 4.4% of rate variation, and in total 11.1% of the variation among genes was explained across significant principal components (supplementary table S2, Supplementary Material online). The first two ranked factors in the first principal component were intron number (2.0% of variation) and gene length (1.9%), both of which clearly superseded the remaining 12 factors in the percent of variance explained (fig. 3). It also should be noted that 1.7% of rate variation was explained by the combination of “chromosomal position” and “recombination rate” across principal components, suggesting that gene location is an important factor for describing _K_s. In contrast, duplication status contributed very little (0.44%) to variation in _K_s among genes.

Principal components regression on Ks and Ka. See supplementary tables S2 and Supplementary Data, Supplementary Material online, for numerical data.

For Ka, the first component explained 15.7% of rate variation, and all principle components captured a total of 21.4% of variation (fig. 3; supplementary table S3, Supplementary Material online). Among the 14 predictors, those that contributed most heavily to Ka variation were expression level and tissue specificity (τ), indicating gene expression best explains Ka variation among genes (see Discussion). Additional parameters of interest include G + C content, codon usage (_F_op) and 3′ UTR lengths.

Evolutionary Rates between Duplicates and Singletons

Duplication status is not correlated with _K_s variation among genes (table 2) and is at best a minor contributor to Ka variation based on PCR analysis (fig. 3). Yet, the dynamics of duplication and evolutionary rate are potentially interesting in their own right (Yang et al. 2003; Davis and Petrov 2004; Jordan et al. 2004). Accordingly, we contrasted the evolutionary rates of four groups that were categorized on the basis of their mode of duplication (see Materials and Methods): singletons (1,497 genes), early WGD (960 genes), recent-WGD (3,351 genes) and non-WGD duplicates (5,684 genes). This last category (non-WGD duplicates) may be best described as a “catch-all” category of duplicates of uncertain origin, including duplicates that are due to WGD events but not detected as such, tandem events, and duplications by transposition (Freeling et al. 2008).

Among these four categories, there was no statistical difference in the _K_s distributions (Mann–Whitney U test, P > 0.01) (fig. 4A and supplementary fig. S5, Supplementary Material online). This result is consistent with pairwise correlations (table 2) and is expected if synonymous substitutions are approximately neutral and if genes have a similar divergence time. However, there were clear differences in the distributions of Ka and ω among categories. As a group, duplicated genes had significantly lower Ka and ω values than those of singleton genes (Mann–Whitney U test, P < 10−10 for both; fig. 4B–C and supplementary figs. S6Supplementary Data, Supplementary Material online). Within classes of duplicated genes, early WGD duplicates had lower Ka and ω values than recent-WGD duplicates and non-WGD duplicates (Mann–Whitney U test, P < 10−9 for both). There is thus evidence for Ka differences among all four duplication categories.

The comparisons of Ks (A), Ka (B), Ka/Ks (C), and dSM (D) among different types of duplicated and singleton genes. The bottom and top of each box are the first (lower) and third (higher) quartiles, and the band in the box is the median value. The ends of the whiskers represent 1.5 interquartile range of lower and higher quartiles, respectively.

FIG. 4.

The comparisons of _K_s (A), Ka (B), _Ka/K_s (C), and _d_SM (D) among different types of duplicated and singleton genes. The bottom and top of each box are the first (lower) and third (higher) quartiles, and the band in the box is the median value. The ends of the whiskers represent 1.5 interquartile range of lower and higher quartiles, respectively.

To attempt to verify these inferences, we analyzed upstream sequences. We found that _d_SM for duplicated genes was significantly lower than those of singletons (Mann–Whitney U test, P < 10−10; fig. 4D), mirroring the Ka results (fig. 4B). The _d_SM results also corroborated the difference in Ka between WGD and non-WGD duplicates but, importantly, yielded no significant difference between early- and recent WGD classes (Mann–Whitney U test, P = 0.56; fig. 4D). Overall, these _d_SM results: i) suggest that protein divergence is correlated with divergence in upstream regions, and indeed Ka and _d_SM were correlated over the entire data set (supplementary table S1, Supplementary Material online); ii) imply that the distinction between duplicates and singletons is neither limited to amino acid replacements nor an artifact of coding region alignments but iii) do not provide independent confirmation that early- and recent WGD duplicates evolve at different rates.

Contrasts between Duplication Status and Gene Expression

Our preceding results reveal a potential inconsistency. On the one hand, PCR analyses attribute only a small proportion of rate variance to duplication status (fig. 3). On the other hand, direct contrasts reveal compelling Ka differences among some duplication categories (fig. 4B). Although there could be several reasons for this inconsistency, including shortcomings of the PCR method (see Discussion), one potential interpretation is that the orthogonal transformation in PCR removes an underlying correlation between gene expression and duplication. To try to tease apart the potential relationship between expression and duplication, we contrasted levels and patterns of A. thaliana gene expression among duplication categories for the 10,021 genes with expression data.

For average expression level, non-WGD genes are the outlier relative to the other three duplication categories and expressed at lower levels than the other gene classes (Mann–Whitney U test, P < 10−9 for all three comparisons; fig. 5A). The other three categories did not differ statistically for expression level, but singletons were expressed at slightly higher levels (fig. 5A). Analysis of MPSS data yielded similar results but with singletons having significantly elevated expression over duplicates (data not shown).

The comparisons of total expression level (A) and specificity (τ) (B) among different types of duplicated and singleton genes. The bottom and top of each box are the first (lower) and third (higher) quartiles, and the band in the box is the median value. The ends of the whiskers represent 1.5 interquartile range.

FIG. 5.

The comparisons of total expression level (A) and specificity (τ) (B) among different types of duplicated and singleton genes. The bottom and top of each box are the first (lower) and third (higher) quartiles, and the band in the box is the median value. The ends of the whiskers represent 1.5 interquartile range.

Tissue specificity, τ, differed more widely among groups, with singletons expressed more broadly than duplicates (Mann–Whitney U test, P < 10−15 for all three comparisons; fig. 5B) and early WGD genes expressed more specifically than either the recent WGD (Mann–Whitney U test, P < 10−5) or the non-WGD (Mann–Whitney U test, P < 10−6) genes. Analysis of MPSS data confirmed that singletons are expressed more broadly than duplicates (data not shown).

Discussion

Surprisingly, few studies have described evolutionary rate variation among plant nuclear genes, and the exceptions have been based on relatively small sample sizes. For example, Zhang et al. (2002) examined rate variation among a group of 242 genes, using contemporaneously duplicated A. thaliana paralogs to measure divergence; Tiffin and Hahn (2002) studied 218 putative orthologs between A. thaliana and Brassica rapa; and Wright et al. (2004) analyzed 83 orthologs between A. thaliana and A. lyrata, which represented the largest ortholog data set between these two species that were available at the time. Although there have been additional multigene studies of evolutionary rates among plant taxa (e.g., Wang et al. 2008), to our knowledge, none have approached the genome-wide scale of the analyses reported here, with >11,000 orthologous pairs.

Despite the limited number of genes in previous studies, they revealed similar patterns of rate variation among plant nuclear genes. For example, Zhang et al. (2002) documented ∼14-fold range of synonymous rate variation among genes, with 90% of genes represented in a more narrow window of 2.6-fold rate variation. Our results also indicate that 90% of genes fall within a window of 2.6-fold rate variation (table 1). The consistency between studies is remarkable, especially given that Zhang et al. (2002) studied paralogs potentially subjected to gene conversion that also diverged on a time frame roughly an order of magnitude higher than that of the orthologs studied here.

Our results are also consistent with these previous papers both in estimating an average ω of ∼0.2, signaling strong constraint on amino acid replacements, and in identifying very few genes with ω values >1.0. It is worth noting that the small number (90) of genes with ω > 1.0 have no obvious functional biases as measured by GO analyses (data not shown). Although it may be possible that ortholog contrasts lack statistical power to detect ω > 1.0, the overarching impression is that the type of positive selection detected by ω analyses has not been a common feature of species divergence.

Ka and _K_s Are Correlated across Genes

Like many previous studies (Alvarez-Valin et al. 1998; Makalowski and Boguski 1998; Smith and Hurst 1999; Zhang et al. 2002; Castillo-Davis et al. 2004), we document a positive correlation between Ka and _K_s across genes. The reason for this correlation is not well established, but it could be caused by at least three nonexclusive phenomena. First, selection for translational speed, efficiency, and accuracy may affect both Ka and _K_s (Wright et al. 2004; Drummond and Wilke 2008) and thus drive the correlation. Second, variation in mutation rates along chromosomes may lead to genomic regions that covary in Ka and _K_s. Third, even if mutation rates are uniform across chromosomes, deleterious mutations may be culled less effectively from regions of low recombination, again potentially leading to genomic regions that covary in Ka and _K_s.

If the latter two phenomena contribute, we expect clustering of evolutionary rates along chromosomes. Indeed, we have identified several 0.5 Mb windows of enhanced or decreased substitution rates (fig. 2; supplementary figs. S1Supplementary Data, Supplementary Material online). Perhaps, the most interesting of these are on chromosome 1, which has a marked Ka peak at ∼21–24 Mb and a _K_s peak from ∼24 to 27 Mb (fig. 2). This entire region coincides to a peak of high synonymous nucleotide diversity among A. thaliana accessions (Clark et al. 2007). Clark et al. (2007) noticed that this region on chromosome 1 contained several disease resistance genes and hypothesized that this was a region particularly prone to balancing selection. However, the correspondence of high divergence and high polymorphism in the same genomic window strongly implies that there is variation in mutation rates along the chromosome, as postulated for mammalian genomes (Lercher et al. 2001). We thus hypothesize that at least one of the causes of Ka and _K_s rate correlation is shared mutation rates and also that variation in mutation rates contributions to rate variation among genes. Interestingly, the region of high rates on chromosome 1 may be syntenous to a centromeric region in A. lyrata (Hu et al. 2011), suggesting that chromosomal rearrangements may affect mutation rates.

Gene Expression Is the Major Predictor of Rate Variation among Genes

It is unlikely that mutation rates alone predict rate variation among genes. We thus examined 14 variables to assess their contributions to evolutionary rates. Typically, the correlations between evolutionary rates and predictor variables have been measured by partial correlation or multiple regression. Drummond et al. (2006) demonstrated that these approaches can generate spurious but highly significant results when the predictor variables are measured with error (noise), and they introduced the PCR approach as an alternative. In turn, Plotkin and Fraser (2007) showed that PCR is itself neither robust to measurement noise nor to variation in the predictor variables. Unfortunately, then, there is as yet no ideal method to partition the factors that contribute to evolutionary rates. Some of our predictor variables are virtually free of noise, such as gene length and _F_op, but other variables—such as gene expression data—do contain noise. Here, we have employed PCR as an exploratory tool, recognizing that the approach can suffer from the same weaknesses as other approaches but has the advantage of easy interpretation.

One outcome of our PCR analysis is that the first principal component explains a small proportion of variation in _K_s (4.4%) and Ka (15.7%) among genes. In contrast, expression level alone accounts for >25% of rate variation among genes in Drosophila (Lemos et al. 2005), bacteria and Chlamydomonas (Rocha and Danchin 2004; Rocha 2006), and ∼50% of Ka variation among yeast genes (Drummond et al. 2006).

There may be at least two reasons why our PCR captures comparatively little variation. The first is that our model lacks predictor variables that assess functional importance directly largely because such information is unavailable. Although there is some information about gene essentiality for a subset of A. thaliana genes (Hanada et al. 2009), to our knowledge, there is no genome-wide information on (for example) protein dispensability or abundance. To circumvent the lack of functional data, we included “multifunctionality” as a predictor based on GO annotations. Associations between GO functions and evolutionary rates have been detected for A. thaliana early WGD genes (Warren et al. 2010), but multifunctionality explains little of the variation across all genes (fig. 3; supplementary tables S2 and Supplementary Data, Supplementary Material online). Hopefully, future work will be able to incorporate critical functional parameters as they become available. The second reason is that PCR cannot capture all sources of variation. For example, in A. lyrata and A. thaliana, which diverged recently on an evolutionary timescale (∼13 Ma; Beilstein et al. 2010), polymorphisms that segregated in the ancestral species may contribute substantial stochastic variation in divergence among genes. Unfortunately, it is unclear how to incorporate such information into the PCR.

Even though PCR explains only a low percentage of variation, it provides insights into the forces that affect _K_s and Ka. For _K_s, some contributors are factors, like recombination rate, which vary on chromosomal scales, but intron number and gene length are also major contributors. For Ka, the results are consistent with previous results from plants, yeast, and mammals in revealing a substantial relationship with both the level and the specificity of expression (Duret and Mouchiroud 2000; Pál et al. 2001b; Zhang et al. 2002; Wright et al. 2004; Drummond et al. 2005, 2006; Wall et al. 2005; Drummond and Wilke 2008). The favored explanation for a positive correlation between Ka and expression specificity is that widely expressed genes are constrained either due to multiple selective environments in different cells due to multiple biochemical contexts (Duret and Mouchiroud 2000) or due to high functional densities. Similarly, the positive correlation between Ka and expression level may reflect high purifying selection on abundant proteins, selection on the speed and accuracy of translation, or selection for robustness against mistranslations (Drummond et al. 2005; Drummond and Wilke 2008). We cannot differentiate among those explanations here, but our study establishes that gene expression is a major predictor of Ka variation on a genomic scale in a plant system.

What Drives the Evolutionary Rate of Duplicated Genes?

Previous studies have shown that singleton genes evolve more rapidly than duplicated genes (Nembaware et al. 2002; Yang et al. 2003; Davis and Petrov 2004; Jordan et al. 2004). By considering different classes of duplicated genes, we have uncovered a somewhat more complex pattern of nonsynonymous evolution. On average, singletons evolve more rapidly than duplicates; non-WGD genes evolve more rapidly than WGD genes; and recent WGD genes evolve more rapidly than early WGD genes (fig. 4B). The hierarchy of Ka divergence is corroborated by _d_SM (fig. 4D). The lone exception to this corroboration, for which we do not have an explanation, is that the recent WGD and early WGD promoters evolve at the same rate but the early WGD genes have statistically lower Ka values. Nonetheless, similar patterns of Ka and _d_SM indicate that protein divergence is correlated with divergence in upstream regions (Castillo-Davis et al. 2004; Chin et al. 2005), suggesting that some aspects of selective constraint are shared between protein and upstream regions.

The four groups also vary with respect to levels and patterns of gene expression (fig. 5). Singletons are expressed more broadly than duplicated genes and tend to be expressed at higher levels; non-WGD genes are lowly expressed, on average; and the slowly evolving early WGD genes are expressed with high specificity. Interestingly, these patterns of gene expression do not follow the expected correlation with evolutionary rate. Previous studies have found that genes with low Ka are expressed in more tissues at “higher” levels (Duret and Mouchiroud 2000; Pál et al. 2001b; Subramanian and Kumar 2004; Zhang and Li 2004), and our groups do not meet this expectation. For example, the slowest evolving group (early WGD duplicates) has neither the highest expression level nor the broadest expression specificity. However, the expected patterns do hold within groups. For example, we compared Ka and expression level within the early WGD duplicates and found the expected negative correlation (supplementary table S4, Supplementary Material online). Overall, these observations suggest that the groups are distinct not only for evolutionary rates but also for some aspect of their expression dynamics.

What, then, drives theses differences among groups? The short answer is that we do not know, but one possibility is that dosage balance plays a critical role. Dosage balance is thought to be a factor in the retention of WGD duplicates (Edger and Pires 2009) because the loss of an individual duplicate destroys stoichiometric relationships in macromolecular complexes and signaling pathways. Gout et al. (2010) have extended this idea by postulating that patterns of gene expression drives the retention and relatively slow evolution of duplicated genes due to an associated cost of perturbing gene expression. In other words, there may be strong selection to retain WGD duplicates, largely through constraint of gene expression. Compared with WGD duplicates, both singletons and non-WGD duplicates may need less adherence to dosage balance and are thus relatively free to diverge both functionally (Rizzon et al. 2006) and in gene expression (Ganko et al. 2007). As a result, one would predict different evolutionary rates among groups, partially as a function of gene expression, as demonstrated here.

This work has benefited from the comments of O. Tenaillon, M. Tenaillon, S. Takuno, H. Sakai, B. von Holdt, and two anonymous reviewers. This work was supported by National Science Foundation grant DEB-0723860 to B.S.G.

References

et al.

(33 co-authors)

The organization and rate of evolution of wheat genomes are correlated with recombination rates along chromosome arms

,

Genome Res

,

2003

, vol.

13

(pg.

753

-

763

)

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

,

Nucleic Acids Res

,

1997

, vol.

25

(pg.

3389

-

3402

)

Synonymous and nonsynonymous substitutions in mammalian genes: intragenic correlations

,

J Mol Evol

,

1998

, vol.

46

(pg.

37

-

44

)

Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana

,

Proc Natl Acad Sci U S A

,

2010

, vol.

107

(pg.

18724

-

18728

)

A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome

,

Genome Res

,

2003

, vol.

13

(pg.

137

-

144

)

Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution

,

Plant Cell

,

2004

, vol.

16

(pg.

1679

-

1691

)

Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events

,

Nature

,

2003

, vol.

422

(pg.

433

-

438

)

_cis_-Regulatory and protein evolution in orthologous and duplicate genes

,

Genome Res

,

2004

, vol.

14

(pg.

1530

-

1536

)

Genome-wide regulatory complexity in yeast promoters: separation of functionally conserved and neutral sequence

,

Genome Res

,

2005

, vol.

15

(pg.

205

-

213

)

et al.

(18 co-authors)

Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana

,

Science

,

2007

, vol.

317

(pg.

338

-

342

)

Preferential duplication of conserved proteins in eukaryotic genomes

,

PLoS Biol

,

2004

, vol.

2

pg.

E55

Why highly expressed proteins evolve slowly

,

Proc Natl Acad Sci U S A

,

2005

, vol.

102

(pg.

14338

-

14343

)

A single determinant dominates the rate of yeast protein evolution

,

Mol Biol Evol

,

2006

, vol.

23

(pg.

327

-

337

)

Mistranslation-induced protein misfolding as a dominant contraint on coding-sequence evolution

,

Cell

,

2008

, vol.

134

(pg.

341

-

352

)

Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate

,

Mol Biol Evol

,

2000

, vol.

17

(pg.

68

-

74

)

Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes

,

Chromosome Res

,

2009

, vol.

17

(pg.

699

-

717

)

Many or most genes in Arabidopsis transposed after the origin of the order Brassicales

,

Genome Res

,

2008

, vol.

18

(pg.

1924

-

1937

)

Divergence in expression between duplicated genes in Arabidopsis

,

Mol Biol Evol

,

2007

, vol.

24

(pg.

2298

-

2309

)

Patterns of chromosomal duplication in maize and their implications for comparative maps of the grasses

,

Genome Res

,

2001

, vol.

11

(pg.

55

-

66

)

Selection on major components of angiosperm genomes

,

Science

,

2008

, vol.

320

(pg.

484

-

486

)

The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution

,

PLoS Genet

,

2010

, vol.

6

pg.

e1000944

Evolution of duplicate genes versus genetic robustness against null mutations

,

Trends Genet

,

2003

, vol.

19

(pg.

354

-

356

)

Extent of gene duplication in the genomes of Drosophila, nematode, and yeast

,

Mol Biol Evol

,

2002

, vol.

19

(pg.

256

-

262

)

Evolutionary persistence of functional compensation by duplicate genes in Arabidopsis

,

Genome Biol Evol

,

2009

, vol.

1

(pg.

409

-

414

)

Protein dispensability and rate of evolution

,

Nature

,

2001

, vol.

411

(pg.

1046

-

1049

)

et al.

(30 co-authors)

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

,

Nat Genet

,

2011

(in press)

R: a language for data analysis and graphics

,

J Comput Graphical Stat

,

1996

, vol.

5

(pg.

299

-

314

)

Codon usage and tRNA content in unicellular and multicellular organisms

,

Mol Biol Evol

,

1985

, vol.

2

(pg.

13

-

34

)

et al.

(56 co-authors)

The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla

,

Nature

,

2007

, vol.

449

(pg.

463

-

467

)

,

Principal component analysis

,

2002

New York

Springer

Duplicated genes evolve slower than singletons despite the initial rate increase

,

BMC Evol Biol

,

2004

, vol.

4

pg.

22

Repbase update, a database of eukaryotic repetitive elements

,

Cytogenet Genome Res

,

2005

, vol.

110

(pg.

462

-

467

)

Evolution of protein-coding genes in Drosophila

,

Trends Genet

,

2008

, vol.

24

(pg.

114

-

123

)

Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions

,

Mol Biol Evol

,

2005

, vol.

22

(pg.

1345

-

1354

)

Lercher MJ, Williams EJ, Hurst, LD. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of the male mutation bias. Mol Biol Evol. 18:2023–2029

Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution

,

Mol Biol Evol

,

2006

, vol.

23

(pg.

1119

-

1128

)

Universal distribution of protein evolution rates as a consequence of protein folding physics

,

Proc Natl Acad Sci U S A

,

2010

, vol.

107

(pg.

2983

-

2988

)

Plant conserved non-coding sequences and paralogue evolution

,

Trends Genet

,

2005

, vol.

21

(pg.

60

-

65

)

Modeling gene and genome duplications in eukaryotes

,

Proc Natl Acad Sci U S A

,

2005

, vol.

102

(pg.

5454

-

5459

)

Synonymous and nonsynonymous substitution distances are correlated in mouse and rat genes

,

J Mol Evol

,

1998

, vol.

47

(pg.

119

-

121

)

Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans

,

J Mol Evol

,

2001

, vol.

52

(pg.

275

-

280

)

The use of MPSS for whole-genome transcriptional analysis in Arabidopsis

,

Genome Res

,

2004

, vol.

14

(pg.

1641

-

1653

)

Impact of the presence of paralogs on sequence divergence in a set of mouse-human orthologs

,

Genome Res

,

2002

, vol.

12

(pg.

1370

-

1376

)

Does the recombination rate affect the efficiency of purifying selection? The yeast genome provides a partial answer

,

Mol Biol Evol

,

2001a

, vol.

18

(pg.

2323

-

2326

)

Highly expressed genes in yeast evolve slowly

,

Genetics

,

2001b

, vol.

158

(pg.

927

-

931

)

An integrated view of protein evolution

,

Nat Rev Genet

,

2006

, vol.

7

(pg.

337

-

348

)

Assessing the determinants of evolutionary rates in the presence of noise

,

Mol Biol Evol

,

2007

, vol.

24

(pg.

1113

-

1121

)

Protein function, connectivity, and duplicability in yeast

,

Mol Biol Evol

,

2006

, vol.

23

(pg.

30

-

39

)

MareyMap: an R-based tool with graphical interface for estimating recombination rates

,

Bioinformatics

,

2007

, vol.

23

(pg.

2188

-

2189

)

Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice

,

PLoS Comput Biol

,

2006

, vol.

2

pg.

e115

The quest for the universals of protein evolution

,

Trends Genet

,

2006

, vol.

22

(pg.

412

-

416

)

An analysis of determinants of amino acids substitution rates in bacterial proteins

,

Mol Biol Evol

,

2004

, vol.

21

(pg.

108

-

116

)

Twilight zone of protein sequence alignments

,

Protein Eng

,

1999

, vol.

12

(pg.

85

-

94

)

The effect of multifunctionality on the rate of evolution in yeast

,

Mol Biol Evol

,

2006

, vol.

23

(pg.

721

-

722

)

A gene expression map of Arabidopsis thaliana development

,

Nat Genet

,

2005

, vol.

37

(pg.

501

-

506

)

Independent ancient polyploidy events in the sister families Brassicaceae and Cleomaceae

,

Plant Cell

,

2006

, vol.

18

(pg.

1152

-

1165

)

The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias

,

Mol Biol Evol

,

1987

, vol.

4

(pg.

222

-

230

)

The hidden duplication past of Arabidopsis thaliana

,

Proc Natl Acad Sci U S A

,

2002

, vol.

99

(pg.

13627

-

13632

)

A high-resolution map of Arabidopsis recombinant inbred lines by whole-genome exon array hybridization

,

PLoS Genet

,

2006

, vol.

2

9

pg.

e144

The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents

,

Genetics

,

1999

, vol.

153

(pg.

1395

-

1402

)

The role of hybridization in plant speciation

,

Annu Rev Plant Biol

,

2009

, vol.

60

(pg.

561

-

588

)

Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome

,

Genetics

,

2004

, vol.

168

(pg.

373

-

381

)

Coding sequence divergence between two closely related plant species: Arabidopsis thaliana and Brassica rapa ssp. pekinensis

,

J Mol Evol.

,

2002

, vol.

54

(pg.

746

-

753

)

Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection

,

Genetics

,

2001

, vol.

159

(pg.

1191

-

1199

)

The origins of genomic duplications in Arabidopsis

,

Science

,

2000

, vol.

290

(pg.

2114

-

2117

)

Functional genomic analysis of the rates of protein evolution

,

Proc Natl Acad Sci U S A

,

2005

, vol.

102

(pg.

5483

-

5488

)

Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae

,

Genetics

,

2008

, vol.

180

(pg.

391

-

408

)

Functional bias in molecular evolution rate of Arabidopsis thaliana

,

BMC Evol Biol

,

2010

, vol.

10

pg.

125

Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata

,

Mol Biol Evol

,

2004

, vol.

21

(pg.

1719

-

1726

)

et al.

(12 co-authors)

Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification

,

Bioinformatics

,

2005

, vol.

21

(pg.

650

-

659

)

Rate of protein evolution versus fitness effect of gene deletion

,

Mol Biol Evol

,

2003

, vol.

20

(pg.

772

-

774

)

Lowly-expressed genes in Arabidopsis thaliana bear the signature of possible pseudogenization by promoter degradation

,

Mol Biol Evol

,

2011

, vol.

28

(pg.

1193

-

1203

)

PAML: a program package for phylogenetic analysis by maximum likelihood

,

Comput Appl Biosci

,

1997

, vol.

13

(pg.

555

-

556

)

Protein stability imposes limits on organism complexity and speed of molecular evolution

,

Proc Natl Acad Sci U S A

,

2007

, vol.

104

(pg.

16152

-

16157

)

Mammalian housekeeping genes evolve more slowly than tissue-specific genes

,

Mol Biol Evol

,

2004

, vol.

21

(pg.

236

-

239

)

Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana

,

Mol Biol Evol

,

2002

, vol.

19

(pg.

1464

-

1473

)

Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins

,

J Mol Evol

,

1976

, vol.

7

(pg.

167

-

183

)

Author notes

Associate editor: Aoife McLysaght

© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 4,395

3,649 Pageviews

746 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 1
December 2016 4
January 2017 2
February 2017 19
March 2017 32
April 2017 15
May 2017 22
June 2017 20
July 2017 29
August 2017 4
September 2017 15
October 2017 20
November 2017 18
December 2017 59
January 2018 38
February 2018 42
March 2018 60
April 2018 87
May 2018 57
June 2018 32
July 2018 34
August 2018 41
September 2018 49
October 2018 43
November 2018 85
December 2018 64
January 2019 59
February 2019 62
March 2019 95
April 2019 100
May 2019 62
June 2019 64
July 2019 34
August 2019 65
September 2019 30
October 2019 49
November 2019 62
December 2019 74
January 2020 59
February 2020 72
March 2020 57
April 2020 77
May 2020 51
June 2020 82
July 2020 53
August 2020 47
September 2020 144
October 2020 87
November 2020 66
December 2020 42
January 2021 81
February 2021 73
March 2021 158
April 2021 62
May 2021 55
June 2021 31
July 2021 19
August 2021 33
September 2021 50
October 2021 40
November 2021 37
December 2021 35
January 2022 63
February 2022 37
March 2022 56
April 2022 34
May 2022 60
June 2022 36
July 2022 28
August 2022 40
September 2022 38
October 2022 42
November 2022 38
December 2022 37
January 2023 33
February 2023 34
March 2023 29
April 2023 33
May 2023 40
June 2023 16
July 2023 43
August 2023 66
September 2023 33
October 2023 25
November 2023 30
December 2023 45
January 2024 46
February 2024 39
March 2024 32
April 2024 35
May 2024 35
June 2024 28
July 2024 37
August 2024 17
September 2024 23
October 2024 8

Citations

133 Web of Science

×

Email alerts

Email alerts

Citing articles via

More from Oxford Academic