Evolution of Synonymous Codon Usage Bias in West African and Central African Strains of Monkeypox Virus (original) (raw)

Abstract

The evolution of bias in synonymous codon usage in chosen monkeypox viral genomes and the factors influencing its diversification have not been reported so far. In this study, various trends associated with synonymous codon usage in chosen monkeypox viral genomes were investigated, and the results are reported. Identification of factors that influence codon usage in chosen monkeypox viral genomes was done using various codon usage indices, such as the relative synonymous codon usage, the effective number of codons, and the codon adaptation index. The Spearman rank correlation analysis and a correspondence analysis were used for correlating various factors with codon usage. The results revealed that mutational pressure due to compositional constraints, gene expression level, and selection at the codon level for utilization of putative optimal codons are major factors influencing synonymous codon usage bias in monkeypox viral genomes. A cluster analysis of relative synonymous codon usage values revealed a grouping of more virulent strains as one major cluster (Central African strains) and a grouping of less virulent strains (West African strains) as another major cluster, indicating a relationship between virulence and synonymous codon usage bias. This study concluded that a balance between the mutational pressure acting at the base composition level and the selection pressure acting at the amino acid level frames synonymous codon usage bias in the chosen monkeypox viruses. The natural selection from the host does not seem to have influenced the synonymous codon usage bias in the analyzed monkeypox viral genomes.

Keywords: Monkeypox viruses (MPXV), synonymous codon usage bias (SCUB), mutational pressure, selection pressure

Introduction

Molecular evolution is a broad term reflecting changes in various genomic parameters due to alterations in the nucleotide and the dinucleotide compositions that lead to an accumulation of mutations over time.1 Because the genetic code is degenerate, more than one codon can encode a particular amino acid; however, the usage of these “synonymous codons” for a given amino acid is not uniform.2 In a given amino acid, a subset of codons may be used more frequently than others are, and such a subset is referred to as “preferred codons.”3 Synonymous codon usage bias (SCUB) is species specific and varies within and between genomes.4 This nonuniform usage of synonymous codons (ie, SCUB) can be significant in highly expressed genes.5 Thus, an understanding of SCUB is critical as it reveals the various forces that frame genomic evolution.6

The mutational pressure, which is due to base compositional constraints, and the selection pressure, which increases the translational speed and accuracy, have been identified as 2 important forces causing SCUB in various lineages, such as plants, mammals, macro-invertebrates, bacteria, fungi, and viruses.711 Selection pressure favors codons having abundant transfer RNAs, particularly in highly expressed genes.1215 Furthermore, synonymous codon choices for protein formation have been found to affect secondary structure and protein folding,16 and messenger RNAs (mRNAs) and protein structures have been found to cause selection pressure.1719 For instance, a significant species-specific correlation was noticed between the usage of AAC (asparagine) and the C-terminal regions of β-sheet segments in Escherichia coli as selection for translational efficiency favors downstream asparagine (AAC) residues that are essential for the formation of the β-sheet.19 Similarly, a significant correlation was found to exist between GAU (aspartic acid) and the N-termini of α-helices in humans as selection acts on co-translational protein folding in eukaryotes.19 In an another important study, selection on synonymous codon usage (SCU) facilitated the optimization of the characteristics of mRNA secondary structures as a specific codon usage pattern was observed in the nucleotide sequence of repetitive units of silk fibroin mRNA.17 However, in another study, the mutational pressure was found to frame the overall nucleotide composition in genomes through GC « » AT changes,20 and intrinsic bias in dinucleotide frequencies may have had an influence on SCUB6 as such bias can be extreme.20 For instance, the CpG (C-phosphate-G) content is underrepresented in many vertebrates owing to the methylation of cytosine residues,21 and the TpA (T-phosphate-A) content is restricted in many organisms due to the susceptibility of uracil in UpA to RNase22 and low thermal stability.23

The quantification of SCUB and the identification of its causative factors in zoonotic viral genomes are crucial in understanding viral evolution and ecology.6 Detailed analyses of trends and SCUB-associated factors are essential if the mechanisms of viral infection and immune response are to be revealed.20 Greater emphasis on understanding the various factors contributing to codon usage patterns is, therefore, more important than merely understanding viral SCUB.2428 The survival, fitness, and evolution of viruses depend strongly on SCUB coactions between viruses and hosts because replication and translation of viral genomes are host associated.20 Few studies have been undertaken to reveal the major forces and trends associated with viral DNA SCUB.20,29,30 Substantial differences between the SCUB in a virus and that in its host will have an effect on viral replication and protein synthesis,31 as evidenced in human papillomaviruses.32

Monkeypox viruses (MPXVs) belong to the genus Orthopoxvirus of the family Poxviridae.33 The family Poxviridae consists of large double-stranded DNA viruses capable of replicating in the cytoplasm of vertebrate and invertebrate cells.33,34 Monkeypox viruses cause human diseases similar to the eradicated smallpox caused by the variola virus (VAR).33 By 1977, smallpox was reported to have been eradicated and vaccination was stopped.35 As a result, closely related zoonotic viruses such as MPXVs infected unvaccinated human populations and caused a fatal illness (human monkeypox), but with a very low human-to-human transmission rate.36,37 Although human monkeypox is clinically similar to smallpox, regarding the case fatality rates (CFRs), smallpox was reported to be severe than human monkeypox, with the former having a CFR of 30%38 and the latter a CFR of 10%.36 A recent outbreak investigation conducted in the Bokungu Health Zone of the Democratic Republic of the Congo (DRC) from July 1 to December 8, 2013, revealed a 600-fold increase in the number of human monkeypox cases.39

Rodents are the major animal reservoirs for MPXVs.4042 The viral transmission to humans takes place through direct contact with animals.43 Wounds in the skin are the major route through which infection happens while handling infected animals.41 In some cases, respiratory transmission from animal to human and then from human to human has occurred.41,44 The incubation period is 10 to 14 days.43 After the incubation period, the prodromal period lasts for 2 days, and in this phase, the infected individual may experience fever, chills, malaise, headache, backache, sore throat, shortness of breath, and swollen lymph nodes.45,46 A clinical feature that can be used to differentiate between human monkeypox and human smallpox infections is the presence of enlarged lymph nodes in the submandibular, cervical, or inguinal regions in the former.35 The infected individual becomes most contagious subsequent to the development of a progressive maculopapular rash (0.2-1.0 cm) after the prodromal period.45,46 The spread of the lesions over the body follows a centrifugal pattern, and in certain cases, dyspigmented scars may develop from the lesions.43 In general, during a 2- to 4-week time period, the lesions over the body progressively undergo several changes from macules to papules, vesicles, and pustules, followed by scabbing and desquamation.35,43

Human monkeypox is endemic to the DRC, and infections take place throughout the Congo Basin.39 Different isolates of MPXVs from West Africa and the Congo Basin have been proven to be genetically distinct, and substantial differences in virulence between them have been reported.47 For instance, MPXV-ZAI-V79 isolated from the Congo Basin is thought to be more virulent than MPXV-COP-58 isolated from West Africa47 as no mortalities were reported during the West African isolate MPXV outbreaks in the United States in 2003.47 However, high virulence (>90%) and fatalities have been reported in the Congo Basin, and D10L, D14L, B10R, B14R, and B19R have been identified as possible candidate loci for virulence.47 Although genetic analyses revealed that MPXVs are not the immediate ancestors of the VAR because considerable differences were found between MPXVs and the VAR in the terminal genomic regions encoding virulence and host range factors, the possibility of an MPXV evolving into a highly virulent VAR-like virus with significant human-to-human transmission rates cannot be ignored.37

In this study, extensive analyses of SCUBs in 13 representative MPXV genomes isolated from different African regions were conducted to unravel patterns and factors associated with MPXV diversification. The size of the double-stranded DNA genome of an MPXV is ≈200 kb, comprising ≈190 nonoverlapping open reading frames (ORFs) that contain ≥180 nucleotides.48 A typical monkeypox genome contains a central conserved region (≈560 00 to 120 000 nucleotides long), with variable regions to the left and the right, as well as an inverted terminal region (ITR) with tandem repeats.33 The central conserved region contains genes with the codes for the replication machinery.48 The ITR in the MPXV genome represents a global repeat49,50 and accounts for almost 1% of the total genome size.50,51 At least 4 ORFs are included in the ITR of the MPXV genomes.52,53 The ORFs in the ITR take part in the virus-host interactions.48,54

As differences in virulence regarding location have been reported,47 an objective of this study was to reveal associations between virulence and various trends associated with SCUB in MPXV genomes. The results of this research should contribute to an understanding of the coaction between the genome-wide neutral mutational and selection pressures, which, in turn, increases our understanding of viral DNA evolution, as well as the interactions between the viruses and their hosts. Most importantly, the results of SCUB analyses of viral genomes should have important applications in studies related to the genetic engineering of viral genome sequences.20

Materials and Methods

Sequence data

The complete genomes of 13 representative MPXVs (Table 1) were retrieved from the National Center for Biotechnology Information. Details such as accession numbers, the region of isolation, the number of coding sequences (CDSs) selected, and the sizes of the genomes were also provided (Table 1). The integrity of full-length coding sequences without introns was confirmed by checking for the presence of proper initiation and termination codons.55 To avoid sampling errors and stochastic variations, we chose CDSs having more than 300 nucleotides for analysis (Table 1).8 Information regarding the ITRs of the MPXV genomes was obtained from GenBank, and for the calculation of the codon usage in an ITR, the orientation was changed in such a way as to maintain the corresponding amino acid sequences intact and thereby avoid any miscalculation of the codon usage.

Table 1.

Details of examined monkeypox virus strains.

S. no. Strain Accession no. Isolation No. of chosen coding sequences Length
1 Congo-2003-358 DQ011154.1 Congo 158 160 929.0
2 COP-58 AY753185.1 West Africa 152 156 321.0
3 DRC Yandongi-1985 KC257460.1 Congo: Yandongi 157 159 768.0
4 Liberia-1970-184 DQ011156.1 Liberia 161 161 544.0
5 MPXV-WRAIR7-61 AY603973.1 West Africa 152 156 414.0
6 Sierra Leone AY741551.1 West Africa 151 155 874.0
7 Sudan-2005-01 KC257459.1 Sudan: Nuria 169 171 372.0
8 USA-2003-039 DQ011157.1 USA 160 161 013.0
9 USA-2003-044 DQ011153.1 USA 160 161 013.0
10 V79-I-005 HQ857562.1 Zaire 159 160 967.0
11 Zaire-1979-005 (cidofovir resistant) HM172544.1 Zaire 157 156 474.0
12 Zaire-1979-005 DQ011155.1 Zaire 161 161 664.0
13 Zaire-96-I-16 NC_003310.1 Zaire 158 160 944.0

Measures of SCU

The effective number of codons (ENC) is a commonly employed index for measuring SCUB independently of the length of the CDS.56 The ENC values vary from 20 to 61. In any given gene, if only one codon is used to encode one particular amino acid, the ENC value will be 20 (extreme SCUB). If all synonymous codons of a particular amino acid are used equally, the ENC value will be 61 (almost no SCUB). The compositions of the G and the C nucleotides were calculated for the first, second, and third codon positions. Expected ENC values were calculated using the GC3 (GC composition at the third codon position) values.56 An ENC versus GC3 plot can be used to distinguish between the 2 major evolutionary forces, the mutational pressure and the translational selection, for the observed SCU patterns by displaying gene groupings along the expected ENC curve. This is true because these 2 major evolutionary forces are the ones that contribute to SCUB. Even though, in some cases, genetic drift can be considered as a factor shaping codon usage; the ENC versus GC3 plot will only give an indication of the influences of the mutational pressure and the selection pressure. In this research, ENC values were calculated according to the following equation56:

where _F_2, _F_3, _F_4, and _F_6 are the average homozygosity values for 4 different synonymous family types and were estimated using the codon frequencies squared. The average homozygosity for each amino acid was calculated according to the following equation56:

where k is the number of alleles squared. The expected ENC versus GC3 curve was plotted using GC3 values ranging from 0% to 100% in intervals of 10% and their corresponding expected ENC values, which, under no selection, can be calculated using the following equation56:

E(ENC)=2+s+{29|[s2+(1−s)2]}

where s = GC3.

The relative SCU (RSCU), which is the ratio of the observed codon frequency to the expected codon frequency, provided all synonymous codons of that particular amino acid have uniform usage, is another important index for measuring SCUB.3,12 The RSCU values greater than 1 denote codons used more frequently than their synonymous counterparts, whereas the RSCU values less than 1 represent codons used less frequently; codons with an RSCU value of 1 denote no bias.3

The codon adaptation index (CAI) assesses the significance of selection in shaping the observed patterns of the SCU of a gene5 using a reference set of highly expressed genes from a particular species. The CAI indicates the level of gene expression5,10,11 by calculating a score for each gene. The CAI values from 0.75 to 1.0 indicate a high level of gene expression.5 Although the CAI is independent of gene length, the CAI of short genes may be affected by sampling bias.5 We used the Homo sapiens general codon usage table as a reference set because the CAI is a good indicator of viral gene adaptation to the host.5

Protein hydrophobicity and aromaticity (ie, frequency of aromatic amino acids such as Phe, Trp, and Tyr) were calculated.57 A correspondence analysis of RSCU (COA-RSCU) has been generally adopted to identify intragenomic variations while avoiding the influence of the amino acid’s composition.8,11 In a COA-RSCU, each CDS is represented as a 59-dimensional vector,58 wherein each dimension corresponds to the RSCU value of a particular codon.58 The COA-RSCU partitions the total variation in codon usage across 59 orthogonal axes with 41 degrees of freedom.8 The first axis of the COA-RSCU (axis 1) accounts for most of variations, whereas subsequent axes capture decreasing amounts of variance.8

Putative optimal codons were identified by applying the χ2 test to a 2 × 2 matrix having 1 degree of freedom. We chose 10% of the genes lying on the left and the right extremes of axis 1 of the COA-RSCU to form 2 data sets as axis 1 of the COA-RSCU accounts for most of the variations in the RSCU. The first row of this matrix contains the observed codon frequencies from the 2 data sets, whereas the second row contains the total number of synonymous alternatives of that particular codon.8 Codons whose frequencies of usage were significantly higher (P < .05) in one data set than in the other data set were defined as putative optimal codons.

Cluster analysis

A cluster analysis of the RSCU values was performed to reveal the relationship between the SCUB and other factors based on groupings of the codon usage.7 In the cluster analysis, a 13 × 59 matrix, in which rows and columns corresponded to the 13 MPXV strains and the pooled RSCU values of the 59 codon species, respectively, was generated. Clustering of the MPXVs based on RSCU values was found to have occurred using unweighted pair-group average clustering and Euclidean distances.

Statistical analysis

The nonparametric Spearman rank correlation was adopted for all correlation analyses between the various codon usage indices and the other parameters as it does not hold any assumptions regarding the distribution of underlying data.8,55 The Mann-Whitney 2-sample test was used to analyze the intergenomic differences in the ENC values. PAST software version 2.12 was used for the Spearman rank correlation analysis.59 CodonW (http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::codonw) was used to compute the values of the ENC, hydrophobicity, and aromaticity.60 MEGA version 5.2.2 was used to calculate the compositions of the nucleotides.61 DAMBE version 5.3.31 was employed to determine the RSCU values,62 and the CAI values were computed using ACUA 1.0.63 The level of significance was taken as .05.

Results

Effect of base composition on SCUB

The overall and the wobble base contents were estimated in all 13 examined MPXV genomes. Overall, the AT content was found to be higher than the GC content. Among the individual nucleotide compositions, the A content was higher than the T, G, and C contents and varied by 35.26 ± 0.053; thus, it was overrepresented in the protein-coding genes (PCGs) of all genomes. In all examined genomes, the C content was observed to be the least among all other nucleotide contents and to vary by 15.52 ± 0.025; thus, it was underrepresented in the PCGs of all genomes. Moreover, the GC content was observed to vary by 33.74 ± 0.065 in all genomes. Because the base changes that occur at the third site of synonymous codons for a given amino acid are neutral, the third site of a codon is commonly known as “the silent site.” Interestingly, the T3 content was higher than the contents of other silent bases (A3, G3, and C3) and was found to vary by 38.23 ± 0.082; the GC composition at silent sites (GC3) was found to vary by 29.12 ± 0.080.

A Spearman rank correlation analysis revealed complex correlations between the overall and the silent base compositions, indicating the presence of compositional constraints in all genomes. The existence of positive correlations between homogeneous nucleotide contents and negative correlations between heterogeneous nucleotide contents implies that mutational pressure due to compositional constraints might play a crucial role in shaping the codon usage.64 In the case of viral genomes, the positively correlated heterogeneous contents and the negatively correlated homogeneous contents indicate natural selection by the host.24 In this study, significant positive correlations were found between A and A3, T and T3, G and G3, and C and C3. The most heterogeneous base contents were found for significant negative correlations (Table 2). The G3, C3, and GC3 contents were found to have significant positive correlations with the overall GC content. No correlations were observed between G3 and C, T3 and A, and vice versa. These noncorrelations did not reveal any SCUB characteristics. The correlation analyses of nucleotide contents did not reveal the role of natural selection by the host. These results suggest that mutational pressure due to compositional constraints shapes the SCUB in MPXV genomes to a large extent.

Table 2.

Spearman rank correlation analysis between overall and silent base compositions.

Strains Bases A3 T3 G3 C3 GC3
Congo-2003-358 A 0.536** −0.147 −0.408** −0.155 −0.395**
T −0.150 0.537** −0.087 −0.344** −0.319**
G −0.283** −0.070 0.458** 0.074 0.358**
C −0.154 −0.371** 0.098 0.526** 0.458**
GC −0.298** −0.270** 0.381** 0.361** 0.526**
C0P-58 A 0.538** −0.148 −0.409** −0.170* −0.396**
T −0.123 0.558** −0.149 −0.324** −0.335**
G −0.293** −0.089 0.505** 0.055 0.362**
C −0.159 −0.334** 0.097 0.508** 0.435**
GC −0.310** −0.267** 0.422** 0.343** 0.526**
DRC Yandongi-1985 A 0.556** −0.171* −0.409** −0.175* −0.392**
T −0.144 0.578** −0.134 −0.337** −0.351**
G −0.283** −0.091 0.487** 0.076 0.366**
C −0.157 −0.346** 0.089 0.538** 0.450**
GC −0.306** −0.278** 0.407** 0.371** 0.539**
Liberia-1970-184 A 0.544** −0.185* −0.356** −0.158 −0.365**
T −0.132 0.553** −0.147 −0.325** −0.338**
G −0.272** −0.093 0.470** 0.060 0.365**
C −0.152 −0.294** 0.067 0.483** 0.405**
GC −0.291** −0.253** 0.389** 0.325** 0.516**
MPXV-WRAI7-61 A 0.538** −0.151 −0.410** −0.170* −0.394**
T −0.124 0.555** −0.153 −0.319** −0.333**
G −0.293** −0.087 0.508** 0.055 0.363**
C −0.158 −0.333** 0.101 0.506** 0.435**
GC −0.310** −0.264** 0.424** 0.343** 0.525**
Sierra Leone A 0.528** −0.159 −0.386** −0.182 −0.386**
T −0.120 0.584** −0.168* −0.340** −0.364**
G −0.286** −0.079 0.484** 0.059 0.356**
C −0.136 −0.327** 0.085 0.503** 0.426**
GC −0.295** −0.252** 0.401** 0.347** 0.519**
Sudan 2005-01 A 0.524** −0.190* −0.366** −0.132 −0.346**
T −0.107 0.570** −0.131 −0.361** −0.364**
G −0.277** −0.039 0.456** 0.047 0.327**
C −0.165* −0.338** 0.081 0.516** 0.439**
GC −0.318** −0.253** 0.374** 0.359** 0.521**
USA-2003-039 A 0.536** −0.184* −0.362** −0.161* −0.369**
T −0.132 0.569** −0.148 −0.327** −0.339**
G −0.278** −0.117 0.491** 0.059 0.386**
C −0.151 −0.287** 0.063 0.489** 0.398**
GC −0.292** −0.275** 0.407** 0.329** 0.532**
USA-2003-044 A 0.536** −0.184* −0.362** −0.161* −0.369**
T −0.132 0.569** −0.148 −0.327** −0.339**
G −0.278** −0.117 0.491** 0.059 0.386**
C −0.151 −0.287** 0.063 0.489** 0.398**
GC −0.292** −0.275** 0.407** 0.329** 0.532**
V79-I-005 A 0.529** −0.167* −0.391** −0.137 −0.368**
T −0.126 0.573** −0.128 −0.366** −0.362**
G −0.284** −0.089 0.476** 0.063 0.366**
C −0.133 −0.351** 0.073 0.519** 0.439**
GC −0.285** −0.280** 0.384** 0.347** 0.527**
Zaire-1979-005 (cr) A 0.534** −0.139 −0.405** −0.167 −0.390**
T −0.131 0.538** −0.098 −0.347** −0.334**
G −0.296** −0.081 0.465** 0.086 0.365**
C −0.148 −0.366** 0.096 0.522** 0.452**
GC −0.311** −0.281** 0.391** 0.374** 0.540**
Zaire-1979-005 A 0.534** −0.149 −0.407** −0.149 −0.385**
T −0.139 0.544** −0.105 −0.345** −0.334**
G −0.287** −0.092 0.478** 0.074 0.370**
C −0.149 −0.354** 0.088 0.520** 0.444**
GC −0.301** −0.283** 0.397** 0.359** 0.535**
Zaire-96-I-16 A 0.536** −0.167* −0.397** −0.146 −0.381**
T −0.131 0.571** −0.131 −0.356** −0.353**
G −0.293** −0.075 0.473** 0.070 0.362**
C −0.135 −0.351** 0.079 0.518** 0.440**
GC −0.290** −0.270** 0.387** 0.347** 0.524**

Quantification of SCUB

The ENC versus GC3 plots were developed to quantify the SCUB (Figure 1). The ENC values were found to vary by 47.00 ± 0.078. The calculated ENC values of all genes were found to be greater than 35, suggesting a weak codon bias in all examined MPXV genomes. The ENC values were approximately normally distributed, and the Mann-Whitney 2-sample test revealed no significant intergenomic differences in the ENC values (P > .05). In the plots, most of genes were found to lie on or just below the expected GC3 curve, suggesting that the SCUB was shaped mainly by GC compositional constraints. However, a considerable number of genes were grouped far below the expected GC3 curve, suggesting that other factors also influenced the SCUB in the MPXV genomes.

Figure 1.

Figure 1.

Mutational pressure versus selection pressure in MPXV genomes. ENC versus GC3 plots for (A) Congo-2003-358, (B) COP-58, (C) DRC Yandongi-1985, (D) Liberia-1970-184, (E) MPXV-WRAIR7-61, (F) Sierra Leone, (G) Sudan-2005-01, (H) USA-2003-039, (I) USA-2003-044, (J)V79-I-005, (K) Zaire-1979-005 (cr), (L) Zaire-1979-005, and (M) Zaire-96-I-16. ENC indicates effective number of codons; MPXV, monkeypox viruses.

Neutrality plots65 revealed no significant correlations between GC3 and GC12 (the G and the C contents at the first and the second codon positions) as the slope of the scatterplot approached 0, which is an indication that other major factors, such as selection, also have an influence on the SCUB in the MPXV genomes (Figure 2). The association between purines (A and G) and pyrimidines (C and T) was analyzed using a PR2 bias plot, and the A and the T contents were found to be used more than the C and the G contents (Figure 3). The PR2 bias plots clearly exhibited deviations from Chargaff’s second parity rule66 as most of the genes were localized far from the origin of the axis (Figure 3). The values of the PCG in all analyzed MPXV genomes (Table 1) had CAI values greater than 0.50; this indicated good host adaptation as the CAI values were calculated based on the Homo sapiens general codon usage. Significant positive correlations were found between the ENC and the CAI (P < .05), indicating that the level of gene expression had a large influence on the SCUB. The ENC was also positively correlated with the GC3 values (P < .01) and with the hydrophobicity scores (P < .05), revealing their crucial roles in shaping the SCUB in MPXV genomes.

Figure 2.

Figure 2.

Influence of GC in shaping SCUB in MPXV genomes. Neutrality plots for (A) Congo-2003-358, (B) COP-58, (C) DRC Yandongi-1985, (D) Liberia-1970-184, (E) MPXV-WRAIR7-61, (F) Sierra Leone, (G) Sudan-2005-01, (H) USA-2003-039, (I) USA-2003-044, (J)V79-I-005, (K) Zaire-1979-005 (cr), (L) Zaire-1979-005, and (M) Zaire-96-I-16. MPXV indicates monkeypox viruses; SCUB, synonymous codon usage bias.

Figure 3.

Figure 3.

Deviation from parity rule 2 in MPXV genomes. PR2 bias plots for (A) Congo-2003-358, (B) COP-58, (C) DRC Yandongi-1985, (D) Liberia-1970-184, (E) MPXV-WRAIR7-61, (F) Sierra Leone, (G) Sudan-2005-01, (H) USA-2003-039, (I) USA-2003-044, (J)V79-I-005, (K) Zaire-1979-005 (cr), (L) Zaire-1979-005, and (M) Zaire-96-I-16. MPXV indicates monkeypox viruses.

Qualitative evaluation of SCUB

The codons with RSCU values greater than 1.0 are considered to be preferred as such codons are used more often than those with RSCU values less than 1.0.3 In all synonymous amino acid families (6-fold, 4-fold, 3-fold, and 2-fold degenerate amino acids), A/T-ending codons were found to be used more frequently than G/C-ending codons (Table 3). In contrast, the human cells (host) use G/C-ending codons more frequently than A/T-ending codons.67,68 The AGA that codes Arg is the only A-ending codon preferred in human cells.67,68 In MPXV genomes, GAC (D), GGG (G), GGC (G), CAC (H), ATC (I), AAG (K), CTC (L), CTG (L), AAC (N), CAG (Q), AGG (R), CGC (R), AGC (S), TCC (S), ACC (T), ACG (T), GTG (V), GTC (V), and TAC (Y) were noted to be rare (RSCU < 0.66). In the host genome, the rare codons were reported to be GCG, CGA, AAT, GAT, TGT, CAA, GAA, GGT, CAT, ATA, TTA, AAA, TTT, CCG, TCG, ACG, TAT, and GTA.67,68 Strand-specific codon biases were observed in all MPXV genomes for the amino acid Ile; ie, in positive strands, all strains preferred ATA, whereas in negative strands, all strains preferred ATT (Table 4). The amino acids Arg, Thr, and Val also exhibited strand-specific bias, but not in all strains (Table 4). Interestingly, positive strand–encoded genes preferentially used A-ending codons, whereas negative strand–encoded genes preferred T-ending codons. However, in the negative strand–encoded genes of the DRC Yandongi-1985 and the Sudan-2005-01 strains, the amino acid Val preferred both GTT and GTA.

Table 3.

Overall relative synonymous codon usage values of protein-coding genes in examined monkeypox virus.

AA Codon Strains
Congo-2003-358 COP-58 DRC Yandongi-1985 Liberia-1970-184 MPXV-WRAIR7-61 Sierra Leone Sudan-2005-01 USA-2003-039 USA-2003-044 V79-I-005 Zaire-1979-005 Zaire-1979-005 Zaire-96-I-16
A GCT 1.405 1.391 1.400 1.403 1.391 1.396 1.420 1.404 1.404 1.418 1.408 1.411 1.411
A GCG 0.690 0.693 0.689 0.689 0.693 0.689 0.680 0.687 0.687 0.692 0.675 0.686 0.684
A GCC 0.634 0.650 0.632 0.641 0.650 0.648 0.630 0.648 0.648 0.625 0.637 0.631 0.631
A GCA 1.271 1.266 1.279 1.267 1.266 1.266 1.270 1.260 1.260 1.264 1.280 1.272 1.275
C TGT 1.588 1.603 1.591 1.597 1.603 1.603 1.580 1.603 1.603 1.589 1.590 1.591 1.588
C TGC 0.412 0.397 0.409 0.403 0.397 0.397 0.420 0.397 0.397 0.411 0.410 0.409 0.412
D GAT 1.530 1.527 1.524 1.524 1.526 1.524 1.520 1.526 1.526 1.522 1.528 1.527 1.524
D GAC 0.470 0.473 0.476 0.476 0.474 0.476 0.480 0.474 0.474 0.478 0.472 0.473 0.476
E GAG 0.544 0.540 0.545 0.543 0.541 0.540 0.550 0.543 0.543 0.546 0.542 0.543 0.546
E GAA 1.456 1.460 1.455 1.457 1.459 1.460 1.450 1.457 1.457 1.454 1.458 1.457 1.454
F TTT 1.425 1.427 1.429 1.422 1.426 1.425 1.420 1.424 1.424 1.426 1.426 1.423 1.423
F TTC 0.575 0.573 0.571 0.578 0.574 0.575 0.580 0.576 0.576 0.574 0.574 0.577 0.577
G GGT 1.303 1.340 1.302 1.341 1.340 1.337 1.300 1.340 1.340 1.288 1.321 1.306 1.304
G GGG 0.297 0.290 0.298 0.287 0.290 0.290 0.300 0.287 0.287 0.301 0.293 0.299 0.298
G GGC 0.329 0.307 0.328 0.303 0.307 0.304 0.330 0.304 0.304 0.337 0.331 0.336 0.337
G GGA 2.071 2.064 2.071 2.069 2.064 2.069 2.070 2.069 2.069 2.074 2.056 2.060 2.060
H CAC 0.512 0.517 0.515 0.513 0.517 0.514 0.530 0.509 0.509 0.513 0.516 0.516 0.514
H CAT 1.488 1.483 1.485 1.487 1.483 1.486 1.470 1.491 1.491 1.487 1.484 1.484 1.486
I ATT 1.217 1.217 1.220 1.217 1.218 1.213 1.200 1.216 1.216 1.219 1.221 1.217 1.219
I ATA 1.209 1.221 1.209 1.218 1.222 1.220 1.220 1.219 1.219 1.208 1.205 1.210 1.207
I ATC 0.573 0.562 0.572 0.565 0.560 0.567 0.580 0.566 0.566 0.573 0.573 0.573 0.574
K AAA 1.381 1.386 1.381 1.384 1.387 1.387 1.380 1.383 1.383 1.382 1.383 1.383 1.381
K AAG 0.619 0.614 0.619 0.616 0.613 0.613 0.620 0.617 0.617 0.618 0.617 0.617 0.619
L CTA 1.689 1.704 1.687 1.692 1.704 1.705 1.260 1.686 1.686 1.685 1.679 1.685 1.683
L CTC 0.557 0.553 0.563 0.558 0.553 0.550 0.420 0.561 0.561 0.557 0.562 0.556 0.557
L CTG 0.660 0.649 0.651 0.653 0.649 0.650 0.480 0.653 0.653 0.658 0.649 0.657 0.662
L CTT 1.094 1.093 1.099 1.097 1.093 1.095 0.800 1.099 1.099 1.100 1.111 1.102 1.098
L TTA 1.192 1.206 1.195 1.197 1.207 1.208 1.820 1.199 1.199 1.198 1.204 1.197 1.196
L TTG 0.808 0.794 0.805 0.803 0.793 0.792 1.210 0.801 0.801 0.802 0.796 0.803 0.804
N AAC 0.580 0.578 0.580 0.576 0.577 0.577 0.580 0.576 0.576 0.579 0.587 0.581 0.580
N AAT 1.420 1.422 1.420 1.424 1.423 1.423 1.420 1.424 1.424 1.421 1.413 1.419 1.420
P CCA 1.505 1.511 1.493 1.522 1.509 1.516 1.510 1.511 1.511 1.494 1.502 1.5 1.496
P CCC 0.488 0.471 0.487 0.474 0.471 0.464 0.490 0.478 0.478 0.488 0.488 0.488 0.489
P CCT 1.346 1.340 1.347 1.342 1.340 1.341 1.320 1.347 1.347 1.350 1.350 1.344 1.348
P CCG 0.662 0.678 0.673 0.662 0.680 0.678 0.680 0.664 0.664 0.669 0.660 0.668 0.667
Q CAA 1.458 1.450 1.452 1.45 1.450 1.451 1.450 1.457 1.457 1.456 1.454 1.458 1.460
Q CAG 0.542 0.550 0.548 0.55 0.550 0.549 0.550 0.543 0.543 0.544 0.546 0.542 0.540
R AGA 1.705 1.707 1.706 1.702 1.707 1.707 3.240 1.701 1.701 1.708 1.711 1.708 1.708
R AGG 0.295 0.293 0.294 0.298 0.293 0.293 0.590 0.299 0.299 0.292 0.289 0.292 0.292
R CGA 1.474 1.510 1.478 1.478 1.510 1.508 0.790 1.476 1.476 1.485 1.496 1.49 1.472
R CGC 0.477 0.480 0.483 0.483 0.480 0.479 0.260 0.473 0.473 0.474 0.459 0.474 0.484
R CGG 0.409 0.403 0.409 0.401 0.403 0.403 0.220 0.405 0.405 0.406 0.429 0.411 0.412
R CGT 1.640 1.607 1.631 1.638 1.607 1.610 0.900 1.645 1.645 1.635 1.616 1.625 1.632
S AGC 0.498 0.499 0.495 0.497 0.500 0.498 0.410 0.499 0.499 0.500 0.496 0.497 0.501
S AGT 1.502 1.501 1.505 1.503 1.500 1.502 1.230 1.501 1.501 1.500 1.504 1.503 1.499
S TCA 1.123 1.112 1.118 1.134 1.113 1.112 1.240 1.128 1.128 1.118 1.122 1.123 1.122
S TCC 0.627 0.629 0.631 0.621 0.628 0.629 0.680 0.617 0.617 0.624 0.630 0.625 0.622
S TCG 0.508 0.512 0.508 0.509 0.512 0.513 0.560 0.515 0.515 0.512 0.517 0.513 0.507
S TCT 1.741 1.747 1.743 1.736 1.747 1.746 1.870 1.74 1.740 1.746 1.731 1.740 1.749
T ACC 0.550 0.572 0.561 0.566 0.572 0.569 0.580 0.566 0.566 0.543 0.556 0.555 0.558
T ACA 1.456 1.428 1.431 1.434 1.428 1.424 1.450 1.434 1.434 1.429 1.452 1.448 1.434
T ACG 0.541 0.545 0.551 0.540 0.545 0.547 0.550 0.543 0.543 0.555 0.545 0.546 0.548
T ACT 1.453 1.455 1.457 1.460 1.455 1.460 1.420 1.458 1.458 1.473 1.447 1.452 1.460
V GTT 1.377 1.371 1.372 1.371 1.371 1.371 1.370 1.373 1.373 1.370 1.376 1.374 1.374
V GTG 0.589 0.592 0.589 0.594 0.592 0.596 0.600 0.596 0.596 0.588 0.584 0.587 0.589
V GTC 0.531 0.522 0.534 0.533 0.522 0.523 0.540 0.528 0.528 0.535 0.531 0.533 0.533
V GTA 1.502 1.515 1.505 1.503 1.515 1.510 1.490 1.504 1.504 1.507 1.509 1.505 1.504
Y TAC 0.555 0.571 0.554 0.573 0.576 0.561 0.570 0.571 0.571 0.556 0.555 0.553 0.556
Y TAT 1.445 1.429 1.446 1.427 1.424 1.439 1.430 1.429 1.429 1.444 1.445 1.447 1.444

Table 4.

Codons exhibiting strand-specific bias in examined monkeypox virus genomes.

AA Strands Congo-2003-358 COP-58 DRC Yandongi-1985 Liberia-1970-184 MPXV-WRAIR7-61 Sierra Leone Sudan-2005-01 USA-2003-039 USA-2003-044 V79-I-005 Zaire-1979-005 Zaire-1979-005 Zaire-96-I-16
I All ATT ATA ATT ATA ATA ATA ATA ATA ATA ATT ATT ATT ATT
+ ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA
ATT ATT ATT ATT ATT ATT ATT ATT ATT ATT ATT ATT ATT
R All AGA AGA AGA AGA AGA AGA AGA AGA AGA
+ AGA AGA AGA AGA AGA AGA AGA AGA AGA
CGT CGT CGT CGT CGT CGT CGT CGT CGT
T All ACA ACT ACT ACA ACA ACT
+ ACA ACA ACA ACA ACA ACA
ACT ACT ACT ACT ACT ACT
V All GTA GTA GTA GTA GTA GTA GTA GTA GTA
+ GTA GTA GTA GTA GTA GTA GTA GTA GTA
GTT GTT and GTA GTT GTT and GTA GTT GTT GTT GTT GTT

Bias in the dinucleotide frequency analysis demonstrated that AT was overrepresented in all genomes, whereas GC was underrepresented. The ρ values of the dinucleotides were calculated by taking the ratio of the observed to the expected dinucleotide frequency and, in all genomes except GC, were found to be very close to 1. The most biased dinucleotides were ρAT, ρGA, and ρTC. The χ2 test revealed that the dinucleotide frequencies were not randomly distributed (P < .05).

Putative optimal codons were chosen based on the χ2 analysis of the 2 data sets formed by selecting 10% of the genes located at the 2 extremes of COA axis 1. All putative optimal codons were found to end in A/T (Table 5). The SCUBs of strains having threshold fitness or “good fitness”24 were hypothesized to be shaped due to natural selection by the host.24 However, the presence of A/T-ending putative optimal codons in the MPXV genomes, as found in this study, can be explained largely by the high AT content in the respective genomes. Natural selection by the host, if it existed, would have resulted in particular codon usage patterns in which amino acids would have preferentially used any nucleotide-ending codons.24

Table 5.

Identified putative optimal codons in examined monkeypox virus genomes.

S. no. Congo-2003-358 COP-58 DRC Yandongi-1985 Liberia-1970-184 MPXV-WRAIR7-61 Sierra Leone Sudan 2005-01 USA-2003-039 USA-2003-044 V79-I-005 Zaire-1979-005 Zaire-1979-005 Zaire-96-I-16
1 A (GCT) (GCA) C (TGT) A (GCA) F (TTT) C (TGT) C (TGT) A (GCT) (GCA) D (GAT) D (GAT) A (GCA) A (GCA) A (GCA) A (GCA)
2 C (TGT) D (GAT) C (TGT) G (GGA) D (GAT) D (GAT) F (TTT) F (TTT) F (TTT) F (TTT) G (GGA) F (TTT) C (TGT)
3 D (GAT) E (GAA) F (TTT) I (ATA) F (TTT) F (TTT) G (GGT) (GGA) G (GGA) G (GGA) G (GGA) I (ATA) G (GGA) F (TTT)
4 G (GGA) F (TTT) G (GGT) (GGA) L (CTA) G (GGT) (GGA) G (GGT) (GGA) I (ATA) I (ATA) I (ATA) I (ATA) L (CTA) I (ATA) G (GGT) (GGA)
5 Y (TAT) G (GGT) GGA) I (ATA) P (CCT) T (ACA) P (CCT) L (CTA) L (CTA) L (CTA) L (CTA) Y (TAT) L (CTA) V (GTT)
6 T (ACA) L (CTA) T (ACA) V (GTT) T (ACA) T (ACA) T (ACA) Y (TAT) Y (TAT) Y (TAT)
7 V (GTT) Y (TAT) Y (TAT) Y (TAT) V (GTT) Y (TAT)
8 Y (TAT) Y (TAT)

Various factors influencing SCUB

The COA partitioned the total number of SCU variations into 59 axes. Among the 59 axes, axes 1 to 5 accounted for approximately 10.42%, 8.43%, 7.13%, 5.66%, and 4.55% of the total SCU variations, respectively (Supplementary Figure 1). In all the strains isolated from various regions of Central Africa, E3 and GC3 had a high positive correlation with axis 1 (P < .01). The index indicating the level of gene expression (ie, CAI) had a higher positive correlation with axis 1 (P < .01) in all strains than the other proposed gene expression index, ENC, did (Table 6). The lengths of the coding sequences were weakly correlated with axis 1 for Central African strains such as V79-I-005 and Zaire-1979-005 (cr) (P < .05). The T3 content exhibited a significant negative correlation with axis 1 (P < .01) in all Central African strains (Table 6).

Table 6.

Spearman rank correlation analysis between various correspondence analysis axes and important codon usage indices.

Strains Axes A3 T3 G3 C3 GC GC3 ENC CAI Gravy Aromaticity Length
Congo-2003-358 Axis 1 −0.123 −0.337** 0.094 0.457** 0.210** 0.418** 0.167* 0.244** −0.100 −0.126 0.134
Axis 2 −0.343** 0.054 0.155 0.236** 0.244** 0.299** 0.041 0.351** −0.010 −0.063 0.052
Axis 3 −0.198** −0.142 0.258** 0.119 0.048 0.282** 0.120 0.165* −0.019 0.082 0.102
Axis 4 0.340** −0.093 −0.294** −0.145 −0.218** −0.257** −0.283** −0.117 0.018 −0.001 −0.062
Axis 5 −0.069 0.083 −0.027 0.136 0.071 0.033 −0.043 0.098 0.120 0.027 −0.060
COP-58 Axis 1 −0.257** −0.341* 0.214** 0.514** 0.291** 0.532** 0.237** 0.317** −0.087 −0.103 0.135
Axis 2 −0.326** 0.060 0.173* 0.201** 0.194* 0.269** 0.105 0.310** −0.001 −0.055 0.137
Axis 3 −0.119 −0.110 0.166* 0.053 −0.040 0.159 0.091 0.161* −0.072 0.091 0.170*
Axis 4 0.180* 0.093 −0.247** −0.040 −0.063 −0.200** −0.160* 0.026 0.077 −0.016 −0.002
Axis 5 −0.345** −0.118 0.244** 0.399** 0.356** 0.438** 0.318** 0.286** 0.175* −0.056 0.003
DRC Yandongi-1985 Axis 1 −0.120 −0.362* 0.109 0.447** 0.218** 0.431** 0.157 0.258** −0.105 −0.123 0.116
Axis 2 −0.334** 0.078 0.142 0.229** 0.238** 0.275** 0.035 0.365** −0.041 −0.072 0.101
Axis 3 −0.153 −0.087 0.165* 0.053 −0.001 0.183* 0.074 0.169* −0.048 0.124 0.150
Axis 4 0.398** −0.026 −0.364** −0.200** −0.282** −0.354** −0.293** −0.173* 0.021 −0.014 −0.033
Axis 5 0.094 −0.115 −0.040 −0.054 −0.070 −0.033 −0.006 −0.062 −0.126 −0.065 0.014
Liberia-1970-184 Axis 1 −0.213** −0.329* 0.214** 0.433** 0.248** 0.486** 0.179* 0.348** −0.165* −0.104 0.110
Axis 2 −0.322** 0.051 0.183* 0.191* 0.213** 0.285** 0.024 0.321** −0.002 −0.066 0.043
Axis 3 −0.025 −0.171* 0.127 0.025 −0.030 0.128 0.051 0.084 −0.097 0.054 0.107
Axis 4 0.199* −0.047 −0.022 −0.229** −0.204** −0.164* −0.090 −0.342** −0.092 0.011 0.103
Axis 5 0.433** 0.105 −0.283** −0.366** −0.382** −0.465** −0.298** −0.339** 0.038 0.056 0.036
MPXV-WRAIR7-61 Axis 1 −0.256** −0.341** 0.213** 0.515* 0.291* 0.532* 0.237* 0.318* −0.091 −0.103 0.135
Axis 2 −0.327** 0.058 0.168* 0.199* 0.195* 0.268* 0.106 0.311* −0.002 −0.054 0.137
Axis 3 −0.118 −0.108 0.161* 0.052 −0.040 0.156 0.090 0.163* −0.071 0.091 0.168*
Axis 4 0.173* 0.091 −0.247** −0.031 −0.052 −0.193* −0.154 0.034 0.073 −0.015 −0.003
Axis 5 −0.347** −0.122 0.244** 0.402* 0.355* 0.440* 0.321* 0.286* 0.170* −0.057 0.010
Sierra Leone Axis 1 −0.242* −0.334* 0.205** 0.509** 0.271** 0.528** 0.217** 0.319** −0.083 −0.093 0.128
Axis 2 −0.312* 0.067 0.176* 0.179* 0.198* 0.256** 0.083 0.305** −0.007 −0.056 0.117
Axis 3 −0.131* −0.096 0.176* 0.050 −0.049 0.161* 0.082 0.153 −0.044 0.107 0.161*
Axis 4 0.061 0.077 −0.158 0.058 0.026 −0.084 −0.072 0.092 0.098 0.007 −0.006
Axis 5 −0.403* −0.128 0.244** 0.463** 0.393** 0.490** 0.333** 0.363** 0.143 −0.099 −0.030
Sudan-2005-01 Axis 1 −0.132* −0.326** 0.115 0.426** 0.202** 0.414** 0.187* 0.281** −0.126 −0.076 0.125
Axis 2 −0.313** 0.073 0.116 0.226** 0.196* 0.266** 0.036 0.290** −0.042 −0.026 0.081
Axis 3 −0.178* −0.043 0.165* 0.065 0.014 0.169* 0.072 0.287** −0.090 0.077 0.084
Axis 4 −0.071 0.003 0.216** −0.063 −0.056 0.083 0.173* −0.250** 0.018 0.078 0.043
Axis 5 0.295** −0.022 −0.179* −0.250** −0.351** −0.272** −0.159 −0.334** −0.060 0.073 0.018
USA-2003-039 Axis 1 −0.213** −0.325** 0.203** 0.466** 0.272** 0.494** 0.199* 0.326** −0.142 −0.101 0.131
Axis 2 −0.341** 0.018 0.224** 0.213** 0.231** 0.333** 0.075 0.337** 0.022 −0.050 0.018
Axis 3 −0.039 −0.223** 0.159 0.055 0.028 0.168* 0.110 0.111 −0.105 0.047 0.114
Axis 4 0.333** 0.157 −0.252** −0.308** −0.345** −0.419** −0.309** −0.226** 0.050 0.064 0.024
Axis 5 −0.183* 0.026 0.015 0.218** 0.177* 0.153 0.112 0.305** 0.092 0.019 −0.093
USA-2003-044 Axis 1 −0.213** −0.325** 0.203** 0.466** 0.272** 0.494** 0.199* 0.326** −0.142 −0.101 0.131
Axis 2 −0.341** 0.018 0.224** 0.213** 0.231** 0.333** 0.075 0.337** 0.022 −0.050 0.018
Axis 3 −0.039 −0.223** 0.159 0.055 0.028 0.168* 0.110 0.111 −0.105 0.047 0.114
Axis 4 0.333** 0.157 −0.252** −0.308** −0.345** −0.419** −0.309** −0.226** 0.050 0.064 0.024
Axis 5 −0.183* 0.026 0.015 0.218** 0.177* 0.153 0.112 0.305** 0.092 0.019 −0.093
V79-I-005 Axis 1 −0.113 −0.344** 0.109 0.419** 0.193* 0.395** 0.190* 0.229** −0.110 −0.105 0.165*
Axis 2 −0.345** 0.022 0.168* 0.249** 0.256** 0.320** 0.019 0.402** −0.034 −0.092 0.032
Axis 3 −0.123 −0.108 0.158 0.038 −0.022 0.168* 0.054 0.159* −0.055 0.125 0.152
Axis 4 0.303** 0.134 −0.271** −0.242** −0.245** −0.370** −0.266** −0.201** 0.089 0.060 −0.005
Axis 5 −0.272** −0.106 0.185* 0.333** 0.331** 0.350** 0.194* 0.319** 0.146 −0.082 −0.099
Zaire-1979-005 Axis 1 −0.106 −0.365** 0.123 0.432** 0.223** 0.414** 0.224** 0.243** −0.117 −0.126 0.151
Axis 2 −0.371** 0.032 0.184* 0.268** 0.266** 0.336** 0.048 0.398** −0.033 −0.079 0.060
Axis 3 −0.153 −0.146 0.212** 0.082 0.010 0.220** 0.112 0.173* −0.054 0.125 0.144
Axis 4 0.351** −0.076 −0.296** −0.169* −0.220** −0.277** −0.287** −0.132 0.034 −0.003 −0.043
Axis 5 0.017 0.173* −0.055 −0.054 −0.022 −0.120 −0.090 −0.027 0.094 0.020 0.012
Zaire-1979-005 Axis 1 −0.130 −0.327** 0.116 0.416** 0.207** 0.405** 0.210* 0.245** −0.113 −0.114 0.166*
Axis 2 −0.368** 0.028 0.198* 0.258** 0.273** 0.338** 0.042 0.389** −0.023 −0.079 0.065
Axis 3 −0.134 −0.105 0.162* 0.053 −0.028 0.176* 0.064 0.162* −0.062 0.126 0.112
Axis 4 0.372** −0.083 −0.316** −0.170* −0.220** −0.290** −0.286** −0.153 0.026 −0.010 −0.038
Axis 5 −0.039 −0.184* 0.076 0.086 0.046 0.155 0.100 0.051 −0.105 −0.034 −0.018
Zaire-96-I-16 Axis 1 −0.133 −0.357** 0.123 0.462** 0.220** 0.437** 0.188* 0.248** −0.113 −0.106 0.152
Axis 2 −0.342** 0.057 0.157* 0.211** 0.214** 0.292** 0.033 0.400** −0.041 −0.049 0.070
Axis 3 −0.054 −0.157 0.135 0.018 −0.042 0.141 0.056 0.114 −0.093 0.123 0.154
Axis 4 −0.272** −0.107 0.262** 0.181* 0.222** 0.321** 0.241** 0.161* −0.105 −0.063 0.014
Axis 5 0.286** 0.149 −0.171* −0.402** −0.368** −0.394* −0.221** −0.342** −0.147 0.088 0.099

In strains isolated from West Africa, the A3 content was highly negatively correlated with axis 1, whereas it was not correlated with axis 1 in strains from Central Africa. High positive correlation was observed between axis 1 and the G3 content (P < .01) for all West African strains. Similarly, G3 positively correlated with axis 1 in strains isolated from the United States. However, no correlation between G3 and axis 1 was observed in Central African and North African strains (Table 6). The CAI and the ENC also correlated highly with axis 1 in strains from West Africa and the United States (Table 6). That GC3 was significantly correlated with the first principal axis (ie, axis 1 in all strains) strongly suggests that nucleotide compositional constraints play an important role in shaping the SCUB across all MPXV genomes. Furthermore, high positive correlation with CAI (P < .01) revealed that the level of gene expression might also influence the SCUB across the examined MPXV genomes.

A correlation analysis between the dinucleotide content and the various COA axes did not reveal any true SCUB features, although some correlations did exist (Table 7). A cluster analysis of the pooled RSCU values of the PCG for each strain revealed 2 major clusters (Figure 4). More virulent Central African strains formed the upper cluster, and less virulent West African strains formed the lower cluster, indicating the presence of SCUB variations based on epidemic region and virulence.

Table 7.

Spearman correlation analysis between various correspondence analysis axes and dinucleotide contents.

Strains Axes AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
Congo-2003-358 Axis 1 0.117 0.164* 0.137 0.112 0.154 0.175* 0.183* 0.144 0.183* 0.176* 0.076 0.117 0.076 0.155 0.156 0.026
Axis 2 0.032 0.045 0.017 −0.013 0.074 0.079 0.007 0.148 0.050 0.141 0.081 0.069 −0.057 0.103 0.162* −0.028
Axis 3 0.088 0.086 0.041 0.094 0.057 0.102 0.053 0.132 0.088 0.033 0.142 0.107 0.078 0.113 0.139 0.088
Axis 4 −0.016 −0.010 −0.046 −0.027 −0.016 −0.077 −0.107 −0.099 −0.054 −0.168* −0.115 −0.069 −0.035 −0.083 −0.096 −0.051
Axis 5 −0.075 −0.038 −0.089 −0.053 −0.003 −0.104 −0.065 −0.090 −0.083 −0.018 −0.043 −0.025 −0.078 −0.069 −0.014 −0.092
COP-58 Axis 1 0.109 0.156 0.104 0.098 0.152 0.175* 0.186* 0.169* 0.177* 0.210** 0.082 0.128 0.058 0.164* 0.197* 0.013
Axis 2 0.105 0.136 0.076 0.076 0.161* 0.150 0.074 0.211* 0.126 0.176* 0.172* 0.149 0.034 0.175* 0.229** 0.042
Axis 3 0.172* 0.176* 0.106 0.175* 0.157 0.128 0.104 0.156 0.148 0.060 0.210** 0.181* 0.160* 0.161* 0.194* 0.144
Axis 4 0.009 0.043 0.015 0.010 0.068 −0.034 −0.036 −0.035 −0.007 −0.014 −0.040 0.004 0.004 −0.033 −0.002 −0.026
Axis 5 −0.081 0.038 −0.023 −0.028 0.088 0.088 0.048 0.008 0.007 0.136 0.086 0.048 −0.063 0.021 0.089 −0.088
DRC Yandongi-1985 Axis 1 −0.109 −0.118 −0.080 −0.167* −0.118 −0.123 −0.144 −0.093 −0.071 −0.055 −0.066 −0.116 −0.168* −0.128 −0.067 −0.173*
Axis 2 −0.017 −0.002 −0.018 −0.013 0.031 0.035 −0.024 −0.005 −0.005 0.039 0.035 0.041 −0.040 −0.005 0.068 −0.030
Axis 3 0.001 −0.020 0.008 −0.008 0.014 0.022 0.015 0.030 −0.011 0.038 0.046 0.027 −0.026 0.045 0.018 0.005
Axis 4 −0.117 −0.124 −0.117 −0.135 −0.132 −0.125 −0.138 −0.110 −0.107 −0.155 −0.128 −0.165* −0.142 −0.123 −0.156 −0.143
Axis 5 0.101 0.046 0.042 0.043 0.029 0.071 0.051 0.033 0.033 −0.003 0.031 0.070 0.064 0.050 0.013 0.008
Liberia-1970-184 Axis 1 0.100 0.129 0.102 0.088 0.135 0.144 0.166* 0.132 0.174* 0.188* 0.064 0.086 0.037 0.134 0.156 −0.011
Axis 2 0.022 0.053 −0.007 −0.014 0.077 0.071 −0.009 0.135 0.046 0.090 0.082 0.068 −0.057 0.101 0.155 −0.039
Axis 3 0.128 0.114 0.071 0.103 0.076 0.074 0.063 0.108 0.103 0.008 0.130 0.107 0.103 0.104 0.115 0.095
Axis 4 0.144 0.071 0.103 0.105 −0.012 0.065 0.114 0.124 0.104 0.012 0.042 0.050 0.145 0.092 0.021 0.150
Axis 5 0.094 0.044 0.032 0.088 0.015 −0.103 −0.045 −0.016 0.030 −0.113 −0.081 −0.006 0.110 −0.021 −0.038 0.070
MPXV-WRAIR7-61 Axis 1 0.109 0.157 0.105 0.099 0.153 0.176* 0.186* 0.169* 0.178* 0.211** 0.083 0.128 0.058 0.165* 0.197* 0.012
Axis 2 0.105 0.137 0.076 0.078 0.162* 0.150 0.074 0.211* 0.126 0.176* 0.172* 0.149 0.034 0.175* 0.230* 0.042
Axis 3 0.171* 0.175* 0.104 0.174 0.156 0.127 0.103 0.154 0.147 0.059 0.209 0.179* 0.160* 0.160* 0.193* 0.143
Axis 4 0.005 0.044 0.013 0.009 0.068 −0.033 −0.035 −0.037 −0.008 −0.011 −0.038 0.004 0.001 −0.036 −0.001 −0.031
Axis 5 −0.076 0.044 −0.016 −0.022 0.092 0.095 0.054 0.014 0.014 0.142 0.090 0.052 −0.056 0.027 0.094 −0.082
Sierra Leone Axis 1 0.102 0.147 0.098 0.095 0.145 0.158 0.179* 0.158 0.174* 0.198* 0.066 0.117 0.053 0.157 0.189* 0.011
Axis 2 0.084 0.117 0.066 0.055 0.135 0.139 0.057 0.196* 0.115 0.158 0.144 0.122 0.014 0.154 0.209** 0.028
Axis 3 0.162* 0.166* 0.091 0.168* 0.146 0.117 0.089 0.153 0.142 0.042 0.201** 0.171* 0.150 0.157 0.188* 0.142
Axis 4 −0.015 0.039 −0.003 0.001 0.069 −0.021 −0.022 −0.043 −0.011 0.013 −0.028 0.011 −0.015 −0.041 0.015 −0.046
Axis 5 −0.115 0.003 −0.040 −0.066 0.057 0.103 0.040 −0.014 −0.021 0.124 0.074 0.027 −0.104 0.007 0.066 −0.122
Sudan-2005-01 Axis 1 0.130 0.148 0.129 0.099 0.150 0.157 0.169* 0.149 0.180* 0.166* 0.077 0.108 0.049 0.167* 0.160* 0.026
Axis 2 0.048 0.088 0.048 0.048 0.107 0.077 0.028 0.140 0.081 0.131 0.097 0.105 0.013 0.101 0.180* −0.011
Axis 3 0.084 0.063 0.040 0.070 0.065 0.061 0.001 0.102 0.090 0.010 0.129 0.081 0.031 0.108 0.130 0.053
Axis 4 0.014 0.009 0.000 0.079 −0.035 −0.020 0.073 0.010 0.006 0.012 0.033 0.039 0.126 0.004 0.002 0.080
Axis 5 0.066 0.025 0.016 0.075 −0.048 0.002 0.008 −0.008 0.009 −0.128 −0.075 −0.021 0.129 −0.025 −0.076 0.086
USA-2003-039 Axis 1 0.115 0.156 0.116 0.103 0.143 0.176* 0.203** 0.156 0.193* 0.201** 0.088 0.113 0.062 0.157 0.175* 0.013
Axis 2 −0.008 0.033 −0.030 −0.031 0.055 0.050 −0.026 0.113 0.029 0.066 0.067 0.037 −0.077 0.081 0.132 −0.062
Axis 3 0.125 0.127 0.091 0.112 0.104 0.095 0.086 0.112 0.122 0.042 0.148 0.118 0.107 0.119 0.130 0.096
Axis 4 0.086 0.014 0.003 0.070 −0.008 −0.108 −0.081 −0.008 0.006 −0.122 −0.073 −0.014 0.085 −0.021 −0.018 0.063
Axis 5 −0.132 −0.061 −0.100 −0.093 0.001 −0.056 −0.095 −0.128 −0.094 −0.009 −0.037 −0.045 −0.129 −0.104 −0.019 −0.128
USA-2003-044 Axis 1 0.115 0.156 0.116 0.103 0.143 0.176* 0.203** 0.156 0.193* 0.201** 0.088 0.113 0.062 0.157 0.175* 0.013
Axis 2 −0.008 0.033 −0.030 −0.031 0.055 0.050 −0.026 0.113 0.029 0.066 0.067 0.037 −0.077 0.081 0.132 −0.062
Axis 3 0.125 0.127 0.091 0.112 0.104 0.095 0.086 0.112 0.122 0.042 0.148 0.118 0.107 0.119 0.130 0.096
Axis 4 0.086 0.014 0.003 0.070 −0.008 −0.108 −0.081 −0.008 0.006 −0.122 −0.073 −0.014 0.085 −0.021 −0.018 0.063
Axis 5 −0.132 −0.061 −0.100 −0.093 0.001 −0.056 −0.095 −0.128 −0.094 −0.009 −0.037 −0.045 −0.129 −0.104 −0.019 −0.128
V79-I-005 Axis 1 0.149 0.181* 0.152 0.145 0.187* 0.196* 0.214** 0.162* 0.205** 0.197* 0.108 0.142 0.106 0.192* 0.184* 0.064
Axis 2 0.018 0.043 0.010 −0.030 0.068 0.053 −0.017 0.127 0.046 0.111 0.066 0.050 −0.078 0.078 0.147 −0.063
Axis 3 0.143 0.141 0.092 0.160* 0.122 0.112 0.080 0.149 0.145 0.047 0.186* 0.149 0.134 0.146 0.181* 0.132
Axis 4 0.034 0.013 0.000 0.039 0.020 −0.112 −0.050 −0.057 −0.016 −0.077 −0.074 −0.010 0.045 −0.042 −0.036 0.013
Axis 5 −0.153 −0.069 −0.098 −0.125 0.012 −0.012 −0.074 −0.074 −0.084 0.048 −0.010 −0.065 −0.180* −0.046 −0.005 −0.157
Zaire-1979-005 Axis 1 0.136 0.175* 0.142 0.127 0.177* 0.197* 0.212** 0.151 0.208** 0.203** 0.099 0.125 0.086 0.176* 0.178* 0.048
Axis 2 0.039 0.066 0.033 −0.009 0.086 0.086 0.013 0.158 0.068 0.138 0.085 0.088 −0.056 0.107 0.178* −0.032
Axis 3 0.137 0.122 0.074 0.148 0.109 0.106 0.072 0.153 0.133 0.064 0.184* 0.134 0.120 0.151 0.182* 0.118
Axis 4 0.009 0.003 −0.019 −0.005 0.001 −0.074 −0.086 −0.086 −0.035 −0.140 −0.099 −0.045 −0.011 −0.069 −0.078 −0.038
Axis 5 0.010 0.010 −0.013 0.024 0.042 −0.073 −0.002 −0.033 −0.014 0.023 0.018 0.035 0.013 −0.011 0.031 0.002
Zaire-1979-005 Axis 1 0.146 0.185* 0.154 0.146 0.190* 0.208** 0.218** 0.168* 0.215** 0.205** 0.109 0.143 0.106 0.194* 0.192* 0.061
Axis 2 0.046 0.069 0.039 −0.005 0.090 0.087 0.017 0.161* 0.075 0.147 0.098 0.090 −0.052 0.110 0.183* −0.025
Axis 3 0.109 0.098 0.049 0.122 0.081 0.070 0.038 0.116 0.102 0.019 0.143 0.105 0.096 0.115 0.142 0.089
Axis 4 0.018 0.010 −0.011 −0.007 0.003 −0.063 −0.080 −0.074 −0.028 −0.137 −0.096 −0.045 −0.009 −0.062 −0.080 −0.029
Axis 5 −0.017 −0.013 0.007 −0.037 −0.045 0.070 −0.001 0.027 0.012 −0.025 −0.012 −0.038 −0.029 0.006 −0.032 −0.013
Zaire-96-I-16 Axis 1 0.137 0.176* 0.140 0.126 0.170* 0.185* 0.201** 0.166* 0.196* 0.186* 0.099 0.138 0.089 0.182* 0.180* 0.038
Axis 2 0.053 0.072 0.034 0.015 0.098 0.078 −0.003 0.159 0.074 0.130 0.103 0.084 −0.033 0.111 0.185* −0.020
Axis 3 0.149 0.155 0.107 0.167* 0.115 0.122 0.091 0.145 0.153 0.034 0.178* 0.152 0.152 0.137 0.170* 0.144
Axis 4 −0.014 −0.011 0.020 −0.030 −0.019 0.112 0.054 0.077 0.024 0.089 0.084 0.014 −0.029 0.056 0.036 0.004
Axis 5 0.163* 0.061 0.091 0.133 −0.020 −0.002 0.069 0.078 0.085 −0.054 0.007 0.051 0.184* 0.050 0.008 0.171*

Figure 4.

Figure 4.

Relationship between synonymous codon usage bias and virulence. The cluster analysis grouped more virulent strains into one major cluster (upper cluster) and less virulent strains into another cluster (lower cluster). CAI indicates codon adaptation index; ENC, effective number of codons.

Discussion

In this study, trends associated with the SCUB and with various factors influencing its diversification in selected MPXV genomes were investigated in detail. Studies related to the evolution of MPXV genomes are highly important as MPXVs can be used as potential bioterrorism agents.69 The mean ENC values of all examined MPXV genomes were greater than 40, indicating weak SCUB. The weak MPXV bias may be attributed to the ability of an MPXV to suppress antiviral CD4+ and CD8+ T-cell responses by inhibiting antiviral T-cell activation and inflammatory cytokine production without involving major histocompatibility complex molecules as this mechanism would reduce competition between the virus and the host, leading to efficient dissemination in the host.70 Monkeypox virus infection effectively inhibits the genes involved in stimulating innate immunity, thereby suppressing the expressions of proteins such as TNF-α, IL-1α/β, CCL5, and IL-6.71 Thus, these findings form the basis for the observed weak SCUB of the PCG across all examined MPXV genomes.

The SCUBs of all mammalian genomes are comparable, and all human viruses share this pattern of codon usage with the human host.72 This sharing reveals the need for human viruses to adapt their codon usage to the host if the infection is to be successful, whereas in other mammalian viruses, adaptation is not a prerequisite for infecting the host.72 Two possible scenarios, which form the basis for developing this phenomenon, are coevolution of humans and viruses infecting humans and/or evolution of a human genome from a viral genome.73

Significant intragenomic variations in the ENC (SD > 4.0) and the GC3 (SD > 4.0) values were observed in all the MPXV genomes used in this research. This heterogeneity in the base composition suggests that base compositional constraints play an important role in shaping SCUBs in MPXV genomes. A similar heterogeneity in the base composition was reported in herpesviruses belonging to the family Poxviridae.7 Strand-specific codon usage was observed in MPXV genomes, whereas in the host genome, tissue-specific codon usage was reported; that is, in humans, the SCUBs of brain-specific, liver-specific, uterus-specific, testis-specific, ovary-specific, and vulva-specific genes were different from one another.74 The SCUB in an MPXV may not be due to the GC composition as no correlations were observed between the GC3 and the cumulative GC values at the first and the second codon positions. However, AT richness is directly linked with SCUB as most preferred codons were A/T ending. Gene length was weakly correlated with different COA axes in some MPXV genomes, for example, the West African genomes COP-58, MPXV-VRAIR7-61, and Sierra Leone with axis 3, and the Central African genomes V79-I-005 and Zaire-1979-005 with axis 1. In addition, based on our analysis using axis 1 of the COA (the principal axis explaining most of the variations), we suggest that gene length may have a significant influence on SCUB only in Central African strains such as V79-I-005 and Zaire-1979-005.

All putative optimal codons were found to be A/T ending as MPXV genomes are AT rich and GC poor. In MPXV genomes, genome-specific preference toward a certain subset of codons was observed. Four codons (GGA, GGT, TAT, and TTT) were used as optimal codons in most MPXV genomes, although some exceptions occurred. The overrepresentation of AT contents and the underrepresentation of GC contents in the MPXV genomes seem to be the reason behind the use of A/T-ending codons, rather than natural selection, being preferred by the host. The weak codon bias of most genes across all examined MPXV genomes suggests that selection for translational accuracy and speed has less influence in dictating SCUB, revealing an inability to act as expression vectors, as reported in herpesviruses, another class of large double-stranded DNA viruses.7 However, the putative optimal codons identified in this study can be used for enhancing heterologous gene expression by increasing translational efficiency.7,75-78 Axis 1 of the COA and the CAI exhibited significant positive correlations in all examined MPXV genomes (P < .01), indicating that gene expression levels have profound influences on SCUB.

Although no dinucleotide contents were found to be in high correlation with axis 1 of the COA in any of the examined MPXV genomes, AT dinucleotides were overrepresented, whereas GC dinucleotides were underrepresented in all genomes; AT, GA, and TC dinucleotides were most biased as their ρ values were greater than 1.10. Because GC dinucleotides possess the highest thermodynamic stacking energy,23,79,80 viral genomes are always under selection pressure to decrease the GC dinucleotide frequency20,79,81 to enhance viral genome replication and transcription.79 Unmethylated GC in viral genomes stimulates immune responses in the host.82 Hence, to reduce antiviral responses from the host, viral genomes contain fewer GC dinucleotides.20 The Spearman rank correlation analysis revealed high positive correlations between C3 and GC3 and the principal axis (axis 1) of the COA and a significant negative correlation between T3 and axis 1. These correlations suggest that base compositional constraints play a crucial role in dictating SCUB. Axis 1 was not correlated with aromaticity in any MPXV, indicating that aromatic amino acids do not have a special role in framing SCUB, which further reveals that all amino acids contribute to SCUB.

Protein hydrophobicity scores were weakly correlated with axis 1 in Liberia-1970-184. Moreover, Central African and West African MPXV genomes are genetically distinct.47 Cluster analysis showed clustering of Central African strains and one North African MPXV strain (Sudan-2005-01) into an upper cluster with similar SCUBs, whereas other strains isolated from West Africa and the United States formed a lower cluster with similar SCUBs. However, the lower cluster revealed that the US-isolated MPXVs possessed similar SCUBs as they are in one clade close to Liberia-1970-184. Furthermore, Central African strains have been reported to be more virulent than West African strains.47 Based on these results, we are able to postulate that a strong association exists between MPXV strain virulence and SCUB as more virulent strains formed one cluster exhibiting similar SCUBs, and less virulent strains formed another. Thus, we conclude that mutational pressure due to base compositional constraints, level of gene expression, and codon selection for utilization of putative optimal codons are major factors influencing the SCUB in MPXV genomes. Consequently, a balance exists between mutational pressure acting on nucleotide sequences and amino acid selection in MPXV genomes, which is similar to the finding in a report on hepatitis E viruses.1 Generally, to conserve the protein sequence, purifying selection eliminates transversions at the third codon positions in 2-fold degenerate amino acids. Among the 20 amino acids, most synonymous positions are in 2-fold degenerate amino acids. Hence, selection may act on an amino acid level to eliminate the possibility of nonsynonymous transversions in 2-fold degenerate amino acids. In addition, viral genomes have naturally evolved with a mechanism to tackle and escape host antiviral responses,28 and according to the evolution rhetoric theory,83 this mechanism may also act as a major selection pressure in framing the SCUB in MPXV genomes, as reported in hepatitis A viral genomes.28 In this context, the multifactorial codon usage bias in MPXV genomes might have evolved as the result of a need to increase the efficiency of communication from the genome to the cell in transitional environments by keeping the message unmodified.28,83

Supplemental Material

EVB761368_Supplementary_Material_REV1 – Supplemental material for Evolution of Synonymous Codon Usage Bias in West African and Central African Strains of Monkeypox Virus

Supplemental material, EVB761368_Supplementary_Material_REV1 for Evolution of Synonymous Codon Usage Bias in West African and Central African Strains of Monkeypox Virus by Sudeesh Karumathil, Nimal T Raveendran, Doss Ganesh, Sampath Kumar NS, Rahul R Nair and Vijaya R Dirisala in Evolutionary Bioinformatics

Supplementary Material

Supplementary material

Acknowledgments

Language editing of this manuscript was provided by Edward J Button, PhD, CEO, Button and Associates, VA, USA. The first author (S.K.) would like to thank Dr TP Jayakrishnan (Director of Aushmath Biosciences) for providing support for the successful completion of this study.

Footnotes

**Funding:**The author(s) received no financial support for the research, authorship, and/or publication of this article.

**Declaration of conflicting interests:**The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author Contributions: RRN conceived the idea and designed the methodology. SK, NTR, and GD performed the analyses. SK, RRN, VRD, GD and SKNS interpreted the results. RRN wrote the manuscript. GD, SKNS and VRD offered critical comments. RRN and VRD developed the final draft. All authors read and approved the final manuscript.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

EVB761368_Supplementary_Material_REV1 – Supplemental material for Evolution of Synonymous Codon Usage Bias in West African and Central African Strains of Monkeypox Virus

Supplemental material, EVB761368_Supplementary_Material_REV1 for Evolution of Synonymous Codon Usage Bias in West African and Central African Strains of Monkeypox Virus by Sudeesh Karumathil, Nimal T Raveendran, Doss Ganesh, Sampath Kumar NS, Rahul R Nair and Vijaya R Dirisala in Evolutionary Bioinformatics

Supplementary material