Phylogenetic Network and Physicochemical Properties of Nonsynonymous Mutations in the Protein-Coding Genes of Human Mitochondrial DNA (original) (raw)

Journal Article

,

Search for other works by this author on:

Search for other works by this author on:

Published:

01 August 2003

Cite

Jukka S. Moilanen, Kari Majamaa, Phylogenetic Network and Physicochemical Properties of Nonsynonymous Mutations in the Protein-Coding Genes of Human Mitochondrial DNA, Molecular Biology and Evolution, Volume 20, Issue 8, August 2003, Pages 1195–1210, https://doi.org/10.1093/molbev/msg121
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Theories on molecular evolution predict that phylogenetically recent nonsynonymous mutations should contain more non-neutral amino acid replacements than ancient mutations. We analyzed 840 complete coding-region human mitochondrial DNA (mtDNA) sequences for nonsynonymous mutations and evaluated the mutations in terms of the physicochemical properties of the amino acids involved. We identified 465 distinct missense and 6 nonsense mutations. 48% of the amino acid replacements changed polarity, 26% size, 8% charge, 32% aliphaticity, 13% aromaticity, and 44% hydropathy. The reduced-median networks of the amino acid changes revealed relatively few differences between the major continent-specific haplogroups, but a high variation and highly starlike phylogenies within the haplogroups. Some 56% of the mutations were private, and 25% were homoplasic. Nonconservative changes were more common than expected among the private mutations but less common among the homoplasic mutations. The asymptotic maximum of the number of nonsynonymous mutations in European mtDNA was estimated to be 1,081. The results suggested that amino acid replacements in the periphery of phylogenetic networks are more deleterious than those in the central parts, indicating that purifying selection prevents the fixation of some alleles.

Introduction

The human mitochondrial genome (mtDNA) has genes coding for 2 rRNAs, 22 tRNAs, and 13 subunits of the respiratory chain complexes (MTND1, MTND2, MTND3, MTND4, MTND4L, MTND5, MTND6, MTATP6, MTATP8, MTCO1, MTCO2, MTCO3, and MTCYB). The protein-coding genes occupy 68% of the genome, and therefore a random nucleotide substitution has a high probability of being nonsynonymous and of leading to amino acid replacement. The neutral (Kimura 1968) and the nearly neutral (Ohta 1992) theories of molecular evolution predict that a certain proportion of nonsynonymous mutations will be neutral in effect, whereas the rest will be more or less deleterious. Several studies have demonstrated an excess of nonsynonymous mutations within species as compared with variation between species (Nachman et al. 1996; Rand and Kann 1996; Hasegawa, Cao, and Yang 1998; Nachman 1998; Fry 1999), and this finding has been interpreted as suggesting selection against mildly deleterious mutations, which prevents their fixation. Furthermore, direct measurements of the intergenerational substitution rate in human mtDNA have yielded rates higher than the estimates derived from phylogenetic analyses, suggesting that a significant fraction of mutations is removed by selection (Parsons et al. 1997).

The effects of nonsynonymous mutations depend both on the position of the amino acid replacement in the protein sequence and on the physicochemical properties of the amino acids involved. The genetic code appears to have evolved toward minimizing changes in physicochemical properties, which also affect the rate of nonsynonymous substitutions (Xia and Li 1998), suggesting that amino acid replacements resulting in a dissimilar amino acid are generally more deleterious than replacements resulting in an amino acid with similar properties. If the hypothesis of selection against mildly deleterious mutations is correct, phylogenetically recent mutations should contain more deleterious mutations and more dissimilar amino acid replacements than the older ones.

On the one hand, there are many examples of pathogenic single-nucleotide mutations in mtDNA. In addition, there is evidence that certain combinations of otherwise harmless polymorphisms in mitochondrial lineages may be associated with susceptibility to complex diseases (Wallace, Brown, and Lott 1999; Chinnery et al. 2000; Ruiz-Pesini et al. 2000), or with successful aging (De Benedictis et al. 1999). Their effect is most likely due to changes in the amino acid sequences of the protein-coding genes. On the other hand, several studies have failed to make the distinction between a pathogenic mutation and a haplotype-associated neutral polymorphism (Herrnstadt et al. 2002_a_). For these reasons, knowledge of the nature and phylogenetic relationships of amino acid haplotypes in the human mitochondrial genome is also important in clinical practice.

Although the number of complete mtDNA sequences available has grown exponentially (Finnilä et al. 2000; Ingman et al. 2000; Elson et al. 2001; Finnilä, Lehtonen, and Majamaa 2001; Maca-Meyer et al. 2001; Herrnstadt et al. 2002_a_), marking the start of mitochondrial population genomics (Hedges 2000), the functional consequences of the numerous variations in these sequences have not yet received much attention. We report here on the characterization of the nonsynonymous mutations in 840 complete human mitochondrial coding region sequences in terms of their physicochemical properties, and on the construction of a phylogenetic network for the amino acid sequences of all 13 protein-coding genes. Furthermore, the physicochemical properties of the amino acid replacements were compared according to their positions in the network to assess the hypothesis of selection against mildly deleterious replacements.

Materials and Methods

Alignment of mtDNA Sequences

Human mtDNA sequences were obtained from public sources (table 1). Historical or current reference sequences (G34, G35, G95) were excluded from the analyses, and sequences G90–G92 were excluded as they only demonstrate variation at positions 3243, 3423, 3426, and 11447 and are otherwise identical to G95, including its errors. The CCL2 HeLa sequence (M104) was excluded because of an unusually high rate of divergence (Herrnstadt et al. 2002_c_). The remaining 840 complete coding region sequences were compared with the MITOMAP reference sequence using the diffseq utility of the EMBOSS software package (Rice, Longden, and Bleasby 2000). The sequences were aligned with the reference sequence, and the nucleotide data of all sequences with position information were stored in a relational SQL database. Comparison of the stored sequences with the original ones by means of the diffseq utility did not reveal any errors. The nucleotide sequences of the protein-coding genes were extracted according to the MITOMAP mtDNA function locations, and noncoding sections between the genes were ignored. The SQL query language and the programming language Perl were used for sequence alignment and subsequence extraction.

Identification of Nonsynonymous Mutations

Amino acid translations of protein-coding genes were obtained by the methods provided by the Bio::PrimarySeqI interface of Bioperl (available at http://www.bioperl.org), and nonsynonymous changes were subsequently identified. The neighboring changes of each mtDNA variant were examined in order to identify multiple nucleotide substitutions within a single codon. Observed nonsense mutations were verified manually from the original DNA sequences. Comparison of the amino acid translations of 325 mtDNA mutations with those found in MITOMAP led to the identification of six discrepancies, whereupon manual examination of the sequences indicated that all were errors in MITOMAP.

Construction of Reduced-Median Networks

Six reduced-median networks (Bandelt et al. 1995) were constructed from all the nonsynonymous mtDNA mutations in the 840 sequences to infer the protein-level phylogeny in African, Asian, and European haplogroup clusters. All the coding region variation (Finnilä, Lehtonen, and Majamaa 2001; Herrnstadt et al. 2002_a_) was used to assign each sequence to one of the six networks. Sequence assignment was verified by comparing the identified mutations against those displayed in the published networks. Because the actual content of the Finnish and the corrected MitoKor sequences was used, the comparison led to the identification of an unpublished error in the haplogroup H skeleton network (Herrnstadt et al. 2002_a_), in which two sequences had been marked with the wrong identifiers (45 and 530), which belonged to two haplogroup J sequences. Furthermore, we found that the transition at nucleotide position 14097 in two sequences (F162 and F163) was incorrectly shown as 14096 (Finnilä, Lehtonen, and Majamaa 2001). Ten GenBank sequences could not be unambiguously assigned to any of the major African, Asian, or European haplogroups and were included in the Asian haplogroup cluster network, since they were of Asian or Pacific origin. The sequences were converted to a binary data matrix by considering transitions and transversions as distinct entities (Bandelt et al. 1995). Reduced median networks were constructed from the binary data using Network 2.1 (available at http://fluxus-engineering.com). All binary characters were weighted equally, including transitions and transversions, and the default reduction threshold r = 2 (Bandelt, Macaulay, and Richards 2000) was used in the analysis.

Characterization of Amino Acid Replacements

The amino acids involved in the nonsynonymous mutations were characterized in terms of six physicochemical properties relevant to protein evolution (Xia and Li 1998), namely polarity, size, isoelectric point, aliphatic and aromatic nature, and hydropathy. We defined amino acids with polarity ≥8.6 (Grantham 1974) as polar, amino acids with a side chain molecular volume ≤61Å3 (Grantham 1974) as small, amino acids with an isoelectric point ≥7.59 as positively charged, amino acids with an isoelectric point ≤3.22 as negatively charged (Alff-Steinberger 1969), amino acids with aliphatic side chains (I, L, and V) as aliphatic, amino acids with aromatic rings (F, H, W, and Y) as aromatic, and amino acids with a negative Kyte-Doolittle hydropathy index (Kyte and Doolittle 1982) as hydrophilic, and those with a positive index as hydrophobic. Amino acid replacements were then assigned to categories according to changes in these physicochemical properties. Furthermore, each replacement was defined as conservative or nonconservative according to the BLOSUM62 matrix used for sequence comparisons (Henikoff and Henikoff 1992), nonconservative replacements having a negative value in the matrix (Cargill et al. 1999).

The distribution of mutations within genes was assessed by identifying hydrophobic and hydrophilic regions of genes. These regions were defined by comparing the average hydropathy of each 19-amino acid segment to the mean of all segments for the respective gene. The average Kyte-Doolittle hydropathy index of 19 neighboring amino acids was calculated for each amino acid position according to the MITOMAP reference sequence and by reference to the pepinfo utility of the EMBOSS package. We used this segment size because it has been shown to be a good value for identifying transmembrane regions (Kyte and Doolittle 1982).

Contingency Table Analysis

The nonsynonymous mutations were counted as differences relative to the reference sequence and without correcting for multiple hits; that is, each mutation was counted once regardless of the number of its occurrences in the networks. This approach results in an underestimate of the true number of mutations that have occurred during human mtDNA evolution, but despite this disadvantage, the method was used here to avoid the confounding effect of the expected high degree of homoplasy. Private mutations occupying the peripheral tips of phylogeny were inferred from alleles that were present in only one sequence, whereas homoplasic mutations were inferred from the presence of a mutation in >1 lineages in the networks. Since each mutation was counted only once, it was possible to classify each amino acid replacement at a given sequence position unambiguously as private/nonprivate and homoplasic/nonhomoplasic. Alternatively, it could have been possible to infer each occurrence of a homoplasic mutation from the phylogeny and to count the occurrences separately, but with this approach the frequencies of mutation categories among homoplasic mutations would have been inflated by the subset of mutations that were highly homoplasic, and would also have depended on the method and parameters used in the phylogenetic reconstruction.

The frequencies of the mutation categories among private amino acid replacements, homoplasic replacements, and replacements in hydrophobic regions were compared with those among the remaining ones using the Fisher's exact test as implemented in R 1.4.1 (Ihaka and Gentleman 1996; available at http://cran.r-project.org/), which computes the exact value of P and the conditional maximum likelihood estimate of the odds ratio. This test was used because small cell frequencies were expected, and the two-tailed test was used because no particular direction of differences was assumed a priori. Sample estimates of the odds ratio were similar to the reported conditional maximum likelihood estimates as differences were observed only in second to fourth decimal positions. Inflated type I error rate due to multiple comparisons was assessed by obtaining the adjusted significance level (_a_c) from 1 − (1−_a_c)n = 0.05, where n is the number of comparisons and 0.05 is the significance level corresponding to 95% confidence limit.

Rate of Detection of New Mutations in European Sequences

An estimate for the cumulative rate of discovery of new nonsynonymous mutations in the 647 European sequences was derived by taking 500 random permutations and examining the sequences contained in each consecutively, calculating for each sequence the cumulative sum of mutations that had not occurred in the previous sequences. The sequences were sampled without replacement. The arithmetic mean of the cumulative sums of the 500 permutations was plotted, and statistical models having an asymptotic maximum were fitted to this mean curve by the nonlinear least squares method to provide an estimate of the total number of nonsynonymous mutations in European mtDNA and to predict the number of sequences required for identifying most of the mutations.

Results

Mutations in the Protein-Coding Genes of mtDNA

A total of 988 synonymous, 465 nonsynonymous missense, and 6 nonsense mutations were identified in the protein-coding genes of 840 complete coding region mtDNA sequences, when the mutations were counted as differences relative to the reference sequence. One-third (32%) of all mutations were nonsynonymous (table 2). MTATP6 and MTATP8 had the highest proportion of nonsynonymous mutations from all mutations (52.5% and 51.3%, respectively), whereas MTND4L, MTCO2, and MTND3 had the lowest (19.2%, 22.2%, and 22.2%, respectively). Several sequences were detected in which two mutations co-occurred in one codon including 4769A > G and 4767A > G in sequences M175, M222, M385, and M409; 8574C > T and 8572G > A in M533; 8703C > T and 8701A > G in G42; 10400C > T and 10398A > G in 53 sequences; and 14767T > C and 14766C > T in M455. All these pairs consisted of a nonsynonymous and a synonymous mutation. Furthermore, each amino acid replacement resulted from a specific nucleotide substitution, as we found no instances where two different nonsynonymous mutations had caused an identical amino acid change. The most common amino acid replacement was an A-T change in either direction, followed in decreasing frequency by I-V, I-T, and F-L (fig. 1).

Nonsense Mutations

Two mutations in the initiator codon of the MTND1 gene were identified. 3308T > C was present in 10 sequences (G20, G66, M158, M165, M192, M215, M293, M379, M386, and M514). This mutation has been identified in the chimpanzee (Arnason, Xu, and Gullberg 1996), but in humans it was originally reported in a patient with bilateral striatal necrosis and MELAS (Campos et al. 1997). However, experimental data have suggested that 3308T > C does not affect the synthesis of the MTND1 polypeptide and that any methionine codon close to the 5′ end of a mitochondrial mRNA may serve as a translation initiator (Fernandez-Moreno et al. 2000). Our phylogenetic analysis indicated that the mutation represents a polymorphism in the African haplogroup L1b, as suggested previously (Rocha et al. 1999), but it was also present in another branch of the African network, indicating that it has arisen more than once. 3308T > A, resulting in a codon for lysine, was found in the sequence M170, which also harbored 3312dupC, a single-nucleotide duplication in the second codon. The sequence of the first three codons was therefore AAACCCCATG instead of ATACCCATG. A third initiator codon mutation was observed in MTND5, where 12338T > C (M339) led to a codon for threonine. Methionine occupies position 3 in MTND1 and MTND5 and probably serves as a translation initiator in the presence of [3308T > A; 3312dupC] and 12338T > C.

Eight sequences (F45, F46, F47, F48, F49, G46, M6, and M426) harbored 7444G > A in the stop codon of MTCO1, leading to the translation of KQK, which has been suggested to increase the penetrance of primary mutations in Leber's hereditary optic neuropathy (Brown et al. 1995). A single-nucleotide deletion 6577delG in the middle of MTCO1 in G36 led to G225E and caused a premature termination of translation with an open reading frame for 28 amino acids (EETPFYTNTYSDFSVTLKFMFLSYQASE). The sequence G36 also harbored 12192G > A, which has been reported to be associated with cardiomyopathy (Shin et al. 2000; MIM 590040), although this variant is a polymorphism in the Finnish population (Finnilä, Lehtonen, and Majamaa 2001). Assuming that the frameshift mutation 6577delG (G225fsX28) is not an error in the published sequence, the mutation might provide an alternative explanation for cardiomyopathy in G36.

Reduced-Median Networks of Nonsynonymous Mutations

Reduced-median networks of Asian and African haplogroups and the European haplogroup clusters IWX, KU, JT, and HV were constructed using information on all nonsynonymous mutations in the 840 sequences and by placing the African sequence G37 as an outgroup. The African, Asian, and European major haplogroups were found to be closely related in their amino acid sequences (fig. 2). The center of the Asian network consisted of a reticulation formed by MTATP6:A20T, MTCYB:S172N, and MTATP6:T59A (fig. 3). The central node of haplogroup L (fig. 4) and the common root of haplogroups D, E, and M were found to belong to this reticulation and had an identical amino acid sequence. Only two amino acid changes separated haplogroups C and Z from L. The central nodes of the European haplogroup clusters IWX (fig. 5) and KU (fig. 6) and the Asian haplogroup B1 had an identical amino acid sequence, which was separated from haplogroup L by MTATP6:T59A and MTND3:T114A. Additional amino acid replacements separated the other European haplogroup clusters JT (fig. 7) and HV (fig. 8) and the Asian haplogroups A and B2 from this node. The major haplogroups in all the ethnic groups were clearly discernible. Amino acid sequences formed highly starlike phylogenies with major center nodes in all the haplogroup clusters. Thirteen of the 20 amino acid replacements that distinguished the major haplogroups were homoplasic (fig. 2) and 18 of 20 were conservative. MTND4:P140S, and MTCYB:T7I were homoplasic and nonconservative.

Characterization of Amino Acid Replacements

Half of the amino acid replacements (48%) involved a change in polarity, and hydropathy was changed in 44% of the replacements. Only four replacements (MTND1:L289Q, MTATP6:V21E, MTND5:Q546L, and MTND5:L555Q) involved changes between the seven most hydrophilic amino acids and the seven most hydrophobic ones, defined as a change of at least –1.7 to +1.7 or vice versa on the Kyte-Doolittle scale. Changes in polarity and hydropathy were followed in decreasing frequency by changes in aliphaticity (32%), size (26%), aromaticity (13%), and charge (8.3%). Of the amino acid replacements, 133 (28%) were nonconservative according to the BLOSUM62 matrix (table 3).

The distribution of amino acid replacements among the 13 protein-coding genes suggested that the mutations were not distributed randomly across or between genes (fig. 9). The mutations were quite evenly distributed in MTATP6, MTATP8, MTCO3, MTND3, MTND4L, and MTND6, whereas each of the remaining genes had at least one region which appeared relatively conserved as compared to the other regions of the gene. Apparent mutational hotspots, or nonconstrained regions, were identified in both hydrophobic and hydrophilic regions. An excess of amino acid replacements in MTND6 (22/29, 76%) were private (P = 0.03, Fisher's exact test), but no comparable deviations from the expected proportion of 56% were identified among the other genes.

Contingency Table Analysis

Of the replacements, 261 (56%) were private, whereas 207 replacements (44%) were present in more than one sequence. Nonconservative changes were more common among the private replacements than among the nonprivate ones (P = 0.005, Fisher's exact test). Changes in size, charge, aliphaticity, and aromaticity were also more common among the private replacements than among the nonprivate ones, but these differences were not significant (table 4).

Of the 468 amino acid replacements, 116 (25%) were homoplasic, indicating that they had arisen multiple times during human evolution. Nonconservative changes were less common than expected among the homoplasic replacements (P = 0.002). A change from an aliphatic to a nonaliphatic amino acid or vice versa occurred in 25 homoplasic replacements (22%) and in 126 (36%) of the non-homoplasic ones (P = 0.004), while an aromatic amino acid was replaced by a nonaromatic one or vice versa in 8 homoplasic replacements (7%) and in 51 (14%) of the non-homoplasic ones (P = 0.04). Replacements between small and large amino acids were also less common in the homoplasic group (P = 0.04). The other types of changes did not differ in frequency between the homoplasic and non-homoplasic replacements (table 4).

The mean hydropathy indices were 1.006 for MTATP6, –0.401 for MTATP8, 0.725 for MTCO1, 0.432 for MTCO2, 0.411 for MTCO3, 0.673 for MTCYB, 0.662 for MTND1, 0.596 for MTND2, 1.075 for MTND3, 0.705 for MTND4, 1.376 for MTND4L, 0.563 for MTND5, and 1.036 for MTND6. The average hydropathy calculated for 19 neighboring amino acids was not defined for 37 amino acid replacements that were near either end of the subunit. 239 (55%) of the remaining 431 replacements were among the 1,843 positions located in regions that were more hydrophobic than the mean, whereas 192 (45%) were among the 1,712 positions located in the hydrophilic regions. The amino acid replacements in hydrophobic regions altered the amino acid charge less often than those in hydrophilic regions and were more often conservative, whereas replacements between aliphatic and nonaliphatic amino acids were more frequent among those in hydrophobic regions than among those in hydrophilic regions (table 4). Amino acid content between the hydrophobic and hydrophilic regions differed, because 103/381 (27%) of the charged amino acids (D, E, H, K, R) and 697/1,065 (65%) of the aliphatic amino acids (I, L, V) in the reference sequence were found to be located in hydrophobic regions of genes.

Rate of Detection of New Mutations in 647 European Sequences

Because private replacements were common among the 840 sequences, we set out to estimate the total number of nonsynonymous mutations that may be present in the population. The rate of detection of new mutations was calculated from 500 permutations of the 647 European sequences harboring 301 distinct nonsynonymous mutations. The Weibull growth curve provided the best fit with the mean of the cumulative sums (fig. 10). The asymptotic maximum of the number of nonsynonymous mutations in European mtDNA was estimated to be 1,081 (standard error 7.3). The 301 mutations detected in 647 European sequences therefore encompass approximately 28% of all nonsynonymous mutations that may be present in European populations. Assuming that mutation identification continues to follow the estimated model, 12,200 sequences will be required to identify 90% of the 1,081 mutations and 18,100 sequences to identify 95%. Similar predictions for non-European sequences were not feasible because of the small number of Asian and African sequences known.

Discussion

We found 1,459 distinct mutations in the protein-coding genes of 840 complete human mtDNA coding region sequences, when the mutations were counted as differences relative to the reference sequence. One-third of the mutations were nonsynonymous. The frequency of changes in the physicochemical properties of the respective amino acids was high, suggesting that such changes are quite common in human mtDNA and that evaluation of the pathogenicity of an amino acid replacement should not rely solely on these structural considerations.

The differences between the frequencies of the particular types of changes are inherent consequences of differences in the frequencies of individual amino acid replacements (fig. 1), which in turn depend on several factors, including sequence composition (Naylor, Collins, and Brown 1995), variable substitution rates and selective constraints among sites and substitutions (Xia 1998; Tourasse and Li 2000; McClellan and McCracken 2001), and the tendency of the genetic code to prefer substitutions between similar amino acids over dissimilar ones (Haig and Hurst 1991). The mitochondrial genome differs from nuclear genes in several properties, including amino acid composition (Naylor, Collins, and Brown 1995) and genetic code (Barrell, Bankier, and Drouin 1979; Knight, Landweber, and Yarus 2001). The proportion of nonconservative amino acid replacements out of all replacements (28.4%) was nevertheless not appreciably different from that in 106 nuclear genes (Cargill et al. 1999), where 36% were nonconservative (odds ratio 1.4, 95% confidence interval 0.95–2.03, P = 0.07; Fisher's exact test).

The reduced-median networks of the nonsynonymous mutations provided a comprehensive description of the intraspecies protein-level phylogeny in humans. The phylogenetic signal of synonymous mutations was lost, because only the nonsynonymous mutations were considered, but the various haplogroups were still discernible. Disregarding synonymous mutations may even improve the accuracy of a phylogenetic network (Naylor and Brown 1997). Many branches in the full networks (Finnilä, Lehtonen, and Majamaa 2001; Herrnstadt et al. 2002_a_) contain at least one nonsynonymous mutation, and the branches were also shown clearly in the present networks. Exceptions to this pattern included the root of haplogroups H and V, which was a single node, because all the nucleotide differences between these haplogroups were synonymous. Furthermore, the central nodes of several major haplogroups (U2 and B1; L and the root of D, E, and M) had identical amino acid sequences.

The major haplogroups were found to be closely related in their amino acid sequences, with relatively few replacements separating their center nodes, but the variation within haplogroups was high, resulting in starlike phylogenies. More than half of the observed amino acid changes were present in only one sequence, giving rise to rare amino acid haplotypes. This finding is analogous to earlier observations of an excess of nonsynonymous mutations within species, as compared with variation between species (Nachman et al. 1996; Rand and Kann 1996; Hasegawa, Cao, and Yang 1998; Nachman 1998; Fry 1999). This is usually assumed to result from purifying selection against slightly deleterious alleles, which prevents their fixation. Such mildly deleterious mutations should reside in the periphery of phylogenetic networks. This hypothesis was supported by the present comparison of private replacements and nonprivate ones, which revealed that nonconservative changes are more frequent among the private replacements.

The frequency of homoplasic mutations in human mtDNA has been found to be high (Finnilä, Lehtonen, and Majamaa 2001; Herrnstadt et al. 2002_a_). We found here that homoplasy among nonsynonymous mutations is also common, as one-fourth of all amino acid replacements were homoplasic. Interestingly, the homoplasic replacements included fewer nonconservative replacements and replacements involving small, aliphatic, and aromatic amino acids. This observation suggests that physicochemical properties determine, at least in part, whether amino acid replacements are removed by selection or whether they persist long enough to be observed in separate lineages in the phylogeny—that is, whether they become homoplasic. Homoplasic replacements are therefore not confined exclusively to nonconstrained amino acid positions. Most ancient amino acid replacements distinguishing the major haplogroups were observed in other parts of the phylogeny as well, and all but two were conservative, which is consistent with their neutrality.

Our findings support the assumption that amino acid replacements resulting in dissimilar amino acid properties are generally more deleterious than replacements resulting in similar properties. However, the effects of nonsynonymous mutations depend also on the position of the amino acid replacement in the protein sequence. Nonsynonymous mutations were found to occupy both hydrophobic and hydrophilic regions of genes, when the regions were defined according to the average hydropathy for the respective gene. Mutations in hydrophobic regions involved less changes in charge and more changes in aliphaticity than expected and were less often nonconservative than mutations in hydrophilic regions; but such differences are confounded by the differences in the amino acid composition of the respective regions. Even if it is accepted that the hydrophobic regions may be generally more conserved than hydrophilic regions (Naylor, Collins, and Brown 1995), the distribution of amino acid changes among genes (fig. 9) suggested that not all hydrophobic regions are alike. Several amino acid replacements were identified in the fifth, eleventh, and twelfth hydrophobic domains of MTCO1, for example, but none were identified in the seventh or eighth.

Although it may eventually be possible to determine the degree and nature of the constraints on each region, and perhaps even on each position in mtDNA, the distribution of nonsynonymous mutations along the genes is still relatively sparse, suggesting that even larger numbers of sequences and polymorphisms will be required for detailed identification and characterization of functionally constrained and nonconstrained regions in human mtDNA. The cumulative rate of detection of new nonsynonymous mutations in European sequences was found to follow the Weibull growth curve model, the estimated parameters suggesting that 19× the current number of mtDNA sequences will be required to identify 90% of the nonsynonymous mutations that may be present in European populations.

In conclusion, the results of this descriptive analysis of 471 nonsynonymous mutations showed that nonconservative changes were more common among private replacements and nonhomoplasic replacements than among nonprivate and homoplasic ones, and that a similar trend was evident in certain physicochemical characteristics of replacements, suggesting a role for selection against these in the evolution of the protein-coding genes of mtDNA. Selection presumably varies between genes, functional domains, and sites, however, and even more sequences will be required for reliable mapping of constrained and nonconstrained regions. Assessment of the pathogenicity of an amino acid change should not rely on single structural considerations, because changes in physicochemical properties such as hydropathy, size, charge, and polarity are common in the mtDNA-encoded proteins in human. The entire mtDNA genome should be screened to exclude other mutations when a particular variant is suspected of being pathogenic, and a population-genetic approach should be adopted to recognize neutral variants that are present in populations. The reduced-median networks and the tabulation of physicochemical properties of amino acid changes presented here should therefore also have practical applications.

Supplementary Material

The complete table of nonsynonymous mutations in the 840 sequences, their amino acid translations, and their physicochemical properties is provided as online Supplementary Material. Links to updated versions of the table may appear at http://cc.oulu.fi/∼jukkamoi/mtres/.

Wolfgang Stephan, Associate Editor

Matrix of amino acid replacements for the 840 mtDNA sequences. The area of each circle is proportional to the frequency of distinct replacements between the respective amino acids. For reference, the number of replacements between T and A is 98, and that between K and N is 1

Fig. 1.

Matrix of amino acid replacements for the 840 mtDNA sequences. The area of each circle is proportional to the frequency of distinct replacements between the respective amino acids. For reference, the number of replacements between T and A is 98, and that between K and N is 1

Collapsed network of the continent-specific major haplogroup clusters. The central haplotype from each cluster is shown. Each dashed rectangle indicates the figure containing the expanded network for the respective cluster. The mutations are shown as amino acid changes relative to the MITOMAP reference sequence (refseq). Outgroup, sequence G37. +, a homoplasic mutation

Fig. 2.

Collapsed network of the continent-specific major haplogroup clusters. The central haplotype from each cluster is shown. Each dashed rectangle indicates the figure containing the expanded network for the respective cluster. The mutations are shown as amino acid changes relative to the MITOMAP reference sequence (refseq). Outgroup, sequence G37. +, a homoplasic mutation

Reduced median network of nonsynonymous mutations in Asian haplogroup clusters. The mutations are shown as amino acid changes relative to the MITOMAP reference sequence (refseq). Outgroup, sequence G37. Squares, links to the networks of other haplogroup clusters. @, a back mutation; +, a homoplasic mutation; CM, cardiomyopathy. The weights of all the characters in the analysis were equal. Some branch lengths have been distorted to increase legibility. Sequence identifiers are shown inside the nodes. F, Finnish sequences; M, MitoKor sequences. The origin of each GenBank sequence, denoted with the letter G, is given next to the sequence identifier. PNG, Papua-New Guinea

Fig. 3.

Reduced median network of nonsynonymous mutations in Asian haplogroup clusters. The mutations are shown as amino acid changes relative to the MITOMAP reference sequence (refseq). Outgroup, sequence G37. Squares, links to the networks of other haplogroup clusters. @, a back mutation; +, a homoplasic mutation; CM, cardiomyopathy. The weights of all the characters in the analysis were equal. Some branch lengths have been distorted to increase legibility. Sequence identifiers are shown inside the nodes. F, Finnish sequences; M, MitoKor sequences. The origin of each GenBank sequence, denoted with the letter G, is given next to the sequence identifier. PNG, Papua-New Guinea

Reduced median network of nonsynonymous mutations in the African haplogroup cluster. See the legend of figure 3 for explanation of symbols

Fig. 4.

Reduced median network of nonsynonymous mutations in the African haplogroup cluster. See the legend of figure 3 for explanation of symbols

Reduced median network of nonsynonymous mutations in the European haplogroup cluster IWX. See the legend of figure 3 for explanation of symbols

Fig. 5.

Reduced median network of nonsynonymous mutations in the European haplogroup cluster IWX. See the legend of figure 3 for explanation of symbols

Reduced median network of nonsynonymous mutations in the European haplogroup cluster KU. See the legend of figure 3 for explanation of symbols

Fig. 6.

Reduced median network of nonsynonymous mutations in the European haplogroup cluster KU. See the legend of figure 3 for explanation of symbols

Reduced median network of nonsynonymous mutations in the European haplogroup cluster JT. See the legend of figure 3 for explanation of symbols

Fig. 7.

Reduced median network of nonsynonymous mutations in the European haplogroup cluster JT. See the legend of figure 3 for explanation of symbols

Reduced median network of nonsynonymous mutations in the European haplogroup cluster HV. See the legend of figure 3 for explanation of symbols. Inset, additional nodes with private amino acid changes and connecting only to the center of the network (“HV”)

Fig. 8.

Reduced median network of nonsynonymous mutations in the European haplogroup cluster HV. See the legend of figure 3 for explanation of symbols. Inset, additional nodes with private amino acid changes and connecting only to the center of the network (“HV”)

Distribution of amino acid replacements and hydropathic regions in the 13 mtDNA-encoded proteins. The x-axis shows the amino acid position, and the y-axis shows a common scale for hydropathy and amino acid dissimilarity. Curve, the average Kyte-Doolittle hydropathy index for 19 neighboring amino acids; positive values indicate hydrophobic regions. ×, private replacement; +, homoplasic replacement; °, other replacement. Negative values for amino acid replacements indicate nonconservative changes and positive values indicate conservative changes according to the BLOSUM62 matrix. Histogram, the number of distinct amino acid changes within a window of 50 amino acid positions plotted at the median position of the window. One unit on the y-axis scale corresponds to 10 amino acid changes

Fig. 9.

Distribution of amino acid replacements and hydropathic regions in the 13 mtDNA-encoded proteins. The x-axis shows the amino acid position, and the y-axis shows a common scale for hydropathy and amino acid dissimilarity. Curve, the average Kyte-Doolittle hydropathy index for 19 neighboring amino acids; positive values indicate hydrophobic regions. ×, private replacement; +, homoplasic replacement; °, other replacement. Negative values for amino acid replacements indicate nonconservative changes and positive values indicate conservative changes according to the BLOSUM62 matrix. Histogram, the number of distinct amino acid changes within a window of 50 amino acid positions plotted at the median position of the window. One unit on the y-axis scale corresponds to 10 amino acid changes

Identification of new nonsynonymous mutations in 647 European sequences. 500 permutations of the sequence order (Index) and the cumulative sum of mutations not observed in previous sequences in each permutation were obtained. Solid curve, the mean of the 500 cumulative sum curves. The largest and lowest value of nonsynonymous mutations at the corresponding index observed in any permutation are shown above and below the mean curve. Dashed curve, the Weibull growth curve y = α – β exp[–exp(δ) xϵ] fitted to the mean curve by the nonlinear least squares method and using all 647 data points. The fitted curve with parameters α = 1080.84 (SE 7.3), β = 1080.31 (SE 7.3), δ = –5.435 (SE 0.0033), and ϵ = 0.6664 (SE 0.00075) is superimposed almost perfectly on the mean curve (residual sum of squares = 16.86). α indicates the asymptotic maximum of the Weibull growth curve

Fig. 10.

Identification of new nonsynonymous mutations in 647 European sequences. 500 permutations of the sequence order (Index) and the cumulative sum of mutations not observed in previous sequences in each permutation were obtained. Solid curve, the mean of the 500 cumulative sum curves. The largest and lowest value of nonsynonymous mutations at the corresponding index observed in any permutation are shown above and below the mean curve. Dashed curve, the Weibull growth curve y = α – β exp[–exp(δ) _x_ϵ] fitted to the mean curve by the nonlinear least squares method and using all 647 data points. The fitted curve with parameters α = 1080.84 (SE 7.3), β = 1080.31 (SE 7.3), δ = –5.435 (SE 0.0033), and ϵ = 0.6664 (SE 0.00075) is superimposed almost perfectly on the mean curve (residual sum of squares = 16.86). α indicates the asymptotic maximum of the Weibull growth curve

Table 1

Available Complete Mitochondrial DNA Coding Region Sequences.

a Sequence identifiers used in this study.

b Sequence identifiers in public files.

d Identical to G37, has 41 differences relative to mitomapRCRS.

e Sequences with 3243A > G, 3423T > G and 3426A > G but otherwise similar to G95.

Table 1

Available Complete Mitochondrial DNA Coding Region Sequences.

a Sequence identifiers used in this study.

b Sequence identifiers in public files.

d Identical to G37, has 41 differences relative to mitomapRCRS.

e Sequences with 3243A > G, 3423T > G and 3426A > G but otherwise similar to G95.

Table 2

Synonymous and Nonsynonymous Mutations in the 840 mtDNA Sequences, by Genes.

Gene Lengtha Synonymousb Nonsynonymousc Total
MTND1 956 78 38 116
MTND2 1,042 91 36 127
MTCO1 1,542 122 40 162
MTCO2 684 63 18 81
MTATP8 207 19 20 39
MTATP6 681 57 63 120
MTCO3 784 67 32 99
MTND3 346 35 10 45
MTND4L 297 21 5 26
MTND4 1,378 124 38 162
MTND5 1,812 158 75 233
MTND6 525 54 29 83
MTCYB 1,141 105 70 175
Total 11,341 988 471 1,459
Gene Lengtha Synonymousb Nonsynonymousc Total
MTND1 956 78 38 116
MTND2 1,042 91 36 127
MTCO1 1,542 122 40 162
MTCO2 684 63 18 81
MTATP8 207 19 20 39
MTATP6 681 57 63 120
MTCO3 784 67 32 99
MTND3 346 35 10 45
MTND4L 297 21 5 26
MTND4 1,378 124 38 162
MTND5 1,812 158 75 233
MTND6 525 54 29 83
MTCYB 1,141 105 70 175
Total 11,341 988 471 1,459

Note.—MTND, NADH dehydrogenase; MTCO, cytochrome c oxidase; MTATP, ATP synthase; MTCYB, cytochrome b.

a Gene length in nucleotides.

b Number of synonymous mutations.

c Number of nonsynonymous mutations. Mutations were counted as differences relative to the reference sequence and without correcting for multiple hits. Sums over all genes do not equal the totals due to overlapping regions between MTATP6 and MTATP8, MTATP6 and MTCO3, and MTND4 and MTND4L.

Table 2

Synonymous and Nonsynonymous Mutations in the 840 mtDNA Sequences, by Genes.

Gene Lengtha Synonymousb Nonsynonymousc Total
MTND1 956 78 38 116
MTND2 1,042 91 36 127
MTCO1 1,542 122 40 162
MTCO2 684 63 18 81
MTATP8 207 19 20 39
MTATP6 681 57 63 120
MTCO3 784 67 32 99
MTND3 346 35 10 45
MTND4L 297 21 5 26
MTND4 1,378 124 38 162
MTND5 1,812 158 75 233
MTND6 525 54 29 83
MTCYB 1,141 105 70 175
Total 11,341 988 471 1,459
Gene Lengtha Synonymousb Nonsynonymousc Total
MTND1 956 78 38 116
MTND2 1,042 91 36 127
MTCO1 1,542 122 40 162
MTCO2 684 63 18 81
MTATP8 207 19 20 39
MTATP6 681 57 63 120
MTCO3 784 67 32 99
MTND3 346 35 10 45
MTND4L 297 21 5 26
MTND4 1,378 124 38 162
MTND5 1,812 158 75 233
MTND6 525 54 29 83
MTCYB 1,141 105 70 175
Total 11,341 988 471 1,459

Note.—MTND, NADH dehydrogenase; MTCO, cytochrome c oxidase; MTATP, ATP synthase; MTCYB, cytochrome b.

a Gene length in nucleotides.

b Number of synonymous mutations.

c Number of nonsynonymous mutations. Mutations were counted as differences relative to the reference sequence and without correcting for multiple hits. Sums over all genes do not equal the totals due to overlapping regions between MTATP6 and MTATP8, MTATP6 and MTCO3, and MTND4 and MTND4L.

Table 3

Properties of the 468 Amino Acid Replacements Detected in the 840 mtDNA Sequences.

Category of Change Na Direction of changeb Na
Polarity 224 (.48) Polar→nonpolar 104 (.22)
Nonpolar→polar 120 (.26)
Size 123 (.26) Small→large 53 (.11)
Large→small 70 (.15)
Hydropathy 207 (.44) Hydrophobic→hydrophilic 112 (.24)
Hydrophilic→hydrophobic 95 (.20)
Charge 39 (.08) Neutral→positive 10 (.02)
Neutral→negative 13 (.03)
Positive→neutral 8 (.02)
Positive→negative 0 (0)
Negative→neutral 6 (.01)
Negative→positive 2 (.004)
Aliphaticity 151 (.32) Aliphatic→nonaliphatic 82 (.18)
Nonaliphatic→aliphatic 69 (.15)
Aromaticity 59 (.13) Aromatic→nonaromatic 36 (.08)
Nonaromatic→aromatic 23 (.05)
Nonconservativec 133 (.28)
Privated 261 (.56)
Homoplasice 116 (.25)
Hydrophobic locationf 239 (.51)
Hydrophilic locationg 192 (.41)
Category of Change Na Direction of changeb Na
Polarity 224 (.48) Polar→nonpolar 104 (.22)
Nonpolar→polar 120 (.26)
Size 123 (.26) Small→large 53 (.11)
Large→small 70 (.15)
Hydropathy 207 (.44) Hydrophobic→hydrophilic 112 (.24)
Hydrophilic→hydrophobic 95 (.20)
Charge 39 (.08) Neutral→positive 10 (.02)
Neutral→negative 13 (.03)
Positive→neutral 8 (.02)
Positive→negative 0 (0)
Negative→neutral 6 (.01)
Negative→positive 2 (.004)
Aliphaticity 151 (.32) Aliphatic→nonaliphatic 82 (.18)
Nonaliphatic→aliphatic 69 (.15)
Aromaticity 59 (.13) Aromatic→nonaromatic 36 (.08)
Nonaromatic→aromatic 23 (.05)
Nonconservativec 133 (.28)
Privated 261 (.56)
Homoplasice 116 (.25)
Hydrophobic locationf 239 (.51)
Hydrophilic locationg 192 (.41)

a Number of mutations in the category. Proportion from the total number of mutations is shown in parentheses.

b Direction is shown relative to the reference sequence.

c Mutation with a negative value in the BLOSUM62 matrix.

d Mutation observed in only one sequence.

e Mutation observed in ≥2 lineages.

f Average hydropathy index of 19 neighboring amino acids is higher than the mean for the respective gene.

g Average hydropathy index is lower than the mean of the respective gene.

Table 3

Properties of the 468 Amino Acid Replacements Detected in the 840 mtDNA Sequences.

Category of Change Na Direction of changeb Na
Polarity 224 (.48) Polar→nonpolar 104 (.22)
Nonpolar→polar 120 (.26)
Size 123 (.26) Small→large 53 (.11)
Large→small 70 (.15)
Hydropathy 207 (.44) Hydrophobic→hydrophilic 112 (.24)
Hydrophilic→hydrophobic 95 (.20)
Charge 39 (.08) Neutral→positive 10 (.02)
Neutral→negative 13 (.03)
Positive→neutral 8 (.02)
Positive→negative 0 (0)
Negative→neutral 6 (.01)
Negative→positive 2 (.004)
Aliphaticity 151 (.32) Aliphatic→nonaliphatic 82 (.18)
Nonaliphatic→aliphatic 69 (.15)
Aromaticity 59 (.13) Aromatic→nonaromatic 36 (.08)
Nonaromatic→aromatic 23 (.05)
Nonconservativec 133 (.28)
Privated 261 (.56)
Homoplasice 116 (.25)
Hydrophobic locationf 239 (.51)
Hydrophilic locationg 192 (.41)
Category of Change Na Direction of changeb Na
Polarity 224 (.48) Polar→nonpolar 104 (.22)
Nonpolar→polar 120 (.26)
Size 123 (.26) Small→large 53 (.11)
Large→small 70 (.15)
Hydropathy 207 (.44) Hydrophobic→hydrophilic 112 (.24)
Hydrophilic→hydrophobic 95 (.20)
Charge 39 (.08) Neutral→positive 10 (.02)
Neutral→negative 13 (.03)
Positive→neutral 8 (.02)
Positive→negative 0 (0)
Negative→neutral 6 (.01)
Negative→positive 2 (.004)
Aliphaticity 151 (.32) Aliphatic→nonaliphatic 82 (.18)
Nonaliphatic→aliphatic 69 (.15)
Aromaticity 59 (.13) Aromatic→nonaromatic 36 (.08)
Nonaromatic→aromatic 23 (.05)
Nonconservativec 133 (.28)
Privated 261 (.56)
Homoplasice 116 (.25)
Hydrophobic locationf 239 (.51)
Hydrophilic locationg 192 (.41)

a Number of mutations in the category. Proportion from the total number of mutations is shown in parentheses.

b Direction is shown relative to the reference sequence.

c Mutation with a negative value in the BLOSUM62 matrix.

d Mutation observed in only one sequence.

e Mutation observed in ≥2 lineages.

f Average hydropathy index of 19 neighboring amino acids is higher than the mean for the respective gene.

g Average hydropathy index is lower than the mean of the respective gene.

Table 4

Comparisons of Categories of the 468 Amino Acid Replacements.

Category of Changea Private N = 261 Nonprivate N = 207 ORc 95% CId P Valuee
Polarity 122b (.47) 102b (.49) 0.90 0.62–1.32 0.64
Size 74 (.28) 49 (.24) 1.28 0.82–1.99 0.29
Hydropathy 110 (.42) 97 (.47) 0.83 0.56–1.21 0.35
Charge 25 (.10) 14 (.07) 1.46 0.71–3.13 0.31
Aliphaticity 93 (.36) 58 (.28) 1.42 0.94–2.16 0.09
Aromaticity 37 (.14) 22 (.11) 1.39 0.77–2.56 0.27
Nonconservative 88 (.34) 45 (.22) 1.83 1.18–2.85 0.005*
Hydrophobic location 141 (.54) 98 (.47) 1.24 0.83–1.86 0.28
Homoplasic N = 116 Nonhomoplasic N = 352
Polarity 61 (.53) 163 (.46) 1.29 0.83–2.00 0.28
Size 22 (.19) 101 (.29) 0.58 0.33–1.00 0.04*
Hydropathy 57 (.49) 150 (.43) 1.30 0.83–2.03 0.24
Charge 11 (.09) 28 (.08) 1.21 0.53–2.62 0.57
Aliphaticity 25 (.22) 126 (.36) 0.49 0.29–0.82 0.004*
Aromaticity 8 (.07) 51 (.14) 0.44 0.17–0.97 0.04*
Nonconservative 20 (.17) 113 (.32) 0.44 0.25–0.76 0.002**
Hydrophobic location 51 (.44) 188 (.53) 0.73 0.46–1.17 0.17
Hydrophobic Location N = 239 Hydrophilic Location N = 192
Polarity 117 (.49) 90 (.47) 1.09 0.73–1.62 0.70
Size 64 (.27) 50 (.26) 1.04 0.66–1.64 0.91
Hydropathy 110 (.46) 84 (.44) 1.10 0.73–1.64 0.70
Charge 8 (.03) 26 (.14) 0.22 0.08–0.52 0.0001**
Aliphaticity 94 (.39) 46 (.24) 2.05 1.32–3.21 0.0009**
Aromaticity 29 (.12) 27 (.14) 0.84 0.46–1.55 0.57
Nonconservative 56 (.23) 64 (.33) 0.61 0.39–0.96 0.02*
Category of Changea Private N = 261 Nonprivate N = 207 ORc 95% CId P Valuee
Polarity 122b (.47) 102b (.49) 0.90 0.62–1.32 0.64
Size 74 (.28) 49 (.24) 1.28 0.82–1.99 0.29
Hydropathy 110 (.42) 97 (.47) 0.83 0.56–1.21 0.35
Charge 25 (.10) 14 (.07) 1.46 0.71–3.13 0.31
Aliphaticity 93 (.36) 58 (.28) 1.42 0.94–2.16 0.09
Aromaticity 37 (.14) 22 (.11) 1.39 0.77–2.56 0.27
Nonconservative 88 (.34) 45 (.22) 1.83 1.18–2.85 0.005*
Hydrophobic location 141 (.54) 98 (.47) 1.24 0.83–1.86 0.28
Homoplasic N = 116 Nonhomoplasic N = 352
Polarity 61 (.53) 163 (.46) 1.29 0.83–2.00 0.28
Size 22 (.19) 101 (.29) 0.58 0.33–1.00 0.04*
Hydropathy 57 (.49) 150 (.43) 1.30 0.83–2.03 0.24
Charge 11 (.09) 28 (.08) 1.21 0.53–2.62 0.57
Aliphaticity 25 (.22) 126 (.36) 0.49 0.29–0.82 0.004*
Aromaticity 8 (.07) 51 (.14) 0.44 0.17–0.97 0.04*
Nonconservative 20 (.17) 113 (.32) 0.44 0.25–0.76 0.002**
Hydrophobic location 51 (.44) 188 (.53) 0.73 0.46–1.17 0.17
Hydrophobic Location N = 239 Hydrophilic Location N = 192
Polarity 117 (.49) 90 (.47) 1.09 0.73–1.62 0.70
Size 64 (.27) 50 (.26) 1.04 0.66–1.64 0.91
Hydropathy 110 (.46) 84 (.44) 1.10 0.73–1.64 0.70
Charge 8 (.03) 26 (.14) 0.22 0.08–0.52 0.0001**
Aliphaticity 94 (.39) 46 (.24) 2.05 1.32–3.21 0.0009**
Aromaticity 29 (.12) 27 (.14) 0.84 0.46–1.55 0.57
Nonconservative 56 (.23) 64 (.33) 0.61 0.39–0.96 0.02*

a See the footnote to table 3 for explanation of categories.

b Number of amino acid replacements of the respective type. Proportions are shown in parentheses.

c Odds ratio.

d 95% confidence interval for odds ratio.

e Probability of the null hypothesis that OR is 1 (Fisher's exact test).

* P < 0.05. ** P < 0.00223, which corresponds to the 95% significance level adjusted for multiple comparisons.

Table 4

Comparisons of Categories of the 468 Amino Acid Replacements.

Category of Changea Private N = 261 Nonprivate N = 207 ORc 95% CId P Valuee
Polarity 122b (.47) 102b (.49) 0.90 0.62–1.32 0.64
Size 74 (.28) 49 (.24) 1.28 0.82–1.99 0.29
Hydropathy 110 (.42) 97 (.47) 0.83 0.56–1.21 0.35
Charge 25 (.10) 14 (.07) 1.46 0.71–3.13 0.31
Aliphaticity 93 (.36) 58 (.28) 1.42 0.94–2.16 0.09
Aromaticity 37 (.14) 22 (.11) 1.39 0.77–2.56 0.27
Nonconservative 88 (.34) 45 (.22) 1.83 1.18–2.85 0.005*
Hydrophobic location 141 (.54) 98 (.47) 1.24 0.83–1.86 0.28
Homoplasic N = 116 Nonhomoplasic N = 352
Polarity 61 (.53) 163 (.46) 1.29 0.83–2.00 0.28
Size 22 (.19) 101 (.29) 0.58 0.33–1.00 0.04*
Hydropathy 57 (.49) 150 (.43) 1.30 0.83–2.03 0.24
Charge 11 (.09) 28 (.08) 1.21 0.53–2.62 0.57
Aliphaticity 25 (.22) 126 (.36) 0.49 0.29–0.82 0.004*
Aromaticity 8 (.07) 51 (.14) 0.44 0.17–0.97 0.04*
Nonconservative 20 (.17) 113 (.32) 0.44 0.25–0.76 0.002**
Hydrophobic location 51 (.44) 188 (.53) 0.73 0.46–1.17 0.17
Hydrophobic Location N = 239 Hydrophilic Location N = 192
Polarity 117 (.49) 90 (.47) 1.09 0.73–1.62 0.70
Size 64 (.27) 50 (.26) 1.04 0.66–1.64 0.91
Hydropathy 110 (.46) 84 (.44) 1.10 0.73–1.64 0.70
Charge 8 (.03) 26 (.14) 0.22 0.08–0.52 0.0001**
Aliphaticity 94 (.39) 46 (.24) 2.05 1.32–3.21 0.0009**
Aromaticity 29 (.12) 27 (.14) 0.84 0.46–1.55 0.57
Nonconservative 56 (.23) 64 (.33) 0.61 0.39–0.96 0.02*
Category of Changea Private N = 261 Nonprivate N = 207 ORc 95% CId P Valuee
Polarity 122b (.47) 102b (.49) 0.90 0.62–1.32 0.64
Size 74 (.28) 49 (.24) 1.28 0.82–1.99 0.29
Hydropathy 110 (.42) 97 (.47) 0.83 0.56–1.21 0.35
Charge 25 (.10) 14 (.07) 1.46 0.71–3.13 0.31
Aliphaticity 93 (.36) 58 (.28) 1.42 0.94–2.16 0.09
Aromaticity 37 (.14) 22 (.11) 1.39 0.77–2.56 0.27
Nonconservative 88 (.34) 45 (.22) 1.83 1.18–2.85 0.005*
Hydrophobic location 141 (.54) 98 (.47) 1.24 0.83–1.86 0.28
Homoplasic N = 116 Nonhomoplasic N = 352
Polarity 61 (.53) 163 (.46) 1.29 0.83–2.00 0.28
Size 22 (.19) 101 (.29) 0.58 0.33–1.00 0.04*
Hydropathy 57 (.49) 150 (.43) 1.30 0.83–2.03 0.24
Charge 11 (.09) 28 (.08) 1.21 0.53–2.62 0.57
Aliphaticity 25 (.22) 126 (.36) 0.49 0.29–0.82 0.004*
Aromaticity 8 (.07) 51 (.14) 0.44 0.17–0.97 0.04*
Nonconservative 20 (.17) 113 (.32) 0.44 0.25–0.76 0.002**
Hydrophobic location 51 (.44) 188 (.53) 0.73 0.46–1.17 0.17
Hydrophobic Location N = 239 Hydrophilic Location N = 192
Polarity 117 (.49) 90 (.47) 1.09 0.73–1.62 0.70
Size 64 (.27) 50 (.26) 1.04 0.66–1.64 0.91
Hydropathy 110 (.46) 84 (.44) 1.10 0.73–1.64 0.70
Charge 8 (.03) 26 (.14) 0.22 0.08–0.52 0.0001**
Aliphaticity 94 (.39) 46 (.24) 2.05 1.32–3.21 0.0009**
Aromaticity 29 (.12) 27 (.14) 0.84 0.46–1.55 0.57
Nonconservative 56 (.23) 64 (.33) 0.61 0.39–0.96 0.02*

a See the footnote to table 3 for explanation of categories.

b Number of amino acid replacements of the respective type. Proportions are shown in parentheses.

c Odds ratio.

d 95% confidence interval for odds ratio.

e Probability of the null hypothesis that OR is 1 (Fisher's exact test).

* P < 0.05. ** P < 0.00223, which corresponds to the 95% significance level adjusted for multiple comparisons.

This work was supported by grants from the Sigrid Juselius Foundation, the Maud Kuistila Memorial Foundation, and the Research Council for Health, Academy of Finland.

Literature Cited

Alff-Steinberger, C.

1969

. The genetic code and error transmission.

Proc. Natl. Acad. Sci. USA

64

:

584

-591.

Anderson, S., A. T. Bankier, and B. G. Barrell, et al. (14 co-authors).

1981

. Sequence and organization of the human mitochondrial genome.

Nature

290

:

457

-465.

Andrews, R. M., I. Kubacka, P. F. Chinnery, R. N. Lightowlers, D. M. Turnbull, and N. Howell.

1999

. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.

Nat. Genet.

23

:

147

.

Arnason, U., X. Xu, and A. Gullberg.

1996

. Comparison between the complete mitochondrial DNA sequences of Homo and the common chimpanzee based on nonchimeric sequences.

J. Mol. Evol.

42

:

145

-152.

Bandelt, H. J., P. Forster, B. C. Sykes, and M. B. Richards.

1995

. Mitochondrial portraits of human populations using median networks.

Genetics

141

:

743

-753.

Bandelt, H. J., V. Macaulay, and M. Richards.

2000

. Median networks: speedy construction and greedy reduction, one simulation, and two case studies from human mtDNA.

Mol. Phylogenet. Evol.

16

:

8

-28.

Barrell, B. G., A. T. Bankier, and J. Drouin.

1979

. A different genetic code in human mitochondria.

Nature

282

:

189

-194.

Brown, M. D., A. Torroni, C. L. Reckord, and D. C. Wallace.

1995

. Phylogenetic analysis of Leber's hereditary optic neuropathy mitochondrial DNA's indicates multiple independent occurrences of the common mutations.

Hum. Mutat.

6

:

311

-325.

Campos, Y., M. A. Martin, J. C. Rubio, M. C. Gutierrez del Olmo, A. Cabello, and J. Arenas.

1997

. Bilateral striatal necrosis and MELAS associated with a new T3308C mutation in the mitochondrial ND1 gene.

Biochem. Biophys. Res. Commun.

238

:

323

-325.

Cargill, M., D. Altshuler, and J. Ireland, et al. (17 co-authors).

1999

. Characterization of single-nucleotide polymorphisms in coding regions of human genes.

Nat. Genet.

22

:

231

-238.

Chinnery, P. F., G. A. Taylor, N. Howell, R. M. Andrews, C. M. Morris, R. W. Taylor, I. G. McKeith, R. H. Perry, J. A. Edwardson, and D. M. Turnbull.

2000

. Mitochondrial DNA haplogroups and susceptibility to AD and dementia with Lewy bodies.

Neurology

55

:

302

-304.

De Benedictis, G., G. Rose, and G. Carrieri, et al. (13 co-authors).

1999

. Mitochondrial DNA inherited variants are associated with successful aging and longevity in humans.

FASEB J.

13

:

1532

-1536.

Elson, J. L., R. M. Andrews, P. F. Chinnery, R. N. Lightowlers, D. M. Turnbull, and N. Howell.

2001

. Analysis of European mtDNAs for recombination.

Am. J. Hum. Genet.

68

:

145

-153.

Fernandez-Moreno, M. A., B. Bornstein, Y. Campos, J. Arenas, and R. Garesse.

2000

. The pathogenic role of point mutations affecting the translational initiation codon of mitochondrial genes.

Mol. Genet. Metab.

70

:

238

-240.

Finnilä, S., I. E. Hassinen, L. Ala-Kokko, and K. Majamaa.

2000

. Phylogenetic network of the mtDNA haplogroup U in Northern Finland based on sequence analysis of the complete coding region by conformation-sensitive gel electrophoresis.

Am. J. Hum. Genet.

66

:

1017

-1026.

Finnilä, S., M. S. Lehtonen, and K. Majamaa.

2001

. Phylogenetic network for European mtDNA.

Am. J. Hum. Genet.

68

:

1475

-1484.

Fry, A. J.

1999

. Mildly deleterious mutations in avian mitochondrial DNA: evidence from neutrality tests.

Evolution

53

:

1617

-1620.

Grantham, R.

1974

. Amino acid difference formula to help explain protein evolution.

Science

185

:

862

-864.

Haig, D., and L. D. Hurst.

1991

. A quantitative measure of error minimization in the genetic code.

J. Mol. Evol.

33

:

412

-417.

Hasegawa, M., Y. Cao, and Z. Yang.

1998

. Preponderance of slightly deleterious polymorphisms in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species.

Mol. Biol. Evol.

15

:

1499

-1505.

Hedges, S. B.

2000

. A start for population genomics.

Nature

408

:

652

-653.

Henikoff, S., and J. G. Henikoff.

1992

. Amino acid substitution matrices from protein blocks.

Proc. Natl. Acad. Sci. USA

89

:

10915

-10919.

Herrnstadt, C., J. L. Elson, and E. Fahy, et al. (11 co-authors).

2002

. Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.

Am. J. Hum. Genet.

70

:

1152

-1171.

Herrnstadt, C., J. L. Elson, and E. Fahy, et al. (11 co-authors).

2002

. Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups [erratum].

Am. J. Hum. Genet.

71

:

448

-449.

Herrnstadt, C., G. Preston, R. Andrews, P. Chinnery, R. N. Lightowlers, D. M. Turnbull, I. Kubacka, and N. Howell.

2002

. A high frequency of mtDNA polymorphisms in HeLa cell sublines.

Mutat. Res.

501

:

19

-28.

Horai, S., K. Hayasaka, R. Kondo, K. Tsugane, and N. Takahata.

1995

. Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs.

Proc. Natl. Acad. Sci. USA

92

:

532

-536.

Ihaka, R., and R. Gentleman.

1996

. R: a language for data analysis and graphics.

J. Comp. Graph. Stat.

5

:

299

-314.

Ingman, M., H. Kaessmann, S. Pääbo, and U. Gyllensten.

2000

. Mitochondrial genome variation and the origin of modern humans.

Nature

408

:

708

-713.

Kimura, M.

1968

. Evolutionary rate at the molecular level.

Nature

217

:

624

-626.

Knight, R. D., L. F. Landweber, and M. Yarus.

2001

. How mitochondria redefine the code.

J. Mol. Evol.

53

:

299

-313.

Kyte, J., and R. F. Doolittle.

1982

. A simple method for displaying the hydropathic character of a protein.

J. Mol. Biol.

157

:

105

-132.

Maca-Meyer, N., A. M. Gonzáles, J. M. Larruga, C. Flores, and V. M. Cabrera.

2001

. Major genomic mitochondrial lineages delineate early human expansions.

BMC Genetics

2

:

13

.

McClellan, D. A., and K. G. McCracken.

2001

. Estimating the influence of selection on the variable amino acid sites of the cytochrome b protein functional domains.

Mol. Biol. Evol.

18

:

917

-925.

Nachman, M. W.

1998

. Deleterious mutations in animal mitochondrial DNA.

Genetica

102–103

:

61

-69.

Nachman, M. W., W. M. Brown, M. Stoneking, and C. F. Aquadro.

1996

. Nonneutral mitochondrial DNA variation in humans and chimpanzees.

Genetics

142

:

953

-963.

Naylor, G. J., and W. M. Brown.

1997

. Structural biology and phylogenetic estimation.

Nature

388

:

527

-528.

Naylor, G. J., T. M. Collins, and W. M. Brown.

1995

. Hydrophobicity and phylogeny.

Nature

373

:

565

-566.

Ohta, T.

1992

. The nearly neutral theory of molecular evolution.

Annu. Rev. Ecol. Syst.

23

:

263

-286.

Parsons, T. J., D. S. Muniec, and K. Sullivan, et al. (11 co-authors).

1997

. A high observed substitution rate in the human mitochondrial DNA control region.

Nat. Genet.

15

:

363

-368.

Rand, D. M., and L. M. Kann.

1996

. Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans.

Mol. Biol. Evol.

13

:

735

-748.

Rice, P., I. Longden, and A. Bleasby.

2000

. EMBOSS: the European molecular biology open software suite.

Trends. Genet.

16

:

276

-277.

Rocha, H., C. Flores, Y. Campos, J. Arenas, L. Vilarinho, F. M. Santorelli, and A. Torroni.

1999

. About the “pathological” role of the mtDNA T3308C mutation….

Am. J. Hum. Genet.

65

:

1457

-1459.

Ruiz-Pesini, E., A. C. Lapena, and C. Diez-Sanchez, et al. (11 co-authors).

2000

. Human mtDNA haplogroups associated with high or reduced spermatozoa motility.

Am. J. Hum. Genet.

67

:

682

-696.

Shin, W. S., M. Tanaka, J. Suzuki, C. Hemmi, and T. Toyo-oka.

2000

. A novel homoplasmic mutation in mtDNA with a single evolutionary origin as a risk factor for cardiomyopathy.

Am. J. Hum. Genet.

67

:

1617

-1620.

Tourasse, N. J., and W. H. Li.

2000

. Selective constraints, amino acid composition, and the rate of protein evolution.

Mol. Biol. Evol.

17

:

656

-664.

Wallace, D. C., M. D. Brown, and M. T. Lott.

1999

. Mitochondrial DNA variation in human evolution and disease.

Gene

238

:

211

-230.

Xia, X.

1998

. The rate heterogeneity of nonsynonymous substitutions in mammalian mitochondrial genes.

Mol. Biol. Evol.

15

:

336

-344.

Xia, X., and W. H. Li.

1998

. What amino acid properties affect protein evolution?

J. Mol. Evol.

47

:

557

-564.

Society for Molecular Biology and Evolution

Citations

Views

Altmetric

Metrics

Total Views 1,226

629 Pageviews

597 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 1
December 2016 6
February 2017 3
March 2017 3
April 2017 1
May 2017 1
June 2017 5
July 2017 1
August 2017 6
September 2017 5
October 2017 5
November 2017 4
December 2017 19
January 2018 21
February 2018 26
March 2018 22
April 2018 16
May 2018 61
June 2018 46
July 2018 16
August 2018 16
September 2018 15
October 2018 15
November 2018 19
December 2018 17
January 2019 14
February 2019 11
March 2019 29
April 2019 34
May 2019 33
June 2019 14
July 2019 26
August 2019 19
September 2019 25
October 2019 24
November 2019 22
December 2019 18
January 2020 16
February 2020 21
March 2020 16
April 2020 21
May 2020 7
June 2020 17
July 2020 7
August 2020 15
September 2020 16
October 2020 8
November 2020 12
December 2020 9
January 2021 3
February 2021 9
March 2021 34
April 2021 19
May 2021 22
June 2021 7
July 2021 9
August 2021 11
September 2021 10
October 2021 6
November 2021 8
December 2021 2
January 2022 4
February 2022 9
March 2022 4
April 2022 9
May 2022 10
June 2022 7
July 2022 24
August 2022 6
September 2022 6
October 2022 9
November 2022 4
December 2022 15
January 2023 5
February 2023 5
March 2023 7
April 2023 4
May 2023 2
June 2023 6
August 2023 5
September 2023 8
October 2023 7
November 2023 18
December 2023 6
January 2024 22
February 2024 20
March 2024 15
April 2024 3
May 2024 9
June 2024 18
July 2024 7
August 2024 5
September 2024 3
October 2024 19
November 2024 1

Citations

47 Web of Science

×

Email alerts

Email alerts

Citing articles via

More from Oxford Academic