HIV-1 Protease Mutations and Protease Inhibitor Cross-Resistance (original) (raw)

Abstract

The effects of many protease inhibitor (PI)-selected mutations on the susceptibility to individual PIs are unknown. We analyzed in vitro susceptibility test results on 2,725 HIV-1 protease isolates. More than 2,400 isolates had been tested for susceptibility to fosamprenavir, indinavir, nelfinavir, and saquinavir; 2,130 isolates had been tested for susceptibility to lopinavir; 1,644 isolates had been tested for susceptibility to atazanavir; 1,265 isolates had been tested for susceptibility to tipranavir; and 642 isolates had been tested for susceptibility to darunavir. We applied least-angle regression (LARS) to the 200 most common mutations in the data set and identified a set of 46 mutations associated with decreased PI susceptibility of which 40 were not polymorphic in the eight most common HIV-1 group M subtypes. We then used least-squares regression to ascertain the relative contribution of each of these 46 mutations. The median number of mutations associated with decreased susceptibility to each PI was 28 (range, 19 to 32), and the median number of mutations associated with increased susceptibility to each PI was 2.5 (range, 1 to 8). Of the mutations with the greatest effect on PI susceptibility, I84AV was associated with decreased susceptibility to eight PIs; V32I, G48V, I54ALMSTV, V82F, and L90M were associated with decreased susceptibility to six to seven PIs; I47A, G48M, I50V, L76V, V82ST, and N88S were associated with decreased susceptibility to four to five PIs; and D30N, I50L, and V82AL were associated with decreased susceptibility to fewer than four PIs. This study underscores the greater impact of nonpolymorphic mutations compared with polymorphic mutations on decreased PI susceptibility and provides a comprehensive quantitative assessment of the effects of individual mutations on susceptibility to the eight clinically available PIs.


HIV-1 protease inhibitors (PIs) are the mainstays of salvage therapy. As the number of licensed PIs has increased, it has become important to identify whether and how each PI-selected mutation affects cross-resistance to each of the other PIs. In a previous study (41), we previously applied several data mining approaches to assess associations between HIV-1 protease genotype and phenotype test results for the first-generation PIs: amprenavir (APV), the active component of the prodrug fosamprenavir (FPV), atazanavir (ATV), indinavir (IDV), lopinavir (LPV), nelfinavir (NFV), and saquinavir (SQV). Specifically, we used a data set containing about 300 susceptibility results for ATV, 500 for LPV, and 800 for FPV, IDV, NFV, and SQV (41). We used a predefined list of PI-selected mutations in this previous study to reduce the number of independent variables influencing PI susceptibility.

Here we analyze a data set that contains between 1,600 and 2,600 isolates tested for susceptibility to the first-generation PIs and about 600 and 1,200 isolates tested for susceptibility to darunavir (DRV) and tipranavir (TPV), respectively. We use two regression methods in tandem: one to identify genotypic predictors of decreased susceptibility and one to quantify the impact of specific mutations on decreased PI susceptibility. We then integrate our findings with previously published data linking protease mutations to decreases in susceptibility to specific PIs.

MATERIALS AND METHODS

HIV-1 isolates.

We analyzed HIV-1 isolates in the HIV Drug Resistance Database (HIVDB) (40) for which cell-based in vitro PI susceptibility testing had been performed by the PhenoSense (Monogram, South San Francisco, CA) (38) or Antivirogram (Virco, Mechelin, Belgium) (19) assays. Seventy percent of the test results were obtained from published studies, and 30% were obtained from virus samples collected at Stanford University or one of several collaborating clinics. Of the test results from published studies, about one-half were obtained in clinical trials performed by Boehringer Ingelheim (3), Tibotec (26), Gilead Sciences (30), and Abbott Laboratories (23). The Stanford University Human Subjects Committee approved this study.

Drug susceptibility results were expressed as the fold change in susceptibility, defined as the ratio of the 50% inhibitory concentration (IC50) of the tested isolate to that of a standard wild-type control isolate. We used the STAR method to assign an HIV-1 subtype to isolates for which nucleotide sequences were available (32). We obtained subtypes for the remaining isolates from the studies from which the phenotype data were obtained or, for the collaborating clinics, from the phenotype report. Mutations were defined as differences from the consensus subtype B amino acid sequence (http://hivdb.stanford.edu/pages/documentPage/consensus_amino_acid_sequences.html). Nonpolymorphic mutations were defined as mutations that occurred at a prevalence of ≤0.5% in the eight most common HIV-1 subtypes (4).

To minimize bias that would result from overrepresenting highly similar viruses, we excluded 200 redundant viruses—defined as viruses obtained from the same person with the same mutations at the following PI resistance positions: positions 30, 32, 46, 47, 54, 76, 82, 84, and 90 (20). Because the presence of mixtures at influential drug resistance positions may confound genotype-phenotype correlations, we excluded 472 viruses with sequences containing electrophoretic mixtures at these protease positions.

Identification of mutations associated with decreased PI susceptibility.

We used two independent regression analyses in tandem: least-angle regression (LARS) was performed to identify protease mutations associated with reduced susceptibility to at least one PI, and least-squares regression was performed to quantify the contribution of protease mutations to reduced PI susceptibility. Protease mutations for which the mean regression coefficient determined by LARS was ≥0.5 or ≤−0.5 and more than 3 standard deviations above or below zero in 10 repeated runs of 5-fold cross-validation were considered significant predictors. All of the significant predictors, including the ones that were significant for only one PI, were retained for our second regression analysis using least-squares regression.

LARS is a model selection algorithm for selecting a parsimonious set for the efficient prediction of a response variable from a large collection of possible explanatory variables (16). We applied LARS to the 200 most common mutations in our data set. Each of these mutations occurred in 20 or more sequences in the data set, although some of these mutations occurred in fewer than 20 sequences for TPV and DRV. Eight regression models—one for each PI—were created. In these models, each of the 200 mutations was an explanatory variable, and the log of the fold change in susceptibility was the response variable. For each 5-fold cross-validation, 80% of data were used for learning regression coefficients and 20% were used for testing. During learning regression coefficients, LARS used four-fifths of the learning data for selecting a model and one-fifth for validating the selected model using the LASSO option. LARS constructs a model by first finding the mutation most correlated with the log fold and then incrementally builds the model by following the equiangular vector until another variable is equally correlated with the residual. The validation set (the one-fifth of learning data) is used to decide when to stop adding variables to the model. The regularization parameter was chosen as the smallest parameter whose mean cross-validation error was less than or equal to the minimum cross-validation error plus 1 standard deviation of the LARS cross-validation error at the minimum.

Quantification of the contributions of protease mutations to decreased PI susceptibility.

In the least-squares regression analyses, the mutations identified by LARS were explanatory variables and the log fold change in susceptibility was the response variable. We used weighted least-squares regression to minimize the potential influence of data obtained by two different phenotypic methods. Weighting was performed by fitting the complete data set (without cross-validation) and then calculating the fitted mean-squared error (MSEfitted) for each of the phenotypic methods (Antivirogram and PhenoSense). We then weighted the contribution of each sequence by 1/MSEfitted for each of the eight regression models. For each weighted regression model, we performed 5-fold cross-validation 10 times on different subdivisions of the complete data set. During each 5-fold cross-validation, 80% of the data were used to derive the coefficient of each genotypic predictor and 20% were used to test the prediction performance of the model.

The mutations for which the mean regression coefficients from the 10 repeated runs of 5-fold cross-validation exceeded 3 standard deviations above or below zero were considered statistically significant. Due to the disparity in the numbers of genotype-phenotype results available for each PI, we also required that the mean regression coefficient equal or exceed 0.5 to be considered significantly associated with decreased susceptibility to a specific PI. Without this additional criterion, there would be more mutations associated with reduced susceptibilities to older PIs simply because the greater numbers of available results for these PIs would make even very weak associations appear significant. We also standardized the mutation regression coefficients by dividing each coefficient by the square root of the sum of squares of the residuals obtained during the learning stages of each cycle of learning and cross-validation. Standardizing the regression coefficients made it possible to obtain unbiased comparisons of the magnitude of the coefficient for a protease mutation across PIs.

Prediction performance was evaluated during cross-validation using continuous and categorical approaches on 20% of the testing data set. The continuous approach involved calculating the mean-squared error between the actual and predicted log-fold (MSE) change in susceptibility. The categorical approach involved determining how often the predicted phenotype correlated with one of three predefined categories of susceptibility that we refer to as susceptible, low/intermediate, and high-level resistance. The predefined susceptibility categories for each PI were derived in advance to approximate the geometric mean of the published estimated clinical cutoffs provided with the PhenoSense and VircoType reports (Table 1).

TABLE 1.

Phenotypic cutoffs used in this study to assess classification accuracy compared with the clinical cutoffs reported by the Antivirogram and PhenoSense assays

Druga Antivirogram assayb PhenoSense assayc Cutoffs for assessing classification accuracy in our analysis
Low cutoff High cutoff Low cutoff High cutoff Low cutoff High cutoff
ATV/r 2.5 32.5 NA NA 3.0 15
DRV/r 10.0 106.9 10.0 90 10.0 90
FPV/r 1.5 19.5 4.0 11 3.0 15
IDV/r 2.3 27.2 NA NA 3.0 15
LPV/r 6.1 51.2 9.0 55 9.0 55
NFV 2.2 9.4 NA NA 3.0 6.0
SQV/r 3.1 22.6 2.3 12 3.0 15
TPV/r 1.5 7.0 2.0 8.0 2.0 8.0

Corroboration with other sources of published data.

We compared our findings on PI susceptibility with three sources of published data: (i) previous publications showing that a protease mutation was selected in vitro or in vivo with a specific PI or directly decreased in vitro PI susceptibility in a well-characterized set of virus samples or in a site-directed mutagenesis study; (ii) the published genotypic susceptibility scores for darunavir and tipranavir developed by Tibotec and Boeringer-Ingelheim, respectively (12, 13, 25); and (iii) a linear regression analysis of genotypes and Antivirogram phenotypes in the VircoLab database published in 2007 (46). The VircoLab linear regression analysis generated a table with 39 mutations at 22 protease positions showing the estimated contribution of each mutation to a decrease or increase in the fold change susceptibility (referred to as fold change factor) of either 1 to 1.5, 1.5 to 2, or >2. Associations in our analysis were considered to be corroborated by the VircoLab analysis for protease mutations having the highest possible fold change factor (>2) for a PI.

RESULTS

Summary of PI susceptibility results.

In vitro drug susceptibility results were available for 2,725 isolates, including 2,643 from 2,439 individuals and 82 mutant laboratory isolates. Fifty-six percent (1,525) of isolates had been tested by the PhenoSense assay and 44% (1,200) by the Antivirogram assay. Table 2 shows the number of isolates analyzed by each assay for susceptibility to each of the eight PIs. More than 2,400 genotype-phenotype correlations were available for FPV, IDV, NFV, and SQV. A total of 2,130 genotype-phenotype correlations were available for LPV, 1,644 for ATV, 1,265 for TPV, and 642 for DRV. Fifty-five percent of PhenoSense results were classified as susceptible, 18% were classified as low-intermediate, and 27% were classified as high level resistant. Comparable numbers for the Antivirogram assay were 41% susceptible, 21% low-intermediate, and 38% high level resistant. Ninety-two percent (2,500) of the isolates were categorized as subtype B; 8% (225) were categorized as a non-B subtype.

TABLE 2.

Total number of drug susceptibility results according to PI and susceptibility assays

Druga No. of isolates analyzed by the following assay: Total no. of isolates analyzed
Antivirogram PhenoSense
ATV 744 900 1,644
DRV 277 365 642
FPV 1,054 1,416 2,470
IDV 1,130 1,450 2,580
LPV 973 1,157 2,130
NFV 1,153 1,488 2,641
SQV 1,159 1,457 2,616
TPV 684 581 1,265
Meanb 897 1,102 1,999

Mutations identified by least-angle regression as predictive of PI resistance.

Among the 200 mutations that occurred 20 or more times in our data set, LARS identified 46 mutations at 26 positions as statistically significant predictors of decreased susceptibility to one or more PIs. These mutations included L10FI, V11L, K20T, L24FI, D30N, V32I, L33F, E35GN, K43T, M46IL, I47AV, G48MV, I50LV, F53L, I54ALMSTV, Q58E, G73CST, T74PS, L76V, V82AFLST, N83D, I84AV, N88DS, L89V, and L90M. The median number of mutations per sample was 4.0. Figure S1 in the supplemental material shows the number of other mutations occurring with each mutation. Table S1 in the supplemental material lists the number of times each combination of mutations occurred in the data set.

Among the 46 significant mutations, 40 were nonpolymorphic mutations in the eight most common subtypes. In contrast, only 14 of the 154 nonsignificant mutations were not polymorphic in these eight subtypes. The nonpolymorphic significant mutations included L10I (which occurs in 1.8% to 9.3% of all subtypes), L33F (0.7% in subtype A and 1.4% in subtype CRF01_AE), E35G (1.1% in subtype A, 1.2% in subtype G, and 2.4% in subtype CRF02_AG), K43T (0.6% in subtype F), Q58E (0.9% in subtype D), and T74S (0.6% to 8% in all subtypes except subtypes B and D) (40).

The prevalence of each of the 46 mutations in the susceptibility data set was highly correlated with the prevalence of these mutations in sequences from more than 15,000 PI-treated individuals in the Stanford HIV Drug Resistance Database (_R_2 = 0.87; P < 0.001) (40) (see Fig. S2 in the supplemental material).

Least-squares regression prediction performance.

Table 3 summarizes the prediction performance of weighted least-squares regression using the 46 mutations identified by LARS. The MSE of 50 trials (5-fold cross-validation performed 10 times) per PI ranged from 0.10 to 0.19, with standard deviation of 0.01 to 0.02 (Table 3). For the PIs with the lowest MSE (FPV and LPV), the predicted fold value was on average 100.10 (1.3) times higher or lower than the actual fold value. For the PI with the highest MSE (SQV), the predicted fold value was on average 100.19 (1.5) times higher or lower than the actual fold value. The classification accuracy ranged from 0.73 for TPV to 0.90 for NFV. In general, PIs with a low MSE had high classification accuracy. However, this was not always the case. The classification accuracy depended on the proportion of phenotypic results that were close to the classification cutoffs and thus more prone to misclassification. Table 3 also shows the prediction performances using the results of each assay separately. With one exception (PhenoSense LPV results), the performance of the weighted least-squares regression was superior or equal to the results obtained solely using one assay.

TABLE 3.

Mean squared error (MSE) and classification accuracy of least-squares regression (LSR) using PhenoSense and Antivirogram data sets separately and in combination

Druga Mean squared errorb Classification accuracyb,c
PhenoSense and LSR Antivirogram and LSR Combined, weighted LSRd PhenoSense and LSR Antivirogram and LSR Combined, weighted LSRd
ATV 0.14 ± 0.01 0.14 ± 0.01 0.13 ± 0.01 0.83 ± 0.02 0.82 ± 0.02 0.83 ± 0.02
DRV 0.15 ± 0.03 0.14 ± 0.03 0.11 ± 0.02 0.85 ± 0.03 0.86 ± 0.03 0.87 ± 0.03
FPV 0.11 ± 0.01 0.11 ± 0.01 0.10 ± 0.01 0.82 ± 0.01 0.81 ± 0.02 0.82 ± 0.02
IDV 0.12 ± 0.01 0.12 ± 0.01 0.11 ± 0.01 0.81 ± 0.02 0.82 ± 0.02 0.82 ± 0.02
LPV 0.12 ± 0.01 0.11 ± 0.01 0.10 ± 0.01 0.78 ± 0.02 0.75 ± 0.02 0.77 ± 0.02
NFV 0.14 ± 0.01 0.14 ± 0.01 0.13 ± 0.01 0.89 ± 0.01 0.90 ± 0.01 0.90 ± 0.01
SQV 0.23 ± 0.02 0.21 ± 0.02 0.19 ± 0.02 0.82 ± 0.02 0.80 ± 0.02 0.82 ± 0.02
TPV 0.19 ± 0.02 0.15 ± 0.02 0.15 ± 0.02 0.69 ± 0.03 0.73 ± 0.02 0.73 ± 0.02
Avg 0.15 0.14 0.13 0.81 0.81 0.82

Figure 1 illustrates the classification accuracy for each PI. It contains eight tables (also known as “confusion matrices”) in which each cell contains the mean proportion of results for which the actual and predicted phenotypes were concordant in 10× 5-fold cross-validation. The numbers shown beneath each table represent the mean number of results in each test set (e.g., one-fifth the total number of genotype-phenotype correlations for each PI). For example, the test sets for ATV averaged 328.8 genotype-phenotype correlations of which 32% (sum of values in the first row) were susceptible, 18.9% (sum of values in the second row) were low-intermediate, and 49.1% (sum of values in the third row) were high level resistant. Of the 328.8 genotype-phenotype correlations, the weighted least-squares regression (LSR) model's predictions were concordant with the actual results for 83.4% (the values in the shaded diagonal cells); the predictions and actual results were completely discordant for 0.3% (i.e., susceptible versus high level resistant) and were partially discordant for 16.3% (i.e., intermediate versus susceptible or intermediate versus high-level). The classification accuracy shown in Table 3 was calculated as the sum of the values in the shaded diagonal.

FIG. 1.

FIG. 1.

Confusion matrices of six PIs containing the number of folds according to the classification of actual fold and the classification of predicted fold. The y axis indicates the susceptible (Susc), intermediate-low (Low-Inter), and high resistance (High) levels of the actual fold. The x axis indicates the susceptible (Susc), intermediate-low (Low-Inter), and high resistance (High) levels of the predicted fold. Each cell contains the average number of folds belonging to corresponding classifications of the actual and predicted fold from 50 test sets (5-fold cross-validation 10 times). The average number of genotype-phenotype correlations in 50 test sets for each drug is indicated (n). The classification accuracy for each test was calculated by the number of correctly predicted fold classification (which is the sum of the diagonal cells) divided by the total number of folds (which is the sum of all the cells). The average classification accuracy over 50 test sets is shown in Table 3. Drug abbreviations: ATV, atazanavir; DRV, darunavir; FPV, fosamprenavir; IDV, indinavir; LPV, lopinavir; NFV, nelfinavir; SQV, saquinavir; TPV, tipranavir.

Contributions of protease mutations to decreased PI susceptibility.

In the least-squares regression analysis, the 46 mutations at 26 positions had a regression coefficient of ≥0.5 for one or more PIs. The median number of mutations associated with decreased susceptibility to each PI was 28 (range, 19 to 32): 19 for DRV, 19 for TPV, 21 for SQV, 27 for FPV, 29 for NFV, 31 for LPV, 31 for IDV, and 32 for ATV. The median number of mutations associated with increased susceptibility to each PI was 2.5 (range, 1 to 8): one each for NFV, ATV, and IDV; two for FPV; three for LPV; four for DRV; five for SQV; and eight for TPV. Figure 2 shows the regression coefficients for each of the 46 mutations with each of the PIs. Mutations with corroborative data supporting an association with decreased susceptibility are indicated with red bars. Table S2 in the supplemental material contains the mean regression coefficients and the standard deviations for each of the 46 mutations for each of the eight drugs.

FIG. 2.

FIG. 2.

Graphical representation of the regression coefficients of the weighted least-squares regression (LSR) model for predicting changes in protease inhibitor susceptibility using genotypic predictors. For each mutation, the y axis indicates the magnitude of the mean coefficient of 50 LSR runs (10 repetitions of 5-fold cross-validation), and the error bar indicates the standard deviation from the mean. Positive coefficients indicate mutations that decrease drug susceptibility. Negative coefficients indicate mutations that increase drug susceptibility. The y axis has no units because coefficients were normalized. The red bars indicate mutations for which we identified corroborative data supporting the association between the mutation and decreased PI susceptibility. Drug abbreviations: ATV, atazanavir; DRV, darunavir; FPV, fosamprenavir; IDV, indinavir; LPV, lopinavir; NFV, nelfinavir; SQV, saquinavir; TPV, tipranavir.

Eleven of the 32 mutations associated with decreased ATV susceptibility were previously reported to be selected by ATV in vivo or in vitro or to decrease ATV susceptibility: V32I, M46L, G48V, I50L, I54V, G73ST, I84AV, N88S, and L90M (5, 6, 18, 28, 31, 45, 51). Additionally, L24I, G48M, F53L, I54AMST, G73C, V82S, and N88D had a fold change factor of >2 in the VircoLab analysis (46).

Ten of the 19 mutations associated with decreased DRV susceptibility were previously reported to be selected in vivo by DRV or to decrease the virological response to DRV salvage therapy: V32I, L33F, I47V, I50V, I54LM, T74P, L76V, I84V, and L89V (11, 13). Two additional mutations (L10F and V82F) were shown to decrease DRV susceptibility (44). Additionally, I84A had a fold change factor of >2 in the VircoLab analysis (46).

Fourteen of the 27 mutations associated with decreased FPV susceptibility were previously reported to have been selected by APV in vivo or in vitro or to decrease APV susceptibility: L10F, V32I, L33F, M46IL, I47AV, I50V, I54LM, L76V, V82F, and I84AV (1, 10, 21, 27, 29, 31, 34, 36). Additionally, I54ST had a fold change factor of >2 in the VircoLab analysis (46).

Fifteen of the 31 mutations associated with decreased IDV susceptibility were previously reported to have been selected by IDV in vivo or in vitro or to decrease IDV susceptibility: L10F, L24I, V32I, M46IL, I54V, G73S, L76V, V82ATF, I84AV, N88S, and L90M (7, 31, 40, 43, 50). Additionally, I47A, G48M, I54ATS, G73CT, and V82S had a fold change factor of >2 in the VircoLab analysis (46).

Nineteen of the 31 mutations associated with decreased LPV susceptibility were previously reported to be selected by LPV in vitro or in vivo or to decrease LPV susceptibility: L10F, L24I, V32I, L33F, M46IL, I47AV I50V, I54LMV, L76V, V82AFST, and I84AV (10, 14, 17, 21, 23, 31, 34, 35). Additionally, G48M, and I54AST had a fold change factor of >2 in the VircoLab analysis (46).

Fourteen of the 30 mutations associated with decreased NFV susceptibility were previously reported to be selected by NFV in vitro or in vivo or to decrease NFV susceptibility: L10FI, D30N, M46IL, G48V, I54V, G73S, V82F, I84AV, N88DS, and L90M (2, 31, 37, 49, 50). Additionally, L24I, V32I, I54AST, G73T, and V82S had a fold change factor of >2 in the VircoLab analysis (46).

Eight of the 21 mutations associated with decreased SQV susceptibility were previously reported to be selected by SQV in vitro or in vivo or to decrease SQV susceptibility: G48V, F53L, I54TV, G73S, I84AV, and L90M (9, 31, 40, 47, 49). Additionally, G48M, I54AS, and G73T had a fold change factor of >2 in the VircoLab analysis (46).

Eleven of the 19 mutations associated with decreased TPV were previously reported to be selected by TPV in vitro or in vivo or to decrease the virological response to TPV salvage therapy (L33F, K43T, I47V, I54AMV, T74P, V82LT, N83D, and I84V) (3, 15, 25).

PI cross-resistance.

Of the mutations with a regression coefficient above 1.5, I84AV were associated with decreased susceptibility to eight PIs; V32I, G48V, I54ALMSTV, V82F, and L90M were associated with decreased susceptibility to six or seven PIs; I47A, G48M, I50V, L76V, V82ST, and N88S were associated with decreased susceptibility to four or five PIs; and D30N, I50L, and V82AL were associated with decreased susceptibility to fewer than four PIs.

At seven protease positions, positions 10, 47, 50, 54, 82, 84, and 88, different mutations at the same position had markedly different effects on PI susceptibility—defined here as a difference in regression coefficients of >1.0 for at least one PI. In some cases, one mutation was associated with resistance to a PI, while a different amino acid change at the same position was associated with increased susceptibility to that PI. Conversely, mutations at six protease positions—positions 24, 35, 46, 48, 73, and 74—conferred similar effects on PI susceptibility regardless of the specific amino acid change. At positions 11, 20, 30, 32, 33, 43, 53, 58, 76, 89, and 90, only a single mutation was associated with decreased PI susceptibility.

Figure 3 shows the correlation between the regression coefficients for each pair of PIs. The regression coefficients of the mutations associated with decreased susceptibility to DRV and FPV (_R_2 = 0.73; P < 0.001), IDV and LPV (_R_2 = 0.57; P < 0.001), and ATV and SQV (_R_2 = 0.53; P < 0.001) were most strongly correlated. The regression coefficients of the mutations associated with decreased susceptibility to DRV and TPV had the lowest correlation (_R_2 = 0.01; not statistically significant).

FIG. 3.

FIG. 3.

Correlation between the regression coefficients for 46 protease mutations for each pair of PIs. Each plot shows the regression coefficient for the 46 protease mutations. The x axis indicates the magnitude of regression coefficient of the PI denoted along the bottom of the figure. The y axis indicates the magnitude of regression coefficient of the PI denoted on the left side of the figure. The correlation coefficient (_R_2) for the values of the 46 regression coefficients between each pair of PIs is also shown on each plot. Drug abbreviations: ATV, atazanavir; DRV, darunavir; FPV, fosamprenavir; IDV, indinavir; LPV, lopinavir; NFV, nelfinavir; SQV, saquinavir; TPV, tipranavir.

DISCUSSION

We used two independent regression analyses to assess the contributions of individual HIV-1 protease mutations to PI susceptibility. The first regression analysis, LARS, employs a parsimonious feature selection algorithm. It identified a set of 46 mutations significantly associated with decreased PI susceptibility of which 40 were not polymorphic in the eight most common group M subtypes. The second analysis relied on least-squares regression to quantify the effects of these 46 mutations on susceptibility to each of the PIs. We used LARS for the first regression analysis because it is optimal for feature selection. We used least-squares regression for the second analysis because the theoretical basis for weighting data produced by different assays is established for least-squares regression but not for LARS. In addition, the use of one regression method for feature selection and another for model optimization reduces the likelihood of overfitting the regression coefficients.

This study differs in several ways from a study of genotype-phenotype correlations we published in 2006 (41). First, the current study includes nearly three times as much data for the first-generation PIs. Second, the current study contains more than 600 genotype-phenotype correlations for TPV and DRV—PIs that were not included in our previous analysis. Third, the current study uses an unbiased feature selection approach to identify genotypic predictors of resistance. In contrast, our 2006 study used a predefined list of nonpolymorphic treatment-selected mutations (39). Fifteen mutations in this study were not significantly independently associated with decreased susceptibility to one or more PIs in the 2006 study; these included L10I, V11L, L24F, E35GN, I47A, G48M, I54AST, G73C, V82SL, N83D, and I84A. L10I was not evaluated in the 2006 study because it did not include mutations that were polymorphic in subtype B viruses. The remaining additional mutations were not included in the 2006 study because of their infrequent occurrence in that data set.

Despite the increase in the number of genotype-phenotype correlations in this study over the 2006 study, predictions of fold decreased susceptibility (mean-squared error) and classification accuracies for the PIs ATV, FPV, IDV, LPV, NFV, and SQV were only slightly improved. The classification accuracies increased on average from 0.81 to 0.83. There are several likely explanations for what appears to be a plateau in prediction accuracy despite an increasing number of genotype-phenotype correlations. First, our analysis did not include three types of mutations that influence PI susceptibility: (i) rare mutations present too infrequently to be included in our regression analysis, including several highly unusual mutations at known PI resistance positions, such as G48AST, V82CM, and I84C (42); (ii) common polymorphic mutations, which alone have little if any effect on PI susceptibility but which influence susceptibility in combination with other PI resistance mutations; and (iii) gag and gag cleavage site mutations, which often have a significant effect on in vitro PI susceptibility (8, 24, 33). Second, our analysis did not use combinations of mutations (i.e., interaction terms) as predictors of reduced susceptibility because the use of interaction terms and of more complex models relating mutations to decreased susceptibility (e.g., neural networks or support vector machines) would have made it more difficult to understand the contributions of individual mutations.

The mutations identified in our analysis were consistent with data from other publications. An average of 13 mutations per PI had a regression coefficient of ≥1.0; 88% of these associations were reported in other publications. An average of 15 mutations per PI had a regression coefficient between 0.5 and 1.0; 43% of these associations were reported in other publications. The less frequent published support for mutations with lower regression coefficients may reflect the lesser effect of these mutations on decreased susceptibility and/or the lower frequency of these mutations. Collinearity may also inflate or distort the regression coefficients associated with some mutations. The most striking example in this data set was L24F which occurred with L90M in 26 of 27 instances. Similarly, I54S or T occurred with V82A in 69 of 81 instances and with G48V in 65 of 81 instances. Because of their relative infrequency and their linkage with other mutations, independent support for I54S or T was present for six PIs only in the VircoLab regression analysis (46).

Our analysis found that several mutations originally identified because of their association with one PI were equally or more strongly associated with a different PI. L76V, which was originally recognized because of its association with decreased virological response to DRV (11, 13), was also strongly associated with decreased susceptibility to FPV, IDV, and LPV. F53L, originally identified because of its association with decreased virological response to LPV (22, 23), was more strongly associated with decreased susceptibility to ATV and SQV. N83D, which was identified because of its association with decreased virological response to TPV (25), was also associated with decreased susceptibility to ATV, NFV, and SQV. L89V, which was identified because of its association with decreased virological response to DRV (12, 13), was found to also be associated with decreased susceptibility to FPV.

In addition to identifying each of the previously reported associations between the protease mutations L24I, I47A, I50L, I50V, I54L, L76V, and N88S and increased PI susceptibility (21, 25, 48, 50), our analysis identified another eight mutations associated with increased PI susceptibility, including L10F and G48MV for TPV, V82AST for DRV, and V82FL for SQV. Two of these new associations, G48V increasing TPV susceptibility and V82F increasing SQV susceptibility, were also detected in the VircoLab analysis (46). PI “hypersusceptibility” mutations provide insight into the mechanisms of PI activity. However, the use of hypersusceptibility mutations to guide therapy has been approached warily out of concern that the potential benefit of the PIs made more active by these mutations would be negated by the emergence of residual wild-type variants within a patient.

Two types of data other than genotype-phenotype associations contribute to our understanding of the genetic basis for HIV-1 PI resistance: (i) associations between protease mutations and selective PI pressure such as occurs during in vitro passage experiments and in viruses from PI-treated individuals; and (ii) associations between protease mutations and the subsequent virological response to a PI-containing regimen. However, with the increasing number of antiretroviral (ARV) treatment options, fewer PI-containing salvage regimens are unsuccessful and fewer virus sequences from PI-treated individuals are available. Therefore, genotype-phenotype associations have become increasingly important for unraveling the genetic basis of HIV-1 PI resistance.

In conclusion, this study is the first comprehensive quantitative analysis of protease mutations associated with decreased susceptibility to each of the eight licensed PIs. It is also the first analysis in which previously published corroborative support for each genotype-phenotype association is provided. Our analysis shows that the protease mutations with the greatest impact on PI susceptibility are those that are not polymorphic in the absence of PI therapy. Our analysis also shows that protease mutations found in the course of investigating resistance to one PI should be evaluated for their effects on susceptibility to other PIs. Without such an evaluation, the extent of cross-resistance among PIs will be underestimated.

Supplementary Material

[Supplemental material]

Acknowledgments

S.-Y.R., J.T., and R.W.S. were supported in part by two NIH grants (AI06858 and 5P01GM066524-08).

Footnotes

Published ahead of print on 26 July 2010.

The authors have paid a fee to allow immediate free access to this article.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental material]