3. Random utility model and the multinomial logit model (original) (raw)

Random utility models are fitted using the mlogit function. Basically, only two arguments are mandatory, formula and data, if an dfidx object (and not an ordinary data.frame) is provided.

ModeCanada

We first use the ModeCanada data set, which was already coerced to a dfidx object (called MC) in the previous section. The same model can then be estimated using as data argument this dfidx object:

library("mlogit")
data("ModeCanada", package = "mlogit")
MC <- dfidx(ModeCanada, subset = noalt == 4)
ml.MC1 <- mlogit(choice ~ cost + freq + ovt | income | ivt, MC)

or a data.frame. In this latter case, further arguments that will be passed to dfidx should be indicated:

ml.MC1b <- mlogit(choice ~ cost + freq + ovt | income | ivt, ModeCanada,
subset = noalt == 4, idx = c("case", "alt"))

mlogit provides two further useful arguments:

We estimate the model on the subset of three alternatives (we exclude bus whose market share is negligible in our sample) and we set car as the reference alternative. Moreover, we use a total transport time variable computed as the sum of the in vehicle and the out of vehicle time variables.

MC$time <- with(MC, ivt + ovt)
ml.MC1 <- mlogit(choice ~ cost + freq | income | time, MC, 
alt.subset = c("car", "train", "air"), reflevel = "car")

The main results of the model are computed and displayed using the summary method:

## 
## Call:
## mlogit(formula = choice ~ cost + freq | income | time, data = MC, 
##     alt.subset = c("car", "train", "air"), reflevel = "car", 
##     method = "nr")
## 
## Frequencies of alternatives:choice
##     car   train     air 
## 0.45757 0.16721 0.37523 
## 
## nr method
## 6 iterations, 0h:0m:0s 
## g'(-H)^-1g = 6.94E-06 
## successive function values within tolerance limits 
## 
## Coefficients :
##                      Estimate  Std. Error  z-value  Pr(>|z|)    
## (Intercept):train -0.97034440  0.26513065  -3.6599 0.0002523 ***
## (Intercept):air   -1.89856552  0.68414300  -2.7751 0.0055185 ** 
## cost              -0.02849715  0.00655909  -4.3447 1.395e-05 ***
## freq               0.07402902  0.00473270  15.6420 < 2.2e-16 ***
## income:train      -0.00646892  0.00310366  -2.0843 0.0371342 *  
## income:air         0.02824632  0.00365435   7.7295 1.088e-14 ***
## time:car          -0.01402405  0.00138047 -10.1589 < 2.2e-16 ***
## time:train        -0.01096877  0.00081834 -13.4036 < 2.2e-16 ***
## time:air          -0.01755120  0.00399181  -4.3968 1.099e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1951.3
## McFadden R^2:  0.31221 
## Likelihood ratio test : chisq = 1771.6 (p.value = < 2.22e-16)

The frequencies of the different alternatives in the sample are first indicated. Next, some information about the optimization are displayed: the Newton-Ralphson method (with analytic gradient and hessian) is used, as it is the most efficient method for this simple model for which the log-likelihood function is concave. Note that very few iterations and computing time are required to estimate this model. Then the usual table of coefficients is displayed followed by some goodness of fit measures: the value of the log-likelihood function, which is compared to the value when only intercepts are introduced, which leads to the computation of the McFadden \(R^2\) and to the likelihood ratio test.

The fitted method can be used either to obtain the probability of actual choices (type = "outcome") or the probabilities for all the alternatives (type = "probabilities").

head(fitted(ml.MC1, type = "outcome"))
##       109       110       111       112       113       114 
## 0.1909475 0.3399941 0.1470527 0.3399941 0.3399941 0.2440011
head(fitted(ml.MC1, type = "probabilities"), 4)
##           car     train       air
## 109 0.4206404 0.3884120 0.1909475
## 110 0.3696476 0.2903582 0.3399941
## 111 0.4296769 0.4232704 0.1470527
## 112 0.3696476 0.2903582 0.3399941

Note that the log-likelihood is the sum of the log of the fitted outcome probabilities and that, as the model contains intercepts, the average fitted probabilities for every alternative equals the market shares of the alternatives in the sample.

sum(log(fitted(ml.MC1, type = "outcome")))
## [1] -1951.344
## 'log Lik.' -1951.344 (df=9)
apply(fitted(ml.MC1, type = "probabilities"), 2, mean)
##       car     train       air 
## 0.4575659 0.1672084 0.3752257

Predictions can be made using the predict method. If no data is provided, predictions are made for the sample mean values of the covariates.

##       car     train       air 
## 0.5066362 0.2116876 0.2816761

Assume, for example, that we wish to predict the effect of a reduction of train transport time of 20%. We first create a new data.frame simply by multiplying train transport time by 0.8 and then using the predict method with this new data.frame.

NMC <- MC
# YC2020/05/03 should replace everywhere index() by idx()
NMC[idx(NMC)$alt == "train", "time"] <- 0.8 *
NMC[idx(NMC)$alt == "train", "time"]
Oprob <- fitted(ml.MC1, type = "probabilities")
Nprob <- predict(ml.MC1, newdata = NMC)
rbind(old = apply(Oprob, 2, mean), new = apply(Nprob, 2, mean))
##           car     train       air
## old 0.4575659 0.1672084 0.3752257
## new 0.4044736 0.2635801 0.3319462

If, for the first individuals in the sample, we compute the ratio of the probabilities of the air and the car mode, we obtain:

head(Nprob[, "air"] / Nprob[, "car"])
##       109       110       111       112       113       114 
## 0.4539448 0.9197791 0.3422401 0.9197791 0.9197791 0.6021092
head(Oprob[, "air"] / Oprob[, "car"])
##       109       110       111       112       113       114 
## 0.4539448 0.9197791 0.3422401 0.9197791 0.9197791 0.6021092

which is an illustration of the IIA property. If train time changes, it changes the probabilities of choosing air and car, but not their ratio.

We next compute the surplus for individuals of the sample induced by train time reduction. This requires the computation of the log-sum term (also called inclusive value or inclusive utility) for every choice situation, which is:

\[ \mbox{iv}_i = \ln \sum_{j = 1} ^ J e^{\beta^\top x_{ij}}. \]

For this purpose, we use the logsum function, which works on a vector of coefficients and a model.matrix. The basic use of logsum consists on providing as unique argument (called coef) a mlogit object. In this case, the model.matrix and the coef are extracted from the same model.

ivbefore <- logsum(ml.MC1)

To compute the log-sum after train time reduction, we must provide a model.matrix which is not the one corresponding to the fitted model. This can be done using the X argument which is a matrix or an object from which a model.matrix can be extracted. This can also be done by filling the data argument (a data.frame or an object from which a data.frame can be extracted using a model.frame method), and eventually the formula argument (a formula or an object for which the formula method can be applied). If no formula is provided but if data is a dfidx object, the formula is extracted from it.

ivafter <- logsum(ml.MC1, data = NMC)

Surplus variation is then computed as the difference of the log-sums divided by the opposite of the cost coefficient which can be interpreted as the marginal utility of income:

surplus <- - (ivafter - ivbefore) / coef(ml.MC1)["cost"]
summary(surplus)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5852  2.8439  3.8998  4.6971  5.8437 31.3912

Consumer’s surplus variation range from 0.6 to 31 Canadian ,withamedianvalueofabout4, with a median value of about 4,withamedianvalueofabout4.

Marginal effects are computed using the effects method. By default, they are computed at the sample mean, but a data argument can be provided. The variation of the probability and of the covariate can be either absolute or relative. This is indicated with the type argument which is a combination of two a (as absolute) and r (as relative) characters. For example, type = "ar" means that what is measured is an absolute variation of the probability for a relative variation of the covariate.

effects(ml.MC1, covariate = "income", type = "ar")
##        car      train        air 
## -0.1822177 -0.1509079  0.3331256

The results indicate that, for a 100% increase of income, the probability of choosing air increases by 33 points of percentage, as the probabilities of choosing car and train decrease by 18 and 15 points of percentage.

For an alternative specific covariate, a matrix of marginal effects is displayed.

effects(ml.MC1, covariate = "cost", type = "rr")
##              car      train        air
## car   -0.9131273  0.9376923  0.9376923
## train  0.3358005 -1.2505014  0.3358005
## air    1.2316679  1.2316679 -3.1409703

The cell in the \(l^{\mbox{th}}\) row and the \(c^{\mbox{th}}\) column indicates the change of the probability of choosing alternative \(c\) when the cost of alternative \(l\) changes. As type = "rr", elasticities are computed. For example, a 10% change of train cost increases the probabilities of choosing car and air by 3.36%. Note that the relative changes of the probabilities of choosing one of these two modes are equal, which is a consequence of the IIA property.

Finally, in order to compute travel time valuation, we divide the coefficients of travel times (in minutes) by the coefficient of monetary cost (in $).

coef(ml.MC1)[grep("time", names(coef(ml.MC1)))] /
    coef(ml.MC1)["cost"] * 60 
##   time:car time:train   time:air 
##   29.52728   23.09447   36.95360

The value of travel time ranges from 23 for train to 37 Canadian $ per hour for plane.

NOx

The second example is a data set used by Fowlie (2010), called NOx. She analyzed the effect of an emissions trading program (the NOx budget program which seeks to reduce the emission of nitrogen oxides) on the behavior of producers. More precisely, coal electricity plant managers may adopt one out of fifteen different technologies in order to comply to the emissions defined by the program. Some of them require high investment (the capital cost is kcost) and are very efficient to reduce emissions, some other require much less investment but are less efficient and the operating cost (denoted vcost) is then higher, especially because pollution permits must be purchased to offset emissions exceeding their allocation.

The focus of the paper is on the effects of the regulatory environment on manager’s behavior. Some firms are deregulated, whereas other are either regulated or public. Rate of returns is applied for regulated firms, which means that they perceive a “fair” rate of return on their investment. Public firms also enjoy significant cost of capital advantages. Therefore, the main hypothesis of the paper is that public and regulated firms will adopt much more capitalistic intensive technologies than deregulated and public ones, which means that the coefficient of capital cost should take a higher negative value for deregulated firms. Capital cost is interacted with the age of the plant (measured as a deviation from the sample mean age), as firms should weight capital costs more heavily for older plants, as they have less time to recover these costs.

Multinomial logit models are estimated for the three subsamples defined by the regulatory environment. The 15 technologies are not available for every plants, the sample is therefore restricted to available technologies, using the available covariate. Three technology dummies are introduced: post for post-combustion polution control technology, cm for combustion modification technology and lnb for low NOx burners technology.

A last model is estimated for the whole sample, but the parameters are allowed to be proportional to each other. The scedasticity function is described in the fourth part of the formula, it contains here only one covariate, env. Note also that for the last model, the author introduced a specific capital cost coefficient for deregulated firms.1

data("NOx", package = "mlogit")
NOx$kdereg <- with(NOx, kcost * (env == "deregulated"))
NOxml <- dfidx(NOx, idx = list(c("chid", "id"), "alt"))
ml.pub <- mlogit(choice ~ post + cm + lnb + vcost + kcost + kcost:age |
- 1, subset = available & env == "public", data = NOxml)
ml.reg <- update(ml.pub, subset = available & env == "regulated")
ml.dereg <- update(ml.pub, subset = available & env == "deregulated")
ml.pool <- ml.dereg
# YC gestion de la quatrième partie
ml.pool <- mlogit(choice ~ post + cm + lnb + vcost + kcost + kcost:age +
kdereg | - 1 | 0 | env, subset = available == 1, data = NOxml,
method = "bhhh")
library("texreg")
htmlreg(list(Public = ml.pub, Deregulated = ml.dereg, Regulated = ml.reg,
             Pooled = ml.pool), caption = "Environmental compliance choices.",
        omit.coef = "(post)|(cm)|(lnb)", float.pos = "hbt", label = "tab:nox")

Environmental compliance choices.

| | Public | Deregulated | Regulated | Pooled | | | ----------------------------------------- | ------------ | ------------ | ------------ | ------------ | | vcost | -1.56*** | -0.19*** | -0.28*** | -0.31*** | | | (0.36) | (0.06) | (0.06) | (0.04) | | | kcost | 0.04 | -0.06** | 0.01 | 0.01 | | | (0.11) | (0.02) | (0.03) | (0.02) | | | kcost:age | -0.08 | -0.04** | -0.02* | -0.02*** | | | (0.04) | (0.01) | (0.01) | (0.01) | | | kdereg | | | | -0.07*** | | | | | | (0.01) | | | sig.envderegulated | | | | 0.32** | | | | | | (0.12) | | | sig.envpublic | | | | -0.33*** | | | | | | (0.08) | | | AIC | 168.92 | 690.15 | 731.48 | 1634.22 | | Log Likelihood | -78.46 | -339.07 | -359.74 | -808.11 | | Num. obs. | 113 | 227 | 292 | 632 | | K | 15 | 15 | 15 | 15 | | ***p < 0.001; **p < 0.01; *p < 0.05 | | | | |

Results are presented in the preceeding table, using the texreg package (Leifeld 2013). Coefficients are very different on the sub-samples defined by the regulatory environment. Note in particular that the capital cost coefficient is positive and insignificant for public and regulated firms, as it is significantly negative for deregulated firms. Errors seems to have significant larger variance for deregulated firms and lower ones for public firms compared to regulated firms. The hypothesis that the coefficients (except the kcost one) are identical up to a multiplicative scalar can be performed using a likelihood ratio test:

stat <- 2 * (logLik(ml.dereg) + logLik(ml.reg) +
             logLik(ml.pub) - logLik(ml.pool))
stat
## 'log Lik.' 61.6718 (df=6)
pchisq(stat, df = 9, lower.tail = FALSE)
## 'log Lik.' 6.377283e-10 (df=6)

The hypothesis is strongly rejected.