Exercise 3: Mixed logit model (original) (raw)
2025-04-26
A sample of residential electricity customers were asked a series of choice experiments. In each experiment, four hypothetical electricity suppliers were described. The person was asked which of the four suppliers he/she would choose. As many as 12 experiments were presented to each person. Some people stopped before answering all 12. There are 361 people in the sample, and a total of 4308 experiments. In the experiments, the characteristics of each supplier were stated. The price of the supplier was either :
- a fixed price at a stated cents per kWh, with the price varying over suppliers and experiments.
- a time-of-day (TOD) rate under which the price is 11 cents per kWh from 8am to 8pm and 5 cents per kWh from 8pm to 8am. These TOD prices did not vary over suppliers or experiments: whenever the supplier was said to offer TOD, the prices were stated as above.
- a seasonal rate under which the price is 10 cents per kWh in the summer, 8 cents per kWh in the winter, and 6 cents per kWh in the spring and fall. Like TOD rates, these prices did not vary. Note that the price is for the electricity only, not transmission and distribution, which is supplied by the local regulated utility.
The length of contract that the supplier offered was also stated, in years (such as 1 year or 5 years.) During this contract period, the supplier guaranteed the prices and the buyer would have to pay a penalty if he/she switched to another supplier. The supplier could offer no contract in which case either side could stop the agreement at any time. This is recorded as a contract length of 0.
Some suppliers were also described as being a local company or a “well-known” company. If the supplier was not local or well-known, then nothing was said about them in this regard .
- Run a mixed logit model without intercepts and a normal distribution for the 6 parameters of the model, using 100 draws, halton sequences and taking into account the panel data structure.
library("mlogit")
data("Electricity", package = "mlogit")
Electricity$chid <- 1:nrow(Electricity)
Electr <- dfidx(Electricity, idx = list(c("chid", "id")),
choice = "choice", varying = 3:26, sep = "")
Elec.mxl <- mlogit(choice ~ pf + cl + loc + wk + tod + seas | 0, Electr,
rpar=c(pf = 'n', cl = 'n', loc = 'n', wk = 'n',
tod = 'n', seas = 'n'),
R = 100, halton = NA, panel = TRUE)
##
## Call:
## mlogit(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,
## data = Electr, start = strt, rpar = c(pf = "n", cl = "n",
## loc = "n", wk = "n", tod = "n", seas = "n"), R = 100,
## halton = NA, panel = TRUE)
##
## Frequencies of alternatives:choice
## 1 2 3 4
## 0.22702 0.26393 0.23816 0.27089
##
## bfgs method
## 1 iterations, 0h:0m:2s
## g'(-H)^-1g = 1.45E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## pf -0.973386 0.034324 -28.359 < 2.2e-16 ***
## cl -0.205557 0.013323 -15.428 < 2.2e-16 ***
## loc 2.075724 0.080430 25.808 < 2.2e-16 ***
## wk 1.475645 0.065168 22.644 < 2.2e-16 ***
## tod -9.052539 0.287218 -31.518 < 2.2e-16 ***
## seas -9.103759 0.289043 -31.496 < 2.2e-16 ***
## sd.pf 0.219943 0.010840 20.291 < 2.2e-16 ***
## sd.cl 0.378303 0.018489 20.461 < 2.2e-16 ***
## sd.loc 1.482974 0.081305 18.240 < 2.2e-16 ***
## sd.wk 1.000057 0.074182 13.481 < 2.2e-16 ***
## sd.tod 2.289477 0.110731 20.676 < 2.2e-16 ***
## sd.seas 1.180869 0.109007 10.833 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -3952.5
##
## random coefficients
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## pf -Inf -1.1217350 -0.9733857 -0.9733857 -0.82503650 Inf
## cl -Inf -0.4607187 -0.2055571 -0.2055571 0.04960449 Inf
## loc -Inf 1.0754736 2.0757241 2.0757241 3.07597467 Inf
## wk -Inf 0.8011174 1.4756454 1.4756454 2.15017342 Inf
## tod -Inf -10.5967678 -9.0525388 -9.0525388 -7.50830989 Inf
## seas -Inf -9.9002427 -9.1037589 -9.1037589 -8.30727517 Inf
- Using the estimated mean coefficients, determine the amount that a customer with average coefficients for price and length is willing to pay for an extra year of contract length.
coef(Elec.mxl)['cl'] / coef(Elec.mxl)['pf']
## cl
## 0.2111774
The mean coefficient of length is -0.20. The consumer with this average coefficient dislikes having a longer contract. So this person is willing to pay to reduce the length of the contract. The mean price coefficient is -0.97. A customer with these coefficients is willing to pay 0.20/0.97=0.21, or one-fifth a cent per kWh extra to have a contract that is one year shorter.
- Determine the share of the population who are estimated to dislike long term contracts (ie have a negative coefficient for the length.) \begin{answer}[5]
pnorm(- coef(Elec.mxl)['cl'] / coef(Elec.mxl)['sd.cl'])
## cl
## 0.7065611
The coefficient of length is normally distributed with mean -0.20 and standard deviation 0.35. The share of people with coefficients below zero is the cumulative probability of a standardized normal deviate evaluated at 0.20 / 0.3 5=0. 57. Looking 0.57 up in a table of the standard normal distribution, we find that the share below 0.57 is 0.72. About seventy percent of the population are estimated to dislike long-term contracts.
- The price coefficient is assumed to be normally distributed in these runs. This assumption means that some people are assumed to have positive price coefficients, since the normal distribution has support on both sides of zero. Using your estimates from exercise 1, determine the share of customers with positive price coefficients. As you can see, this is pretty small share and can probably be ignored. However, in some situations, a normal distribution for the price coefficient will give a fairly large share with the wrong sign. Revise the program to make the price coefficient fixed rather than random. A fixed price coefficient also makes it easier to calculate the distribution of willingness to pay (wtp) for each non-price attribute. If the price coefficient is fixed, the distribtion of wtp for an attribute has the same distribution as the attribute’s coefficient, simply scaled by the price coefficient. However, when the price coefficient is random, the distribution of wtp is the ratio of two distributions, which is harder to work with.
pnorm(- coef(Elec.mxl)['pf'] / coef(Elec.mxl)['sd.pf'])
## pf
## 0.9999952
The price coefficient is distributed normal with mean -0.97 and standard deviation 0.20. The cumulative standard normal distribution evaluated at 0.97 / 0.20 = 4.85 is more than 0.999, which means that more than 99.9% of the population are estimated to have negative price coefficients. Essentially no one is estimated to have a positive price coefficient.
Elec.mxl2 <- mlogit(choice ~ pf + cl + loc + wk + tod + seas | 0, Electr,
rpar = c(cl = 'n', loc = 'n', wk = 'n',
tod = 'n', seas = 'n'),
R = 100, halton = NA, panel = TRUE)
##
## Call:
## mlogit(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,
## data = Electr, start = strt, rpar = c(cl = "n", loc = "n",
## wk = "n", tod = "n", seas = "n"), R = 100, halton = NA,
## panel = TRUE)
##
## Frequencies of alternatives:choice
## 1 2 3 4
## 0.22702 0.26393 0.23816 0.27089
##
## bfgs method
## 1 iterations, 0h:0m:2s
## g'(-H)^-1g = 4.6E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## pf -0.879902 0.032759 -26.860 < 2.2e-16 ***
## cl -0.217059 0.013673 -15.875 < 2.2e-16 ***
## loc 2.092298 0.081067 25.809 < 2.2e-16 ***
## wk 1.490902 0.065230 22.856 < 2.2e-16 ***
## tod -8.581835 0.282912 -30.334 < 2.2e-16 ***
## seas -8.583281 0.280347 -30.617 < 2.2e-16 ***
## sd.cl 0.373477 0.018018 20.728 < 2.2e-16 ***
## sd.loc 1.558857 0.087696 17.776 < 2.2e-16 ***
## sd.wk 1.050810 0.078023 13.468 < 2.2e-16 ***
## sd.tod 2.694660 0.120798 22.307 < 2.2e-16 ***
## sd.seas 1.950728 0.104766 18.620 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -3961.7
##
## random coefficients
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## cl -Inf -0.4689658 -0.2170594 -0.2170594 0.03484691 Inf
## loc -Inf 1.0408650 2.0922983 2.0922983 3.14373157 Inf
## wk -Inf 0.7821415 1.4909018 1.4909018 2.19966212 Inf
## tod -Inf -10.3993553 -8.5818346 -8.5818346 -6.76431383 Inf
## seas -Inf -9.8990266 -8.5832805 -8.5832805 -7.26753448 Inf
- You think that everyone must like using a known company rather than an unknown one, and yet the normal distribution implies that some people dislike using a known company. Revise the program to give the coefficient of a uniform distribution between zero and b,where b is estimated (do this with the price coefficient fixed).
Elec.mxl3 <- update(Elec.mxl, rpar = c(cl = 'n', loc = 'n', wk = 'u',
tod = 'n', seas = 'n'))
The price coefficient is uniformly distributed with parameters 1.541 and 1.585.
##
## Call:
## mlogit(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,
## data = Electr, start = strt, rpar = c(cl = "n", loc = "n",
## wk = "u", tod = "n", seas = "n"), R = 100, halton = NA,
## panel = TRUE)
##
## Frequencies of alternatives:choice
## 1 2 3 4
## 0.22702 0.26393 0.23816 0.27089
##
## bfgs method
## 1 iterations, 0h:0m:2s
## g'(-H)^-1g = 1.08E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## pf -0.882229 0.032818 -26.883 < 2.2e-16 ***
## cl -0.217128 0.013776 -15.761 < 2.2e-16 ***
## loc 2.099323 0.081056 25.900 < 2.2e-16 ***
## wk 1.509425 0.065496 23.046 < 2.2e-16 ***
## tod -8.606979 0.282983 -30.415 < 2.2e-16 ***
## seas -8.602396 0.280671 -30.649 < 2.2e-16 ***
## sd.cl 0.381070 0.018150 20.996 < 2.2e-16 ***
## sd.loc 1.593852 0.087802 18.153 < 2.2e-16 ***
## sd.wk 1.786373 0.125764 14.204 < 2.2e-16 ***
## sd.tod 2.719073 0.119356 22.781 < 2.2e-16 ***
## sd.seas 1.945381 0.103686 18.762 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -3956.7
##
## random coefficients
## Min. 1st Qu. Median Mean 3rd Qu.
## cl -Inf -0.4741561 -0.2171285 -0.2171285 0.0398992
## loc -Inf 1.0242863 2.0993231 2.0993231 3.1743600
## wk -0.2769485 0.6162382 1.5094248 1.5094248 2.4026115
## tod -Inf -10.4409656 -8.6069790 -8.6069790 -6.7729924
## seas -Inf -9.9145353 -8.6023958 -8.6023958 -7.2902563
## Max.
## cl Inf
## loc Inf
## wk 3.295798
## tod Inf
## seas Inf
## uniform distribution with parameters 1.509 (center) and 1.786 (span)
summary(rpar(Elec.mxl3, 'wk'))
## Min. 1st Qu. Median Mean 3rd Qu.
## -0.2769485 0.6162382 1.5094248 1.5094248 2.4026115
## Max.
## 3.2957982
plot(rpar(Elec.mxl3, 'wk'))
The upper bound is 3.13. The estimated price coefficient is -0.88 and so the willingness to pay for a known provided ranges uniformly from -0.05 to 3.55 cents per kWh.
- Rerun the model with a fixed coefficient for price and lognormal distributions for the coefficients of TOD and seasonal (since their coefficient should be negative for all people.) To do this, you need to reverse the sign of the TOD and seasonal variables, since the lognormal is always positive and you want the these coefficients to be always negative.
A lognormal is specified as \(\exp(b+se)\) where \(e\) is a standard normal deviate. The parameters of the lognormal are \(b\) and \(s\). The mean of the lognormal is \(\exp(b+0.5s^2)\) and the standard deviation is the mean times \(\sqrt{(\exp(s^2))-1}\).
Electr <- dfidx(Electricity, idx = list(c("chid", "id")), choice = "choice",
varying = 3:26, sep = "", opposite = c("tod", "seas"))
Elec.mxl4 <- mlogit(choice ~ pf + cl + loc + wk + tod + seas | 0, Electr,
rpar = c(cl = 'n', loc = 'n', wk = 'u', tod = 'ln', seas = 'ln'),
R = 100, halton = NA, panel = TRUE)
##
## Call:
## mlogit(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,
## data = Electr, start = strt, rpar = c(cl = "n", loc = "n",
## wk = "u", tod = "ln", seas = "ln"), R = 100, halton = NA,
## panel = TRUE)
##
## Frequencies of alternatives:choice
## 1 2 3 4
## 0.22702 0.26393 0.23816 0.27089
##
## bfgs method
## 1 iterations, 0h:0m:2s
## g'(-H)^-1g = 6.24E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## pf -0.868985 0.032350 -26.862 < 2.2e-16 ***
## cl -0.211334 0.013569 -15.575 < 2.2e-16 ***
## loc 2.023876 0.080102 25.266 < 2.2e-16 ***
## wk 1.479118 0.064957 22.771 < 2.2e-16 ***
## tod 2.112378 0.033769 62.554 < 2.2e-16 ***
## seas 2.124205 0.033342 63.709 < 2.2e-16 ***
## sd.cl 0.373120 0.017710 21.068 < 2.2e-16 ***
## sd.loc 1.548511 0.086400 17.922 < 2.2e-16 ***
## sd.wk 1.521790 0.119993 12.682 < 2.2e-16 ***
## sd.tod 0.367077 0.019997 18.357 < 2.2e-16 ***
## sd.seas 0.275352 0.016875 16.317 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -3976.5
##
## random coefficients
## Min. 1st Qu. Median Mean 3rd Qu.
## cl -Inf -0.4629994 -0.2113338 -0.2113338 0.04033179
## loc -Inf 0.9794208 2.0238757 2.0238757 3.06833059
## wk -0.0426715 0.7182234 1.4791184 1.4791184 2.24001328
## tod 0.0000000 6.4545718 8.2678801 8.8441019 10.59060830
## seas 0.0000000 6.9482054 8.3662477 8.6894950 10.07369478
## Max.
## cl Inf
## loc Inf
## wk 3.000908
## tod Inf
## seas Inf
plot(rpar(Elec.mxl4, 'seas'))
- Rerun the same model as previously, but allowing now the correlation between random parameters. Compute the correlation matrix of the random parameters. Test the hypothesis that the random parameters are uncorrelated.
Elec.mxl5 <- update(Elec.mxl4, correlation = TRUE)
##
## Call:
## mlogit(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,
## data = Electr, start = strt, rpar = c(cl = "n", loc = "n",
## wk = "u", tod = "ln", seas = "ln"), R = 100, correlation = TRUE,
## halton = NA, panel = TRUE)
##
## Frequencies of alternatives:choice
## 1 2 3 4
## 0.22702 0.26393 0.23816 0.27089
##
## bfgs method
## 1 iterations, 0h:0m:2s
## g'(-H)^-1g = 3.73E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## pf -0.9177181 0.0340200 -26.9758 < 2.2e-16 ***
## cl -0.2158542 0.0138413 -15.5950 < 2.2e-16 ***
## loc 2.3925696 0.0869029 27.5315 < 2.2e-16 ***
## wk 1.7475365 0.0712087 24.5411 < 2.2e-16 ***
## tod 2.1554746 0.0337206 63.9217 < 2.2e-16 ***
## seas 2.1695605 0.0334577 64.8448 < 2.2e-16 ***
## chol.cl:cl 0.3962539 0.0187077 21.1814 < 2.2e-16 ***
## chol.cl:loc 0.6175136 0.0924281 6.6810 2.373e-11 ***
## chol.loc:loc -2.0717172 0.1063246 -19.4848 < 2.2e-16 ***
## chol.cl:wk 0.1952590 0.0731907 2.6678 0.007635 **
## chol.loc:wk -1.2366541 0.0866096 -14.2785 < 2.2e-16 ***
## chol.wk:wk 0.6431944 0.0742354 8.6643 < 2.2e-16 ***
## chol.cl:tod 0.0019817 0.0104181 0.1902 0.849141
## chol.loc:tod 0.0625074 0.0119608 5.2260 1.732e-07 ***
## chol.wk:tod 0.1606713 0.0138054 11.6383 < 2.2e-16 ***
## chol.tod:tod 0.3758504 0.0209474 17.9426 < 2.2e-16 ***
## chol.cl:seas 0.0259973 0.0098344 2.6435 0.008205 **
## chol.loc:seas -0.0012255 0.0098997 -0.1238 0.901483
## chol.wk:seas 0.1413800 0.0128750 10.9810 < 2.2e-16 ***
## chol.tod:seas 0.0899893 0.0109769 8.1981 2.220e-16 ***
## chol.seas:seas 0.2112423 0.0141902 14.8865 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -3851.4
##
## random coefficients
## Min. 1st Qu. Median Mean 3rd Qu.
## cl -Inf -0.4831234 -0.2158542 -0.2158542 0.05141502
## loc -Inf 0.9344645 2.3925696 2.3925696 3.85067469
## wk 0.3400072 1.0437718 1.7475365 1.7475365 2.45130110
## tod 0.0000000 6.5310440 8.6319857 9.4024428 11.40876968
## seas 0.0000000 7.2924594 8.7544353 9.0816328 10.50950485
## Max.
## cl Inf
## loc Inf
## wk 3.155066
## tod Inf
## seas Inf
## cl loc wk tod seas
## cl 1.000000000 0.28564925 0.13872467 0.004792336 0.09596640
## loc 0.285649252 1.00000000 0.88161832 -0.143495905 0.03174792
## wk 0.138724672 0.88161832 1.00000000 0.045410056 0.25577355
## tod 0.004792336 -0.14349591 0.04541006 1.000000000 0.50449234
## seas 0.095966400 0.03174792 0.25577355 0.504492337 1.00000000
lrtest(Elec.mxl5, Elec.mxl4)
## Likelihood ratio test
##
## Model 1: choice ~ pf + cl + loc + wk + tod + seas | 0
## Model 2: choice ~ pf + cl + loc + wk + tod + seas | 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 21 -3851.4
## 2 11 -3976.5 -10 250.18 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
waldtest(Elec.mxl5, correlation = FALSE)
##
## Wald test
##
## data: uncorrelated random effects
## chisq = 466.48, df = 10, p-value < 2.2e-16
scoretest(Elec.mxl4, correlation = TRUE)
##
## score test
##
## data: correlation = TRUE
## chisq = 157.35, df = 10, p-value < 2.2e-16
## alternative hypothesis: uncorrelated random effects
The three tests clearly reject the hypothesis that the random parameters are uncorrelated.