Censored and Hurdle Model Vignettes (original) (raw)

About the Data

The USA sample in the online survey conducted by Smithson and Shou (2017) as described earlier included items from the social and economic conservatism scales created by Everett (2013). Each item asked respondents to rate their feelings about the issue described in the item on a scale from 0 to 100, according to this instruction: “Please indicate the extent to which you feel positive or negative towards each issue. Scores of 0 indicate greater negativity, and scores of 100 indicate greater positivity. Scores of 50 indicate that you feel neutral about the issue.”

Model Fitting

The next figure shows a histogram of the ratings on the issue of “gun ownership”. This is clearly a strongly polarizing issue. There are reasonable arguments for treating the bounds on the gun ownership scale either as censored scores or true scores. Here, we treat the bounds as true scores, so that responses are considered as a doubly-bounded random variable.

Histograms of gun ownership ratings separated by political orientation show clear differences among the four orientations. The sources of the polarization in the distribution are primarily the Democrats and Republicans, as would be expected. We should expect an accurate model to highlight these differences, given that there are sufficiently many people in each of the four groups for such a model to detect sizable group differences.

# How many people occupy the political orientation groups in the sample?
table(gunowndata$political)

## 
##    Democrat Independent      NoPref  Republican 
##          98         109          46          68

# 
par(mfrow = c(2,2),mar = c(4,4,1,1))
truehist(gunowndata$gunown[gunowndata$political == "Democrat"], nbins = 50, main = "Democrat", xlab = "gun ownership", ylab = "density", ylim = c(0,11), col = "red")
truehist(gunowndata$gunown[gunowndata$political == "Independent"], nbins = 50, main = "Independent", xlab = "gun ownership", ylab = "density", ylim = c(0,11), col = "red")
truehist(gunowndata$gunown[gunowndata$political == "NoPref"], nbins = 50, main = "No Preference", xlab = "gun ownership", ylab = "density", ylim = c(0,11), col = "red")
truehist(gunowndata$gunown[gunowndata$political == "Republican"], nbins = 50, main = "Republican", xlab = "gun ownership", ylab = "density", ylim = c(0,11), col = "red")

The first three models test for the effect of political orientation in the non-hurdle component of the data, using the burr8-burr8 distribution. Including political orientation in the dispersion submodel does not improve model fit, so subsequent models omit it.

mod0 <- cdfquantregH(gunown ~ 1, zero.fo = ~1, one.fo = ~1, fd = 'burr8', sd = 'burr8', type = 'ZO', data = gunowndata)
mod1 <- cdfquantregH(gunown ~ political, zero.fo = ~1, one.fo = ~1, fd = 'burr8', sd = 'burr8', type = 'ZO', data = gunowndata)
mod2 <- cdfquantregH(gunown ~ political|political, zero.fo = ~1, one.fo = ~1, fd = 'burr8', sd = 'burr8', type = 'ZO', data = gunowndata)
mod3 <- cdfquantregH(gunown ~ political, zero.fo = ~political, one.fo = ~political, fd = 'burr8', sd = 'burr8', type = 'ZO', data = gunowndata)


anova(mod1,mod3)

## Likelihood ratio tests 
## 
##   Resid. Df -2Loglik Df LR stat Pr(>Chi)    
## 1       314   353.83                        
## 2       308   322.05  6  31.777  1.8e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Family:  burr8 burr8 
## Call:  cdfquantregH(formula = gunown ~ political, data = gunowndata,  
##     fd = "burr8", sd = "burr8", zero.fo = ~political, one.fo = ~political,  
##     type = "ZO") 
## 
## Mu coefficients (Location submodel)
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -0.2785     0.1296  -2.148 0.031677 *  
## politicalIndependent   0.6523     0.1796   3.633 0.000281 ***
## politicalNoPref        0.2917     0.2340   1.247 0.212569    
## politicalRepublican    1.0886     0.2020   5.390 7.03e-08 ***
## 
## Sigma coefficients (Dispersion submodel)
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.19454    0.05573  -3.491 0.000482 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero component coefficients
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.7918     0.2887  -6.207 5.41e-10 ***
## politicalIndependent  -1.4759     0.5855  -2.521   0.0117 *  
## politicalNoPref        0.5108     0.4595   1.112   0.2662    
## politicalRepublican   -1.7047     0.7736  -2.204   0.0276 *  
## 
## One component coefficients
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -3.8712     0.7144  -5.419    6e-08 ***
## politicalIndependent   1.6841     0.7820   2.154  0.03126 *  
## politicalNoPref        1.5198     0.8855   1.716  0.08611 .  
## politicalRepublican    2.3308     0.7820   2.980  0.00288 ** 
## 
## Converge:  
## Log-Likelihood:  -161.0249

The final model shows the expected effects of political orientation in all three model components. The location submodel yields higher ratings for Republicans and Independents than for Democrats, whereas the submodel does not find a significant difference between the Democrat and No Preference groups. These differences are echoed in the zero and one components. Republicans and Independents are more likely to give zero ratings and less likely to give ratings of one than Democrats. The No Preference group has a marginally greater tendency than Democrats to give ratings of 1, but it does not reach significance.