Binomial distribution with a truncated output (original) (raw)

Hi There,

I’m keen to get some input on a problem I’m trying to solve.

I’m trying to model the conversation rate of searches to bookings when there is a finite resource, e.g., hotel rooms. A simplified way to do this is to group all searches on a given date, and model as a binomial distribution, where n is the number of searches, k is the number of bookings. The problem with this is that you can’t have more bookings than rooms (r), and when sampling from the posterior predictive it’s likely that some predictions will break this constraint.

I’m sure this type of problem has been tackled before, so keen to hear if anyone has any experience? My thinking on a simple solution was to truncate the output of the binomial distribution to a max value of r, but it doesn’t feel very elegant. The alternative complex way would be to model searches individually and sequentially, but this feels like overkill and the size of the dataset will rapidly expand.

I’ll most likely try a few versions and see which gives the best fit for the use case, but if anyone else has any thoughts I’d happily hear them!

Thanks all!

Sounds like the general topic of counting processes? Counting process - Wikipedia.

For your specific case you may consider Bernoulli trials or a truncated likelihood (instead of artificially truncating the data).

If you are just starting you may also think about how you would simulate the system, that may suggest am initial model.

This kind of problem comes up when there’s a mismatch between the generative process you’re assuming in the model and reality. In this case, the problem is that conversions can’t be independent given a fixed number of hotel rooms. Trying to model them as independent with a post-hoc fix like truncating a binomial is probably not what you want in the longer run because the model won’t be generative and won’t be easy to modify. How much do you need to get the time series right? These searches can’t all be happening at once. And won’t prices change as supply diminishes like with airlines reservations?

Predictively, it might not get out of hand to model searches individually and sequentially if it makes sense that the search and decision can be treated as instantaneous (e.g., buying lunch from GrubHub, not buying an apartment from StreetEasy, to use a couple of New York examples). But if you have to estimate the model with the same data, that’s going to be much harder because then you have to marginalize out all the discrete decisions or build some kind of nested Monte Carlo inside of another MCMC process.

I’m afraid I don’t know any of the specifics of any of these counting processes, but it looks like a fun and deep area.

Thanks @ricardoV94 and @bob-carpenter!

This gives me lots of food for thought. I like the idea of including the time/sequence of searches from a modelling standpoint, but implementation with this approach will be far more complex - it needs to scale to 1000s of locations with many 1000s of searches per day. My inkling is to hold off from doing this unless absolutely necessary for the use case.

I had another thought that would be great to get your perspective on. Given I have so many trials for each date, I could model the binomial as a normal distribution, and truncate the upper bound at r - I think this may be what @ricardoV94 was suggesting?

All thoughts welcome, and thanks again for the initial comments.

Sorry I got confused about the truncation, you sometimes observe counts > n, I thought something else was going on.

Regarding scaling, it may still be useful to try and play with what you think is the real model, even if it only works for small datasizes. May give you a better idea for you (and others here) of how to approximate it differently.

That seems sensible, I’ll keep my options open for as long as possible.

Thanks again!

What are your quantities of interest for inference? As always, the thing to do is set up a system where you can run posterior predictive checks to see if your model can estimate these well. It may be that you can make simplifying assumptions and still get the inferences you need without maintaining consistency. For example, regressions are typically about modeling expectations and errors. If you can write a regression that gets the expectations right, it might be OK even if the tails of the error are not well behaved (e.g., by allowing inconsistent purchasing decisions). As another example, treating the individuals as independent might work out just fine if what you care about are expectations and they’re bounded below the max possible.

Also, while it’s often possible for a model to be consistent with impossible things, the data won’t have impossible things in it (assuming it’s relatively clean—if not, you need to clean it or include a noisy measurement model). In this way, I can take something like a concentration parameter and give it a normal distribution even though it’s not consistent for it to be negative. As long as the data’s positive with a small enough standard deviation to not overlap with zero, it won’t cause inconsistencies. And it can actually give you a better fit to the data than a lognormal. Ideally, you’d just be using a half normal, but if the effect’s well separated from zero, that distinction won’t matter in practice.

Thanks @bob-carpenter !

That all makes lots of sense, and is the path I was leaning toward. At the moment I don’t have access to the real data - I’m setting the problem up with a synthetic sample - so until I get it I’m considering all of the potential modelling options.

With this synthesised data, even with lots of noise added, the posterior predictive checks hold pretty well to the actual data when using a simple binomial. I’m hopeful this will be the case with the real data - fingers crossed!

Note I did try using a truncated normal distribution (given the large number of trials), and while I got it to fit with some messing around with sampling parameters, the sampler is a lot more temperamental around the discontinuity (to be expected), so I’m leaning away from that.

Thanks for all the advice - I will no doubt be replying again to this thread once I have seem the real data!

You could probably reframe this as a censored demand problem as an alternative, where theres some latent demand for bookings that youre modeling, but its censored by the number of hotel rooms available as the censoring mechanism (different than truncation), and finally you can use the number of searches as a regressor for this, where you’d expect more searches is a leading indicator of demand.

This is ignoring alot of effects in bookings data however, such as spatial effects, price, cross effects from those things in neighboring time/locations, etc but its best to start simple and then add complexity

Thanks @KyleJCaron - I haven’t come across this type of model before but I’ll do some research and will revert with any Qs.

Quick update - thanks for this suggestion @KyleJCaron, the censoring is a far better way to capture the booking behaviour vs truncation!