How to deal with missing values when modeling human reinforcement learning data (original) (raw)

Dear all,

I’m new to PyMC and have limited experience in cognitive modeling.

Recently, I’ve been working with human reinforcement learning data. In some trials, participants may time out, resulting in missing data for those trials. I’m wondering how to handle such situations.

Below is a portion of the hierarchical model code I’m testing:

with pm.Model(coords=coords) as hbandit:
    # ----- hyper‑priors (non‑centred) -----
    mu_alpha = pm.Normal("mu_alpha", 0.0, 1.0)
    sigma_alpha = pm.HalfNormal("sigma_alpha", 1.0)
    mu_beta = pm.Normal("mu_beta", 0.0, 1.0)
    sigma_beta = pm.HalfNormal("sigma_beta", 1.0)

    # ----- individual parameters -----
    alpha_logit = pm.Normal(
        "alpha_logit", mu=mu_alpha, sigma=sigma_alpha, dims="subject"
    )
    beta_log = pm.Normal("beta_log", mu=mu_beta, sigma=sigma_beta, dims="subject")

    alpha = pm.Deterministic("alpha", pm.math.sigmoid(alpha_logit), dims="subject")
    beta = pm.Deterministic("beta", pm.math.exp(beta_log), dims="subject")

    # ----- observed data -----
    a_data = pm.Data("actions", actions.astype("int64"))
    r_data = pm.Data("rewards", rewards.astype("int64"))

    # ----- trial‑by‑trial scan -----
    def step(a_t, r_t, Q_prev, alpha, beta, n_subj):
        logits = beta[:, None] * Q_prev  # shape (N,2)
        p = pm.math.softmax(logits, axis=1)
        logp = pt.log(p[pt.arange(n_subj), a_t])  # (N,)

        # chosen Q update
        idx = pt.arange(n_subj)
        q_chosen = Q_prev[idx, a_t]
        q_new = q_chosen + alpha * (r_t - q_chosen)
        Q_next = pt.set_subtensor(Q_prev[idx, a_t], q_new)

        return Q_next, pt.sum(logp)  # scalar log‑likelihood at t

    Q0 = pm.math.zeros((n_subj, 2))
    (_, ll_seq), _ = scan(
        fn=step,
        sequences=[a_data.T, r_data.T],
        outputs_info=[Q0, None],
        non_sequences=[alpha, beta, n_subj],
    )

    pm.Potential("likelihood", pt.sum(ll_seq))

    idata = pm.sample(
        1000,
        tune=1000,
        chains=4,
        target_accept=0.9,
        nuts_sampler="nutpie",
        progressbar=True,
        random_seed=42,
    )

Thank you!

Best,
Kun

What is the cause of the timeouts? That will inform how you want to model them.

Are they given a task to complete in a given time and fail to complete it in the given time? Would they have completed given more time? If so, then you can use a censored model. This is common in survival models where you measure when someone dies, but they may still be alive at the end of the trial, so you know their death will be some time after the trial.

Or is it something like a survey question where you ask them their income and they just fail to respond? In that case, you want to take the non-response as information in and of itself if non-response is related to the outcome of interest.

You may also run into identifiability issues with your parameterization of a simplex in terms of softmax of unnormalized log densities.

const June 17, 2025, 3:10am 3

Thanks for your helpful reply. In my task, participants must respond within 2 s; if they do not, that trial is recorded as a missing response. A few participants show some misses (typically 0–8 % of their trials, and most have almost none).

At the moment I handle these trials using pt.switch with a response mask to skip their update to Q value & likelihood. The model runs, but I’m not sure whether this is theorically/statistically sound.

Your suggestions sound more advanced. Could you please walk me through them in a bit more detail? I’d also appreciate any pointers to introductory resources so I can learn more about these concepts.

Thanks again for your help!

It’s usually better to model how data came to be missing than to throw it out.

One standard way to do this is with survival analysis.

The basic idea is that if you observe t[n] for a reaction time for participant n and assume that they would react if given more time, then if you have a model for reaction time of lognormal(mu, sigma), just to take an example, then if you have a point t[n] = 2s then rather than taking t[n] ~ lognromal(mu, sigma), you have to increment the log density with the log of complementary cdf at 2s—this is the probability that t[n] > 2 for a lognormal(mu, sigma). I’m not sure how easy that is to do in PyMC, but an alternative is to define a new parameter t_true[n] with a constraint that t_true[n] > 2 and then just take t_true[n] ~ lognormal(mu, sigma)`. I also don’t know how easy that is. The tutorial should provide some guidance.

If, on the other hand, this isn’t a censoring problem and it’s a “refuse to answer” problem, then you want to build a model of the probability of refusing to answer given other covariates and/or the outcome. This lets non-answering be informative.

const June 18, 2025, 2:25am 5

Thank you! I will check the tutorial and test it first.