Allow Minibatch of derived RVs and deprecate generators as data by ricardoV94 · Pull Request #7480 · pymc-devs/pymc (original) (raw)
Description
This PR fixes issues related to Minibatch indexing reported in https://discourse.pymc.io/t/warning-using-minibatch-and-censored-together-rng-variable-has-shared-clients/14943 and extends the MinibatchRV functionality for derived RVs.
Minibatch value variables are uniquely tricky because they are random graphs, that can share RNG with other variables in the forward / logp graph. As such we need to make sure they are not mutated for the default updates to work. We tried some tricks in the past but as revealed in the discourse issue that was not enough. This PR solves the problem by encapsulating the random graph in an OpFromGraph so that the inner graph will not be touched by PyMC logprob derivation routines. It will still be inlined in the final compiled functions to avoid overhead.
I also decided to deprecate Generators as data, which showed up in some of the refactoring. The GeneratorOp is not a true Op, which should not have any side-effects. It is also not compatible with non default backends like Numba and JAX that we are moving towards to. If needed, the logic should be handled by the sampler by consuming the generator and setting the values before subsequent function calls.