pymc.backends.zarr.ZarrTrace — PyMC 5.22.0 documentation (original) (raw)
class pymc.backends.zarr.ZarrTrace(store=None, synchronizer=None, compressor=UNSET, draws_per_chunk=1, include_transformed=False)[source]#
Object that stores and enables access to MCMC draws stored in a zarr.hierarchy.Group
objects.
This class creats a zarr hierarchy to represent the sampling information which is intended to mimic arviz.InferenceData. The hierarchy looks like this:
root
|–> constant_data
|–> observed_data
|–> posterior
|–> unconstrained_posterior
|–> sample_stats
|–> warmup_posterior
|–> warmup_unconstrained_posterior
|–> warmup_sample_stats
|–> _sampling_state
The root group is created when the ZarrTrace
object is initialized. The rest of the groups are created once init_trace()
is called with a few exceptions: unconstrained_posterior is only created if include_transformed = True
, and the groups prefixed with warmup_
are created only after callingsplit_warmup_groups().
Since ZarrTrace
objects are intended to be as close toarviz.InferenceData objects as possible, the groups store the dimension and coordinate information following the xarray zarr standard.
Parameters:
storezarr.storage.BaseStore
| collections.abc.MutableMapping | None
The store object where the zarr groups and arrays will be stored and read from. Any zarr compatible storage object works. Keep in mind that if None
is provided, a zarr.storage.MemoryStore will be used, which means that information won’t be visible to other processes and won’t persist after theZarrTrace
life-cycle ends. If you want to have persistent storage, please use one of the multiple disk backed zarr storage options, e.g.DirectoryStore
or ZipStore.
synchronizerzarr.sync.Synchronizer
| None
The synchronizer to use for the underlying zarr arrays.
compressornumcodec.abc.Codec
| None | pymc.util.UNSET
The compressor to use for the underlying zarr arrays. If None
, no compressor is used. If UNSET
, zarr’s default compressor is used.
draws_per_chunkint
The number of draws that make up a chunk in the variable’s posterior array. Each variable’s array shape is set to (n_chains, n_draws, *rv_shape)
, but the chunks are set to (1, draws_per_chunk, *rv_shape)
. This means that each chain will have it’s own chunk to read or write to, allowing for concurrent write operations of different chains not to interfere with each other, and that multiple draws can belong to the same chunk. The variable’s core dimension however, will never be split across different chunks.
include_transformedbool
If True
, the transformed, unconstrained value variables are included in the storage group.
Notes
ZarrTrace
objects represent the storage information. If the underlying store persists on disk or over the network (e.g. with a zarr.storage.FSStore
) multiple process will be able to concurrently access the same storage and read or write to it.
The intended division of labour is for ZarrTrace
to handle the creation and management of the zarr group and storage objects and arrays, and for individualZarrChain objects to handle recording MCMC samples to the trace. This division was chosen to stay close to the existing pymc.backends.base.MultiTraceand pymc.backends.ndarray.NDArray way of working with the existing samplers.
One extra feature of ZarrTrace
is that it enables direct access to any array’s metadata. ZarrTrace
takes advantage of this to tag arrays as deterministic
or freeRV
depending on what kind of variable they were in the defining model.
Methods
ZarrTrace.__init__([store, synchronizer, ...]) | |
---|---|
ZarrTrace.create_group(name, data_dict) | |
ZarrTrace.groups() | |
ZarrTrace.init_group_with_empty(group, ...) | |
ZarrTrace.init_sampling_state_group(tune, chains) | |
ZarrTrace.init_trace(chains, draws, tune, step) | Initialize the trace groups and arrays. |
ZarrTrace.split_warmup(group_name[, ...]) | Split the arrays of a group into the warmup and regular groups. |
ZarrTrace.split_warmup_groups() | Split the warmup and standard groups. |
ZarrTrace.to_inferencedata([save_warmup]) | Convert ZarrTrace to InferenceData. |
Attributes
constant_data |
---|
observed_data |
posterior |
sample_stats |
sampling_time |
tuning_steps |
unconstrained_posterior |