pymc.backends.zarr.ZarrTrace — PyMC 5.23.0 documentation (original) (raw)

class pymc.backends.zarr.ZarrTrace(store=None, synchronizer=None, compressor=UNSET, draws_per_chunk=1, include_transformed=False)[source]#

Object that stores and enables access to MCMC draws stored in a zarr.hierarchy.Group objects.

This class creats a zarr hierarchy to represent the sampling information which is intended to mimic arviz.InferenceData. The hierarchy looks like this:

root

|–> constant_data

|–> observed_data

|–> posterior

|–> unconstrained_posterior

|–> sample_stats

|–> warmup_posterior

|–> warmup_unconstrained_posterior

|–> warmup_sample_stats

|–> _sampling_state

The root group is created when the ZarrTrace object is initialized. The rest of the groups are created once init_trace() is called with a few exceptions: unconstrained_posterior is only created if include_transformed = True, and the groups prefixed with warmup_ are created only after callingsplit_warmup_groups().

Since ZarrTrace objects are intended to be as close toarviz.InferenceData objects as possible, the groups store the dimension and coordinate information following the xarray zarr standard.

Parameters:

storezarr.storage.BaseStore | collections.abc.MutableMapping | None

The store object where the zarr groups and arrays will be stored and read from. Any zarr compatible storage object works. Keep in mind that if None is provided, a zarr.storage.MemoryStore will be used, which means that information won’t be visible to other processes and won’t persist after theZarrTrace life-cycle ends. If you want to have persistent storage, please use one of the multiple disk backed zarr storage options, e.g.DirectoryStore or ZipStore.

synchronizerzarr.sync.Synchronizer | None

The synchronizer to use for the underlying zarr arrays.

compressornumcodec.abc.Codec | None | pymc.util.UNSET

The compressor to use for the underlying zarr arrays. If None, no compressor is used. If UNSET, zarr’s default compressor is used.

draws_per_chunkint

The number of draws that make up a chunk in the variable’s posterior array. Each variable’s array shape is set to (n_chains, n_draws, *rv_shape), but the chunks are set to (1, draws_per_chunk, *rv_shape). This means that each chain will have it’s own chunk to read or write to, allowing for concurrent write operations of different chains not to interfere with each other, and that multiple draws can belong to the same chunk. The variable’s core dimension however, will never be split across different chunks.

include_transformedbool

If True, the transformed, unconstrained value variables are included in the storage group.

Notes

ZarrTrace objects represent the storage information. If the underlying store persists on disk or over the network (e.g. with a zarr.storage.FSStore) multiple process will be able to concurrently access the same storage and read or write to it.

The intended division of labour is for ZarrTrace to handle the creation and management of the zarr group and storage objects and arrays, and for individualZarrChain objects to handle recording MCMC samples to the trace. This division was chosen to stay close to the existing pymc.backends.base.MultiTraceand pymc.backends.ndarray.NDArray way of working with the existing samplers.

One extra feature of ZarrTrace is that it enables direct access to any array’s metadata. ZarrTrace takes advantage of this to tag arrays as deterministicor freeRV depending on what kind of variable they were in the defining model.

Methods

Attributes