GitHub - pymc-devs/nutpie: Python wrapper for nuts-rs (original) (raw)

nutpie: A fast sampler for Bayesian posteriors

The nutpie package provides a fast NUTS sampler for PyMC and Stan models.

See the documentation for more details.

Installation

nutpie can be installed using Conda or Mamba from conda-forge with

mamba install -c conda-forge nutpie

Or using pip:

To install it from source, install a Rust compiler and maturin and then

maturin develop --release

If you want to use the nightly SIMD implementation for some of the math functions, switch to Rust nightly and then install with the simd_support feature in then nutpie directory:

rustup override set nightly maturin develop --release --features=simd_support

Usage with PyMC

First, PyMC and Numba need to be installed, for example using

mamba install -c conda-forge pymc numba

We need to create a model:

import pymc as pm import numpy as np import nutpie import pandas as pd import seaborn as sns

Load the radon dataset

data = pd.read_csv(pm.get_data("radon.csv")) data["log_radon"] = data["log_radon"].astype(np.float64) county_idx, counties = pd.factorize(data.county) coords = {"county": counties, "obs_id": np.arange(len(county_idx))}

Create a simple hierarchical model for the radon dataset

with pm.Model(coords=coords, check_bounds=False) as pymc_model: intercept = pm.Normal("intercept", sigma=10)

# County effects
raw = pm.ZeroSumNormal("county_raw", dims="county")
sd = pm.HalfNormal("county_sd")
county_effect = pm.Deterministic("county_effect", raw * sd, dims="county")

# Global floor effect
floor_effect = pm.Normal("floor_effect", sigma=2)

# County:floor interaction
raw = pm.ZeroSumNormal("county_floor_raw", dims="county")
sd = pm.HalfNormal("county_floor_sd")
county_floor_effect = pm.Deterministic(
    "county_floor_effect", raw * sd, dims="county"
)

mu = (
    intercept
    + county_effect[county_idx]
    + floor_effect * data.floor.values
    + county_floor_effect[county_idx] * data.floor.values
)

sigma = pm.HalfNormal("sigma", sigma=1.5)
pm.Normal(
    "log_radon", mu=mu, sigma=sigma, observed=data.log_radon.values, dims="obs_id"
)

We then compile this model and sample form the posterior:

compiled_model = nutpie.compile_pymc_model(pymc_model) trace_pymc = nutpie.sample(compiled_model)

trace_pymc now contains an ArviZ InferenceData object, including sampling statistics and the posterior of the variables defined above.

We can also control the sampler in a non-blocking way:

The sampler will now run the the background

sampler = nutpie.sample(compiled_model, blocking=False)

Pause and resume the sampling

sampler.pause() sampler.resume()

Wait for the sampler to finish (up to timeout seconds)

sampler.wait(timeout=0.1)

Note that not passing any timeout to wait will

wait until the sampler finishes, then return the InferenceData object:

idata = sampler.wait()

or we can also abort the sampler (and return the incomplete trace)

incomplete_trace = sampler.abort()

or cancel and discard all progress:

sampler.cancel()

Usage with Stan

In order to sample from Stan model, bridgestan needs to be installed. A pip package is available, but right now this can not be installed using Conda.

When we install nutpie with pip, we can also specify that we want optional dependencies for Stan models using

pip install 'nutpie[stan]'

In addition, a C++ compiler needs to be available. For details seethe Stan docs.

We can then compile a Stan model, and sample using nutpie:

import nutpie

code = """ data { real mu; } parameters { real x; } model { x ~ normal(mu, 1); } """

compiled = nutpie.compile_stan_model(code=code)

Provide data

compiled = compiled.with_data(mu=3.) trace = nutpie.sample(compiled)

Advantages

nutpie uses nuts-rs, a library written in Rust, that implements NUTS as in PyMC and Stan, but with a slightly different mass matrix tuning method as those. It often produces a higher effective sample size per gradient evaluation, and tends to converge faster and with fewer gradient evaluation.