arviz.loo — ArviZ dev documentation (original) (raw)
arviz.loo(data, pointwise=None, var_name=None, reff=None, scale=None)[source]#
Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).
Estimates the expected log pointwise predictive density (elpd) using Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV). Also calculates LOO’s standard error and the effective number of parameters. Read more theory herehttps://arxiv.org/abs/1507.04544 and here https://arxiv.org/abs/1507.02646
Parameters:
data: obj
Any object that can be converted to an arviz.InferenceData object. Refer to documentation ofarviz.convert_to_dataset() for details.
pointwise: bool, optional
If True the pointwise predictive accuracy will be returned. Defaults tostats.ic_pointwise
rcParam.
var_namestr, optional
The name of the variable in log_likelihood groups storing the pointwise log likelihood data to use for loo computation.
reff: float, optional
Relative MCMC efficiency, ess / n
i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.
scale: str
Output scale for loo. Available options are:
log
: (default) log-scorenegative_log
: -1 * log-scoredeviance
: -2 * log-score
A higher log-score (or a lower deviance or negative log_score) indicates a model with better predictive accuracy.
Returns:
ELPDData
object (inherits
from
pandas.Series) with
the
following
row/attributes:
elpd_loo: approximated
expected
log
pointwise
predictive
density
(elpd
)
se: standard
error
of the
elpd
p_loo: effective
number
of parameters
n_samples: number
of samples
n_data_points: number
of data
points
warning: bool
True if the estimated shape parameter of Pareto distribution is greater thangood_k
.
loo_i: DataArray with
the
pointwise
predictive
accuracy,
only if pointwise=True
pareto_k: array of Pareto
shape
values
, only
if
pointwise
True
scale: scale
of the
elpd
good_k: For
a
sample
size
S, the
thresold
is
compute
as
min(1 - 1/log10(S), 0.7)
The returned object has a custom print method that overrides pd.Series method.
See also
Compare models based on PSIS-LOO loo or WAIC waic cross-validation.
Compute the widely applicable information criterion.
Summary plot for model comparison.
Plot pointwise elpd differences between two or more models.
Plot Pareto tail indices for diagnosing convergence.
Examples
Calculate LOO of a model:
In [1]: import arviz as az ...: data = az.load_arviz_data("centered_eight") ...: az.loo(data) ...: Out[1]: Computed from 2000 posterior samples and 8 observations log-likelihood matrix.
Estimate SE
elpd_loo -30.78 1.35 p_loo 0.95 -
Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 8 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%
Calculate LOO of a model and return the pointwise values:
In [2]: data_loo = az.loo(data, pointwise=True) ...: data_loo.loo_i ...: Out[2]: <xarray.DataArray 'loo_i' (school: 8)> Size: 64B array([-4.8918424 , -3.41965169, -3.86732498, -3.46497133, -3.47794644, -3.49926442, -4.20043549, -3.959389 ]) Coordinates:
- school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'