arviz.compare — ArviZ dev documentation (original) (raw)
arviz.compare(compare_dict, ic=None, method='stacking', b_samples=1000, alpha=1, seed=None, scale=None, var_name=None)[source]#
Compare models based on their expected log pointwise predictive density (ELPD).
The ELPD is estimated either by Pareto smoothed importance sampling leave-one-out cross-validation (LOO) or using the widely applicable information criterion (WAIC). We recommend loo. Read more theory here - in a paper by some of the leading authorities on model comparison dx.doi.org/10.1111/1467-9868.00353
Parameters:
compare_dict: dict of {str: InferenceData or ELPDData}
A dictionary of model names and arviz.InferenceData or ELPDData
.
ic: str, optional
Method to estimate the ELPD, available options are “loo” or “waic”. Defaults torcParams["stats.information_criterion"]
.
method: str, optional
Method used to estimate the weights for each model. Available options are:
- ‘stacking’ : stacking of predictive distributions.
- ‘BB-pseudo-BMA’ : pseudo-Bayesian Model averaging using Akaike-type weighting. The weights are stabilized using the Bayesian bootstrap.
- ‘pseudo-BMA’: pseudo-Bayesian Model averaging using Akaike-type weighting, without Bootstrap stabilization (not recommended).
For more information read https://arxiv.org/abs/1704.02030
b_samples: int, optional default = 1000
Number of samples taken by the Bayesian bootstrap estimation. Only useful when method = ‘BB-pseudo-BMA’. Defaults to rcParams["stats.ic_compare_method"]
.
alpha: float, optional
The shape parameter in the Dirichlet distribution used for the Bayesian bootstrap. Only useful when method = ‘BB-pseudo-BMA’. When alpha=1 (default), the distribution is uniform on the simplex. A smaller alpha will keeps the final weights more away from 0 and 1.
seed: int or np.random.RandomState instance, optional
If int or RandomState, use it for seeding Bayesian bootstrap. Only useful when method = ‘BB-pseudo-BMA’. Default None the globalnumpy.random state is used.
scale: str, optional
Output scale for IC. Available options are:
log
: (default) log-score (after Vehtari et al. (2017))negative_log
: -1 * (log-score)deviance
: -2 * (log-score)
A higher log-score (or a lower deviance) indicates a model with better predictive accuracy.
var_name: str, optional
If there is more than a single observed variable in the InferenceData
, which should be used as the basis for comparison.
Returns:
A
DataFrame
, ordered
from
best
to
worst
model
(measured
by
the
ELPD
).
The
index
reflects
the
key
with
which
the
models
are
passed
to
this
function. The
columns
are:
rank: The
rank-order of the
models. 0 is
the
best.
elpd: ELPD
estimated
either
using
(PSIS-LOO-CV elpd_loo
or WAIC
elpd_waic
).
Higher ELPD indicates higher out-of-sample predictive fit (“better” model). If scale
is deviance
or negative_log
smaller values indicates higher out-of-sample predictive fit (“better” model).
pIC: Estimated
effective
number
of parameters.
elpd_diff: The
difference
in
ELPD
between
two
models.
If more than two models are compared, the difference is computed relative to the top-ranked model, that always has a elpd_diff of 0.
weight: Relative
weight
for
each
model.
This can be loosely interpreted as the probability of each model (among the compared model) given the data. By default the uncertainty in the weights estimation is considered using Bayesian bootstrap.
SE: Standard
error
of the
ELPD
estimate.
If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.
dSE: Standard
error
of the
difference
in
ELPD
between
each
model
and
the
top-ranked model.
It’s always 0 for the top-ranked model.
warning: A
value
of 1 indicates
that
the
computation
of the
ELPD
may
not
be
reliable.
This could be indication of WAIC/LOO starting to fail seehttp://arxiv.org/abs/1507.04544 for details.
scale: Scale
used
for
the
ELPD.
See also
Compute the ELPD using the Pareto smoothed importance sampling Leave-one-out cross-validation method.
Compute the ELPD using the widely applicable information criterion.
Summary plot for model comparison.
References
Examples
Compare the centered and non centered models of the eight school problem:
Compare the models using PSIS-LOO-CV, returning the ELPD in log scale and calculating the weights using the stacking method.