arviz.plot_ecdf — ArviZ dev documentation (original) (raw)

arviz.plot_ecdf(values, values2=None, eval_points=None, cdf=None, difference=False, confidence_bands=False, ci_prob=None, num_trials=500, rvs=None, random_state=None, figsize=None, fill_band=True, plot_kwargs=None, fill_kwargs=None, plot_outline_kwargs=None, ax=None, show=None, backend=None, backend_kwargs=None, npoints=100, pointwise=False, fpr=None, pit=False, **kwargs)[source]#

Plot ECDF or ECDF-Difference Plot with Confidence bands.

Plots of the empirical cumulative distribution function (ECDF) of an array. Optionally, A cdfargument representing a reference CDF may be provided for comparison using a difference ECDF plot and/or confidence bands.

Alternatively, the PIT for a single dataset may be visualized.

Parameters:

valuesarray_like

Values to plot from an unknown continuous or discrete distribution.

values2array_like, optional

values to compare to the original sample.

Deprecated since version 0.18.0: Instead use cdf=scipy.stats.ecdf(values2).cdf.evaluate.

cdfcallable(), optional

Cumulative distribution function of the distribution to compare the original sample. The function must take as input a numpy array of draws from the distribution.

differencebool, default False

If True then plot ECDF-difference plot otherwise ECDF plot.

confidence_bandsstr or bool

False: No confidence bands are plotted (default).
True: Plot bands computed with the default algorithm (subject to change)
“pointwise”: Compute the pointwise (i.e. marginal) confidence band.
“optimized”: Use optimization to estimate a simultaneous confidence band.
“simulated”: Use Monte Carlo simulation to estimate a simultaneous confidence band.

For simultaneous confidence bands to be correctly calibrated, provide eval_points that are not dependent on the values.

ci_probfloat, default 0.94

The probability that the true ECDF lies within the confidence band. If confidence_bandsis “pointwise”, this is the marginal probability instead of the joint probability.

eval_pointsarray_like, optional

The points at which to evaluate the ECDF. If None, npoints uniformly spaced points between the data bounds will be used.

rvs: callable, optional

A function that takes an integer ndraws and optionally the object passed torandom_state and returns an array of ndraws samples from the same distribution as the original dataset. Required if method is “simulated” and variable is discrete.

random_stateint, numpy.random.Generator or numpy.random.RandomState, optional

num_trialsint, default 500

The number of random ECDFs to generate for constructing simultaneous confidence bands (if confidence_bands is “simulated”).

figsize(float,float), optional

Figure size. If None it will be defined automatically.

fill_bandbool, default True

If True it fills in between to mark the area inside the confidence interval. Otherwise, plot the border lines.

plot_kwargsdict, optional

Additional kwargs passed to matplotlib.pyplot.step() orbokeh.plotting.figure.step()

fill_kwargsdict, optional

Additional kwargs passed to matplotlib.pyplot.fill_between() orbokeh:bokeh.plotting.Figure.varea()

plot_outline_kwargsdict, optional

Additional kwargs passed to matplotlib.axes.Axes.plot() orbokeh:bokeh.plotting.Figure.line()

ax :axes, optional

Matplotlib axes or bokeh figures.

showbool, optional

Call backend show function.

backend{“matplotlib”, “bokeh”}, default “matplotlib”

Select plotting backend.

backend_kwargsdict, optional

These are kwargs specific to the backend being used, passed tomatplotlib.pyplot.subplots() or bokeh.plotting.figure. For additional documentation check the plotting method of the backend.

npointsint, default 100

The number of evaluation points for the ecdf or ecdf-difference plots, if eval_points is not provided or pit is True.

Deprecated since version 0.18.0: Instead specify eval_points=np.linspace(np.min(values), np.max(values), npoints)unless pit is True.

pointwisebool, default False

Deprecated since version 0.18.0: Instead use confidence_bands="pointwise".

fprfloat, optional

Deprecated since version 0.18.0: Instead use ci_prob=1-fpr.

pitbool, default False

If True plots the ECDF or ECDF-diff of PIT of sample.

Deprecated since version 0.18.0: See below example instead.

Returns:

axesmatplotlib Axes or Bokeh Figure

Notes

This plot computes the confidence bands with the simulated based algorithm presented in [1].

References

[1]

Säilynoja, T., Bürkner, P.C. and Vehtari, A. (2022). Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison. Statistics and Computing, 32(32).

Examples

In a future release, the default behaviour of plot_ecdf will change. To maintain the original behaviour you should do:

import arviz as az import numpy as np from scipy.stats import uniform, norm

sample = norm(0,1).rvs(1000) npoints = 100 az.plot_ecdf(sample, eval_points=np.linspace(sample.min(), sample.max(), npoints))

../../_images/arviz-plot_ecdf-1.png

However, seeing this warning isn’t an indicator of anything being wrong, if you are happy to get different behaviour as ArviZ improves and adds new algorithms you can ignore it like so:

import warnings warnings.filterwarnings("ignore", category=az.utils.BehaviourChangeWarning)

Plot an ECDF plot for a given sample evaluated at the sample points. This will become the new behaviour when eval_points is not provided:

az.plot_ecdf(sample, eval_points=np.unique(sample))

../../_images/arviz-plot_ecdf-3.png

Plot an ECDF plot with confidence bands for comparing a given sample to a given distribution. We manually specify evaluation points independent of the values so that the confidence bands are correctly calibrated.

distribution = norm(0,1) eval_points = np.linspace(*distribution.ppf([0.001, 0.999]), 100) az.plot_ecdf( sample, eval_points=eval_points, cdf=distribution.cdf, confidence_bands=True )

../../_images/arviz-plot_ecdf-4.png

Plot an ECDF-difference plot with confidence bands for comparing a given sample to a given distribution.

az.plot_ecdf( sample, cdf=distribution.cdf, confidence_bands=True, difference=True )

../../_images/arviz-plot_ecdf-5.png

Plot an ECDF plot with confidence bands for the probability integral transform (PIT) of a continuous sample. If drawn from the reference distribution, the PIT values should be uniformly distributed.

pit_vals = distribution.cdf(sample) uniform_dist = uniform(0, 1) az.plot_ecdf( pit_vals, cdf=uniform_dist.cdf, confidence_bands=True, )

../../_images/arviz-plot_ecdf-6.png

Plot an ECDF-difference plot of PIT values.

az.plot_ecdf( pit_vals, cdf = uniform_dist.cdf, confidence_bands = True, difference = True )

../../_images/arviz-plot_ecdf-7.png