Plotting - uhi 0.5.1.dev33+g1da3778 documentation (original) (raw)

This is a description of the PlottableProtocol. Any plotting library that accepts an object that follows the PlottableProtocol can plot object that follow this protocol, and libraries that follow this protocol are compatible with plotters. The Protocol is runtime checkable, though as usual, that will only check for the presence of the needed methods at runtime, not for the static types.

Using the protocol:

Plotters should only depend on the methods and attributes listed below. In short, they are:

Axes have:

Plotters should see if .counts() is None; no boost-histogram objects currently return None, but a future storage or different library could.

Also check .variances; if not None, this storage holds variance information and error bars should be included. Boost-histogram histograms will return something unless they know that this is an invalid assumption (a weighted fill was made on an unweighted histogram).

To statically restrict yourself to valid API usage, use PlottableHistogramas the parameter type to your function (Not needed at runtime).

Implementing the protocol:

Add UHI to your MyPy environment; an example .pre-commit-config.yaml file:

Then, check your library against the Protocol like this:

from typing import TYPE_CHECKING, cast

if TYPE_CHECKING: _: PlottableHistogram = cast(MyHistogram, None)

Help for plotters

The module uhi.numpy_plottable has a utility to simplify the common use case of accepting a PlottableProtocol or other common formats, primarily a NumPy histogram/histogram2d/histogramdd tuple. Theensure_plottable_histogram function will take a histogram or NumPy tuple, or an object that implements .to_numpy() or .numpy() and convert it to aNumPyPlottableHistogram, which is a minimal implementation of the Protocol. By calling this function on your input, you can then write your plotting function knowing that you always have a PlottableProtocol object, greatly simplifying your code.

The full protocol version 1.2 follows:

(Also available as uhi.typing.plottable.PlottableProtocol, for use in tests, etc.

""" Using the protocol:

Producers: use isinstance(myhist, PlottableHistogram) in your tests; part of the protocol is checkable at runtime, though ideally you should use MyPy; if your histogram class supports PlottableHistogram, this will pass.

Consumers: Make your functions accept the PlottableHistogram static type, and MyPy will force you to only use items in the Protocol. """

from future import annotations

from collections.abc import Iterator, Sequence from typing import Any, Protocol, Tuple, TypeVar, Union, runtime_checkable

NumPy 1.20+ will work much, much better than previous versions when type checking

import numpy as np

protocol_version = (1, 2)

Known kinds of histograms. A Producer can add Kinds not defined here; a

Consumer should check for known types if it matters. A simple plotter could

just use .value and .variance if non-None and ignore .kind.

Could have been Kind = Literal["COUNT", "MEAN"] - left as a generic string so

it can be extendable.

Kind = str

Implementations are highly encouraged to use the following construct:

class Kind(str, enum.Enum):

COUNT = "COUNT"

MEAN = "MEAN"

Then return and use Kind.COUNT or Kind.MEAN.

@runtime_checkable class PlottableTraits(Protocol): @property def circular(self) -> bool: """ True if the axis "wraps around" """

@property
def discrete(self) -> bool:
    """
    True if each bin is discrete - Integer, Boolean, or Category, for example
    """

T_co = TypeVar("T_co", covariant=True)

@runtime_checkable class PlottableAxisGeneric(Protocol[T_co]): # name: str - Optional, not part of Protocol # label: str - Optional, not part of Protocol # # Plotters are encouraged to plot label if it exists and is not None, and # name otherwise if it exists and is not None, but these properties are not # available on all histograms and not part of the Protocol.

@property
def traits(self) -> PlottableTraits: ...

def __getitem__(self, index: int) -> T_co:
    """
    Get the pair of edges (not discrete) or bin label (discrete).
    """

def __len__(self) -> int:
    """
    Return the number of bins (not counting flow bins, which are ignored
    for this Protocol currently).
    """

def __eq__(self, other: Any) -> bool:
    """
    Required to be sequence-like.
    """

def __iter__(self) -> Iterator[T_co]:
    """
    Useful element of a Sequence to include.
    """

PlottableAxisContinuous = PlottableAxisGeneric[Tuple[float, float]] PlottableAxisInt = PlottableAxisGeneric[int] PlottableAxisStr = PlottableAxisGeneric[str]

PlottableAxis = Union[PlottableAxisContinuous, PlottableAxisInt, PlottableAxisStr]

@runtime_checkable class PlottableHistogram(Protocol): @property def axes(self) -> Sequence[PlottableAxis]: ...

@property
def kind(self) -> Kind: ...

# All methods can have a flow=False argument - not part of this Protocol.
# If this is included, it should return an array with flow bins added,
# normal ordering.

def values(self) -> np.typing.NDArray[Any]:
    """
    Returns the accumulated values. The counts for simple histograms, the
    sum of weights for weighted histograms, the mean for profiles, etc.

    If counts is equal to 0, the value in that cell is undefined if
    kind == "MEAN".
    """

def variances(self) -> np.typing.NDArray[Any] | None:
    """
    Returns the estimated variance of the accumulated values. The sum of squared
    weights for weighted histograms, the variance of samples for profiles, etc.
    For an unweighed histogram where kind == "COUNT", this should return the same
    as values if the histogram was not filled with weights, and None otherwise.

    If counts is equal to 1 or less, the variance in that cell is undefined if
    kind == "MEAN".

    If kind == "MEAN", the counts can be used to compute the error on the mean
    as sqrt(variances / counts), this works whether or not the entries are
    weighted if the weight variance was tracked by the implementation.
    """

def counts(self) -> np.typing.NDArray[Any] | None:
    """
    Returns the number of entries in each bin for an unweighted
    histogram or profile and an effective number of entries (defined below)
    for a weighted histogram or profile. An exotic generalized histogram could
    have no sensible .counts, so this is Optional and should be checked by
    Consumers.

    If kind == "MEAN", counts (effective or not) can and should be used to
    determine whether the mean value and its variance should be displayed
    (see documentation of values and variances, respectively). The counts
    should also be used to compute the error on the mean (see documentation
    of variances).

    For a weighted histogram, counts is defined as sum_of_weights ** 2 /
    sum_of_weights_squared. It is equal or less than the number of times
    the bin was filled, the equality holds when all filled weights are equal.
    The larger the spread in weights, the smaller it is, but it is always 0
    if filled 0 times, and 1 if filled once, and more than 1 otherwise.

    A suggested implementation is:

        return np.divide(
            sum_of_weights**2,
            sum_of_weights_squared,
            out=np.zeros_like(sum_of_weights, dtype=np.float64),
            where=sum_of_weights_squared != 0)
    """