tfp.distributions.GaussianProcess | TensorFlow Probability (original) (raw)

Marginal distribution of a Gaussian process at finitely many points.

tfp.distributions.GaussianProcess(
    kernel,
    index_points=None,
    mean_fn=None,
    observation_noise_variance=0.0,
    marginal_fn=None,
    cholesky_fn=None,
    jitter=1e-06,
    validate_args=False,
    allow_nan_stats=False,
    parameters=None,
    name='GaussianProcess',
    _check_marginal_cholesky_fn=True
)

A Gaussian process (GP) is an indexed collection of random variables, any finite collection of which are jointly Gaussian. While this definition applies to finite index sets, it is typically implicit that the index set is infinite; in applications, it is often some finite dimensional real or complex vector space. In such cases, the GP may be thought of as a distribution over (real- or complex-valued) functions defined over the index set.

Just as Gaussian distributions are fully specified by their first and second moments, a Gaussian process can be completely specified by a mean and covariance function. Let S denote the index set and K the space in which each indexed random variable takes its values (again, often R or C). The mean function is then a map m: S -> K, and the covariance function, or kernel, is a positive-definite function k: (S x S) -> K. The properties of functions drawn from a GP are entirely dictated (up to translation) by the form of the kernel function.

This Distribution represents the marginal joint distribution over function values at a given finite collection of points [x[1], ..., x[N]] from the index set S. By definition, this marginal distribution is just a multivariate normal distribution, whose mean is given by the vector[ m(x[1]), ..., m(x[N]) ] and whose covariance matrix is constructed from pairwise applications of the kernel function to the given inputs:

    | k(x[1], x[1])    k(x[1], x[2])  ...  k(x[1], x[N]) |
    | k(x[2], x[1])    k(x[2], x[2])  ...  k(x[2], x[N]) |
    |      ...              ...                 ...      |
    | k(x[N], x[1])    k(x[N], x[2])  ...  k(x[N], x[N]) |

For this to be a valid covariance matrix, it must be symmetric and positive definite; hence the requirement that k be a positive definite function (which, by definition, says that the above procedure will yield PD matrices).

We also support the inclusion of zero-mean Gaussian noise in the model, via the observation_noise_variance parameter. This augments the generative model to

f ~ GP(m, k)
(y[i] | f, x[i]) ~ Normal(f(x[i]), s)

where

m is the mean function
k is the covariance kernel function
f is the function drawn from the GP
x[i] are the index points at which the function is observed
y[i] are the observed values at the index points
s is the scale of the observation noise.

Note that this class represents an unconditional Gaussian process; it does not implement posterior inference conditional on observed function evaluations. This class is useful, for example, if one wishes to combine a GP prior with a non-conjugate likelihood using MCMC to sample from the posterior.

Mathematical Details

The probability density function (pdf) is a multivariate normal whose parameters are derived from the GP's properties:

pdf(x; index_points, mean_fn, kernel) = exp(-0.5 * y) / Z
K = (kernel.matrix(index_points, index_points) +
     observation_noise_variance * eye(N))
y = (x - mean_fn(index_points))^T @ K @ (x - mean_fn(index_points))
Z = (2 * pi)**(.5 * N) |det(K)|**(.5)

where:

index_points are points in the index set over which the GP is defined,
mean_fn is a callable mapping the index set to the GP's mean values,
kernel is PositiveSemidefiniteKernel-like and represents the covariance function of the GP,
observation_noise_variance represents (optional) observation noise.
eye(N) is an N-by-N identity matrix.

Examples

Draw joint samples from a GP prior

import numpy as np
import tensorflow.compat.v2 as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
psd_kernels = tfp.math.psd_kernels

num_points = 100
# Index points should be a collection (100, here) of feature vectors. In this
# example, we're using 1-d vectors, so we just need to reshape the output from
# np.linspace, to give a shape of (100, 1).
index_points = np.expand_dims(np.linspace(-1., 1., num_points), -1)

# Define a kernel with default parameters.
kernel = psd_kernels.ExponentiatedQuadratic()

gp = tfd.GaussianProcess(kernel, index_points)

samples = gp.sample(10)
# ==> 10 independently drawn, joint samples at `index_points`

noisy_gp = tfd.GaussianProcess(
    kernel=kernel,
    index_points=index_points,
    observation_noise_variance=.05)
noisy_samples = noisy_gp.sample(10)
# ==> 10 independently drawn, noisy joint samples at `index_points`

Optimize kernel parameters via maximum marginal likelihood.

# Suppose we have some data from a known function. Note the index points in
# general have shape `[b1, ..., bB, f1, ..., fF]` (here we assume `F == 1`),
# so we need to explicitly consume the feature dimensions (just the last one
# here).
f = lambda x: np.sin(10*x[..., 0]) * np.exp(-x[..., 0]**2)
observed_index_points = np.expand_dims(np.random.uniform(-1., 1., 50), -1)
# Squeeze to take the shape from [50, 1] to [50].
observed_values = f(observed_index_points)

# Define a kernel with trainable parameters.
kernel = psd_kernels.ExponentiatedQuadratic(
    amplitude=tf.Variable(1., dtype=np.float64, name='amplitude'),
    length_scale=tf.Variable(1., dtype=np.float64, name='length_scale'))

gp = tfd.GaussianProcess(kernel, observed_index_points)

optimizer = tf.optimizers.Adam()

@tf.function
def optimize():
  with tf.GradientTape() as tape:
    loss = -gp.log_prob(observed_values)
  grads = tape.gradient(loss, gp.trainable_variables)
  optimizer.apply_gradients(zip(grads, gp.trainable_variables))
  return loss

for i in range(1000):
  neg_log_likelihood = optimize()
  if i % 100 == 0:
    print("Step {}: NLL = {}".format(i, neg_log_likelihood))
print("Final NLL = {}".format(neg_log_likelihood))

Args
kernel	PositiveSemidefiniteKernel-like instance representing the GP's covariance function.
index_points	(nested) Tensor representing finite (batch of) vector(s) of points in the index set over which the GP is defined. Shape (or shape of each nested component) has the form [b1, ..., bB, e, f1, ..., fF] where F is the number of feature dimensions and must equal kernel.feature_ndims (or its corresponding nested component) and e is the number (size) of index points in each batch. Ultimately this distribution corresponds to a e-dimensional multivariate normal. The batch shape must be broadcastable withkernel.batch_shape and any batch dims yielded by mean_fn.
mean_fn	Python callable that acts on index_points to produce a (batch of) vector(s) of mean values at index_points. Takes a (nested)Tensor of shape [b1, ..., bB, e, f1, ..., fF] and returns a Tensorwhose shape is broadcastable with [b1, ..., bB, e]. Default value: None implies constant zero function.
observation_noise_variance	float Tensor representing (batch of) scalar variance(s) of the noise in the Normal likelihood distribution of the model. If batched, the batch shape must be broadcastable with the shapes of all other batched parameters (kernel.batch_shape, index_points, etc.). Default value: 0.
marginal_fn	A Python callable that takes a location, covariance matrix, optional validate_args, allow_nan_stats and name arguments, and returns a multivariate normal subclass of tfd.Distribution. At most one of cholesky_fn and marginal_fn should be set. Default value: None, in which case a Cholesky-factorizing function is created using make_cholesky_factored_marginal_fn and thecholesky_fn argument.
cholesky_fn	Callable which takes a single (batch) matrix argument and returns a Cholesky-like lower triangular factor. Default value: None, in which case make_cholesky_with_jitter_fn is used with the jitterparameter. At most one of cholesky_fn and marginal_fn should be set.
jitter	float scalar Tensor added to the diagonal of the covariance matrix to ensure positive definiteness of the covariance matrix, whenmarginal_fn and cholesky_fn is None. This argument is ignored if cholesky_fn is set. Default value: 1e-6.
validate_args	Python bool, default False. When True distribution parameters are checked for validity despite possibly degrading runtime performance. When False invalid inputs may silently render incorrect outputs. Default value: False.
allow_nan_stats	Python bool, default True. When True, statistics (e.g., mean, mode, variance) use the value "NaN" to indicate the result is undefined. When False, an exception is raised if one or more of the statistic's batch members are undefined. Default value: False.
parameters	For subclasses, a dict of constructor arguments.
name	Python str name prefixed to Ops created by this class. Default value: "GaussianProcess".
_check_marginal_cholesky_fn	Internal parameter -- do not use.

Raises
ValueError	if mean_fn is not None and is not callable.

Attributes
allow_nan_stats	Python bool describing behavior when a stat is undefined.Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution's pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student's T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.
batch_shape	Shape of a single sample from a single event index as a TensorShape.May be partially defined or unknown. The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.
cholesky_fn
dtype	The DType of Tensors handled by this Distribution.
event_shape	Shape of a single sample from a single batch as a TensorShape.May be partially defined or unknown.
experimental_shard_axis_names	The list or structure of lists of active shard axis names.
index_points
jitter	DEPRECATED FUNCTION
kernel
marginal_fn
mean_fn
name	Name prepended to all ops created by this Distribution.
name_scope	Returns a tf.name_scope instance for this class.
non_trainable_variables	Sequence of non-trainable variables owned by this module and its submodules.
observation_noise_variance
parameters	Dictionary of parameters used to instantiate this Distribution.
reparameterization_type	Describes how samples from the distribution are reparameterized.Currently this is one of the static instancestfd.FULLY_REPARAMETERIZED or tfd.NOT_REPARAMETERIZED.
submodules	Sequence of all sub-modules.Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on). a = tf.Module() b = tf.Module() c = tf.Module() a.b = b b.c = c list(a.submodules) == [b, c] True list(b.submodules) == [c] True list(c.submodules) == [] True
trainable_variables	Sequence of trainable variables owned by this module and its submodules.
validate_args	Python bool indicating possibly expensive checks are enabled.
variables	Sequence of variables owned by this module and its submodules.

Args
value	float or double Tensor.
name	Python str prepended to names of ops created by this function.
**kwargs	Named arguments forwarded to subclass implementation.

Args
other	tfp.distributions.Distribution instance.
name	Python str prepended to names of ops created by this function.

Args
*args	Passed to implementation _default_event_space_bijector.
**kwargs	Passed to implementation _default_event_space_bijector.

Args
value	a Tensor valid sample from this distribution family.
sample_ndims	Positive int Tensor number of leftmost dimensions ofvalue that index i.i.d. samples. Default value: 1.
validate_args	Python bool, default False. When True, distribution parameters are checked for validity despite possibly degrading runtime performance. When False, invalid inputs may silently render incorrect outputs. Default value: False.
**init_kwargs	Additional keyword arguments passed through tocls.__init__. These take precedence in case of collision with the fitted parameters; for example,tfd.Normal.experimental_fit([1., 1.], scale=20.) returns a Normal distribution with scale=20. rather than the maximum likelihood parameter scale=0..

Args
value	float or double Tensor.
backward_compat	bool specifying whether to fall back to returningFullSpace as the tangent space, and representing R^n with the standard basis.
**kwargs	Named arguments forwarded to subclass implementation.

Returns
log_prob	a Tensor representing the log probability density, of shapesample_shape(x) + self.batch_shape with values of type self.dtype.
tangent_space	a TangentSpace object (by default FullSpace) representing the tangent space to the manifold at value.

Args
sample_shape	integer Tensor desired shape of samples to draw. Default value: ().
seed	PRNG seed; see tfp.random.sanitize_seed for details. Default value: None.
name	name to give to the op. Default value: 'sample_and_log_prob'.
**kwargs	Named arguments forwarded to subclass implementation.

Returns
samples	a Tensor, or structure of Tensors, with prepended dimensionssample_shape.
log_prob	a Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

Args
sample_shape	Tensor or python list/tuple. Desired shape of a call tosample().
name	name to prepend ops with.

Args
dtype	Optional float dtype to assume for continuous-valued parameters. Some constraining bijectors require advance knowledge of the dtype because certain constants (e.g., tfb.Softplus.low) must be instantiated with the same dtype as the values to be transformed.
num_classes	Optional int Tensor number of classes to assume when inferring the shape of parameters for categorical-like distributions. Otherwise ignored.

Args
observations	float Tensor representing collection, or batch of collections, of observations corresponding toself.index_points. Shape has the form [b1, ..., bB, e], which must be broadcastable with the batch and example shapes ofself.index_points. The batch shape [b1, ..., bB] must be broadcastable with the shapes of all other batched parameters
predictive_index_points	(nested) Tensor representing finite collection, or batch of collections, of points in the index set over which the GP is defined. Shape (or shape of each nested component) has the form[b1, ..., bB, e, f1, ..., fF] where F is the number of feature dimensions and must equal kernel.feature_ndims (or its corresponding nested component) and e is the number (size) of predictive index points in each batch. The batch shape must be broadcastable with this distributions batch_shape. Default value: None.
**kwargs	Any other keyword arguments to pass / override.

Args
sample_shape	0D or 1D int32 Tensor. Shape of the generated samples.
seed	PRNG seed; see tfp.random.sanitize_seed for details.
name	name to give to the op.
**kwargs	Named arguments forwarded to subclass implementation.

tfp.distributions.GaussianProcess | TensorFlow Probability (original) (raw)

Mathematical Details

Examples

Draw joint samples from a GP prior

Optimize kernel parameters via maximum marginal likelihood.

Methods

batch_shape_tensor

cdf

copy

covariance

cross_entropy

entropy

event_shape_tensor

experimental_default_event_space_bijector

experimental_fit

experimental_local_measure

experimental_sample_and_log_prob

get_marginal_distribution

is_scalar_batch

is_scalar_event

kl_divergence

log_cdf

log_prob

kwargs:

log_survival_function

mean

mode

param_shapes

param_static_shapes

parameter_properties

posterior_predictive

prob

quantile

sample

stddev

survival_function

unnormalized_log_prob

variance

with_name_scope

__getitem__

__iter__

`batch_shape_tensor`

`cdf`

`copy`

`covariance`

`cross_entropy`

`entropy`

`event_shape_tensor`

`experimental_default_event_space_bijector`

`experimental_fit`

`experimental_local_measure`

`experimental_sample_and_log_prob`

`get_marginal_distribution`

`is_scalar_batch`

`is_scalar_event`

`kl_divergence`

`log_cdf`

`log_prob`

`kwargs`:

`log_survival_function`

`mean`

`mode`

`param_shapes`

`param_static_shapes`

`parameter_properties`

`posterior_predictive`

`prob`

`quantile`

`sample`

`stddev`

`survival_function`

`unnormalized_log_prob`

`variance`

`with_name_scope`

`getitem`

`iter`