Probability distributions - torch.distributions — PyTorch 2.7 documentation (original) (raw)

The distributions package contains parameterizable probability distributions and sampling functions. This allows the construction of stochastic computation graphs and stochastic gradient estimators for optimization. This package generally follows the design of the TensorFlow Distributions package.

It is not possible to directly backpropagate through random samples. However, there are two main methods for creating surrogate functions that can be backpropagated through. These are the score function estimator/likelihood ratio estimator/REINFORCE and the pathwise derivative estimator. REINFORCE is commonly seen as the basis for policy gradient methods in reinforcement learning, and the pathwise derivative estimator is commonly seen in the reparameterization trick in variational autoencoders. Whilst the score function only requires the value of samples f(x)f(x), the pathwise derivative requires the derivativef′(x)f'(x). The next sections discuss these two in a reinforcement learning example. For more details seeGradient Estimation Using Stochastic Computation Graphs .

Score function

When the probability density function is differentiable with respect to its parameters, we only need sample() andlog_prob() to implement REINFORCE:

Δθ=αr∂log⁡p(a∣πθ(s))∂θ\Delta\theta = \alpha r \frac{\partial\log p(a|\pi^\theta(s))}{\partial\theta}

where θ\theta are the parameters, α\alpha is the learning rate,rr is the reward and p(a∣πθ(s))p(a|\pi^\theta(s)) is the probability of taking action aa in state ss given policy πθ\pi^\theta.

In practice we would sample an action from the output of a network, apply this action in an environment, and then use log_prob to construct an equivalent loss function. Note that we use a negative because optimizers use gradient descent, whilst the rule above assumes gradient ascent. With a categorical policy, the code for implementing REINFORCE would be as follows:

probs = policy_network(state)

Note that this is equivalent to what used to be called multinomial

m = Categorical(probs) action = m.sample() next_state, reward = env.step(action) loss = -m.log_prob(action) * reward loss.backward()

Pathwise derivative

The other way to implement these stochastic/policy gradients would be to use the reparameterization trick from thersample() method, where the parameterized random variable can be constructed via a parameterized deterministic function of a parameter-free random variable. The reparameterized sample therefore becomes differentiable. The code for implementing the pathwise derivative would be as follows:

params = policy_network(state) m = Normal(*params)

Any distribution with .has_rsample == True could work based on the application

action = m.rsample() next_state, reward = env.step(action) # Assuming that reward is differentiable loss = -reward loss.backward()

Distribution

class torch.distributions.distribution.Distribution(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)[source][source]

Bases: object

Distribution is the abstract base class for probability distributions.

property arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_

Returns a dictionary from argument names toConstraint objects that should be satisfied by each argument of this distribution. Args that are not tensors need not appear in this dict.

property batch_shape_: Size_

Returns the shape over which parameters are batched.

cdf(value)[source][source]

Returns the cumulative density/mass function evaluated atvalue.

Parameters

value (Tensor) –

Return type

Tensor

entropy()[source][source]

Returns entropy of distribution, batched over batch_shape.

Returns

Tensor of shape batch_shape.

Return type

Tensor

enumerate_support(expand=True)[source][source]

Returns tensor containing all values supported by a discrete distribution. The result will enumerate over dimension 0, so the shape of the result will be (cardinality,) + batch_shape + event_shape(where event_shape = () for univariate distributions).

Note that this enumerates over all batched tensors in lock-step[[0, 0], [1, 1], …]. With expand=False, enumeration happens along dim 0, but with the remaining batch dimensions being singleton dimensions, [[0], [1], ...

To iterate over the full Cartesian product useitertools.product(m.enumerate_support()).

Parameters

expand (bool) – whether to expand the support over the batch dims to match the distribution’s batch_shape.

Returns

Tensor iterating over dimension 0.

Return type

Tensor

property event_shape_: Size_

Returns the shape of a single sample (without batching).

expand(batch_shape, _instance=None)[source][source]

Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded tobatch_shape. This method calls expand on the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in__init__.py, when an instance is first created.

Parameters

Returns

New distribution instance with batch dimensions expanded tobatch_size.

icdf(value)[source][source]

Returns the inverse cumulative density/mass function evaluated atvalue.

Parameters

value (Tensor) –

Return type

Tensor

log_prob(value)[source][source]

Returns the log of the probability density/mass function evaluated atvalue.

Parameters

value (Tensor) –

Return type

Tensor

property mean_: Tensor_

Returns the mean of the distribution.

property mode_: Tensor_

Returns the mode of the distribution.

perplexity()[source][source]

Returns perplexity of distribution, batched over batch_shape.

Returns

Tensor of shape batch_shape.

Return type

Tensor

rsample(sample_shape=torch.Size([]))[source][source]

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

Return type

Tensor

sample(sample_shape=torch.Size([]))[source][source]

Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched.

Return type

Tensor

sample_n(n)[source][source]

Generates n samples or n batches of samples if the distribution parameters are batched.

Return type

Tensor

static set_default_validate_args(value)[source][source]

Sets whether validation is enabled or disabled.

The default behavior mimics Python’s assert statement: validation is on by default, but is disabled if Python is run in optimized mode (via python -O). Validation may be expensive, so you may want to disable it once a model is working.

Parameters

value (bool) – Whether to enable validation.

property stddev_: Tensor_

Returns the standard deviation of the distribution.

property support_: Optional[Constraint]_

Returns a Constraint object representing this distribution’s support.

property variance_: Tensor_

Returns the variance of the distribution.

ExponentialFamily

class torch.distributions.exp_family.ExponentialFamily(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)[source][source]

Bases: Distribution

ExponentialFamily is the abstract base class for probability distributions belonging to an exponential family, whose probability mass/density function has the form is defined below

pF(x;θ)=exp⁡(⟨t(x),θ⟩−F(θ)+k(x))p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle - F(\theta) + k(x))

where θ\theta denotes the natural parameters, t(x)t(x) denotes the sufficient statistic,F(θ)F(\theta) is the log normalizer function for a given family and k(x)k(x) is the carrier measure.

Note

This class is an intermediary between the Distribution class and distributions which belong to an exponential family mainly to check the correctness of the .entropy() and analytic KL divergence methods. We use this class to compute the entropy and KL divergence using the AD framework and Bregman divergences (courtesy of: Frank Nielsen and Richard Nock, Entropies and Cross-entropies of Exponential Families).

entropy()[source][source]

Method to compute the entropy using Bregman divergence of the log normalizer.

Bernoulli

class torch.distributions.bernoulli.Bernoulli(probs=None, logits=None, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a Bernoulli distribution parameterized by probsor logits (but not both).

Samples are binary (0 or 1). They take the value 1 with probability pand 0 with probability 1 - p.

Example:

m = Bernoulli(torch.tensor([0.3])) m.sample() # 30% chance 1; 70% chance 0 tensor([ 0.])

Parameters

arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}

entropy()[source][source]

enumerate_support(expand=True)[source][source]

expand(batch_shape, _instance=None)[source][source]

has_enumerate_support = True

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

support = Boolean()

property variance_: Tensor_

Beta

class torch.distributions.beta.Beta(concentration1, concentration0, validate_args=None)[source][source]

Bases: ExponentialFamily

Beta distribution parameterized by concentration1 and concentration0.

Example:

m = Beta(torch.tensor([0.5]), torch.tensor([0.5])) m.sample() # Beta distributed with concentration concentration1 and concentration0 tensor([ 0.1046])

Parameters

arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0)}

property concentration0_: Tensor_

property concentration1_: Tensor_

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=())[source][source]

Return type

Tensor

support = Interval(lower_bound=0.0, upper_bound=1.0)

property variance_: Tensor_

Binomial

class torch.distributions.binomial.Binomial(total_count=1, probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a Binomial distribution parameterized by total_count and either probs or logits (but not both). total_count must be broadcastable with probs/logits.

Example:

m = Binomial(100, torch.tensor([0 , .2, .8, 1])) x = m.sample() tensor([ 0., 22., 71., 100.])

m = Binomial(torch.tensor([[5.], [10.]]), torch.tensor([0.5, 0.8])) x = m.sample() tensor([[ 4., 5.], [ 7., 6.]])

Parameters

arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0), 'total_count': IntegerGreaterThan(lower_bound=0)}

entropy()[source][source]

enumerate_support(expand=True)[source][source]

expand(batch_shape, _instance=None)[source][source]

has_enumerate_support = True

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

property support

Return type

_DependentProperty

property variance_: Tensor_

Categorical

class torch.distributions.categorical.Categorical(probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a categorical distribution parameterized by either probs orlogits (but not both).

Samples are integers from {0,…,K−1}\{0, \ldots, K-1\} where K is probs.size(-1).

If probs is 1-dimensional with length-K, each element is the relative probability of sampling the class at that index.

If probs is N-dimensional, the first N-1 dimensions are treated as a batch of relative probability vectors.

Note

The probs argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension. probswill return this normalized value. The logits argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension. logitswill return this normalized value.

See also: torch.multinomial()

Example:

m = Categorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) m.sample() # equal probability of 0, 1, 2, 3 tensor(3)

Parameters

arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}

entropy()[source][source]

enumerate_support(expand=True)[source][source]

expand(batch_shape, _instance=None)[source][source]

has_enumerate_support = True

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

property support

Return type

_DependentProperty

property variance_: Tensor_

Cauchy

class torch.distributions.cauchy.Cauchy(loc, scale, validate_args=None)[source][source]

Bases: Distribution

Samples from a Cauchy (Lorentz) distribution. The distribution of the ratio of independent normally distributed random variables with means 0 follows a Cauchy distribution.

Example:

m = Cauchy(torch.tensor([0.0]), torch.tensor([1.0])) m.sample() # sample from a Cauchy distribution with loc=0 and scale=1 tensor([ 2.3214])

Parameters

arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(value)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

support = Real()

property variance_: Tensor_

Chi2

class torch.distributions.chi2.Chi2(df, validate_args=None)[source][source]

Bases: Gamma

Creates a Chi-squared distribution parameterized by shape parameter df. This is exactly equivalent to Gamma(alpha=0.5*df, beta=0.5)

Example:

m = Chi2(torch.tensor([1.0])) m.sample() # Chi2 distributed with shape df=1 tensor([ 0.1046])

Parameters

df (float or Tensor) – shape parameter of the distribution

arg_constraints = {'df': GreaterThan(lower_bound=0.0)}

property df_: Tensor_

expand(batch_shape, _instance=None)[source][source]

ContinuousBernoulli

class torch.distributions.continuous_bernoulli.ContinuousBernoulli(probs=None, logits=None, lims=(0.499, 0.501), validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a continuous Bernoulli distribution parameterized by probsor logits (but not both).

The distribution is supported in [0, 1] and parameterized by ‘probs’ (in (0,1)) or ‘logits’ (real-valued). Note that, unlike the Bernoulli, ‘probs’ does not correspond to a probability and ‘logits’ does not correspond to log-odds, but the same names are used due to the similarity with the Bernoulli. See [1] for more details.

Example:

m = ContinuousBernoulli(torch.tensor([0.3])) m.sample() tensor([ 0.2538])

Parameters

[1] The continuous Bernoulli: fixing a pervasive error in variational autoencoders, Loaiza-Ganem G and Cunningham JP, NeurIPS 2019.https://arxiv.org/abs/1907.06845

arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(value)[source][source]

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

sample(sample_shape=torch.Size([]))[source][source]

property stddev_: Tensor_

support = Interval(lower_bound=0.0, upper_bound=1.0)

property variance_: Tensor_

Dirichlet

class torch.distributions.dirichlet.Dirichlet(concentration, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a Dirichlet distribution parameterized by concentration concentration.

Example:

m = Dirichlet(torch.tensor([0.5, 0.5])) m.sample() # Dirichlet distributed with concentration [0.5, 0.5] tensor([ 0.1046, 0.8954])

Parameters

concentration (Tensor) – concentration parameter of the distribution (often referred to as alpha)

arg_constraints = {'concentration': IndependentConstraint(GreaterThan(lower_bound=0.0), 1)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=())[source][source]

Return type

Tensor

support = Simplex()

property variance_: Tensor_

Exponential

class torch.distributions.exponential.Exponential(rate, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a Exponential distribution parameterized by rate.

Example:

m = Exponential(torch.tensor([1.0])) m.sample() # Exponential distributed with rate=1 tensor([ 0.1046])

Parameters

rate (float or Tensor) – rate = 1 / scale of the distribution

arg_constraints = {'rate': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(value)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

property stddev_: Tensor_

support = GreaterThanEq(lower_bound=0.0)

property variance_: Tensor_

FisherSnedecor

class torch.distributions.fishersnedecor.FisherSnedecor(df1, df2, validate_args=None)[source][source]

Bases: Distribution

Creates a Fisher-Snedecor distribution parameterized by df1 and df2.

Example:

m = FisherSnedecor(torch.tensor([1.0]), torch.tensor([2.0])) m.sample() # Fisher-Snedecor-distributed with df1=1 and df2=2 tensor([ 0.2453])

Parameters

arg_constraints = {'df1': GreaterThan(lower_bound=0.0), 'df2': GreaterThan(lower_bound=0.0)}

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

support = GreaterThan(lower_bound=0.0)

property variance_: Tensor_

Gamma

class torch.distributions.gamma.Gamma(concentration, rate, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a Gamma distribution parameterized by shape concentration and rate.

Example:

m = Gamma(torch.tensor([1.0]), torch.tensor([1.0])) m.sample() # Gamma distributed with concentration=1 and rate=1 tensor([ 0.1046])

Parameters

arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

support = GreaterThanEq(lower_bound=0.0)

property variance_: Tensor_

Geometric

class torch.distributions.geometric.Geometric(probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a Geometric distribution parameterized by probs, where probs is the probability of success of Bernoulli trials.

P(X=k)=(1−p)kp,k=0,1,...P(X=k) = (1-p)^{k} p, k = 0, 1, ...

Note

torch.distributions.geometric.Geometric() (k+1)(k+1)-th trial is the first success hence draws samples in {0,1,…}\{0, 1, \ldots\}, whereastorch.Tensor.geometric_() k-th trial is the first success hence draws samples in {1,2,…}\{1, 2, \ldots\}.

Example:

m = Geometric(torch.tensor([0.3])) m.sample() # underlying Bernoulli has 30% chance 1; 70% chance 0 tensor([ 2.])

Parameters

arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

support = IntegerGreaterThan(lower_bound=0)

property variance_: Tensor_

Gumbel

class torch.distributions.gumbel.Gumbel(loc, scale, validate_args=None)[source][source]

Bases: TransformedDistribution

Samples from a Gumbel Distribution.

Examples:

m = Gumbel(torch.tensor([1.0]), torch.tensor([2.0])) m.sample() # sample from Gumbel distribution with loc=1, scale=2 tensor([ 1.0124])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

property stddev_: Tensor_

support = Real()

property variance_: Tensor_

HalfCauchy

class torch.distributions.half_cauchy.HalfCauchy(scale, validate_args=None)[source][source]

Bases: TransformedDistribution

Creates a half-Cauchy distribution parameterized by scale where:

X ~ Cauchy(0, scale) Y = |X| ~ HalfCauchy(scale)

Example:

m = HalfCauchy(torch.tensor([1.0])) m.sample() # half-cauchy distributed with scale=1 tensor([ 2.3214])

Parameters

scale (float or Tensor) – scale of the full Cauchy distribution

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'scale': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(prob)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

property scale_: Tensor_

support = GreaterThanEq(lower_bound=0.0)

property variance_: Tensor_

HalfNormal

class torch.distributions.half_normal.HalfNormal(scale, validate_args=None)[source][source]

Bases: TransformedDistribution

Creates a half-normal distribution parameterized by scale where:

X ~ Normal(0, scale) Y = |X| ~ HalfNormal(scale)

Example:

m = HalfNormal(torch.tensor([1.0])) m.sample() # half-normal distributed with scale=1 tensor([ 0.1046])

Parameters

scale (float or Tensor) – scale of the full Normal distribution

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'scale': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(prob)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

property scale_: Tensor_

support = GreaterThanEq(lower_bound=0.0)

property variance_: Tensor_

Independent

class torch.distributions.independent.Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)[source][source]

Bases: Distribution

Reinterprets some of the batch dims of a distribution as event dims.

This is mainly useful for changing the shape of the result oflog_prob(). For example to create a diagonal Normal distribution with the same shape as a Multivariate Normal distribution (so they are interchangeable), you can:

from torch.distributions.multivariate_normal import MultivariateNormal from torch.distributions.normal import Normal loc = torch.zeros(3) scale = torch.ones(3) mvn = MultivariateNormal(loc, scale_tril=torch.diag(scale)) [mvn.batch_shape, mvn.event_shape] [torch.Size([]), torch.Size([3])] normal = Normal(loc, scale) [normal.batch_shape, normal.event_shape] [torch.Size([3]), torch.Size([])] diagn = Independent(normal, 1) [diagn.batch_shape, diagn.event_shape] [torch.Size([]), torch.Size([3])]

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {}

entropy()[source][source]

enumerate_support(expand=True)[source][source]

expand(batch_shape, _instance=None)[source][source]

property has_enumerate_support_: bool_

property has_rsample_: bool_

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

sample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

property support

Return type

_DependentProperty

property variance_: Tensor_

InverseGamma

class torch.distributions.inverse_gamma.InverseGamma(concentration, rate, validate_args=None)[source][source]

Bases: TransformedDistribution

Creates an inverse gamma distribution parameterized by concentration and ratewhere:

X ~ Gamma(concentration, rate) Y = 1 / X ~ InverseGamma(concentration, rate)

Example:

m = InverseGamma(torch.tensor([2.0]), torch.tensor([3.0])) m.sample() tensor([ 1.2953])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}

property concentration_: Tensor_

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

property mean_: Tensor_

property mode_: Tensor_

property rate_: Tensor_

support = GreaterThan(lower_bound=0.0)

property variance_: Tensor_

Kumaraswamy

class torch.distributions.kumaraswamy.Kumaraswamy(concentration1, concentration0, validate_args=None)[source][source]

Bases: TransformedDistribution

Samples from a Kumaraswamy distribution.

Example:

m = Kumaraswamy(torch.tensor([1.0]), torch.tensor([1.0])) m.sample() # sample from a Kumaraswamy distribution with concentration alpha=1 and beta=1 tensor([ 0.1729])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

property mean_: Tensor_

property mode_: Tensor_

support = Interval(lower_bound=0.0, upper_bound=1.0)

property variance_: Tensor_

LKJCholesky

class torch.distributions.lkj_cholesky.LKJCholesky(dim, concentration=1.0, validate_args=None)[source][source]

Bases: Distribution

LKJ distribution for lower Cholesky factor of correlation matrices. The distribution is controlled by concentration parameter η\etato make the probability of the correlation matrix MM generated from a Cholesky factor proportional to det⁡(M)η−1\det(M)^{\eta - 1}. Because of that, when concentration == 1, we have a uniform distribution over Cholesky factors of correlation matrices:

L ~ LKJCholesky(dim, concentration) X = L @ L' ~ LKJCorr(dim, concentration)

Note that this distribution samples the Cholesky factor of correlation matrices and not the correlation matrices themselves and thereby differs slightly from the derivations in [1] for the LKJCorr distribution. For sampling, this uses the Onion method from [1] Section 3.

Example:

l = LKJCholesky(3, 0.5) l.sample() # l @ l.T is a sample of a correlation 3x3 matrix tensor([[ 1.0000, 0.0000, 0.0000], [ 0.3516, 0.9361, 0.0000], [-0.1899, 0.4748, 0.8593]])

Parameters

References

[1] Generating random correlation matrices based on vines and extended onion method (2009), Daniel Lewandowski, Dorota Kurowicka, Harry Joe. Journal of Multivariate Analysis. 100. 10.1016/j.jmva.2009.04.008

arg_constraints = {'concentration': GreaterThan(lower_bound=0.0)}

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

sample(sample_shape=torch.Size([]))[source][source]

support = CorrCholesky()

Laplace

class torch.distributions.laplace.Laplace(loc, scale, validate_args=None)[source][source]

Bases: Distribution

Creates a Laplace distribution parameterized by loc and scale.

Example:

m = Laplace(torch.tensor([0.0]), torch.tensor([1.0])) m.sample() # Laplace distributed with loc=0, scale=1 tensor([ 0.1046])

Parameters

arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(value)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

property stddev_: Tensor_

support = Real()

property variance_: Tensor_

LogNormal

class torch.distributions.log_normal.LogNormal(loc, scale, validate_args=None)[source][source]

Bases: TransformedDistribution

Creates a log-normal distribution parameterized byloc and scale where:

X ~ Normal(loc, scale) Y = exp(X) ~ LogNormal(loc, scale)

Example:

m = LogNormal(torch.tensor([0.0]), torch.tensor([1.0])) m.sample() # log-normal distributed with mean=0 and stddev=1 tensor([ 0.1046])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

property loc_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property scale_: Tensor_

support = GreaterThan(lower_bound=0.0)

property variance_: Tensor_

LowRankMultivariateNormal

class torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal(loc, cov_factor, cov_diag, validate_args=None)[source][source]

Bases: Distribution

Creates a multivariate normal distribution with covariance matrix having a low-rank form parameterized by cov_factor and cov_diag:

covariance_matrix = cov_factor @ cov_factor.T + cov_diag

Example

m = LowRankMultivariateNormal( ... torch.zeros(2), torch.tensor([[1.0], [0.0]]), torch.ones(2) ... ) m.sample() # normally distributed with mean=[0,0], cov_factor=[[1],[0]], cov_diag=[1,1] tensor([-0.2102, -0.5429])

Parameters

Note

The computation for determinant and inverse of covariance matrix is avoided whencov_factor.shape[1] << cov_factor.shape[0] thanks to Woodbury matrix identity andmatrix determinant lemma. Thanks to these formulas, we just need to compute the determinant and inverse of the small size “capacitance” matrix:

capacitance = I + cov_factor.T @ inv(cov_diag) @ cov_factor

arg_constraints = {'cov_diag': IndependentConstraint(GreaterThan(lower_bound=0.0), 1), 'cov_factor': IndependentConstraint(Real(), 2), 'loc': IndependentConstraint(Real(), 1)}

property covariance_matrix_: Tensor_

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

property precision_matrix_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

property scale_tril_: Tensor_

support = IndependentConstraint(Real(), 1)

property variance_: Tensor_

MixtureSameFamily

class torch.distributions.mixture_same_family.MixtureSameFamily(mixture_distribution, component_distribution, validate_args=None)[source][source]

Bases: Distribution

The MixtureSameFamily distribution implements a (batch of) mixture distribution where all component are from different parameterizations of the same distribution type. It is parameterized by a Categorical“selecting distribution” (over k component) and a component distribution, i.e., a Distribution with a rightmost batch shape (equal to [k]) which indexes each (batch of) component.

Examples:

Construct Gaussian Mixture Model in 1D consisting of 5 equally

weighted normal distributions

mix = D.Categorical(torch.ones(5,)) comp = D.Normal(torch.randn(5,), torch.rand(5,)) gmm = MixtureSameFamily(mix, comp)

Construct Gaussian Mixture Model in 2D consisting of 5 equally

weighted bivariate normal distributions

mix = D.Categorical(torch.ones(5,)) comp = D.Independent(D.Normal( ... torch.randn(5,2), torch.rand(5,2)), 1) gmm = MixtureSameFamily(mix, comp)

Construct a batch of 3 Gaussian Mixture Models in 2D each

consisting of 5 random weighted bivariate normal distributions

mix = D.Categorical(torch.rand(3,5)) comp = D.Independent(D.Normal( ... torch.randn(3,5,2), torch.rand(3,5,2)), 1) gmm = MixtureSameFamily(mix, comp)

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {}

cdf(x)[source][source]

property component_distribution_: Distribution_

expand(batch_shape, _instance=None)[source][source]

has_rsample = False

log_prob(x)[source][source]

property mean_: Tensor_

property mixture_distribution_: Categorical_

sample(sample_shape=torch.Size([]))[source][source]

property support

Return type

_DependentProperty

property variance_: Tensor_

Multinomial

class torch.distributions.multinomial.Multinomial(total_count=1, probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a Multinomial distribution parameterized by total_count and either probs or logits (but not both). The innermost dimension ofprobs indexes over categories. All other dimensions index over batches.

Note that total_count need not be specified if only log_prob() is called (see example below)

Note

The probs argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension. probswill return this normalized value. The logits argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension. logitswill return this normalized value.

Example:

m = Multinomial(100, torch.tensor([ 1., 1., 1., 1.])) x = m.sample() # equal probability of 0, 1, 2, 3 tensor([ 21., 24., 30., 25.])

Multinomial(probs=torch.tensor([1., 1., 1., 1.])).log_prob(x) tensor([-4.1338])

Parameters

arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

property support

Return type

_DependentProperty

total_count_: int_

property variance_: Tensor_

MultivariateNormal

class torch.distributions.multivariate_normal.MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source][source]

Bases: Distribution

Creates a multivariate normal (also called Gaussian) distribution parameterized by a mean vector and a covariance matrix.

The multivariate normal distribution can be parameterized either in terms of a positive definite covariance matrix Σ\mathbf{\Sigma}or a positive definite precision matrix Σ−1\mathbf{\Sigma}^{-1}or a lower-triangular matrix L\mathbf{L} with positive-valued diagonal entries, such thatΣ=LL⊤\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top. This triangular matrix can be obtained via e.g. Cholesky decomposition of the covariance.

Example

m = MultivariateNormal(torch.zeros(2), torch.eye(2)) m.sample() # normally distributed with mean=[0,0] and covariance_matrix=I tensor([-0.2102, -0.5429])

Parameters

Note

Only one of covariance_matrix or precision_matrix orscale_tril can be specified.

Using scale_tril will be more efficient: all computations internally are based on scale_tril. If covariance_matrix orprecision_matrix is passed instead, it is only used to compute the corresponding lower triangular matrices using a Cholesky decomposition.

arg_constraints = {'covariance_matrix': PositiveDefinite(), 'loc': IndependentConstraint(Real(), 1), 'precision_matrix': PositiveDefinite(), 'scale_tril': LowerCholesky()}

property covariance_matrix_: Tensor_

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

property precision_matrix_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

property scale_tril_: Tensor_

support = IndependentConstraint(Real(), 1)

property variance_: Tensor_

NegativeBinomial

class torch.distributions.negative_binomial.NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a Negative Binomial distribution, i.e. distribution of the number of successful independent and identical Bernoulli trials before total_count failures are achieved. The probability of success of each Bernoulli trial is probs.

Parameters

arg_constraints = {'logits': Real(), 'probs': HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count': GreaterThanEq(lower_bound=0)}

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

support = IntegerGreaterThan(lower_bound=0)

property variance_: Tensor_

Normal

class torch.distributions.normal.Normal(loc, scale, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a normal (also called Gaussian) distribution parameterized byloc and scale.

Example:

m = Normal(torch.tensor([0.0]), torch.tensor([1.0])) m.sample() # normally distributed with loc=0 and scale=1 tensor([ 0.1046])

Parameters

arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(value)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

sample(sample_shape=torch.Size([]))[source][source]

property stddev_: Tensor_

support = Real()

property variance_: Tensor_

OneHotCategorical

class torch.distributions.one_hot_categorical.OneHotCategorical(probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a one-hot categorical distribution parameterized by probs orlogits.

Samples are one-hot coded vectors of size probs.size(-1).

Note

The probs argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension. probswill return this normalized value. The logits argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension. logitswill return this normalized value.

See also: torch.distributions.Categorical() for specifications ofprobs and logits.

Example:

m = OneHotCategorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) m.sample() # equal probability of 0, 1, 2, 3 tensor([ 0., 0., 0., 1.])

Parameters

arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}

entropy()[source][source]

enumerate_support(expand=True)[source][source]

expand(batch_shape, _instance=None)[source][source]

has_enumerate_support = True

log_prob(value)[source][source]

property logits_: Tensor_

property mean_: Tensor_

property mode_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

support = OneHot()

property variance_: Tensor_

Pareto

class torch.distributions.pareto.Pareto(scale, alpha, validate_args=None)[source][source]

Bases: TransformedDistribution

Samples from a Pareto Type 1 distribution.

Example:

m = Pareto(torch.tensor([1.0]), torch.tensor([1.0])) m.sample() # sample from a Pareto distribution with scale=1 and alpha=1 tensor([ 1.5623])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'alpha': GreaterThan(lower_bound=0.0), 'scale': GreaterThan(lower_bound=0.0)}

entropy()[source][source]

Return type

Tensor

expand(batch_shape, _instance=None)[source][source]

Return type

Pareto

property mean_: Tensor_

property mode_: Tensor_

property support_: Constraint_

Return type

_DependentProperty

property variance_: Tensor_

Poisson

class torch.distributions.poisson.Poisson(rate, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a Poisson distribution parameterized by rate, the rate parameter.

Samples are nonnegative integers, with a pmf given by

rateke−ratek!\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!}

Example:

m = Poisson(torch.tensor([4])) m.sample() tensor([ 3.])

Parameters

rate (Number , Tensor) – the rate parameter

arg_constraints = {'rate': GreaterThanEq(lower_bound=0.0)}

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

support = IntegerGreaterThan(lower_bound=0)

property variance_: Tensor_

RelaxedBernoulli

class torch.distributions.relaxed_bernoulli.RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)[source][source]

Bases: TransformedDistribution

Creates a RelaxedBernoulli distribution, parametrized bytemperature, and either probs or logits(but not both). This is a relaxed version of the Bernoulli distribution, so the values are in (0, 1), and has reparametrizable samples.

Example:

m = RelaxedBernoulli(torch.tensor([2.2]), ... torch.tensor([0.1, 0.2, 0.3, 0.99])) m.sample() tensor([ 0.2951, 0.3442, 0.8918, 0.9021])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

property logits_: Tensor_

property probs_: Tensor_

support = Interval(lower_bound=0.0, upper_bound=1.0)

property temperature_: Tensor_

LogitRelaxedBernoulli

class torch.distributions.relaxed_bernoulli.LogitRelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)[source][source]

Bases: Distribution

Creates a LogitRelaxedBernoulli distribution parameterized by probsor logits (but not both), which is the logit of a RelaxedBernoulli distribution.

Samples are logits of values in (0, 1). See [1] for more details.

Parameters

[1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables (Maddison et al., 2017)

[2] Categorical Reparametrization with Gumbel-Softmax (Jang et al., 2017)

arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}

expand(batch_shape, _instance=None)[source][source]

log_prob(value)[source][source]

property logits_: Tensor_

property param_shape_: Size_

property probs_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

support = Real()

RelaxedOneHotCategorical

class torch.distributions.relaxed_categorical.RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)[source][source]

Bases: TransformedDistribution

Creates a RelaxedOneHotCategorical distribution parametrized bytemperature, and either probs or logits. This is a relaxed version of the OneHotCategorical distribution, so its samples are on simplex, and are reparametrizable.

Example:

m = RelaxedOneHotCategorical(torch.tensor([2.2]), ... torch.tensor([0.1, 0.2, 0.3, 0.4])) m.sample() tensor([ 0.1294, 0.2324, 0.3859, 0.2523])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

property logits_: Tensor_

property probs_: Tensor_

support = Simplex()

property temperature_: Tensor_

StudentT

class torch.distributions.studentT.StudentT(df, loc=0.0, scale=1.0, validate_args=None)[source][source]

Bases: Distribution

Creates a Student’s t-distribution parameterized by degree of freedom df, mean loc and scale scale.

Example:

m = StudentT(torch.tensor([2.0])) m.sample() # Student's t-distributed with degrees of freedom=2 tensor([ 0.1046])

Parameters

arg_constraints = {'df': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

support = Real()

property variance_: Tensor_

TransformedDistribution

class torch.distributions.transformed_distribution.TransformedDistribution(base_distribution, transforms, validate_args=None)[source][source]

Bases: Distribution

Extension of the Distribution class, which applies a sequence of Transforms to a base distribution. Let f be the composition of transforms applied:

X ~ BaseDistribution Y = f(X) ~ TransformedDistribution(BaseDistribution, f) log p(Y) = log p(X) + log |det (dX/dY)|

Note that the .event_shape of a TransformedDistribution is the maximum shape of its base distribution and its transforms, since transforms can introduce correlations among events.

An example for the usage of TransformedDistribution would be:

Building a Logistic Distribution

X ~ Uniform(0, 1)

f = a + b * logit(X)

Y ~ f(X) ~ Logistic(a, b)

base_distribution = Uniform(0, 1) transforms = [SigmoidTransform().inv, AffineTransform(loc=a, scale=b)] logistic = TransformedDistribution(base_distribution, transforms)

For more examples, please look at the implementations ofGumbel,HalfCauchy,HalfNormal,LogNormal,Pareto,Weibull,RelaxedBernoulli andRelaxedOneHotCategorical

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {}

cdf(value)[source][source]

Computes the cumulative distribution function by inverting the transform(s) and computing the score of the base distribution.

expand(batch_shape, _instance=None)[source][source]

property has_rsample_: bool_

icdf(value)[source][source]

Computes the inverse cumulative distribution function using transform(s) and computing the score of the base distribution.

log_prob(value)[source][source]

Scores the sample by inverting the transform(s) and computing the score using the score of the base distribution and the log abs det jacobian.

rsample(sample_shape=torch.Size([]))[source][source]

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched. Samples first from base distribution and appliestransform() for every transform in the list.

Return type

Tensor

sample(sample_shape=torch.Size([]))[source][source]

Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched. Samples first from base distribution and applies transform() for every transform in the list.

property support

Return type

_DependentProperty

Uniform

class torch.distributions.uniform.Uniform(low, high, validate_args=None)[source][source]

Bases: Distribution

Generates uniformly distributed random samples from the half-open interval[low, high).

Example:

m = Uniform(torch.tensor([0.0]), torch.tensor([5.0])) m.sample() # uniformly distributed in the range [0.0, 5.0) tensor([ 2.3418])

Parameters

arg_constraints = {'high': Dependent(), 'low': Dependent()}

cdf(value)[source][source]

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

icdf(value)[source][source]

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

rsample(sample_shape=torch.Size([]))[source][source]

Return type

Tensor

property stddev_: Tensor_

property support

Return type

_DependentProperty

property variance_: Tensor_

VonMises

class torch.distributions.von_mises.VonMises(loc, concentration, validate_args=None)[source][source]

Bases: Distribution

A circular von Mises distribution.

This implementation uses polar coordinates. The loc and value args can be any real number (to facilitate unconstrained optimization), but are interpreted as angles modulo 2 pi.

Example::

m = VonMises(torch.tensor([1.0]), torch.tensor([1.0])) m.sample() # von Mises distributed with loc=1 and concentration=1 tensor([1.9777])

Parameters

arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'loc': Real()}

expand(batch_shape, _instance=None)[source][source]

has_rsample = False

log_prob(value)[source][source]

property mean_: Tensor_

The provided mean is the circular one.

property mode_: Tensor_

sample(sample_shape=torch.Size([]))[source][source]

The sampling algorithm for the von Mises distribution is based on the following paper: D.J. Best and N.I. Fisher, “Efficient simulation of the von Mises distribution.” Applied Statistics (1979): 152-157.

Sampling is always done in double precision internally to avoid a hang in _rejection_sample() for small values of the concentration, which starts to happen for single precision around 1e-4 (see issue #88443).

support = Real()

property variance_: Tensor_

The provided variance is the circular one.

Weibull

class torch.distributions.weibull.Weibull(scale, concentration, validate_args=None)[source][source]

Bases: TransformedDistribution

Samples from a two-parameter Weibull distribution.

Example

m = Weibull(torch.tensor([1.0]), torch.tensor([1.0])) m.sample() # sample from a Weibull distribution with scale=1, concentration=1 tensor([ 0.4784])

Parameters

arg_constraints_: dict[str, torch.distributions.constraints.Constraint]_ = {'concentration': GreaterThan(lower_bound=0.0), 'scale': GreaterThan(lower_bound=0.0)}

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

property mean_: Tensor_

property mode_: Tensor_

support = GreaterThan(lower_bound=0.0)

property variance_: Tensor_

Wishart

class torch.distributions.wishart.Wishart(df, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source][source]

Bases: ExponentialFamily

Creates a Wishart distribution parameterized by a symmetric positive definite matrix Σ\Sigma, or its Cholesky decomposition Σ=LL⊤\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top

Example

m = Wishart(torch.Tensor([2]), covariance_matrix=torch.eye(2)) m.sample() # Wishart distributed with mean=df * I and

variance(x_ij)=df for i != j and variance(x_ij)=2 * df for i == j

Parameters

Note

Only one of covariance_matrix or precision_matrix orscale_tril can be specified. Using scale_tril will be more efficient: all computations internally are based on scale_tril. If covariance_matrix orprecision_matrix is passed instead, it is only used to compute the corresponding lower triangular matrices using a Cholesky decomposition. ‘torch.distributions.LKJCholesky’ is a restricted Wishart distribution.[1]

References

[1] Wang, Z., Wu, Y. and Chu, H., 2018. On equivalence of the LKJ distribution and the restricted Wishart distribution. [2] Sawyer, S., 2007. Wishart Distributions and Inverse-Wishart Sampling. [3] Anderson, T. W., 2003. An Introduction to Multivariate Statistical Analysis (3rd ed.). [4] Odell, P. L. & Feiveson, A. H., 1966. A Numerical Procedure to Generate a SampleCovariance Matrix. JASA, 61(313):199-203. [5] Ku, Y.-C. & Bloomfield, P., 2010. Generating Random Wishart Matrices with Fractional Degrees of Freedom in OX.

arg_constraints = {'covariance_matrix': PositiveDefinite(), 'df': GreaterThan(lower_bound=0), 'precision_matrix': PositiveDefinite(), 'scale_tril': LowerCholesky()}

property covariance_matrix_: Tensor_

entropy()[source][source]

expand(batch_shape, _instance=None)[source][source]

has_rsample = True

log_prob(value)[source][source]

property mean_: Tensor_

property mode_: Tensor_

property precision_matrix_: Tensor_

rsample(sample_shape=torch.Size([]), max_try_correction=None)[source][source]

Warning

In some cases, sampling algorithm based on Bartlett decomposition may return singular matrix samples. Several tries to correct singular samples are performed by default, but it may end up returning singular matrix samples. Singular samples may return -inf values in .log_prob(). In those cases, the user should validate the samples and either fix the value of dfor adjust max_try_correction value for argument in .rsample accordingly.

Return type

Tensor

property scale_tril_: Tensor_

support = PositiveDefinite()

property variance_: Tensor_

KL Divergence

torch.distributions.kl.kl_divergence(p, q)[source][source]

Compute Kullback-Leibler divergence KL(p∥q)KL(p \| q) between two distributions.

KL(p∥q)=∫p(x)log⁡p(x)q(x) dxKL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx

Parameters

Returns

A batch of KL divergences of shape batch_shape.

Return type

Tensor

Raises

NotImplementedError – If the distribution types have not been registered viaregister_kl().

KL divergence is currently implemented for the following distribution pairs:

torch.distributions.kl.register_kl(type_p, type_q)[source][source]

Decorator to register a pairwise function with kl_divergence(). Usage:

@register_kl(Normal, Normal) def kl_normal_normal(p, q): # insert implementation here

Lookup returns the most specific (type,type) match ordered by subclass. If the match is ambiguous, a RuntimeWarning is raised. For example to resolve the ambiguous situation:

@register_kl(BaseP, DerivedQ) def kl_version1(p, q): ... @register_kl(DerivedP, BaseQ) def kl_version2(p, q): ...

you should register a third most-specific implementation, e.g.:

register_kl(DerivedP, DerivedQ)(kl_version1) # Break the tie.

Parameters

Transforms

class torch.distributions.transforms.AbsTransform(cache_size=0)[source][source]

Transform via the mapping y=∣x∣y = |x|.

class torch.distributions.transforms.AffineTransform(loc, scale, event_dim=0, cache_size=0)[source][source]

Transform via the pointwise affine mapping y=loc+scale×xy = \text{loc} + \text{scale} \times x.

Parameters

class torch.distributions.transforms.CatTransform(tseq, dim=0, lengths=None, cache_size=0)[source][source]

Transform functor that applies a sequence of transforms tseqcomponent-wise to each submatrix at dim, of length lengths[dim], in a way compatible with torch.cat().

Example:

x0 = torch.cat([torch.range(1, 10), torch.range(1, 10)], dim=0) x = torch.cat([x0, x0], dim=0) t0 = CatTransform([ExpTransform(), identity_transform], dim=0, lengths=[10, 10]) t = CatTransform([t0, t0], dim=0, lengths=[20, 20]) y = t(x)

class torch.distributions.transforms.ComposeTransform(parts, cache_size=0)[source][source]

Composes multiple transforms in a chain. The transforms being composed are responsible for caching.

Parameters

class torch.distributions.transforms.CorrCholeskyTransform(cache_size=0)[source][source]

Transforms an uncontrained real vector xx with length D∗(D−1)/2D*(D-1)/2 into the Cholesky factor of a D-dimension correlation matrix. This Cholesky factor is a lower triangular matrix with positive diagonals and unit Euclidean norm for each row. The transform is processed as follows:

  1. First we convert x into a lower triangular matrix in row order.
  2. For each row XiX_i of the lower triangular part, we apply a signed version of class StickBreakingTransform to transform XiX_i into a unit Euclidean length vector using the following steps: - Scales into the interval (−1,1)(-1, 1) domain: ri=tanh⁡(Xi)r_i = \tanh(X_i). - Transforms into an unsigned domain: zi=ri2z_i = r_i^2. - Applies si=StickBreakingTransform(zi)s_i = StickBreakingTransform(z_i). - Transforms back into signed domain: yi=sign(ri)∗siy_i = sign(r_i) * \sqrt{s_i}.

class torch.distributions.transforms.CumulativeDistributionTransform(distribution, cache_size=0)[source][source]

Transform via the cumulative distribution function of a probability distribution.

Parameters

distribution (Distribution) – Distribution whose cumulative distribution function to use for the transformation.

Example:

Construct a Gaussian copula from a multivariate normal.

base_dist = MultivariateNormal( loc=torch.zeros(2), scale_tril=LKJCholesky(2).sample(), ) transform = CumulativeDistributionTransform(Normal(0, 1)) copula = TransformedDistribution(base_dist, [transform])

class torch.distributions.transforms.ExpTransform(cache_size=0)[source][source]

Transform via the mapping y=exp⁡(x)y = \exp(x).

class torch.distributions.transforms.IndependentTransform(base_transform, reinterpreted_batch_ndims, cache_size=0)[source][source]

Wrapper around another transform to treatreinterpreted_batch_ndims-many extra of the right most dimensions as dependent. This has no effect on the forward or backward transforms, but does sum out reinterpreted_batch_ndims-many of the rightmost dimensions in log_abs_det_jacobian().

Parameters

class torch.distributions.transforms.LowerCholeskyTransform(cache_size=0)[source][source]

Transform from unconstrained matrices to lower-triangular matrices with nonnegative diagonal entries.

This is useful for parameterizing positive definite matrices in terms of their Cholesky factorization.

class torch.distributions.transforms.PositiveDefiniteTransform(cache_size=0)[source][source]

Transform from unconstrained matrices to positive-definite matrices.

class torch.distributions.transforms.PowerTransform(exponent, cache_size=0)[source][source]

Transform via the mapping y=xexponenty = x^{\text{exponent}}.

class torch.distributions.transforms.ReshapeTransform(in_shape, out_shape, cache_size=0)[source][source]

Unit Jacobian transform to reshape the rightmost part of a tensor.

Note that in_shape and out_shape must have the same number of elements, just as for torch.Tensor.reshape().

Parameters

class torch.distributions.transforms.SigmoidTransform(cache_size=0)[source][source]

Transform via the mapping y=11+exp⁡(−x)y = \frac{1}{1 + \exp(-x)} and x=logit(y)x = \text{logit}(y).

class torch.distributions.transforms.SoftplusTransform(cache_size=0)[source][source]

Transform via the mapping Softplus(x)=log⁡(1+exp⁡(x))\text{Softplus}(x) = \log(1 + \exp(x)). The implementation reverts to the linear function when x>20x > 20.

class torch.distributions.transforms.TanhTransform(cache_size=0)[source][source]

Transform via the mapping y=tanh⁡(x)y = \tanh(x).

It is equivalent to

ComposeTransform( [ AffineTransform(0.0, 2.0), SigmoidTransform(), AffineTransform(-1.0, 2.0), ] )

However this might not be numerically stable, thus it is recommended to use TanhTransforminstead.

Note that one should use cache_size=1 when it comes to NaN/Inf values.

class torch.distributions.transforms.SoftmaxTransform(cache_size=0)[source][source]

Transform from unconstrained space to the simplex via y=exp⁡(x)y = \exp(x) then normalizing.

This is not bijective and cannot be used for HMC. However this acts mostly coordinate-wise (except for the final normalization), and thus is appropriate for coordinate-wise optimization algorithms.

class torch.distributions.transforms.StackTransform(tseq, dim=0, cache_size=0)[source][source]

Transform functor that applies a sequence of transforms tseqcomponent-wise to each submatrix at dimin a way compatible with torch.stack().

Example:

x = torch.stack([torch.range(1, 10), torch.range(1, 10)], dim=1) t = StackTransform([ExpTransform(), identity_transform], dim=1) y = t(x)

class torch.distributions.transforms.StickBreakingTransform(cache_size=0)[source][source]

Transform from unconstrained space to the simplex of one additional dimension via a stick-breaking process.

This transform arises as an iterated sigmoid transform in a stick-breaking construction of the Dirichlet distribution: the first logit is transformed via sigmoid to the first probability and the probability of everything else, and then the process recurses.

This is bijective and appropriate for use in HMC; however it mixes coordinates together and is less appropriate for optimization.

class torch.distributions.transforms.Transform(cache_size=0)[source][source]

Abstract class for invertable transformations with computable log det jacobians. They are primarily used intorch.distributions.TransformedDistribution.

Caching is useful for transforms whose inverses are either expensive or numerically unstable. Note that care must be taken with memoized values since the autograd graph may be reversed. For example while the following works with or without caching:

y = t(x) t.log_abs_det_jacobian(x, y).backward() # x will receive gradients.

However the following will error when caching due to dependency reversal:

y = t(x) z = t.inv(y) grad(z.sum(), [y]) # error because z is x

Derived classes should implement one or both of _call() or_inverse(). Derived classes that set bijective=True should also implement log_abs_det_jacobian().

Parameters

cache_size (int) – Size of cache. If zero, no caching is done. If one, the latest single value is cached. Only 0 and 1 are supported.

Variables

property inv_: Transform_

Returns the inverse Transform of this transform. This should satisfy t.inv.inv is t.

property sign_: int_

Returns the sign of the determinant of the Jacobian, if applicable. In general this only makes sense for bijective transforms.

log_abs_det_jacobian(x, y)[source][source]

Computes the log det jacobian log |dy/dx| given input and output.

forward_shape(shape)[source][source]

Infers the shape of the forward computation, given the input shape. Defaults to preserving shape.

inverse_shape(shape)[source][source]

Infers the shapes of the inverse computation, given the output shape. Defaults to preserving shape.

Constraints

class torch.distributions.constraints.Constraint[source][source]

Abstract base class for constraints.

A constraint object represents a region over which a variable is valid, e.g. within which a variable can be optimized.

Variables

check(value)[source][source]

Returns a byte tensor of sample_shape + batch_shape indicating whether each event in value satisfies this constraint.

torch.distributions.constraints.cat[source]

alias of _Cat

torch.distributions.constraints.dependent_property[source]

alias of _DependentProperty

torch.distributions.constraints.greater_than[source]

alias of _GreaterThan

torch.distributions.constraints.greater_than_eq[source]

alias of _GreaterThanEq

torch.distributions.constraints.independent[source]

alias of _IndependentConstraint

torch.distributions.constraints.integer_interval[source]

alias of _IntegerInterval

torch.distributions.constraints.interval[source]

alias of _Interval

torch.distributions.constraints.half_open_interval[source]

alias of _HalfOpenInterval

torch.distributions.constraints.is_dependent(constraint)[source][source]

Checks if constraint is a _Dependent object.

Parameters

constraint – A Constraint object.

Returns

True if constraint can be refined to the type _Dependent, False otherwise.

Return type

bool

Examples

import torch from torch.distributions import Bernoulli from torch.distributions.constraints import is_dependent

dist = Bernoulli(probs=torch.tensor([0.6], requires_grad=True)) constraint1 = dist.arg_constraints["probs"] constraint2 = dist.arg_constraints["logits"]

for constraint in [constraint1, constraint2]: if is_dependent(constraint): continue

torch.distributions.constraints.less_than[source]

alias of _LessThan

torch.distributions.constraints.multinomial[source]

alias of _Multinomial

torch.distributions.constraints.stack[source]

alias of _Stack

Constraint Registry

PyTorch provides two global ConstraintRegistry objects that linkConstraint objects toTransform objects. These objects both input constraints and return transforms, but they have different guarantees on bijectivity.

  1. biject_to(constraint) looks up a bijectiveTransform from constraints.realto the given constraint. The returned transform is guaranteed to have.bijective = True and should implement .log_abs_det_jacobian().
  2. transform_to(constraint) looks up a not-necessarily bijectiveTransform from constraints.realto the given constraint. The returned transform is not guaranteed to implement .log_abs_det_jacobian().

The transform_to() registry is useful for performing unconstrained optimization on constrained parameters of probability distributions, which are indicated by each distribution’s .arg_constraints dict. These transforms often overparameterize a space in order to avoid rotation; they are thus more suitable for coordinate-wise optimization algorithms like Adam:

loc = torch.zeros(100, requires_grad=True) unconstrained = torch.zeros(100, requires_grad=True) scale = transform_to(Normal.arg_constraints["scale"])(unconstrained) loss = -Normal(loc, scale).log_prob(data).sum()

The biject_to() registry is useful for Hamiltonian Monte Carlo, where samples from a probability distribution with constrained .support are propagated in an unconstrained space, and algorithms are typically rotation invariant.:

dist = Exponential(rate) unconstrained = torch.zeros(100, requires_grad=True) sample = biject_to(dist.support)(unconstrained) potential_energy = -dist.log_prob(sample).sum()

Note

An example where transform_to and biject_to differ isconstraints.simplex: transform_to(constraints.simplex) returns aSoftmaxTransform that simply exponentiates and normalizes its inputs; this is a cheap and mostly coordinate-wise operation appropriate for algorithms like SVI. In contrast, biject_to(constraints.simplex) returns aStickBreakingTransform that bijects its input down to a one-fewer-dimensional space; this a more expensive less numerically stable transform but is needed for algorithms like HMC.

The biject_to and transform_to objects can be extended by user-defined constraints and transforms using their .register() method either as a function on singleton constraints:

transform_to.register(my_constraint, my_transform)

or as a decorator on parameterized constraints:

@transform_to.register(MyConstraintClass) def my_factory(constraint): assert isinstance(constraint, MyConstraintClass) return MyTransform(constraint.param1, constraint.param2)

You can create your own registry by creating a new ConstraintRegistryobject.

class torch.distributions.constraint_registry.ConstraintRegistry[source][source]

Registry to link constraints to transforms.

register(constraint, factory=None)[source][source]

Registers a Constraintsubclass in this registry. Usage:

@my_registry.register(MyConstraintClass) def construct_transform(constraint): assert isinstance(constraint, MyConstraint) return MyTransform(constraint.arg_constraints)

Parameters