torch.func.vmap — PyTorch 2.7 documentation (original) (raw)

torch.func.vmap(func, in_dims=0, out_dims=0, randomness='error', *, chunk_size=None)[source]

vmap is the vectorizing map; vmap(func) returns a new function that maps func over some dimension of the inputs. Semantically, vmap pushes the map into PyTorch operations called by func, effectively vectorizing those operations.

vmap is useful for handling batch dimensions: one can write a functionfunc that runs on examples and then lift it to a function that can take batches of examples with vmap(func). vmap can also be used to compute batched gradients when composed with autograd.

Note

torch.vmap() is aliased to torch.func.vmap() for convenience. Use whichever one you’d like.

Parameters

Returns

Returns a new “batched” function. It takes the same inputs asfunc, except each input has an extra dimension at the index specified by in_dims. It takes returns the same outputs asfunc, except each output has an extra dimension at the index specified by out_dims.

Return type

Callable

One example of using vmap() is to compute batched dot products. PyTorch doesn’t provide a batched torch.dot API; instead of unsuccessfully rummaging through docs, use vmap() to construct a new function.

torch.dot # [D], [D] -> [] batched_dot = torch.func.vmap(torch.dot) # [N, D], [N, D] -> [N] x, y = torch.randn(2, 5), torch.randn(2, 5) batched_dot(x, y)

vmap() can be helpful in hiding batch dimensions, leading to a simpler model authoring experience.

batch_size, feature_size = 3, 5 weights = torch.randn(feature_size, requires_grad=True)

def model(feature_vec): # Very simple linear model with activation return feature_vec.dot(weights).relu()

examples = torch.randn(batch_size, feature_size) result = torch.vmap(model)(examples)

vmap() can also help vectorize computations that were previously difficult or impossible to batch. One example is higher-order gradient computation. The PyTorch autograd engine computes vjps (vector-Jacobian products). Computing a full Jacobian matrix for some function f: R^N -> R^N usually requires N calls to autograd.grad, one per Jacobian row. Using vmap(), we can vectorize the whole computation, computing the Jacobian in a single call to autograd.grad.

Setup

N = 5 f = lambda x: x ** 2 x = torch.randn(N, requires_grad=True) y = f(x) I_N = torch.eye(N)

Sequential approach

jacobian_rows = [torch.autograd.grad(y, x, v, retain_graph=True)[0] for v in I_N.unbind()] jacobian = torch.stack(jacobian_rows)

vectorized gradient computation

def get_vjp(v): return torch.autograd.grad(y, x, v) jacobian = torch.vmap(get_vjp)(I_N)

vmap() can also be nested, producing an output with multiple batched dimensions

torch.dot # [D], [D] -> [] batched_dot = torch.vmap(torch.vmap(torch.dot)) # [N1, N0, D], [N1, N0, D] -> [N1, N0] x, y = torch.randn(2, 3, 5), torch.randn(2, 3, 5) batched_dot(x, y) # tensor of size [2, 3]

If the inputs are not batched along the first dimension, in_dims specifies the dimension that each inputs are batched along as

torch.dot # [N], [N] -> [] batched_dot = torch.vmap(torch.dot, in_dims=1) # [N, D], [N, D] -> [D] x, y = torch.randn(2, 5), torch.randn(2, 5) batched_dot(x, y) # output is [5] instead of [2] if batched along the 0th dimension

If there are multiple inputs each of which is batched along different dimensions,in_dims must be a tuple with the batch dimension for each input as

torch.dot # [D], [D] -> [] batched_dot = torch.vmap(torch.dot, in_dims=(0, None)) # [N, D], [D] -> [N] x, y = torch.randn(2, 5), torch.randn(5) batched_dot(x, y) # second arg doesn't have a batch dim because in_dim[1] was None

If the input is a Python struct, in_dims must be a tuple containing a struct matching the shape of the input:

f = lambda dict: torch.dot(dict['x'], dict['y']) x, y = torch.randn(2, 5), torch.randn(5) input = {'x': x, 'y': y} batched_dot = torch.vmap(f, in_dims=({'x': 0, 'y': None},)) batched_dot(input)

By default, the output is batched along the first dimension. However, it can be batched along any dimension by using out_dims

f = lambda x: x ** 2 x = torch.randn(2, 5) batched_pow = torch.vmap(f, out_dims=1) batched_pow(x) # [5, 2]

For any function that uses kwargs, the returned function will not batch the kwargs but will accept kwargs

x = torch.randn([2, 5]) def fn(x, scale=4.): return x * scale

batched_pow = torch.vmap(fn) assert torch.allclose(batched_pow(x), x * 4) batched_pow(x, scale=x) # scale is not batched, output has shape [2, 2, 5]

Note

vmap does not provide general autobatching or handle variable-length sequences out of the box.