torch_frame.data.MultiNestedTensor — pytorch-frame documentation (original) (raw)

class MultiNestedTensor(num_rows: int, num_cols: int, values: Tensor, offset: Tensor)[source]

Bases: _MultiTensor

A read-only PyTorch tensor-based data structure that stores[num_rows, num_cols, *], where the size of last dimension can be different for different row/column. Internally, we store the object in an efficient flattened format: (values, offset), where the PyTorch Tensor at (i, j) is accessed byvalues[offset[i*num_cols+j]:offset[i*num_cols+j+1]]. It supports various advanced indexing, including slicing and list indexing along both row and column.

Parameters:

num_rows (int) – Number of rows.
num_cols (int) – Number of columns.
values (torch.Tensor) – The values torch.Tensor of size[numel,].
offset (torch.Tensor) – The offset torch.Tensor of size[num_rows*num_cols+1,].

Example

import torch from torch_frame.data import MultiNestedTensor tensor_mat = [ ... [torch.tensor([1, 2]), torch.tensor([3])], ... [torch.tensor([4]), torch.tensor([5, 6, 7])], ... [torch.tensor([8, 9]), torch.tensor([10])], ... ] mnt = MultiNestedTensor.from_tensor_mat(tensor_mat) mnt MultiNestedTensor(num_rows=3, num_cols=2, device='cpu') mnt.values tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) mnt.offset tensor([ 0, 2, 3, 4, 7, 9, 10]) mnt[0, 0] torch.tensor([1, 2]) mnt[1, 1] tensor([5, 6, 7]) mnt[0] # Row integer indexing MultiNestedTensor(num_rows=1, num_cols=2, device='cpu') mnt[:, 0] # Column integer indexing MultiNestedTensor(num_rows=3, num_cols=1, device='cpu') mnt[:2] # Row integer slicing MultiNestedTensor(num_rows=2, num_cols=2, device='cpu') mnt[[2, 1, 2, 0]] # Row list indexing MultiNestedTensor(num_rows=4, num_cols=2, device='cpu') mnt.to_dense(fill_value=-1) # Map to a dense matrix with padding tensor([[[ 1, 2, -1], [ 3, -1, -1]], [[ 4, -1, -1], [ 5, 6, 7]], [[ 8, 9, -1], [10, -1, -1]]])

classmethod from_tensor_mat(tensor_mat: list[list[torch.Tensor]]) → MultiNestedTensor [source]

Construct MultiNestedTensor object fromtensor_mat.

Parameters:

tensor_mat (List [ List [ Tensor ] ]) – A matrix oftorch.Tensor objects. tensor_mat[i][j]contains 1-dim torch.Tensor of i-th row and j-th column, varying in size.

Returns:

A MultiNestedTensor instance.

Return type:

MultiNestedTensor

fillna_col(col_index: int, fill_value: int | float | Tensor) → None [source]

Fill the index-th column in MultiTensor with fill_value in-place.

Parameters:

col_index (int) – A column index of the tensor to select.
fill_value (Union [_int,_ float, Tensor ]) – Scalar values to replace NaNs.

to_dense(fill_value: int | float) → Tensor[source]

Map MultiNestedTensor into dense Tensor representation with padding.

Parameters:

fill_value (Union [_int,_ float]) – Fill values.

Returns:

Padded PyTorch Tensor object with shape

(num_rows, num_cols, max_length)

Return type:

Tensor

static cat(xs: Sequence[MultiNestedTensor], dim: int = 0) → MultiNestedTensor [source]

Concatenates a sequence of MultiNestedTensor along the specified dimension.

Parameters:

xs (Sequence _[_MultiNestedTensor]) – A sequence ofMultiNestedTensor to be concatenated.
dim (int) – The dimension to concatenate along.

Returns:

Concatenated multi nested tensor.

Return type:

MultiNestedTensor