0-dimensional tensors result in aliasing errors (original) (raw)

🐛 Bug

When zero-dimenional tensors are created, the XLA device_data comes out of a cache is supposed to be read_only. However, across mark_step() the read_only bit is dropped. Subsequently, buffer aliasing can corrupt the cached value.

To Reproduce

Small test case:

import torch
import torch_xla.core.xla_model as xm

def main():
    """
    Test that device data in DataCache are not aliased.
    """
    xla_device = xm.xla_device()

    t0 = torch.tensor(42, device=xla_device)
    # drops the read-only bit on t0's device_data
    xm.mark_step()

    # cached value of 42 is corrupted
    t0.add_(1)
    xm.mark_step()

    # t1 get the cached device_data, which is corrupted.
    t1 = torch.tensor(42, device=xla_device)
    xm.mark_step()

    t1.add_(1)
    # XLA crashes here because parameter is donated buffer...
    xm.mark_step()

    # If it doesn't crash, the value here would be 44.
    assert t1.item() == 43


if __name__ == '__main__':
    main()

Expected behavior

Environment

Reproducible on XLA backend [CPU/TPU/CUDA]: neuron trn1
torch_xla version: 2.5.1

0-dimensional tensors result in aliasing errors (original) (raw)

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context