0-dimensional tensors result in aliasing errors (original) (raw)
🐛 Bug
When zero-dimenional tensors are created, the XLA device_data comes out of a cache is supposed to be read_only. However, across mark_step()
the read_only bit is dropped. Subsequently, buffer aliasing can corrupt the cached value.
To Reproduce
Small test case:
import torch
import torch_xla.core.xla_model as xm
def main():
"""
Test that device data in DataCache are not aliased.
"""
xla_device = xm.xla_device()
t0 = torch.tensor(42, device=xla_device)
# drops the read-only bit on t0's device_data
xm.mark_step()
# cached value of 42 is corrupted
t0.add_(1)
xm.mark_step()
# t1 get the cached device_data, which is corrupted.
t1 = torch.tensor(42, device=xla_device)
xm.mark_step()
t1.add_(1)
# XLA crashes here because parameter is donated buffer...
xm.mark_step()
# If it doesn't crash, the value here would be 44.
assert t1.item() == 43
if __name__ == '__main__':
main()
Expected behavior
Environment
- Reproducible on XLA backend [CPU/TPU/CUDA]: neuron trn1
- torch_xla version: 2.5.1