fix: Automatically send truncated long ints to cuda at shape analysis time by gs-olive · Pull Request #1541 · pytorch/TensorRT (original) (raw)
Description
- Augment
aten::to
operator insertion at shape analysis time to insert the target device - Add functionality to
PartitioningInfo
struct to store device information and produce cuda device string - Make cuda device string function in
LowerInfo
andPartitioningInfo
const to avoid altering struct fields
Uses schema
Tensor.to(device : Device, dtype : int, non_blocking : bool=False, copy : bool=False, memory_format : Optional[int]) -> Tensor
Instead of
Tensor.to(dtype : int, non_blocking : bool=False, copy : bool=False, memory_format : Optional[int]) -> Tensor
This switch was made to ensure the device for truncated objects is GPU, regardless of their origin, to avoid adding another lowering pass for this case. Since an aten::to
operation is already being inserted, use the opportunity to use correct tensor device (GPU).
Type of change
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- [ x ] My code follows the style guidelines of this project (You can use the linters)
- [ x ] I have performed a self-review of my own code
- [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
- [ x ] I have made corresponding changes to the documentation
- [ x ] I have added tests to verify my fix or my feature
- Tested on a few models, sample scripts + existing fallback test suite
- [ x ] New and existing unit tests pass locally with my changes
- [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified