fix: Automatically send truncated long ints to cuda at shape analysis time by gs-olive · Pull Request #1541 · pytorch/TensorRT (original) (raw)

Description

Augment aten::to operator insertion at shape analysis time to insert the target device
Add functionality to PartitioningInfo struct to store device information and produce cuda device string
Make cuda device string function in LowerInfo and PartitioningInfo const to avoid altering struct fields

Uses schema

Tensor.to(device : Device, dtype : int, non_blocking : bool=False, copy : bool=False, memory_format : Optional[int]) -> Tensor

Instead of

Tensor.to(dtype : int, non_blocking : bool=False, copy : bool=False, memory_format : Optional[int]) -> Tensor

This switch was made to ensure the device for truncated objects is GPU, regardless of their origin, to avoid adding another lowering pass for this case. Since an aten::to operation is already being inserted, use the opportunity to use correct tensor device (GPU).

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ x ] I have added tests to verify my fix or my feature
- Tested on a few models, sample scripts + existing fallback test suite
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified