fix: Issue in non-Tensor Input Resolution by gs-olive · Pull Request #1617 · pytorch/TensorRT (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Description

In certain TensorRT-executed blocks, multiple non-Tensor inputs originate from the same parent Tensor in another Torch block, so a single pass of resolveTRTNonTensorInputs does not resolve all non-Tensor inputs, causing failures at compile-time
Issue was determined to trace to the use of prim::Loop in the TensorRT block and is fixed in PR fix: fix the prim::Loop fallback issue #1691
Add function to SegmentedBlock to get the ID of a block, so re-inserted blocks retain the same ID as the previous block being replaced
Add regression test case to elicit behavior

Failure Case

Before Non-Tensor Input Resolution

DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress TensorRT block DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @2: Target: TensorRT

Graph: graph(%1 : int,
  %9 : int[],
  %12 : Tensor):

After 1 Round of Non-Tensor Input Resolution

GRAPH: [Torch-TensorRT - Debug Build] - Running shape analysis on block Segment Block @2: Target: TensorRT

Graph: graph(%1 : int[],
  %3 : Tensor):

After 2 Rounds of Non-Tensor Input Resolution

INFO: [Torch-TensorRT - Debug Build] - Segment Block @2: Target: TensorRT

Graph: graph(%1 : Tensor):

Fixes #1612
Related to: #1613

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ x ] I have added tests to verify my fix or my feature
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Expected this PR to make it into 23.03 release

This does not make sense in fact.
Firstly, even if 2 non-Tensor inputs come from a same node, that node will be added into the segmented block. There is no need to do it for multiple passes.
Secondly, I guess why it fails for a single pass is because that there is a Loop block in the TensorRT segment. This could introduce undefined behaviors. The real issue I think here is the Loop block, it shouldn't be in TensorRT segment at all. In our last code refactoring for partitioning, we set all nodes aggressively to be run in TensorRT, this introduces some issue like LOOP nodes default to be run in TensorRT, this could result in some issues like this.

In certain TensorRT-executed blocks, multiple non-Tensor inputs originate from the same parent Tensor in another Torch block, so a single pass of resolveTRTNonTensorInputs does not resolve all non-Tensor inputs, causing failures at compile-time
Add do-while loop to ensure all input dependencies are resolved by running dependency-resolution algorithm multiple times, if necessary
Add function to SegmentedBlock to get the ID of a block, so re-inserted blocks retain the same ID as the previous block being replaced

Added test case in partitioning to elicit bug arising from multiple nonTensor inputs to TensorRT block

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

gs-olive changed the base branch from main to fix_loop_fallback

February 22, 2023 04:06

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

[ github-actions[bot] ](/apps/github-actions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment on lines +289 to +290

cur_partitioned_block[i] =
SegmentedBlock(cur_partitioned_block[i].get_id(), SegmentedBlock::kTensorRT, dependency_nodes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to ensure the block IDs of segmented blocks stay in order despite resolution of non-Tensor inputs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please squash your commits then merge.

fix: Issue in non-Tensor Input Resolution by gs-olive · Pull Request #1617 · pytorch/TensorRT (original) (raw)

Conversation

Description

Failure Case

Before Non-Tensor Input Resolution

After 1 Round of Non-Tensor Input Resolution

After 2 Rounds of Non-Tensor Input Resolution

Type of change

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels