Onnx2trt --fp16 error (original) (raw)

April 24, 2025, 11:46am 1

Description

某个多任务大模型在A100的tensorrt-10.7的环境下，由onnx转engine模型，fp32可以正常转换，转fp16时，会报Detected 17 inputs and 36 output network tensors.
Internal Error: MyelinCheckException: bb.cpp:44: CHECK(append_to->parent() == this) failed.
[04/23/2025-05:49:46][E] Error[1]: IBuilder::buildSerializedNetwork: Error Code 1: Myelin ([myelin_graph.h:attachExceptionMsgToGraph:840] MyelinCheckException: bb.cpp:44: CHECK(append_to->parent() == this) failed. )。该onnx模型有13929个节点，大小是560MB。当把模型拆分成两个部分后，一个183MB，一个370MB时，两个模型都能在A100的tensorrt-10.7的环境下转fp16的engine模型。

Environment

TensorRT Version: 10.7
GPU Type: A100
Nvidia Driver Version:
CUDA Version: 12.4
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.4.1+cu124
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

The error you encountered during the ONNX to TensorRT conversion process occurs specifically when attempting to convert a multi-task model to FP16, while FP32 conversion works without issues. The model in question has 13,929 nodes and is 560MB in size. Interestingly, it successfully converts to FP16 when split into two smaller models of 183MB and 370MB each.

Here are some recommended solutions to address this issue:

Memory Limitations: The error may be due to insufficient memory on the A100 GPU while handling the entire 560MB model for FP16 conversion. Splitting the model reduces the memory footprint, making conversion feasible. Ensure that your environment has adequate memory available for this process.
Model Optimization: The large model size might be leading to complications during the FP16 conversion. It’s advisable to optimize the model by:

Reducing unnecessary layers,
Pruning redundant nodes,
Using quantization to decrease the size while maintaining performance.

TensorRT Version: The issue might stem from bugs or limitations in the TensorRT version 10.7 being utilized. Check for any updates or consider upgrading to the latest version of TensorRT to enhance support for FP16 conversion.
Model Partitioning: As discovered, splitting the model into smaller segments allows for successful FP16 conversion. Continue this practice of dividing the model into sub-models and combining them post-conversion for effective results.
Node Analysis: It may also be beneficial to investigate specific nodes within the model that could be contributing to the error. Certain operations might not be fully compatible with FP16 conversion in TensorRT. Identifying these problematic nodes could lead to targeted solutions.

1、However, the TRT model for converting onnx to fp32 will not generate any errors。
3、but tensorrt version 8.6 also generate errors
5、The nodes before and after model splitting should be the same, why can they be converted normally after splitting？