TensorRT engine fails to deserialize, despite being built with the same version on the same machine (original) (raw)

Description

I converted my YOLO11 model from the PyTorch to Engine format using the Ultralytics library. When I tried to use the model in my DeepStream pipeline, the deserialization failed with a “magic tag” error. Here is my conversion script:

from ultralytics import YOLO
import os

model = YOLO("best.pt")

variants = {
    "best_b4_gpu0_fp32.engine": {
        "format": "engine",
        "dynamic": True,
        # "simplify": True,
        "nms": True,
        "batch": 4,
        "device": 0
    },
    "best_b4_gpu0_fp16.engine": {
        "format": "engine",
        "dynamic": True,
        # "simplify": True,
        "nms": True,
        "half": True,
        "batch": 4,
        "device": 0
    },
    # Need to copy the dataset first for INT8
    "best_b4_gpu0_int8.engine": {
        "format": "engine",
        "dynamic": True,
        # "simplify": True,
        # "nms": True,
        "int8": True,
        "batch": 4,
        "data": "data.yaml",
        "device": 0
    },
    "best_b4_dla0_fp16.engine": {
        "format": "engine",
        "dynamic": False,
        "simplify": True,
        "nms": False,
        "half": True,
        "batch": 4,
        "device": "dla:0"
    },
    # Need to copy the dataset first for INT8 DLA
    "best_b4_dla0_int8.engine": {
        "format": "engine",
        # "dynamic": True,
        # "simplify": True,
        # "nms": True,
        "batch": 4,
        "device": "dla:0",
        "int8": True,
        "data": "data.yaml"
    }
}

for name, config in variants.items():
    if not os.path.isfile(name):
        print(f"Creating {name}...")
        model.export(**config)
        os.rename("best.engine", name)

And here is the error that my DeepStream pipeline produces when I use one of the models it converted:

Apr 30 10:08:14 ubuntu start_pipeline.sh[11579]: Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
Apr 30 10:08:14 ubuntu start_pipeline.sh[11579]: ERROR: [TRT]: IRuntime::deserializeCudaEngine: Error Code 1: Serialization (Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed.Trying to load an engine created with incompatible serialization version. Check that the engine was not created using safety runtime, same OS was used and version compatibility parameters were set accordingly.)
Apr 30 10:08:14 ubuntu start_pipeline.sh[11579]: ERROR: Deserialize engine failed from file: /var/data/models/best_b4_gpu0_fp16.engine
Apr 30 10:08:14 ubuntu start_pipeline.sh[11579]: 0:00:00.209163011 11579 0xffff7448be30 WARN                 nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2080> [UID = 1]: deserialize engine from file :/var/data/models/best_b4_gpu0_fp16.engine failed
Apr 30 10:08:14 ubuntu start_pipeline.sh[11579]: 0:00:00.209205060 11579 0xffff7448be30 WARN                 nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2185> [UID = 1]: deserialize backend context from engine from file :/var/data/models/best_b4_gpu0_fp16.engine failed, try rebuild
Apr 30 10:08:14 ubuntu start_pipeline.sh[11579]: 0:00:00.209220324 11579 0xffff7448be30 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2106> [UID = 1]: Trying to create engine from model files
Apr 30 10:08:16 ubuntu start_pipeline.sh[11579]: WARNING: [TRT]: DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU

Environment

TensorRT Version: 10.3.0, installed in Python virtual environment from Nvidia developer pre-built wheels; I believe it is the exact same version installed in the DeepStream environment for the pipeline, but I’m not sure how to check the version the pipeline is using
GPU Type: Jetson Orin NX 16GB
Nvidia Driver Version: 540.4.0
CUDA Version: 12.3
CUDNN Version: the version that comes with JetPack 6.2
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): Python 3.10.12 in virtual environment
TensorFlow Version (if applicable): n/a
PyTorch Version (if applicable): 2.5.0a0+872d972e41.nv24.08
Baremetal or Container (if container which image + tag): Bare metal

Relevant Files

I can’t attach the exact code or the model I use to run it, but I can provide it to an Nvidia engineer privately.

Steps To Reproduce

Run the model conversion script using the dataset, model file, and the specified environment.
Copy the model files to the same folder as the pipeline.
Run the pipeline.
Observe that the pipeline fails to deserialize the model with a magic tag error, and attempts to fall back to converting the ONNX file locally instead.

I’m sure that I’m doing something wrong, but I don’t know what the issue is.