Tutorial 15 — TensorRT Export: Optimise Any PyTorch Model for Jetson

By matin / March 21, 2026

A vanilla PyTorch model runs at 8–12 FPS on Jetson. The same model exported to TensorRT FP16 runs at 50+ FPS. This tutorial shows you exactly how to convert any PyTorch model to a TensorRT engine — not just YOLO models, but any custom network you have trained.

What you will learn

How TensorRT optimisation works (layer fusion, precision calibration)
How to export YOLO models to TensorRT in one line
How to export any custom PyTorch model via ONNX → TensorRT
FP32 vs FP16 vs INT8 — when to use each precision mode
How to validate accuracy after export

Step 1 — Export a YOLO model (easiest)

from ultralytics import YOLO

model = YOLO("yolov8s.pt")   # or yolo11s.pt, or your custom best.pt
model.export(
    format = "engine",
    device = 0,
    half   = True,    # FP16 — best speed/accuracy balance
    imgsz  = 640
)
# Output: yolov8s.engine — ready to use on Jetson

Step 2 — Export any custom PyTorch model via ONNX

import torch

# Step A: Export PyTorch → ONNX
model  = MyCustomModel()
model.load_state_dict(torch.load("my_model.pth"))
model.eval()

dummy  = torch.randn(1, 3, 640, 640)
torch.onnx.export(model, dummy, "my_model.onnx",
                  opset_version=11, input_names=["input"],
                  output_names=["output"])

# Step B: Convert ONNX → TensorRT engine on Jetson
trtexec     --onnx=my_model.onnx     --saveEngine=my_model.engine     --fp16     --workspace=2048

Step 3 — Run inference with the TensorRT engine

import tensorrt as trt
import pycuda.driver as cuda
import numpy as np

# Helper script included on your kit
from trt_infer import TRTInferencer

engine = TRTInferencer("my_model.engine")
output = engine.infer(input_image)
print(f"Inference result shape: {output.shape}")

✅ Next: Tutorial 16 — DeepStream Multi-Camera | Back to Jetson Kit

Leave a Comment Cancel Reply