A vanilla PyTorch model runs at 8–12 FPS on Jetson. The same model exported to TensorRT FP16 runs at 50+ FPS. This tutorial shows you exactly how to convert any PyTorch model to a TensorRT engine — not just YOLO models, but any custom network you have trained.
What you will learn
- How TensorRT optimisation works (layer fusion, precision calibration)
- How to export YOLO models to TensorRT in one line
- How to export any custom PyTorch model via ONNX → TensorRT
- FP32 vs FP16 vs INT8 — when to use each precision mode
- How to validate accuracy after export
Step 1 — Export a YOLO model (easiest)
from ultralytics import YOLO
model = YOLO("yolov8s.pt") # or yolo11s.pt, or your custom best.pt
model.export(
format = "engine",
device = 0,
half = True, # FP16 — best speed/accuracy balance
imgsz = 640
)
# Output: yolov8s.engine — ready to use on Jetson
Step 2 — Export any custom PyTorch model via ONNX
import torch
# Step A: Export PyTorch → ONNX
model = MyCustomModel()
model.load_state_dict(torch.load("my_model.pth"))
model.eval()
dummy = torch.randn(1, 3, 640, 640)
torch.onnx.export(model, dummy, "my_model.onnx",
opset_version=11, input_names=["input"],
output_names=["output"])
# Step B: Convert ONNX → TensorRT engine on Jetson
trtexec --onnx=my_model.onnx --saveEngine=my_model.engine --fp16 --workspace=2048
Step 3 — Run inference with the TensorRT engine
import tensorrt as trt
import pycuda.driver as cuda
import numpy as np
# Helper script included on your kit
from trt_infer import TRTInferencer
engine = TRTInferencer("my_model.engine")
output = engine.infer(input_image)
print(f"Inference result shape: {output.shape}")
✅ Next: Tutorial 16 — DeepStream Multi-Camera | Back to Jetson Kit