YoloDetectionNetwork node NNdata decoding

Ddmn225 · Dec 10, 2024

I am working with a modified yolo model that has additional output channels. In order to compensate for this, I am no longer using a YoloDetectionNetwork node, but a NeuralNetwork node and then I am creating custom ImgDetections from the NNdata output by the NeuralNetwork node. However, I noticed that even after performing non max suppression on the output, only the bounding box numbers do not fall into the expected range. This is consistent with the modified and unmodified yolo models. I am confused on how the YoloDetectionNetwork node is able to extract meaningful bounding box values out of NNdata, and I hope that understanding this will help me do the same in my custom decoding function.

erik · Dec 10, 2024

Hi @dmn225 , I don't think we have any special 'sauce' in our FW, just standard yolo output decoding. Perhaps you could share what you are doing and the results you are getting?
Here's also an example that does on-host decoding:
luxonis/depthai-experimentsblob/master/gen2-yolo/host-decoding/main.py
Thanks, Erik

Ddmn225 · Dec 10, 2024

I am using this custom decode function on the NNdata coming from the NeuralNetwork node. The other values seem to come out correctly but the bounding box values range from small negative numbers to around 16. When this gets normalized later in the pipeline this causes the bounding box to be outside the bounds of the image, as it is expecting the values to be between 0 and 1. Using the YoloDetectionNetwork node seemed to interpret the bounding boxes correctly so I assumed the issue was with my decoding function rather than the model itself. I am using the non_max_suppression method from ultralytics. Below the function are some of the box coordinates that were assigned to the detection's xmin, ymin, xmax, ymax fields.

def decode(nn_data: dai.NNData):

layer = nn_data.getFirstLayerFp16()

res_np = np.array(layer).reshape((1, 14, -1))

res = torch.tensor(res_np)

results = non_max_suppression(res, conf_thres=0.25, iou_thres=0.5, classes=None, agnostic=False, multi_label=False, labels=(), max_det=300, nc=9, max_time_img=0.05, max_nms=30000, max_wh=640, in_place=True, rotated=False)

dets = Detections(nn_data)

r = results[0]

if r.numel() > 0:

for result in r:

x_min = result[0].item()

y_min = result[1].item()

x_max = result[2].item()

y_max = result[3].item()

conf = result[4].item()

label = int(result[5].item())

det = Detection(None, label, conf, x_min, y_min, x_max, y_max)

dets.detections.append(det)

return dets

[0.49, 1.82, 3.34, 3.10]

[2.07, 1.92, 3.85, 3.17]

[0.46, 0.32, 3.35, 2.65]

[2.02, 0.42, 3.82, 2.69]

[2.05, -1.13, 3.84, 2.14]

[3.47, 0.43, 4.37, 2.77]

[3.50, 2.11, 4.37, 3.38]

erik · Dec 10, 2024

@dmn225 perhaps there are multiple layers, and your nn_data.getFirstLayerFp16() isn't correct? What training scripts/conversion options did you use to get the model?

Ddmn225 · Dec 10, 2024

I believe the issue was with the onnx file that was being used, and generating one through yolo rather than luxonis aligned with the postprocessing that I was trying to do, and now the boxes are coming out correctly. Thanks for the help!