Use custom detection NN and Object Tracking node

Mmzbt · May 16, 2024

Hello everyone,

I would like to use a custom NN model for detection purposes, post-process the results if necessary, and then use the OAK's Object Tracking node for my project. However, I've encountered some restrictions when applying the script node to post-process the detection results. Does anyone have any ideas on how to successfully or better implement this? Any help would be greatly appreciated. Thanks!

jakaskerl · May 16, 2024

Hi @mzbt
The script node is very limited in modules and libraries: https://docs.luxonis.com/projects/api/en/latest/components/nodes/script/#available-modules-and-libraries and also slow to run arithmetic in due to slow CPU.

I suggest switching to YOLO so you can directly implement object tracker with the already decoded output.

If you have simple decoding (that can be done in script node), do that, then create a new ImgDetections message each time, and populate it with the results. Then you can pass it directly into the tracker node.

If not, you will need to send the frames to host, and perform decoding and tracking there.

Thanks,
Jaka

Mmzbt · May 17, 2024

jakaskerl

Hi Jaka,

Thank you for the reply.

I am actually interested in using the YOLOv8-Pose model, and it seems that this model cannot be decoded using the original YOLO node, is this True?

Additionally, you mentioned performing decoding and tracking on the host side. Is it possible to create an additional pipeline on the host to use the Object Tracking node from OAK?

jakaskerl · May 17, 2024

Hi @mzbt
Only detection is currently supported on the YOLOv8 models at the moment, so decoding for pose detection is not possible.

mzbt Additionally, you mentioned performing decoding and tracking on the host side. Is it possible to create an additional pipeline on the host to use the Object Tracking node from OAK?

It's not possible to utilize OAK HW blocks on the host side. What you could do, is send the frames back to the OAK via XLINKIN node, and perform object tracking on top of it, but this would introduce additional overhead and hinder performance.
I suggest running tracking on host as well if you can.

Thanks,
Jaka

Mmzbt · May 17, 2024

jakaskerl

Hi Jaka,

Thanks again for the reply.

I think I have managed to output the detection results, build the ImgDetection, and feed them back into the Object Tracking node. However, I have noticed that the bounding boxes generated by the Detection output are not consistent with the "source" detection bounding boxes of the corresponding tracklet (accessed through the "srcImgDetection" attribute of the Tracklet class) in each frame.

The above problem remains even when I use the original YOLO detection node followed by the Object Tracking node, which is a big surprise for me. Specifically, the latter appears to be delayed sometimes by 1 or 2 frames compared with the raw Detection output. Is my pipeline working properly here? How can I avoid such problem to make both types of bounding boxes consistent in each frame?

jakaskerl · May 19, 2024

Hi @mzbt
Depends on how you are linking the inputs to object tracker:

If input detection and input tracker frame have different dimensions, then a scaling of BBOX will take place (which is usually a good thing since it allows you to view tracking in the bigger frame), which might be causing the issues here.

Same thing goes for the delay. If you are passing the detections and frames separately, there will likely be a delay.
You can avoid that by sending detection frame and the detection at the same time using detection.passthrough for the frame and detection.out for the detections.

Thanks,
Jaka