synchronize setCropRect

Ddhaanpaa · Sep 6, 2023

I want to apply yolo to native resolution tiles across the 4k camera input. My first stab was to create multiple yolo nodes each simultaneously processing it's own little tile but takes too much memory. So my second approach was to I create a single ImageManip node that feeds a YoloDetectionNetwork and I adjust the crop window of the ImageManip on the host side by setting up a stream and sending a ImageManipConfig message using setCropRect . It appears to work except that I cannot for the life of me figure out how to the yolo detections correlates to which crop rect. They are horribly desynchronized. So I added a synchronization DataInputQueue linked directly to a DataOutputqueue that just goes round trip from host to camera and back, figuring that would act as a sequence number that I could use to associate the incoming detections to the cropped region. However this does not work, I don't know why but maybe because my DIY sequence queue is effectively independent of the yolo detections. Is there any way I can determine which setCropRect matches which detections coming back from the camera? This is on POE Oak D Pro. please help! thanks.

erik · Sep 7, 2023

Hi dhaanpaa ,
One option would be to use 16:9 input size for YOLO, such as a few models here: depthai-model-zoo (see 640x352 models at the bottom). IMO this would be the best solution, and one that would provide best throughput / use the least memory.

Other option would be to use Script node to do what you are describing above. You could set camera FPS to some low value so NN can still process all frames without blocking, or perhaps keep track of all imgs sent to NN and all NN outputs (linked back to Script node) to measure throughput, and to not sent too many imgs. Then you could configure sequence number of frames sent to the NN (imgFrame.setSequenceNum(123)) so you can later sync NN results together with original high-res img.
Thoughts?

Ddhaanpaa · Sep 7, 2023

I wasn't aware of the non-square models but now that you've shown me the light, I agree with you that that's probably my best bet. I might be able to fully tile the full 4k with simultaneous detectors (which I can do right now on HD input).

I did look into doing script stuff and fully intend to play with that in future just because it looks like opens up a lot of possibilities/customization. (baby steps, I'm new at this pipeline programming) I also came across setSequenceNum and figured maybe that could be applicable but I wasn't able to find documentation about it beyond "Retrieves image sequence number" (which looks like a cut-and-paste from getSequenceNum). but if the wide-aspect model doesn't work out I'll go down the road of what you describe.

thanks for the help!

AlexisCarlier · Feb 15, 2024

Hi @dhaanpaa ,

I would like to implement a very similar approach for segmentation purpose. Would it be possible that you share some code pls?

Thanks