Perform detection in entire frame

Lucas

Hi everyone.

Is it possible to perform detections in the entire frame?
In testing, I'm feeding the standard 300x300 preview to mobilenet_ssd neural network and it works well.
But is it possible to feed a 16x9 frame or even THE_12_MP frame to the NN to perform the detection everywhere?
Do you have to convert the frame with image.manip? Does this squish the bigger image? And if that's the case, is the accuracy as good as when feeding a 1x1 image?
I hope my questions make sense.
Thank you very much in advance.
Cheers!

Brandon

Yes, that does make sense. And this is possible. I need to look what config is required for this. I think our Gen2 depthai_demo.py does this with -ff, so worst-case we can dig into that.

But hopefully I'll just see it in the API documentation... looking now.

Brandon

I didn't immediately see this in the docs. We will make a FAQ for this. CC: @Luxonis-Karolina

But here it is from python3 depthai_demo.py -ff from the depthai repo (which is a demo repo):
https://github.com/luxonis/depthai

Notice that it crams the full aspect ratio of the color camera into the 1:1 aspect ratio of the neural model (which is 300x300 pixel resolution for this MobileNetSSDv2 example).

Looking for the config for this.

Brandon

Found it.

The ImageManip implementation for it should be:
https://github.com/luxonis/depthai-python/blob/54b842e/examples/26_2_spatial_mobilenet_mono.py#L55

manip.setKeepAspectRatio(not nnFullCameraFov)

And similarly for preview : colorCam.setPreviewKeepAspectRatio

Brandon

What I found I think is an old way of doing it. Not sure. Asking team.

Brandon

OK, team helped. Here's how:

After line 55, here add the following line:
camRgb.setPreviewKeepAspectRatio(False)

Notice that as hoped, the full field of view of the RGB camera is crammed into the 300x300 neural network.

Lucas

Wow Brandon, that's awesome, thank you very much for looking into it, and so quickly!
I'll have a look tonight to see if I can implement it.
Would you say the accuracy is the same when cramming the bigger aspect ratio into 1x1?

Thank you again!

Brandon

It really depends on the model. There will be some accuracy loss though.

A way to do no accuracy loss would be to pad the top and bottom of the wide aspect ratio and feed that into 1:1.

Lucas

Hi Brandon,

I tested it and it works well.

What I also managed to do was use a 4x3 aspect ratio for the preview and then convert it to 1x1 for the neural network. In that way the preview looks fine.

manip = pipeline.createImageManip() manip.initialConfig.setResize(300, 300) camera.preview.link(manip.inputImage) manip.out.link(neuralNetwork.input)

I had thought about the padding as well. Do you know if it's possible doing it with ImageManip?
Or do I have to do it with cv2?

Thank you again for all your help.
Cheers!

erik

Hello Lucas ,
I believe padding isn't (yet) available in the ImageManip node, but you can do it on the host (with cv2 as you mentioned). And feeding padded frame into NN isn't very efficient either, since pixel count increases inference time🙂
Thanks, Erik

Lucas

Hi Erik,

Thank you for your help with this.
So to use cv2 I would have to first capture the frames, process them on the host for adding padding and then feeding them back to the NN?
I'm not too sure how to go about that, since at the moment, following the tutorials, I first create all of the nodes and links on the pipeline, and then start the main loop where I already have everything from the camera.
How could I pass cv2 modified frames from the host back to the NN node on the camera?

Do you think feeding padded frames but keeping the size at 300x300 would increase inference time?

What i'd like to achieve is to perform is a person detection throughout the entire sensor.

Not keeping the preview aspect ratio as Brandon showed above, seems to work, but I was wondering if by doing so i'm reducing the accuracy of the detections.

Thank you very much once again.

Cheers

GergelySzabolcs

Lucas

Padding in ImageManip is implemented.
https://docs.luxonis.com/projects/api/en/latest/references/python/?highlight=setResizeThumbnail#depthai.ImageManip.setResizeThumbnail

Example of usage:

https://github.com/luxonis/depthai-python/blob/main/examples/object_tracker_video.py

erik

Hello Lucas,
my bad for inaccurate information, I wasn't aware of this functionality. So I would try to just use the preview and if you want to use the whole frame (without cropping to keep the aspect ratio), you can do colorCam.setPreviewKeepAspectRatio(bool). I'm not sure how well the model will perform on such a "squished" frame though. As you mentioned, I would also try the img padding with ImageManip node that Szabi linked.
Thanks and sorry again! Erik

AdamPolak

GergelySzabolcs @erik

It seems like setResizeThumbnail crops before resizing? I don't see any mention of padding in it.

Lucas

Wow, i'm blown away more and more with every response!

Not only with the amazing gear and technology, but also how awesome, helpful and friendly everyone in the community is.

I'll give try the padding feature to see how it goes.

Thank you very much again everyone!