Does NN examine the entire hi-res frame?

Russ76

https://github.com/luxonis/depthai-python/blob/main/examples/VideoEncoder/rgb_full_resolution_saver.py

Question about the link above. Is the AI really examining all the frame or just a 300 x 300 patch? My plan is to use a 1920 wide and 300 height image and examine that. I want to get the detection output to guide the robot for the width. That's the X dimension, no?

erik

Hi Russ76 ,
AI model (eg MobileNet here) requires 300x300 input shape, so it's using the 300x300 preview frame, which is downscaled + cropped from the 1080P video frame (which is cropped from the full-fov ISP frame) - more details here. To get 1920x300 frames, you could train your AI model with such dimensions - you can check out our YoloV5 training/deployment notebook which allows you to specify any size. Thoughts?
Thanks, Erik

Russ76

Ah, so I'd have to train the AI with that same image size for it to work right? That's probably beyond the capability of Tf-Lite. My best accuracy would be to include the depth, I bet. That would shrink the width. Is there a tutorial on how to do that, with a 5 layer input? Thanks!

erik

Hi Russ76 ,
Actually, with YoloV5 I linked above, you don't need to train it on the same image size as you want the network to be. Regarding the 5-channel/layer input - there are some very bleeding edge architectures that would allow you this, but we haven't gotten it to work with the OAK. You would also need depth frames/maps when training the model, so the whole training pipeline would need to be specific for such architecture.
Thansk, Erik

Russ76

OK, I will try preparing a dataset of 1920 x 300 images. If Yolo5 can train with that, we'll do it! The reason for this is that the timing is critical. The machine moves forward and we need to examine the entire width in one instance. If it works, I'll share the video of the operation.

erik

Hi Russ76 ,
I don't even think you need 1920x300 images - you can use standard images and then just resize the model to desired resoltuion (1920x300) 🙂
Thanks, Erik

Russ76

Yes, I'm finding Yolov5 (Ultralytics) quite flexible about image size. The combination of Roboflow for working with images and Yolov5 training on Google Colab works GREAT! We are making progress...

Russ76

So far I have the Rgb camera putting out video at 1920 x 320, and the AI model takes in 1280 x 224, which is the preview size. Seems to work well, and it is fast.

erik

Awesome, thanks for the update! It's great to see such short development cycles🙂

Russ76

Say, how do I determine if the 320 height strip is lower in the frame or higher? I mean, which part of the frame is the camera disregarding? Can I instruct it in this regard?

erik

Russ76 By default (using rgbCam.setPreviewSize()) it will do center cropping. You could use ImageManip for cropping and use ImageManipConfig message & specify the setCropRect to get eg. top/bottom part of the frame instead of central. Thoughts?

Russ76

That sounds great! I will couple image selection with machine speed so that it is more accurate in spraying. For this, the robot will need ROS so that encoders can be included. At present, motors are RC managed.

Russ76

In the page:
https://docs.luxonis.com/projects/api/en/latest/samples/ImageManip/image_manip_tiling/
line 12 of the Python code says:
maxFrameSize = camRgb.getPreviewHeight() * camRgb.getPreviewHeight() * 3
Should one of those be getPreviewWidth?

erik

Hi Russ76 , thanks for reporting! Just created PR here: https://github.com/luxonis/depthai-python/pull/639

Russ76

erik

Awesome project, thanks for sharing Russ76 !! Are there any planned improvements you have in mind🙂?

Russ76

Yes, of course! Here's a more complete video.

Some improvement ideas I mentioned previously. I have herbicide marker dye coming today.
This machine also has mower, materiel transporter, and snowplow attachments.