yolov8 input image size

RRyanLee · Dec 19, 2024

Hi,

I am working with yolov8 nowadays. i saw the input size is 640x352 and you choose it. Could you explain about it for me? such as why you choose 640x352. and we train our model with 512x512 then do i need to change it to 640x352??

https://docs.luxonis.com/software/ai-inference/integrations/yolo

Open the tools in a browser of your choice. Then upload the downloaded yolov6n.pt weights and set the Input image shape to 640 352 (we choose this input image shape as the aspect ratio is close to 16:9 and throughput and latency are still decent). The rest of the options are left as they are.

Best regards,

Ryan.

jakaskerl · Dec 21, 2024

RyanLee
That's what the model was trained on. Likely not set by us. Afaik for Yolov8 models you can set the resolution in the tools and it will be applied to the model during conversion.

Thanks,
Jaka

RRyanLee · Dec 22, 2024

@jakaskerl

Thank you for your feedback. And i have still a question about it.

!yolo train model=yolov8n.pt data=VOC.yaml epochs=2 imgsz=640 batch=32 device=0

As you can see there is imgsz=640. and it means input image is 640x640. But in the tools, ( https://tools.luxonis.com/ ) can be set 640x352 for 16:9 ratio image like 1920x1080.

So again my first question is how you define 352 height??
Second, even thought height is reduced from 640, there is no performance degradation??
Third, when height is reduced are you using crop or resizing.

i am not sure it is connected with your device so feel free let me know as much as you can. Thanks,

Best regards,
Ryan.

jakaskerl · Dec 23, 2024

cc @JanCuhel

JanCuhel · Dec 24, 2024

Hi @RyanLee,

YOLO models are robust to input size changes due to their fully convolutional design. This allows the model to process input images of various sizes as long as the dimensions are divisible by the stride of the network's layers (typically powers of 2, e.g., 32 for most common YOLO versions). So even though a YOLO-based model was trained with, let's say, 640x640 input image shape, you can export it using 640x352.

Regarding the performance degradation, I have personally never measured it, but I have never noticed any big performance gap.

The reason why we're sometimes reducing the height during export is that having images with an aspect ratio of 16:9 is more realistic for our cameras than 1:1. Furthermore when switching input image shape from 640x640 to 640x352, the model latency improves as the model has fewer pixels to process which is crucial in edge AI.

I hope this addresses all your questions! Please feel free to reach out if anything remains unclear or if you have additional queries.

Wishing you a Merry Christmas and a wonderful holiday season!

Kind regards,
Jan

RRyanLee · Dec 25, 2024

@JanCuhel Thank you so much about its great and clear explanation. I got a sense now.

Happy Christmas!!

Best regards,
Ryan.