What does "Input image shape" exactly do?
Say my model is 640x640, and on the tools page I specify 640x352, what does happen? Also, any impact on model performance?

  • Hi @Thor,

    The input image shape argument in tools specifies the resolution of the images accepted by a specific YOLO model that is exported for our devices.

    The reason why you can infer a YOLO model with different image resolutions is because the YOLO models are fully convolutional. Opposite to fully connected layers, where we need to specify the exact number of neurons in each layer, in the case of convolutional layers, we specify the size and number of the used kernel filters, stride, etc. This makes the fully convolutional neural networks independent of the input image resolution (though you can't infer the model with whatever image resolution, usually the image resolution needs to be divisible by 32 in the case of YOLO models).

    In terms of performance, a larger image resolution results in more visual information (higher level of detail) that can be used during inference. This means that the model should perform slightly better on images with higher resolutions as the pictures are more sharp and not so blurry. On the other hand, using higher input image resolution results in slower inference, as the model has more data, so it naturally requires more time to process it.

    I hope that this answers your question. If you don't understand something or want to ask additional questions, please don't hesitate to do so!

    Best,
    Jan

Hi @Thor
Haven't tested it, but I'd assume the you will get dimension mismatch errors when running the frames through the model. And if you set the input to match the 640x352, the inference won't work and will probably just crash.

Thanks,
Jaka

@jakaskerl
thank you, but that is not what happens. It seems it works. However I'm not sure about the tradeoffs
This is why I was asking.
Can you maybe check with someone that has direct knowledge of the matter?
Thanks

Hi @Thor,

The input image shape argument in tools specifies the resolution of the images accepted by a specific YOLO model that is exported for our devices.

The reason why you can infer a YOLO model with different image resolutions is because the YOLO models are fully convolutional. Opposite to fully connected layers, where we need to specify the exact number of neurons in each layer, in the case of convolutional layers, we specify the size and number of the used kernel filters, stride, etc. This makes the fully convolutional neural networks independent of the input image resolution (though you can't infer the model with whatever image resolution, usually the image resolution needs to be divisible by 32 in the case of YOLO models).

In terms of performance, a larger image resolution results in more visual information (higher level of detail) that can be used during inference. This means that the model should perform slightly better on images with higher resolutions as the pictures are more sharp and not so blurry. On the other hand, using higher input image resolution results in slower inference, as the model has more data, so it naturally requires more time to process it.

I hope that this answers your question. If you don't understand something or want to ask additional questions, please don't hesitate to do so!

Best,
Jan

That's great to hear! Happy to help!

6 days later

@JanCuhel
I apologize if it is a stupid question, but I couldn't find an answer anywhere
The end goal is to have a blob file that, loaded into an OAK pipeline, will let me use a frame size that is different from the image size used for yolov8 training
Doing this starting with a .pt file is pretty straightforward, eg:
1) train a model using an image size of 640x640
2) use tools.luxonis.com to convert to blob specifying input shape 640 352 on the command line
3) feed the blob and 640x352 camera frames to the YoloDetectionNetwork node

However I could not find a way to do the same starting from an onnx file, assuming the .pt file is no longer available but I have the onnx, xml, bin files that were exported from the .pt file at some point in time
I thought I could use blobconverter.luxonis.com, but I'm not sure about which parameters I should specify in the model optimizer command line to generate a blob that will work with a 640x352 frame

Can you help?

Hi @Thor,

It's actually a very good question! The answer is a bit tricky. When exporting a YOLO model from a .pt weights, we first load the model but then replace the original detection head with our custom detection head, in which we are pruning a bounding box decoding part since the decoding part is already included in our DepthAI library. Because of this and also the fact that converting YOLO models is popular, we created our tools, a simple service for converting YOLO models. The pruning part happens with .pt weights before exporting the modified model to ONNX. You need to do the same pruning to export your models to blob to be compatible with the YoloDetectionNetwork node in DepthAI.

That being said, if you would share your model with us, I'd try to export the model for you and write you instructions to follow so that you could do it by yourself in the future if needed.

Best,
Jan

@JanCuhel
Thank you, Jan.
I'll be happy to share the model with you, we are only asking to keep it confidential. Do you have an email address I can send a dropbox link to?