yolov4-tiny problems with different input size

Ttode09 · Nov 10, 2021

Hi everyone,
following the steps in the tutorial 'Training a Tiny YOLOv4 Object Detector with Your Own Data', I successfully trained my own Tiny YOLOv4 to detect some different objects and I successfully ran it on the OAK-D.
Unfortunately, I would like to detect some object that is very small in the resized image, and so the performances on these objects are poor. In order to overcome this issue, I tried to increase the input resolution from 416x416 to 608x608, gaining accuracy on small objects. However, If I try to run this model on the OAK-D I get this warning and no predictions:
[warning] Input image (608x608) does not match NN (416x416).
It seems that the generated blob model has not the correct input dimensions, but I cannot find anything that could be changed in order to solve this problem.
In the colab notebook, I changed the width and height in the yolov4-tiny.cfg file and the trained model seem to be correct since in the output I see that the first layer is 608x608 and the accuracy increased.
Following the answer of @GergelySzabolcs in this discussion, I also added the argument size in python3 convert_weights_pb.py --size 608 and changed the height and width in the OPENVINO-YOLOV4/cfg/yolov4-tiny.cfg to be 608.
Doing that the generated xml file shows that the first layer is effectively 608x608.
Nevertheless, after the conversion to .blob the model doesn't work, giving the warning written before.
Does anyone have the same issue?
Thanks
Davide

PS in order to run the model on the OAK-D i'm using the depthai_demo.py file

erik · Nov 10, 2021

Hello, asking Matija, our lead AI engineer, to answer this. He's from Europe though, so will probably reply when he wakes up.
Thanks, Erik

Matija · Nov 11, 2021

Hi tode09 ,

If you changed the input shape in the cfg before training and provided --size 608 when calling convert_weights_pb.py it should work and convert as it is supposed to. A good indicator of that is that you find the correct input shape also in the XML. You can also confirm this by looking at <meta_data> tags in the last few lines of the XML. There should be a line similar to <input_shape value="[1,3,608,608]"/>.

If all of this is correct, then maybe you are setting up the pipeline wrong. When calling the depthai_demo.py you should provide correct JSON configuration for you model, with values that match the values in your CFG. As a starting point, I'd suggest you to take this template, and edit it. By default, the anchors should be the same. You would have to change input_size, classes, and labels.

Also, be careful when setting anchor masks. Anchor masks in the JSON should match the anchor masks in the config. Since you changed the input shape, you will also have to change the "side26" to "side38", and "side13" to "side19". You get this number as input_shape divided by 16 and 32 respectively.

You can also try adapting this example from our docs, where you would set this values directly in Python code.

One last note on custom objects. I would kindly refer you to this page, where it's described how to generate anchors for your own data set, which might improve the performance for your case. In case you generate your own anchors, you would then have to add them to the CFG, to the JSON used during conversion, and to the JSON in depthai_demo.py.

Ttode09 · Nov 11, 2021

Hi Matija , erik , thank you for your replies.
With respect to what you said, I noticed that in the XML file the input_shape in the <meta_data> is unset.

While at the top of the page, the dimension of the first layer is correct.

I cannot figure out the cause of that.
Regarding the anchor masks, I left them as default, while effectively I didn't change the side26and side13
to the correct values (only in the JSON used in depthay_demo.py, correct? ).
Unfortunately, even changing these values, the problem still remains.
Thank you,
Davide

Matija · Nov 11, 2021

Hey tode09 ,

I quickly tried converting one tinyYoloV4 model and found that for some reason there's no input shape in meta data. Looks like this is due to the conversion script. Nevertheless, it should still work. Have you tried testing the model before the conversion and can you successfully infer the objects?

Are you still getting incorrect input shapes problem? If so, then there must be some depthai_demo.py problem I believe (CC @erik ). Could you try editing the example from our docs and running the model that way? Be sure to set the preview size to your input shape.

If the same error still persists even with the edited docs example, would you mind sharing the files (weights, xml, bin, obj.names, CFG, and names used during the conversion)?

Thanks, Matija

Ttode09 · Nov 12, 2021

Hi @Matija ,
I finally solved by changing the output path of the OpenVino conversion. I was generating the bin,xml and mapping files in a totally new folder, while now its parent directory is the one with my custom .cfg .data .names and so on.
I don't know actually why in this way it works, but maybe the .cfg is used in the following conversion to blob, while in the previous situation a default one was used.

As a recap for someone that will face the same necessity.
To change the yolo network size, starting from the colab notebook, it is sufficient to:

change the height and width in the .cfg file.
add the argument --size followed from the desired size to the convert_weights_pb.py call
change the width and height in the JSON file used by the OAK-D for inference.
in the same file change also side26 and side13 to sideX and sideY where X and Y are obtained dividing the network size by 16 and 32.
I don't know if this is actually a requirement for the blob conversion, but in my case in order to work, the xml,mapping and bin files generated by the openvino conversion must be in a folder whose parent directory contains my custom .cfg, .names, .data and so on (maybe the only relevant is the .cfg). This is actually the same structure of the mask example, but is important to keep it as it is.

@Matija thank you for your help, and sorry for making lose your time.