JanCuhel I did manage to train a YoloV7 network and it works pretty well: excellent Fps and reasonably good detection. The NN is [416,416], so the RGB preview is also that size; I tried a larger preview size linking to the host + an ImageManip outputting [416,416] to the NN, and this appears to be equivalent. My question is about getting a wider aspect ration to exploit nearer the FOV of the OAK. What I get is cropping to force the square NN dimension, which amounts to around 2/3 FOV.
There may be the possibility to train a network that would harness an RGB preview size [1280,800] and scale that to [416,256] for example, but how do I train for that? I did try using --rect and --img 416 256: the results after 100 epochs as seen in F1, R, and P graphs were terrible. Some searching around makes me guess that just using --rect plus --img 416 might be the correct way, but how could this work? After all, the NN will expect a particular size of image for input. Is that just specified when doing blob conversion? By the way, the training images appear to be 480x360, if that matters.