I have built a custom model to detect objects using YOLOv5s. The model was then converted to ONNX and then to a .blob. I have set up the model to run on an OAKdLite camera. The dataset used was very small (200 images) as this was a proof of concept.
The results were good as it was able to detect with 60% confidence some of the objects. But there are two things happening that are causing issues in reading the Data.
After turning the Data into a readable format I am able to determine there are seven points to each detection and I have print results that read like this:
Raw Detection Data: [5.44000000e+02 5.65000000e+02 8.25000000e+01 8.63125000e+01
6.06445312e-01 1.05102539e-01 8.80371094e-01]
Processed Data: Xcenter[0] 544.0 Ycenter[1] 565.0 Width[2] 82.5 Height[3] 86.3125 Confidence[4] 0.6064453125 Label[5]*Maybe* 0 unkown[6] 0.88037109375
This is not the Data format that was expected and it took some trial and error to determine that those were the values by drawing the bounding boxes on the screen it is in fact picking up the objects as indented up to 70% confidence which is likely due to a few factors in the dataset and testing environment.
So my Main issues is that the label for the object its finding is actual class 1 but it will always put out class 0 unless I am supposed to round this up which seems unlikely.
Further I don't know what the seventh Data point is… Unless its actually the class and in that case I don't know what the 6th data point is…
Currently the model fails to pickup any items from class 0 and only returns class one. But the Detections from getFirstLayerFp16() are always full at a len of 25200 and an average confidence of the 25200 very stable around 0.2%. Even with no objects in the frame I get 25200 results but with average confidence of 1.6e-7%
Is it common to have so many results which often overlap very close?
Any resources on parsing this unusual data or advice on lowering the incoming results to a more reasonable level would be appreciated.
I could update with the code but I am not sure that would be of any consequence.