Hi everyone,

I trained an instance segmentation model with YOLOV5 to detect an excavator bucket's different actions at every moment by its features. The custom data is exported from Roboflow, and the model file is converted using DepthAI Tool

It does run on my OAK-D camera, however it generates so many detections on each frame:

I would like to know what's going on, and how could I fix this problem please?

Cheers,

Austin

    Hi YWei
    Did you change the anchors and masks to what is set in your model's json which you received from tools.luxonis?

    Thanks,
    Jaka

    • YWei replied to this.

      Hi Jaka jakaskerl

      Yes I believe so.

      This is my script set-up:

      pipeline = dai.Pipeline()
      
      #### det NN ###
      nn_det = pipeline.create(dai.node.YoloDetectionNetwork)
      nn_det.setConfidenceThreshold(0.4)
      nn_det.setBlobPath('/Users/cloudscapespare/Documents/yolov5s_seg/224_x/best_openvino_2022.1_8shave.blob')
                         
      nn_det.input.setBlocking(True)
      nn_det.setNumClasses(6)
      nn_det.setCoordinateSize(4)
      nn_det.setAnchors([10.0,13.0, 16.0,30.0, 33.0,23.0, 30.0,61.0, 62.0,45.0, 59.0,119.0, 116.0,90.0, 156.0,198.0, 373.0,326.0])
      nn_det.setAnchorMasks({"side28": [0, 1, 2], "side14": [3, 4, 5], "side7": [6, 7, 8]})This is my script set up:

      While this is what's in my JSON file:

      "nn_config": {
              "output_format": "detection",
              "NN_family": "YOLO",
              "input_size": "224x224",
              "NN_specific_metadata": {
                  "classes": 6,
                  "coordinates": 4,
                  "anchors": [
                      10.0,
                      13.0,
                      16.0,
                      30.0,
                      33.0,
                      23.0,
                      30.0,
                      61.0,
                      62.0,
                      45.0,
                      59.0,
                      119.0,
                      116.0,
                      90.0,
                      156.0,
                      198.0,
                      373.0,
                      326.0
                  ],
                  "anchor_masks": {
                      "side28": [
                          0,
                          1,
                          2
                      ],
                      "side14": [
                          3,
                          4,
                          5
                      ],
                      "side7": [
                          6,
                          7,
                          8
                      ]
                  },
                  "iou_threshold": 0.5,
                  "confidence_threshold": 0.5
              }

      Don't think I need to change anything in my script, right?

      Or it might be a issue about optimisation?

      Regards,

      Austin

        Hi YWei
        Didn't realize this until now, the json also has confidence_threshold and iou_threshold, both of which you can also set inside the yolo node config.

        nn_det.setConfidenceThreshold(0.5)
        nn_det.setIouThreshold(0.5)

        Check if it makes a difference. Visually, it looks like both the confidence threshold and IOU are too low.

        Thanks,
        Jaka

        • YWei replied to this.

          Hi jakaskerl

          Now it's getting much less noise by resetting Confidence Threshold and IOU. Still see a few mis-detections, which I believe it's manageable.

          However, there is a new finding I do want to discuss with you: I trained a YOLOV8 model and found it was going much better -- always showed only 1 bounding box with a high accuracy. The odd thing is I checked the JSON file of it:

          "nn_config": {
                  "output_format": "detection",
                  "NN_family": "YOLO",
                  "input_size": "224x224",
                  "NN_specific_metadata": {
                      "classes": 6,
                      "coordinates": 4,
                      "anchors": [],
                      "anchor_masks": {},
                      "iou_threshold": 0.5,
                      "confidence_threshold": 0.5
                  }
              },
              "mappings": {
                  "labels": [
                      "Class_0",
                      "Class_1",
                      "Class_2",
                      "Class_3",
                      "Class_4",
                      "Class_5"
                  ]
              }

          There is no anchors, anchor_masks and labels really defined. So I wonder how my script can successfully run it with no configuration in JSON file? If so, can I do the same on an OAK PoE device in the future?

          Also, could you please explain why YOLOV8 can outperform YOLOV5 so much in my case? Is that because of the optimisation, or because of the model architecture itself?

          Cheers,

          Austin

            YWei

            Hey,

            Also, could you please explain why YOLOV8 can outperform YOLOV5 so much in my case? Is that because of the optimisation, or because of the model architecture itself?

            There might be a lot of reasons why YoloV8 is outperforming YoloV5 here.

            First, you said you've used instance segmentation model for V5, which is not yet officially supported on our devices (and tools). This means that there might be some issues during the conversion. Potentially, the model might have learned less due to having multiple tasks to perform (object detection and mask prediction) and training might not have been optimized for that.

            Second, it depends on how you trained the models. Did you train them yourself or did you use some third party tool? I am not sure how Roboflow sets up the training parameters. Whichever tool you have used, it might be related to how the training is set up and the training itself.

            Third, YoloV8 should be able to perform better than V5 based on the metrics that they report so it makes sense V8 would perform better.

            Based on the image you have shared, I would say it is likely a combination of the above, as there seems to be just too many predictions. I know there was a bug in how we compute the confidence, so you can also try upgrading the depthAI to the latest version to see if that resolves it.

            There is no anchors, anchor_masks and labels really defined. So I wonder how my script can successfully run it with no configuration in JSON file? If so, can I do the same on an OAK PoE device in the future?

            Regarding the anchors -- the prediction head of YoloV5 is similar to those of V3 and V4, which use anchors. This is why you need to provide them to DepthAI so we know how to decode your predictions. V8 builds on top of V6 and its predecessors and uses anchorless approach. This is why anchors are not required. If you use V6 or V8 you should be able to do this on all our devices, including OAK PoE.

              Matija

              Thank you Matija. Your explanation will be really helpful for my work I believe!

                YWei

                Glad to hear, let me know if you'll have some further questions 🙂