E
EwoudPool

  • 5 days ago
  • Joined May 3, 2023
  • 0 best answers
  • A new version of yolo has just been released, yolov10. Does luxonis plan to support this new version on their cameras?

  • Hey erik,

    I'm not really sure, I don't know if I can force IE to run on the cpu with FP16. However, I'm using a model created with mo with the flag --data_type FP16, so I assume the CPU runs it with FP16?

    • erik replied to this.
    • Hey erik ,

      Do I then understand correctly that there is a difference between running a model at FP32@CPU and FP16@CPU, as well as a difference between running a model at FP16@CPU and FP16@VPU?

      • erik replied to this.
      • Hey erik ,

        Thank you for the two links you shared, they have clarified quite a bit! Using my own trained model, I have followed some form of the steps outlined in https://docs.luxonis.com/en/latest/pages/tutorials/deploying-custom-model/#deploying-custom-models, and verified the scale and mean to be correct. The effect of the model optimizer converting to FP16 (i.e. running mo with or without --data_type FP16) were also minimal.

        The last step on that page, https://docs.luxonis.com/en/latest/pages/tutorials/deploying-custom-model/#testing-accuracy-degradation-due-to-fp16-quantization, seems to be the culprit. Changing exec_net = ie.load_network(network=net, device_name='CPU') to exec_net = ie.load_network(network=net, device_name='MYRIAD') significantly changes the output. Comparing the confidence of the bounding boxes on CPU vs. MYRIAD, I see some bounding boxes with either a confidence 0.3 lower, or 0.6 higher.

        Given that the issue seems to be the quantization, how would you suggest I proceed? Note that while it is my own trained model, it is a yolov5 model which does not (necessarily) seem to be affected by the quantization.

        • erik replied to this.
        • Hey erik,

          These resources sound exactly like what I was looking for! I'll dive into them and will report back on success or failure.

          Regards,
          Ewoud

        • Hey erik ,

          That could very well be the cause, but how can I verify that that's the sole cause? The condensed issue is, I have a trained neural network that works very well on my computer and significantly worse on the oak-d. I could use a hand-wavy explanation and attribute it all to the precision conversion, but that gives me no clear way to improve my results, other than swapping out my oak-d for something else to do the NN computation. My hope is that I can truly isolate and reproduce the effect, so that I can hopefully find a way to improve my performance on the oak-d directly.

          To isolate the issue, I have tried keeping everything FP32 (are you sure about INT32?) by following the "manual" conversion steps in the colab notebook and switching the two conversions from FP16 into FP32, but I still have a discrepancy between the pytorch output and the oak-d output in my bus example. This could point to that there's something else causing the degrading effect (could it be the argument --reverse_input_channelthat's passed on to mo ? Or perhaps an argument is missing?). Where would you suggest I look next?

          Regards,
          Ewoud

          • erik replied to this.
          • I've been training my own yolov5 model, which shows quite a nice result when I run inference using the yolov5 repo, but the performance is significantly different (and worse) when I run it on an oak-D. I've been trying to figure out what the culprit is, but I am unsure how I could properly debug this.

            My current focus is to get the same result from inference in pytorch as on the camera, and right now I'm using the yolov5s pretrained model from the yolov5 repo to make sure that my own model is not the problem. I've resized one of their example images to 448x448, my target size, see below.

            In the pytorch repo, I can get an output image with the folowwing command:
            python detect.py --weights yolov5s.pt --source bus2.jpg --img 448

            Which gives me the result below. I've removed the label name so I can get those confidence values printed out.

            Then, I use the code below to run the same image through the oak-d, with the .blob generated by the luxonis tool. (specify location of yolov5s .blob, .json and image yourself)

            
            from pathlib import Path
            import depthai as dai
            import matplotlib.pyplot as plt
            import numpy as np
            import cv2
            import os
            
            
            # load model json and blob
            model_name = "yolov5s"
            model_dir = Path("yolov5s")
            model_config_path = model_dir / (model_name + '.json')
            with open(model_config_path) as fp:
                config = json.load(fp)
            
            model_blob_path = model_dir / (model_name + '.blob')
            model_config = config['nn_config']
            labels = config['mappings']['labels']
            metadata = model_config['NN_specific_metadata']
            coordinate_size = metadata['coordinates']
            anchors = metadata['anchors']
            anchor_masks = metadata['anchor_masks']
            iou_threshold = metadata['iou_threshold']
            confidence_threshold = metadata['confidence_threshold']
            
            
            # build pipeline
            pipeline = dai.Pipeline()
            detection = pipeline.create(dai.node.YoloDetectionNetwork)
            detection.setBlobPath(model_blob_path)
            detection.setAnchors(anchors)
            detection.setAnchorMasks(anchor_masks)
            detection.setConfidenceThreshold(confidence_threshold)
            detection.setNumClasses(len(labels))
            detection.setCoordinateSize(coordinate_size)
            detection.setIouThreshold(iou_threshold)
            detection.setNumInferenceThreads(2)
            detection.input.setBlocking(False)
            detection.input.setQueueSize(1)
            
            xin = pipeline.create(dai.node.XLinkIn)
            xin.setStreamName("frameIn")
            xin.out.link(detection.input)
            
            detection_out = pipeline.create(dai.node.XLinkOut)
            detection_out.setStreamName("detectionOut")
            detection.out.link(detection_out.input)
            
            device = dai.Device(pipeline)
            qIn = device.getInputQueue("frameIn")
            qOut = device.getOutputQueue("detectionOut", maxSize=10, blocking=False)
            
            # Create ImgFrame message
            image = cv2.imread("bus2.jpg")
            imsize = 448
            img = dai.ImgFrame()
            img.setData(image.transpose(2, 0, 1))
            img.setWidth(imsize)
            img.setHeight(imsize)
            qIn.send(img)
            frame_out = qOut.get()
            
            fix, ax = plt.subplots()
            ax.imshow(image[:, :, [2, 1, 0]])
            
            for detection in frame_out.detections:
                xmin = detection.xmin
                xmax = detection.xmax
                ymin = detection.ymin
                ymax = detection.ymax
                xpos = np.array([xmin, xmax, xmax, xmin, xmin]) * imsize
                ypos = np.array([ymax, ymax, ymin, ymin, ymax]) * imsize
                print(detection.confidence)
                ax.plot(xpos, ypos)
            
            plt.show()

            Which gives me roughly the same bounding boxes, but not exactly the same as the pytorch implementation. The detection confidences are also a bit off (compare below with those in the image):

            0.887598991394043
            0.8628273010253906
            0.8618507385253906
            0.7907819747924805
            0.4408433437347412

            Unfortunately for me, the results are significantly off when I do the above steps with my own network, on an image of my own. And the fact that there is still a difference when I do it with the general yolov5 model makes me believe that something in the conversion from .pt to .blob is messing it up for me.

            How would you suggest I further debug this? Is it reasonable to believe something is happening is the conversion from .pt to .blob, and could I counteract it? If you want I can also send you my own trained model and example image, but I'd rather not share that publicly.

            • erik replied to this.