A new version of yolo has just been released, yolov10. Does luxonis plan to support this new version on their cameras?
EEwoudPool
- 5 days ago
- Joined May 3, 2023
- 0 best answers
Hey erik,
I'm not really sure, I don't know if I can force IE to run on the cpu with FP16. However, I'm using a model created with
mo
with the flag--data_type FP16
, so I assume the CPU runs it with FP16?Hey erik ,
Do I then understand correctly that there is a difference between running a model at FP32@CPU and FP16@CPU, as well as a difference between running a model at FP16@CPU and FP16@VPU?
Hey erik ,
Thank you for the two links you shared, they have clarified quite a bit! Using my own trained model, I have followed some form of the steps outlined in https://docs.luxonis.com/en/latest/pages/tutorials/deploying-custom-model/#deploying-custom-models, and verified the scale and mean to be correct. The effect of the model optimizer converting to FP16 (i.e. running
mo
with or without--data_type FP16
) were also minimal.The last step on that page, https://docs.luxonis.com/en/latest/pages/tutorials/deploying-custom-model/#testing-accuracy-degradation-due-to-fp16-quantization, seems to be the culprit. Changing
exec_net = ie.load_network(network=net, device_name='CPU')
toexec_net = ie.load_network(network=net, device_name='MYRIAD')
significantly changes the output. Comparing the confidence of the bounding boxes on CPU vs. MYRIAD, I see some bounding boxes with either a confidence 0.3 lower, or 0.6 higher.Given that the issue seems to be the quantization, how would you suggest I proceed? Note that while it is my own trained model, it is a yolov5 model which does not (necessarily) seem to be affected by the quantization.
Hey erik,
These resources sound exactly like what I was looking for! I'll dive into them and will report back on success or failure.
Regards,
EwoudHey erik ,
That could very well be the cause, but how can I verify that that's the sole cause? The condensed issue is, I have a trained neural network that works very well on my computer and significantly worse on the oak-d. I could use a hand-wavy explanation and attribute it all to the precision conversion, but that gives me no clear way to improve my results, other than swapping out my oak-d for something else to do the NN computation. My hope is that I can truly isolate and reproduce the effect, so that I can hopefully find a way to improve my performance on the oak-d directly.
To isolate the issue, I have tried keeping everything FP32 (are you sure about INT32?) by following the "manual" conversion steps in the colab notebook and switching the two conversions from FP16 into FP32, but I still have a discrepancy between the pytorch output and the oak-d output in my bus example. This could point to that there's something else causing the degrading effect (could it be the argument
--reverse_input_channel
that's passed on tomo
? Or perhaps an argument is missing?). Where would you suggest I look next?Regards,
EwoudI've been training my own yolov5 model, which shows quite a nice result when I run inference using the yolov5 repo, but the performance is significantly different (and worse) when I run it on an oak-D. I've been trying to figure out what the culprit is, but I am unsure how I could properly debug this.
My current focus is to get the same result from inference in pytorch as on the camera, and right now I'm using the yolov5s pretrained model from the yolov5 repo to make sure that my own model is not the problem. I've resized one of their example images to 448x448, my target size, see below.
In the pytorch repo, I can get an output image with the folowwing command:
python detect.py --weights yolov5s.pt --source bus2.jpg --img 448
Which gives me the result below. I've removed the label name so I can get those confidence values printed out.
Then, I use the code below to run the same image through the oak-d, with the .blob generated by the luxonis tool. (specify location of yolov5s .blob, .json and image yourself)
from pathlib import Path import depthai as dai import matplotlib.pyplot as plt import numpy as np import cv2 import os # load model json and blob model_name = "yolov5s" model_dir = Path("yolov5s") model_config_path = model_dir / (model_name + '.json') with open(model_config_path) as fp: config = json.load(fp) model_blob_path = model_dir / (model_name + '.blob') model_config = config['nn_config'] labels = config['mappings']['labels'] metadata = model_config['NN_specific_metadata'] coordinate_size = metadata['coordinates'] anchors = metadata['anchors'] anchor_masks = metadata['anchor_masks'] iou_threshold = metadata['iou_threshold'] confidence_threshold = metadata['confidence_threshold'] # build pipeline pipeline = dai.Pipeline() detection = pipeline.create(dai.node.YoloDetectionNetwork) detection.setBlobPath(model_blob_path) detection.setAnchors(anchors) detection.setAnchorMasks(anchor_masks) detection.setConfidenceThreshold(confidence_threshold) detection.setNumClasses(len(labels)) detection.setCoordinateSize(coordinate_size) detection.setIouThreshold(iou_threshold) detection.setNumInferenceThreads(2) detection.input.setBlocking(False) detection.input.setQueueSize(1) xin = pipeline.create(dai.node.XLinkIn) xin.setStreamName("frameIn") xin.out.link(detection.input) detection_out = pipeline.create(dai.node.XLinkOut) detection_out.setStreamName("detectionOut") detection.out.link(detection_out.input) device = dai.Device(pipeline) qIn = device.getInputQueue("frameIn") qOut = device.getOutputQueue("detectionOut", maxSize=10, blocking=False) # Create ImgFrame message image = cv2.imread("bus2.jpg") imsize = 448 img = dai.ImgFrame() img.setData(image.transpose(2, 0, 1)) img.setWidth(imsize) img.setHeight(imsize) qIn.send(img) frame_out = qOut.get() fix, ax = plt.subplots() ax.imshow(image[:, :, [2, 1, 0]]) for detection in frame_out.detections: xmin = detection.xmin xmax = detection.xmax ymin = detection.ymin ymax = detection.ymax xpos = np.array([xmin, xmax, xmax, xmin, xmin]) * imsize ypos = np.array([ymax, ymax, ymin, ymin, ymax]) * imsize print(detection.confidence) ax.plot(xpos, ypos) plt.show()
Which gives me roughly the same bounding boxes, but not exactly the same as the pytorch implementation. The detection confidences are also a bit off (compare below with those in the image):
0.887598991394043 0.8628273010253906 0.8618507385253906 0.7907819747924805 0.4408433437347412
Unfortunately for me, the results are significantly off when I do the above steps with my own network, on an image of my own. And the fact that there is still a difference when I do it with the general yolov5 model makes me believe that something in the conversion from .pt to .blob is messing it up for me.
How would you suggest I further debug this? Is it reasonable to believe something is happening is the conversion from .pt to .blob, and could I counteract it? If you want I can also send you my own trained model and example image, but I'd rather not share that publicly.