Compatibility of YoloSpatialDetectionNetwork with Yolov8n Models

suhailnajeeb

I'm currently exploring the use of YoloSpatialDetectionNetwork, which I understand supports up to Tiny-Yolo-v4 for on-device decoding of detections and spatial coordinates together. I'm keen on implementing the Yolov8n model, trained using the Ultralytics framework for higher accuracy/speed.

However, I've encountered an issue where the Yolov8n model's blob file seems incompatible with the YoloSpatialDetectionNetwork node, as direct integration does not function as expected. I was wondering if there are any plans to update the YoloSpatialDetectionNetwork node to support the more advanced Yolov8n models?

If not, has anyone in the community successfully integrated Yolov8n with the node or found a workaround for this compatibility issue? I was able to run the Yolov8n blob file manually with the NeuralNetwork node, but I'm unable to do on-device decoding and fetching of spatial co-ordinates which slows down the pipeline.

For the time being, I plan to use the less accurate Tiny-Yolov4 model, but ideally, I'd like to utilize the capabilities of Yolov8 models for improved performance and accuracy. I appreciate any insights or suggestions. Thanks in advance for your help!

jakaskerl

Hi suhailnajeeb
The V8 detection models are supported by the YoloDetectionNetwork.

suhailnajeeb but I'm unable to do on-device decoding and fetching of spatial co-ordinates which slows down the pipeline.

Have you tried only using the YoloDetectionNetwork? How did you convert the model? Make sure you use https://tools.luxonis.com/.

Thanks
Jaka

suhailnajeeb

Hi @jakaskerl

Thanks for sharing the link to the blob converter. I was previously using the blobconverter python module which might have been causing some issues. After successful conversion using the blobconverter link you provided, I am able to run inference on my custom yolo detector. However, there is an issue that persists.

For my custom object detector, I utilised only 2 classes, which was trained and configured using a custom dataset/training pipeline where I used autodistil for the dataset and training. Upon conversion with the given blob converter, the bounding boxes are properly decoded, however, the detection labels and confidence scores are giving out erroneous outputs. Here part of my modified detection code and sample output for reference:

Changes for the class labels:

# Custom label texts / class maps
labelMap = [
    "yellow cone", "blue cone"
]

Custom displayFrame function:

    def displayFrame(name, frame):
        color = (255, 0, 0)
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            # cv2.putText(frame, labelMap[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)                   # Commented out since these lines cause error
            # cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)       # Commented out since these lines cause error
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
            print(f"Detection Label: {detection.label}, Confidence Threshold: {detection.confidence*100} %")

        # Show the frame
        cv2.imshow(name, frame)

Full code here

Output Preview:

Erroneous output:

(env) envai4r@ai4r:~/cone-localizer$ python rgb_yolo_simple.py
Detection Label: 2, Confidence Threshold: 552.734375 %
Detection Label: 2, Confidence Threshold: 543.75 %
qt.qpa.plugin: Could not find the Qt platform plugin "wayland" in "/home/ai4r/cone-localizer/env/lib/python3.11/site-packages/cv2/qt/plugins"
Detection Label: 2, Confidence Threshold: 596.09375 %
Detection Label: 2, Confidence Threshold: 533.203125 %
Detection Label: 2, Confidence Threshold: 605.078125 %
Detection Label: 2, Confidence Threshold: 519.921875 %
Detection Label: 2, Confidence Threshold: 603.90625 %
Detection Label: 2, Confidence Threshold: 521.484375 %
Detection Label: 2, Confidence Threshold: 607.03125 %
Detection Label: 2, Confidence Threshold: 528.90625 %
Detection Label: 77, Confidence Threshold: 181100.0 %
Detection Label: 78, Confidence Threshold: 4162.5 %
Detection Label: 52, Confidence Threshold: 3366400.0 %
Detection Label: 34, Confidence Threshold: 5008000.0 %
Detection Label: 57, Confidence Threshold: 13362.5 %
Detection Label: 40, Confidence Threshold: 816.40625 %
Detection Label: 15, Confidence Threshold: 777.734375 %
Detection Label: 32, Confidence Threshold: 892.1875 %
Detection Label: 7, Confidence Threshold: 821.09375 %
Detection Label: 32, Confidence Threshold: 890.625 %
Detection Label: 32, Confidence Threshold: 849.21875 %
Detection Label: 32, Confidence Threshold: 897.65625 %
Detection Label: 32, Confidence Threshold: 855.46875 %
Detection Label: 68, Confidence Threshold: 1005.46875 %
Detection Label: 32, Confidence Threshold: 850.78125 %
Detection Label: 32, Confidence Threshold: 1046.875 %
Detection Label: 32, Confidence Threshold: 1030.46875 %
Detection Label: 50, Confidence Threshold: 5017600.0 %
Detection Label: 32, Confidence Threshold: 973.4375 %
Detection Label: 68, Confidence Threshold: 5840000.0 %

I was expecting my Detection Label to be between 0-1 and the confidence threshold to be at a more reasonable range. The model was properly detecting when I was using the NeuralNetwork node using the blob converted byblobconverter + decoding on the computer.

However, on-device decoding seems to be causing these issues when I am doing the luxonis blob converter.

I have also tried training a yolov8n using the official ultralytics framework but that has yielded exactly the same results.

Can you confirm if the luxonis blob converter supports custom models where the number of classes is different? Do you have a clue why this might be happening and/or if there is any workaround?

Thanks in advance!

jakaskerl

Hi @suhailnajeeb

detectionNetwork.setNumClasses(80)
detectionNetwork.setCoordinateSize(4)
detectionNetwork.setAnchors([10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319])
detectionNetwork.setAnchorMasks({"side26": [1, 2, 3], "side13": [3, 4, 5]})

You don't need the bottom three for v8, and make sure to set numClasses to 2.

Thanks,
Jaka