• DepthAI-v2
  • How to use YOLOv5n-seg for object segmentation with OAK1/OAK-D?

Hi, I successfully converted a COCO checkpoint of YOLOv5n-seg using http://tools.luxonis.com/. Then, I developed a code for handling OAK with the converted .blob file, following this tutorial/example. To parse the YOLO model, I am using the NeuralNetwork node from DepthAI. The code seems fine, but I can't obtain the segmentation masks from the model's inference.

I get three layers named "output1_yolov5", "output2_yolov5" and "output3_yolov5", after running getAllLayerNames(), and when I use getLayerFp16("output3_yolov5") for instance, I obtain a list of 140400 elements and I don't have a clue what their meaning is.

So my question is how can we use a YOLOv5-seg model in the OAK device and got the masks out of it? Is this even possible with YOLO models? I saw that someone did this with DeepLabV3+, but I'm really interested in using YOLOv5/YOLOv8 models.

Thank you in advance.

    6 days later

    Hi @jakaskerl , I've tried to follow your suggestion but I got nothing new. The three output layers that I mentioned before have the following shapes (2246400,) (561600,) (140400,), respectively. And when I try to reshape them as shown in the code you mentioned:
    detections = output.reshape((num_detections, num_values_per_detection))

    I got the error "ValueError: cannot reshape array of size 2246400 into shape (26428,85)".

    I believe the reason behind this issue is that the example you provided is for object detection and what I'm trying to do here is for object segmentation with YOLOv5-seg. In fact, this hypothesis was corroborated since I've tested the same code with YOLOv5 for object detection and it works.

    Do you have any idea how object segmentation with YOLOv5-seg can be integrated and run successfully in OAK?

    Thanks for the help.

    Hi @daniqsilva
    I'm not really too familiar with decoding but try this (looks sensible to me):

    import numpy as np
    import cv2
    
    def upsample_and_combine(outputs, img_width, img_height, num_classes):
        # Upsample each output to the original image size
        upsampled_outputs = [cv2.resize(output, (img_width, img_height), interpolation=cv2.INTER_NEAREST) for output in outputs]
        
        # Combine the upsampled outputs
        # This example simply averages the outputs, but other techniques could be applied
        combined_output = np.mean(upsampled_outputs, axis=0)
        
        return combined_output
    
    def decode_segmentation(output, img_width, img_height, num_classes):
        class_predictions = np.argmax(output, axis=-1)
        segmentation_map = np.zeros((output.shape[0], output.shape[1], 3), dtype=np.uint8)
        colors = np.random.randint(0, 255, size=(num_classes, 3), dtype=np.uint8)
        for class_id in range(num_classes):
            segmentation_map[class_predictions == class_id] = colors[class_id]
        segmentation_map_resized = cv2.resize(segmentation_map, (img_width, img_height), interpolation=cv2.INTER_NEAREST)
        return segmentation_map_resized
    
    # Assuming outputs is a list of the three outputs from YOLOv5-seg, each being a (H, W, C) tensor
    outputs = [output1, output2, output3]  # Placeholder for actual model outputs
    num_classes = 20  # Example number of classes
    img_width, img_height = 1280, 720  # Example dimensions
    
    # Process the outputs
    combined_output = upsample_and_combine(outputs, img_width, img_height, num_classes)
    segmentation_map = decode_segmentation(combined_output, img_width, img_height, num_classes)
    
    # segmentation_map now holds the final segmentation result

    Thanks,
    Jaka

    Hi @jakaskerl

    The output generated by the code sample you provided (segmentation_map) holds a OpenCV Mat of a single color (like a mask that covers the entire frame), instead of drawing the individual segmentation masks surrounding the objects in the scene.

    5 days later

    Hi @daniqsilva,

    apologies for the delay in our response. Currently our tools support the conversion of only object detection models. We are also actively working on supporting instance segmentation Yolo models. However, we don't have any ETA yet. Could you please elaborate on how you converted the model using our tools? Did you edit the model in some way?

    Best,
    Jan

    Hi @JanCuhel,

    I converted the model using http://tools.luxonis.com. I didn't edit the model in any way. I assumed that segmentation was supported at least for YOLOv5 since, in your conversion tool, when I check the "Yolo version" list, YOLOv5 appears without the "detection only" label like YOLOv7 or YOLOv8.

    Best regards,

    Daniel

    Hi @daniqsilva,

    I see; it's interesting that the conversion didn't fail.

    I apologize, but as I said, currently, http://tools.luxonis.com doesn't support the conversion of segmentation models for any Yolo version. We are actively working on supporting it. As soon as we add it, we'll let you know. Until then, unfortunately, you will have to convert the model partially manually with the help of our blobconverter or use a different segmentation model.

    I am very sorry for the unclear description of the conversion options and that I can't help you more now.

    Best,
    Jan

    Hi @JanCuhel,

    Thank you for your feedback!

    I will be waiting for an update on this matter. In the meantime, I will try other approaches.

    Best regards,

    Daniel

    a month later

    Hi @daniqsilva @JanCuhel ,

    Did you find an alternate approach to run instance segmentation models in the OAK devices? Is there any other model architecture that is supported by oak (which can be loaded with a model trained on custom data)? I also want to get the depth (z - axis) value along with the center (X, Y) points and segmentation masks array. By when could the on device support for yolo segment models be included @JanCuhel ?

    Hi @u111s ,

    So far I have no developments regarding running inference with YOLO-based instance segmentation models in OAK devices. The only approach that I've found was this one.

      Hi!

      Maybe it will help:

      "The only approach that I've found was this one.">>> Here use tensorflow 1.15 and below version 2 is no longer supported in google colab. Last TF version (2.16) it is not possible to convert with openvino and blobconverter. It's an eternal labyrinth

      Pytorch is the only option for Mobilenet and depthai.

      The ideal is yolo5
      It works better than mobilenet, it is quick to train and convert with tool.luxonis

        EmilioMachado

        Thank you for the video suggestion!

        Yeah, YOLOv5 or YOLOv8 are preferable for the reasons you mentioned, but as @JanCuhel said earlier in this thread, so far tool.luxonis.com is unable to convert YOLO segmentation models for OAK devices.

        7 months later