How to use YOLOv5n-seg for object segmentation with OAK1/OAK-D?

daniqsilva · Jan 31, 2024

Hi, I successfully converted a COCO checkpoint of YOLOv5n-seg using http://tools.luxonis.com/. Then, I developed a code for handling OAK with the converted .blob file, following this tutorial/example. To parse the YOLO model, I am using the NeuralNetwork node from DepthAI. The code seems fine, but I can't obtain the segmentation masks from the model's inference.

I get three layers named "output1_yolov5", "output2_yolov5" and "output3_yolov5", after running getAllLayerNames(), and when I use getLayerFp16("output3_yolov5") for instance, I obtain a list of 140400 elements and I don't have a clue what their meaning is.

So my question is how can we use a YOLOv5-seg model in the OAK device and got the masks out of it? Is this even possible with YOLO models? I saw that someone did this with DeepLabV3+, but I'm really interested in using YOLOv5/YOLOv8 models.

Thank you in advance.

jakaskerl · Jan 31, 2024

daniqsilva I get three layers named "output1_yolov5", "output2_yolov5" and "output3_yolov5", after running getAllLayerNames(), and when I use getLayerFp16("output3_yolov5") for instance, I obtain a list of 140400 elements and I don't have a clue what their meaning is.

https://discuss.luxonis.com/d/2484-problem-using-yolov5-detectionsegmentation-custom-model-on-host/12

Hope this helps a little bit.

Thanks,
Jaka

daniqsilva · Feb 6, 2024

Hi @jakaskerl , I've tried to follow your suggestion but I got nothing new. The three output layers that I mentioned before have the following shapes (2246400,) (561600,) (140400,), respectively. And when I try to reshape them as shown in the code you mentioned:
detections = output.reshape((num_detections, num_values_per_detection))

I got the error "ValueError: cannot reshape array of size 2246400 into shape (26428,85)".

I believe the reason behind this issue is that the example you provided is for object detection and what I'm trying to do here is for object segmentation with YOLOv5-seg. In fact, this hypothesis was corroborated since I've tested the same code with YOLOv5 for object detection and it works.

Do you have any idea how object segmentation with YOLOv5-seg can be integrated and run successfully in OAK?

Thanks for the help.

jakaskerl · Feb 6, 2024

Hi @daniqsilva
I'm not really too familiar with decoding but try this (looks sensible to me):

import numpy as np
import cv2

def upsample_and_combine(outputs, img_width, img_height, num_classes):
    # Upsample each output to the original image size
    upsampled_outputs = [cv2.resize(output, (img_width, img_height), interpolation=cv2.INTER_NEAREST) for output in outputs]
    
    # Combine the upsampled outputs
    # This example simply averages the outputs, but other techniques could be applied
    combined_output = np.mean(upsampled_outputs, axis=0)
    
    return combined_output

def decode_segmentation(output, img_width, img_height, num_classes):
    class_predictions = np.argmax(output, axis=-1)
    segmentation_map = np.zeros((output.shape[0], output.shape[1], 3), dtype=np.uint8)
    colors = np.random.randint(0, 255, size=(num_classes, 3), dtype=np.uint8)
    for class_id in range(num_classes):
        segmentation_map[class_predictions == class_id] = colors[class_id]
    segmentation_map_resized = cv2.resize(segmentation_map, (img_width, img_height), interpolation=cv2.INTER_NEAREST)
    return segmentation_map_resized

# Assuming outputs is a list of the three outputs from YOLOv5-seg, each being a (H, W, C) tensor
outputs = [output1, output2, output3]  # Placeholder for actual model outputs
num_classes = 20  # Example number of classes
img_width, img_height = 1280, 720  # Example dimensions

# Process the outputs
combined_output = upsample_and_combine(outputs, img_width, img_height, num_classes)
segmentation_map = decode_segmentation(combined_output, img_width, img_height, num_classes)

# segmentation_map now holds the final segmentation result

Thanks,
Jaka

daniqsilva · Feb 6, 2024

Hi @jakaskerl

The output generated by the code sample you provided (segmentation_map) holds a OpenCV Mat of a single color (like a mask that covers the entire frame), instead of drawing the individual segmentation masks surrounding the objects in the scene.

JanCuhel · Feb 11, 2024

Hi @daniqsilva,

apologies for the delay in our response. Currently our tools support the conversion of only object detection models. We are also actively working on supporting instance segmentation Yolo models. However, we don't have any ETA yet. Could you please elaborate on how you converted the model using our tools? Did you edit the model in some way?

Best,
Jan

daniqsilva · Feb 12, 2024

Hi @JanCuhel,

I converted the model using http://tools.luxonis.com. I didn't edit the model in any way. I assumed that segmentation was supported at least for YOLOv5 since, in your conversion tool, when I check the "Yolo version" list, YOLOv5 appears without the "detection only" label like YOLOv7 or YOLOv8.

Best regards,

Daniel

JanCuhel · Feb 13, 2024

Hi @daniqsilva,

I see; it's interesting that the conversion didn't fail.

I apologize, but as I said, currently, http://tools.luxonis.com doesn't support the conversion of segmentation models for any Yolo version. We are actively working on supporting it. As soon as we add it, we'll let you know. Until then, unfortunately, you will have to convert the model partially manually with the help of our blobconverter or use a different segmentation model.

I am very sorry for the unclear description of the conversion options and that I can't help you more now.

Best,
Jan

daniqsilva · Feb 13, 2024

Hi @JanCuhel,

Thank you for your feedback!

I will be waiting for an update on this matter. In the meantime, I will try other approaches.

Best regards,

Daniel

Uu111s · Mar 19, 2024

Hi @daniqsilva @JanCuhel ,

Did you find an alternate approach to run instance segmentation models in the OAK devices? Is there any other model architecture that is supported by oak (which can be loaded with a model trained on custom data)? I also want to get the depth (z - axis) value along with the center (X, Y) points and segmentation masks array. By when could the on device support for yolo segment models be included @JanCuhel ?

daniqsilva · Mar 19, 2024

Hi @u111s ,

So far I have no developments regarding running inference with YOLO-based instance segmentation models in OAK devices. The only approach that I've found was this one.

Uu111s · Mar 19, 2024

daniqsilva

Does that also give us the depth calibrated distance?

daniqsilva · Mar 19, 2024

u111s

I don't think so, but with an OAK-D you can have that.

EmilioMachado · Mar 19, 2024

Hi!

Maybe it will help:

"The only approach that I've found was this one.">>> Here use tensorflow 1.15 and below version 2 is no longer supported in google colab. Last TF version (2.16) it is not possible to convert with openvino and blobconverter. It's an eternal labyrinth

Pytorch is the only option for Mobilenet and depthai.

The ideal is yolo5
It works better than mobilenet, it is quick to train and convert with tool.luxonis

daniqsilva · Mar 19, 2024

EmilioMachado

Thank you for the video suggestion!

Yeah, YOLOv5 or YOLOv8 are preferable for the reasons you mentioned, but as @JanCuhel said earlier in this thread, so far tool.luxonis.com is unable to convert YOLO segmentation models for OAK devices.

daniqsilva · Oct 25, 2024

Hi @EmilioMachado @u111s @jakaskerl @JanCuhel I've managed to decode the segmentation outputs of YOLOv5, YOLOv8, YOLOv9 and YOLO11 on the host side using OAK devices. You can check the my PR here.

EmilioMachado · Oct 27, 2024

Hi @daniqsilva , 1000 thanks!!! excellent work, thanks for sharing!!!