Support for yolov8 instance segmentation models along with depth information.

Uu111s · Mar 19, 2024

Greetings @erik ,

I have an Oak D Pro Poe device. I want to run my yolov8 instance segmentation model trained on my custom dataset. But at the same time I want to get the depth distance from the object detected.

I followed the steps from the discussion (link) and able to run the segment model with output but the inference is found to be slow. Also, there doesn't seem to be a Spatial detection to get the depth.

I would like to know whether the depthai-sdk has support for any other instance segmentation model which also returns depth.

Note: I would like to run inference on a model trained with custom dataset.

jakaskerl · Mar 19, 2024

Hi @u111s

u111s I followed the steps from the discussion (link) and able to run the segment model with output but the inference is found to be slow.

That is because the decoding (segmentation) part runs on host computer and is notoriously expensive to run; as opposed to standard detection.

As Erik said in the post, the idea is to combine depth and segmentation on host after decoding is done. If the depth is aligned to color, you should have no trouble overlaying the segmentation results (image) over depth image. It should also not impact performance much since the depth algorithms run on-device.

Thanks,
Jaka

Uu111s · Mar 20, 2024

jakaskerl

Thanks for the response.

Could you please provide a sample snippet to combine and retrieve segmentation with depth?

jakaskerl · Mar 20, 2024

HI @u111s

import cv2
import numpy as np
import depthai as dai
import time
from YOLOSeg import YOLOSeg

pathYoloBlob = "./yolov8n-seg.blob"

# Create OAK-D pipeline
pipeline = dai.Pipeline()

# Setup color camera
cam_rgb = pipeline.createColorCamera()
cam_rgb.setPreviewSize(640, 640)
cam_rgb.setInterleaved(False)

# Setup depth
stereo = pipeline.createStereoDepth()
left = pipeline.createMonoCamera()
right = pipeline.createMonoCamera()

left.setBoardSocket(dai.CameraBoardSocket.LEFT)
right.setBoardSocket(dai.CameraBoardSocket.RIGHT)
stereo.setConfidenceThreshold(255)

left.out.link(stereo.left)
right.out.link(stereo.right)

# Setup neural network
nn = pipeline.createNeuralNetwork()
nn.setBlobPath(pathYoloBlob)
cam_rgb.preview.link(nn.input)

# Setup output streams
xout_rgb = pipeline.createXLinkOut()
xout_rgb.setStreamName("rgb")
cam_rgb.preview.link(xout_rgb.input)

xout_nn_yolo = pipeline.createXLinkOut()
xout_nn_yolo.setStreamName("nn_yolo")
nn.out.link(xout_nn_yolo.input)

xout_depth = pipeline.createXLinkOut()
xout_depth.setStreamName("depth")
stereo.depth.link(xout_depth.input)

# Start application
with dai.Device(pipeline) as device:

    q_rgb = device.getOutputQueue("rgb")
    q_nn_yolo = device.getOutputQueue("nn_yolo")
    q_depth = device.getOutputQueue("depth", maxSize=4, blocking=False)

    while True:
        in_rgb = q_rgb.tryGet()
        in_nn_yolo = q_nn_yolo.tryGet()
        in_depth = q_depth.tryGet()

        if in_rgb is not None:
            frame = in_rgb.getCvFrame()
            depth_frame = in_depth.getFrame() if in_depth is not None else None

            if in_nn_yolo is not None:
                # Assuming you have the segmented output and depth frame
                # You can now overlay segmentation mask on the depth frame or calculate depth for segmented objects

                # Placeholder for YOLOSeg processing
                # (Your existing code to obtain combined_img)

                if depth_frame is not None:
                    # Assuming the depth map and color frames are aligned
                    # You can fetch depth for specific objects here
                    # For example, fetching depth at the center of an object detected by YOLO:
                    for obj in detected_objects:  # Assuming detected_objects are obtained from YOLOSeg
                        x_center = obj["x_center"]
                        y_center = obj["y_center"]
                        depth = depth_frame[y_center, x_center]
                        print(f"Depth at center of object: {depth} mm")

                cv2.imshow("Output", combined_img)
                
            else:
                print("in_nn_yolo EMPTY")

        else:
            print("in_rgb EMPTY")

        # Exit logic
        if cv2.waitKey(1) == ord('q'):
            break

Uu111s · Mar 21, 2024

jakaskerl

Thanks. Will try and let you know.

Uu111s · Mar 26, 2024

@jakaskerl @pedro-UCA

Thanks for your support. I have successfully merged all your code and now I can retrieve masks and also get depth in the specified region.

I have put the working code in the following repository. You can navigate others in need of this to this repo. Thanks

tirandazi/depthai-yolov8-segment

jakaskerl · Mar 26, 2024

@u111s
Awesome, thanks!

DavidMeiraPliego · Apr 3, 2024

Hii!!!, I am trying to run my custom data with two classes but the segmentation is terrible compared to the original one from yolo8 before transforming the file .pt to .blob. I have followed the steps discussed in this blog and tried with different image sizes.

Any thoughts? thanks.

jakaskerl · Apr 3, 2024

@DavidMeiraPliego
Terrible in what way? Could you add some screenshots?

DavidMeiraPliego · Apr 4, 2024

@jakaskerl

I mean, the model basically doesn't segment the objects accurately (it almost doesn't detect them). I have tried the same model, without the oak, in .onxx format and it works correctly, maybe the problem is when I transform it into .blob.

However, I have tried to follow the same steps with the model "yolov8n-seg.pt" and the segmentation does not give me any problem.

jakaskerl · Apr 4, 2024

cc @Matija

Matija · Apr 5, 2024

DavidMeiraPliego

We are looking to add native support for instance segmentation to DepthAI, so we will be able to take a better look at the issue then.

In the meantime:

Do you follow the same steps to create the blob, including passing exactly the same flags?
If yes, there are a lot of reasons something could go wrong. The first thing I would do is take a look at the confidence thresholds in Pytorch and the script you use with the camera. Do they match or is one higher than the other?

DavidMeiraPliego · Apr 5, 2024

Matija

Yes, i followed the same steps to create the .blob and the thresholds match. I've been trying to modify it in the script but the result is the same.

Do you have an approximate date for the implementation of instance segmentation to DepthA?

Matija · Apr 5, 2024

Have you exported it for the same input shape? Does it help if you reduce the thresholds

DavidMeiraPliego