U
u111s

daniqsilva · Oct 25, 2024

Hi @EmilioMachado @u111s @jakaskerl @JanCuhel I've managed to decode the segmentation outputs of YOLOv5, YOLOv8, YOLOv9 and YOLO11 on the host side using OAK devices. You can check the my PR here.

jakaskerl · Mar 26, 2024

@u111s
Awesome, thanks!

jakaskerl · Mar 20, 2024

HI @u111s

import cv2
import numpy as np
import depthai as dai
import time
from YOLOSeg import YOLOSeg

pathYoloBlob = "./yolov8n-seg.blob"

# Create OAK-D pipeline
pipeline = dai.Pipeline()

# Setup color camera
cam_rgb = pipeline.createColorCamera()
cam_rgb.setPreviewSize(640, 640)
cam_rgb.setInterleaved(False)

# Setup depth
stereo = pipeline.createStereoDepth()
left = pipeline.createMonoCamera()
right = pipeline.createMonoCamera()

left.setBoardSocket(dai.CameraBoardSocket.LEFT)
right.setBoardSocket(dai.CameraBoardSocket.RIGHT)
stereo.setConfidenceThreshold(255)

left.out.link(stereo.left)
right.out.link(stereo.right)

# Setup neural network
nn = pipeline.createNeuralNetwork()
nn.setBlobPath(pathYoloBlob)
cam_rgb.preview.link(nn.input)

# Setup output streams
xout_rgb = pipeline.createXLinkOut()
xout_rgb.setStreamName("rgb")
cam_rgb.preview.link(xout_rgb.input)

xout_nn_yolo = pipeline.createXLinkOut()
xout_nn_yolo.setStreamName("nn_yolo")
nn.out.link(xout_nn_yolo.input)

xout_depth = pipeline.createXLinkOut()
xout_depth.setStreamName("depth")
stereo.depth.link(xout_depth.input)

# Start application
with dai.Device(pipeline) as device:

    q_rgb = device.getOutputQueue("rgb")
    q_nn_yolo = device.getOutputQueue("nn_yolo")
    q_depth = device.getOutputQueue("depth", maxSize=4, blocking=False)

    while True:
        in_rgb = q_rgb.tryGet()
        in_nn_yolo = q_nn_yolo.tryGet()
        in_depth = q_depth.tryGet()

        if in_rgb is not None:
            frame = in_rgb.getCvFrame()
            depth_frame = in_depth.getFrame() if in_depth is not None else None

            if in_nn_yolo is not None:
                # Assuming you have the segmented output and depth frame
                # You can now overlay segmentation mask on the depth frame or calculate depth for segmented objects

                # Placeholder for YOLOSeg processing
                # (Your existing code to obtain combined_img)

                if depth_frame is not None:
                    # Assuming the depth map and color frames are aligned
                    # You can fetch depth for specific objects here
                    # For example, fetching depth at the center of an object detected by YOLO:
                    for obj in detected_objects:  # Assuming detected_objects are obtained from YOLOSeg
                        x_center = obj["x_center"]
                        y_center = obj["y_center"]
                        depth = depth_frame[y_center, x_center]
                        print(f"Depth at center of object: {depth} mm")

                cv2.imshow("Output", combined_img)
                
            else:
                print("in_nn_yolo EMPTY")

        else:
            print("in_rgb EMPTY")

        # Exit logic
        if cv2.waitKey(1) == ord('q'):
            break

jakaskerl · Mar 19, 2024

Hi @u111s

u111s I followed the steps from the discussion (link) and able to run the segment model with output but the inference is found to be slow.

That is because the decoding (segmentation) part runs on host computer and is notoriously expensive to run; as opposed to standard detection.

As Erik said in the post, the idea is to combine depth and segmentation on host after decoding is done. If the depth is aligned to color, you should have no trouble overlaying the segmentation results (image) over depth image. It should also not impact performance much since the depth algorithms run on-device.

Thanks,
Jaka

erik · Mar 19, 2024

Hi @u111s ,
If you have depth aligned to color stream, and do segmetnation on color stream, you could overlay segmentation results on depth stream. If you do that, you have a mask and depth info, by combining them you'd get only depth points of the segmented class. Then you could take eg. median depth pixel (or some smarter approach) to get Z of the segmented class.
Thougths?
Thanks, Erik

daniqsilva · Mar 19, 2024

Hi @u111s ,

So far I have no developments regarding running inference with YOLO-based instance segmentation models in OAK devices. The only approach that I've found was this one.

Uu111s

U
u111s