Hi @EmilioMachado @u111s @jakaskerl @JanCuhel I've managed to decode the segmentation outputs of YOLOv5, YOLOv8, YOLOv9 and YOLO11 on the host side using OAK devices. You can check the my PR here.
Uu111s
- May 23, 2024
- Joined Mar 19, 2024
- 1 best answer
HI @u111s
import cv2 import numpy as np import depthai as dai import time from YOLOSeg import YOLOSeg pathYoloBlob = "./yolov8n-seg.blob" # Create OAK-D pipeline pipeline = dai.Pipeline() # Setup color camera cam_rgb = pipeline.createColorCamera() cam_rgb.setPreviewSize(640, 640) cam_rgb.setInterleaved(False) # Setup depth stereo = pipeline.createStereoDepth() left = pipeline.createMonoCamera() right = pipeline.createMonoCamera() left.setBoardSocket(dai.CameraBoardSocket.LEFT) right.setBoardSocket(dai.CameraBoardSocket.RIGHT) stereo.setConfidenceThreshold(255) left.out.link(stereo.left) right.out.link(stereo.right) # Setup neural network nn = pipeline.createNeuralNetwork() nn.setBlobPath(pathYoloBlob) cam_rgb.preview.link(nn.input) # Setup output streams xout_rgb = pipeline.createXLinkOut() xout_rgb.setStreamName("rgb") cam_rgb.preview.link(xout_rgb.input) xout_nn_yolo = pipeline.createXLinkOut() xout_nn_yolo.setStreamName("nn_yolo") nn.out.link(xout_nn_yolo.input) xout_depth = pipeline.createXLinkOut() xout_depth.setStreamName("depth") stereo.depth.link(xout_depth.input) # Start application with dai.Device(pipeline) as device: q_rgb = device.getOutputQueue("rgb") q_nn_yolo = device.getOutputQueue("nn_yolo") q_depth = device.getOutputQueue("depth", maxSize=4, blocking=False) while True: in_rgb = q_rgb.tryGet() in_nn_yolo = q_nn_yolo.tryGet() in_depth = q_depth.tryGet() if in_rgb is not None: frame = in_rgb.getCvFrame() depth_frame = in_depth.getFrame() if in_depth is not None else None if in_nn_yolo is not None: # Assuming you have the segmented output and depth frame # You can now overlay segmentation mask on the depth frame or calculate depth for segmented objects # Placeholder for YOLOSeg processing # (Your existing code to obtain combined_img) if depth_frame is not None: # Assuming the depth map and color frames are aligned # You can fetch depth for specific objects here # For example, fetching depth at the center of an object detected by YOLO: for obj in detected_objects: # Assuming detected_objects are obtained from YOLOSeg x_center = obj["x_center"] y_center = obj["y_center"] depth = depth_frame[y_center, x_center] print(f"Depth at center of object: {depth} mm") cv2.imshow("Output", combined_img) else: print("in_nn_yolo EMPTY") else: print("in_rgb EMPTY") # Exit logic if cv2.waitKey(1) == ord('q'): break
Hi @u111s
That is because the decoding (segmentation) part runs on host computer and is notoriously expensive to run; as opposed to standard detection.
As Erik said in the post, the idea is to combine depth and segmentation on host after decoding is done. If the depth is aligned to color, you should have no trouble overlaying the segmentation results (image) over depth image. It should also not impact performance much since the depth algorithms run on-device.
Thanks,
JakaHi @u111s ,
If you have depth aligned to color stream, and do segmetnation on color stream, you could overlay segmentation results on depth stream. If you do that, you have a mask and depth info, by combining them you'd get only depth points of the segmented class. Then you could take eg. median depth pixel (or some smarter approach) to get Z of the segmented class.
Thougths?
Thanks, Erik