U
u111s

  • May 23, 2024
  • Joined Mar 19, 2024
  • 1 best answer
  • HI @u111s

    import cv2
    import numpy as np
    import depthai as dai
    import time
    from YOLOSeg import YOLOSeg
    
    pathYoloBlob = "./yolov8n-seg.blob"
    
    # Create OAK-D pipeline
    pipeline = dai.Pipeline()
    
    # Setup color camera
    cam_rgb = pipeline.createColorCamera()
    cam_rgb.setPreviewSize(640, 640)
    cam_rgb.setInterleaved(False)
    
    # Setup depth
    stereo = pipeline.createStereoDepth()
    left = pipeline.createMonoCamera()
    right = pipeline.createMonoCamera()
    
    left.setBoardSocket(dai.CameraBoardSocket.LEFT)
    right.setBoardSocket(dai.CameraBoardSocket.RIGHT)
    stereo.setConfidenceThreshold(255)
    
    left.out.link(stereo.left)
    right.out.link(stereo.right)
    
    # Setup neural network
    nn = pipeline.createNeuralNetwork()
    nn.setBlobPath(pathYoloBlob)
    cam_rgb.preview.link(nn.input)
    
    # Setup output streams
    xout_rgb = pipeline.createXLinkOut()
    xout_rgb.setStreamName("rgb")
    cam_rgb.preview.link(xout_rgb.input)
    
    xout_nn_yolo = pipeline.createXLinkOut()
    xout_nn_yolo.setStreamName("nn_yolo")
    nn.out.link(xout_nn_yolo.input)
    
    xout_depth = pipeline.createXLinkOut()
    xout_depth.setStreamName("depth")
    stereo.depth.link(xout_depth.input)
    
    # Start application
    with dai.Device(pipeline) as device:
    
        q_rgb = device.getOutputQueue("rgb")
        q_nn_yolo = device.getOutputQueue("nn_yolo")
        q_depth = device.getOutputQueue("depth", maxSize=4, blocking=False)
    
        while True:
            in_rgb = q_rgb.tryGet()
            in_nn_yolo = q_nn_yolo.tryGet()
            in_depth = q_depth.tryGet()
    
            if in_rgb is not None:
                frame = in_rgb.getCvFrame()
                depth_frame = in_depth.getFrame() if in_depth is not None else None
    
                if in_nn_yolo is not None:
                    # Assuming you have the segmented output and depth frame
                    # You can now overlay segmentation mask on the depth frame or calculate depth for segmented objects
    
                    # Placeholder for YOLOSeg processing
                    # (Your existing code to obtain combined_img)
    
                    if depth_frame is not None:
                        # Assuming the depth map and color frames are aligned
                        # You can fetch depth for specific objects here
                        # For example, fetching depth at the center of an object detected by YOLO:
                        for obj in detected_objects:  # Assuming detected_objects are obtained from YOLOSeg
                            x_center = obj["x_center"]
                            y_center = obj["y_center"]
                            depth = depth_frame[y_center, x_center]
                            print(f"Depth at center of object: {depth} mm")
    
                    cv2.imshow("Output", combined_img)
                    
                else:
                    print("in_nn_yolo EMPTY")
    
            else:
                print("in_rgb EMPTY")
    
            # Exit logic
            if cv2.waitKey(1) == ord('q'):
                break
    • Hi @u111s

      u111s I followed the steps from the discussion (link) and able to run the segment model with output but the inference is found to be slow.

      That is because the decoding (segmentation) part runs on host computer and is notoriously expensive to run; as opposed to standard detection.

      As Erik said in the post, the idea is to combine depth and segmentation on host after decoding is done. If the depth is aligned to color, you should have no trouble overlaying the segmentation results (image) over depth image. It should also not impact performance much since the depth algorithms run on-device.

      Thanks,
      Jaka

      • Hi @u111s ,
        If you have depth aligned to color stream, and do segmetnation on color stream, you could overlay segmentation results on depth stream. If you do that, you have a mask and depth info, by combining them you'd get only depth points of the segmented class. Then you could take eg. median depth pixel (or some smarter approach) to get Z of the segmented class.
        Thougths?
        Thanks, Erik

      • Hi @u111s ,

        So far I have no developments regarding running inference with YOLO-based instance segmentation models in OAK devices. The only approach that I've found was this one.