• DepthAI
  • Support for yolov8 instance segmentation models along with depth information.

Greetings @erik ,

I have an Oak D Pro Poe device. I want to run my yolov8 instance segmentation model trained on my custom dataset. But at the same time I want to get the depth distance from the object detected.

I followed the steps from the discussion (link) and able to run the segment model with output but the inference is found to be slow. Also, there doesn't seem to be a Spatial detection to get the depth.

I would like to know whether the depthai-sdk has support for any other instance segmentation model which also returns depth.

Note: I would like to run inference on a model trained with custom dataset.

  • jakaskerl replied to this.
  • @jakaskerl @pedro-UCA

    Thanks for your support. I have successfully merged all your code and now I can retrieve masks and also get depth in the specified region.

    I have put the working code in the following repository. You can navigate others in need of this to this repo. Thanks

    tirandazi/depthai-yolov8-segment

    Hi @u111s

    u111s I followed the steps from the discussion (link) and able to run the segment model with output but the inference is found to be slow.

    That is because the decoding (segmentation) part runs on host computer and is notoriously expensive to run; as opposed to standard detection.

    As Erik said in the post, the idea is to combine depth and segmentation on host after decoding is done. If the depth is aligned to color, you should have no trouble overlaying the segmentation results (image) over depth image. It should also not impact performance much since the depth algorithms run on-device.

    Thanks,
    Jaka

      jakaskerl

      Thanks for the response.

      Could you please provide a sample snippet to combine and retrieve segmentation with depth?

      HI @u111s

      import cv2
      import numpy as np
      import depthai as dai
      import time
      from YOLOSeg import YOLOSeg
      
      pathYoloBlob = "./yolov8n-seg.blob"
      
      # Create OAK-D pipeline
      pipeline = dai.Pipeline()
      
      # Setup color camera
      cam_rgb = pipeline.createColorCamera()
      cam_rgb.setPreviewSize(640, 640)
      cam_rgb.setInterleaved(False)
      
      # Setup depth
      stereo = pipeline.createStereoDepth()
      left = pipeline.createMonoCamera()
      right = pipeline.createMonoCamera()
      
      left.setBoardSocket(dai.CameraBoardSocket.LEFT)
      right.setBoardSocket(dai.CameraBoardSocket.RIGHT)
      stereo.setConfidenceThreshold(255)
      
      left.out.link(stereo.left)
      right.out.link(stereo.right)
      
      # Setup neural network
      nn = pipeline.createNeuralNetwork()
      nn.setBlobPath(pathYoloBlob)
      cam_rgb.preview.link(nn.input)
      
      # Setup output streams
      xout_rgb = pipeline.createXLinkOut()
      xout_rgb.setStreamName("rgb")
      cam_rgb.preview.link(xout_rgb.input)
      
      xout_nn_yolo = pipeline.createXLinkOut()
      xout_nn_yolo.setStreamName("nn_yolo")
      nn.out.link(xout_nn_yolo.input)
      
      xout_depth = pipeline.createXLinkOut()
      xout_depth.setStreamName("depth")
      stereo.depth.link(xout_depth.input)
      
      # Start application
      with dai.Device(pipeline) as device:
      
          q_rgb = device.getOutputQueue("rgb")
          q_nn_yolo = device.getOutputQueue("nn_yolo")
          q_depth = device.getOutputQueue("depth", maxSize=4, blocking=False)
      
          while True:
              in_rgb = q_rgb.tryGet()
              in_nn_yolo = q_nn_yolo.tryGet()
              in_depth = q_depth.tryGet()
      
              if in_rgb is not None:
                  frame = in_rgb.getCvFrame()
                  depth_frame = in_depth.getFrame() if in_depth is not None else None
      
                  if in_nn_yolo is not None:
                      # Assuming you have the segmented output and depth frame
                      # You can now overlay segmentation mask on the depth frame or calculate depth for segmented objects
      
                      # Placeholder for YOLOSeg processing
                      # (Your existing code to obtain combined_img)
      
                      if depth_frame is not None:
                          # Assuming the depth map and color frames are aligned
                          # You can fetch depth for specific objects here
                          # For example, fetching depth at the center of an object detected by YOLO:
                          for obj in detected_objects:  # Assuming detected_objects are obtained from YOLOSeg
                              x_center = obj["x_center"]
                              y_center = obj["y_center"]
                              depth = depth_frame[y_center, x_center]
                              print(f"Depth at center of object: {depth} mm")
      
                      cv2.imshow("Output", combined_img)
                      
                  else:
                      print("in_nn_yolo EMPTY")
      
              else:
                  print("in_rgb EMPTY")
      
              # Exit logic
              if cv2.waitKey(1) == ord('q'):
                  break
        5 days later
        8 days later

        Hii!!!, I am trying to run my custom data with two classes but the segmentation is terrible compared to the original one from yolo8 before transforming the file .pt to .blob. I have followed the steps discussed in this blog and tried with different image sizes.

        Any thoughts? thanks.

        @jakaskerl

        I mean, the model basically doesn't segment the objects accurately (it almost doesn't detect them). I have tried the same model, without the oak, in .onxx format and it works correctly, maybe the problem is when I transform it into .blob.

        However, I have tried to follow the same steps with the model "yolov8n-seg.pt" and the segmentation does not give me any problem.

          DavidMeiraPliego

          We are looking to add native support for instance segmentation to DepthAI, so we will be able to take a better look at the issue then.

          In the meantime:

          1. Do you follow the same steps to create the blob, including passing exactly the same flags?
          2. If yes, there are a lot of reasons something could go wrong. The first thing I would do is take a look at the confidence thresholds in Pytorch and the script you use with the camera. Do they match or is one higher than the other?

            Matija

            Yes, i followed the same steps to create the .blob and the thresholds match. I've been trying to modify it in the script but the result is the same.

            Do you have an approximate date for the implementation of instance segmentation to DepthA?

              Have you exported it for the same input shape? Does it help if you reduce the thresholds

              DavidMeiraPliego