• DepthAI
  • OAK-D Absolute Depth Precision

Hello,

I am working on an application using the OAK-D camera, for detecting circles in a scene and calculating their distance from the camera, for a robotic application. I am trying several setups, but I keep getting a big error in terms of absolute distance.

In red you can see the object of which I want to find the distance. The object is at 92 cm, but from the disparity algorithm I get 97 cm. I get this value by first finding the coordinate of the center of the red blob, then getting the relevant disparity value in pixel from the disparity array and findally calculating the distance using the formula you provided in the FAQs. This error is not consistent at all distances (the closer the object, the lower the error and viceversa).

I also attach the disparity map.

It looks like the camera is not able to tell the roll of tape apart from the background white box, and this might cause the big error. Is this an intrinsic limitation of the setup or the camera? Or am I doing something wrong? I wish to use the camera for a robotic application, in which I have to determine the distance of metal pipes edges, from a distance of at least 1.5 meters. If that won't be possible with the OAK-D, I will simply consider more powerful means (i.e. a lidar setup).

Thanks a lot for your help in advance!

  • erik replied to this.

    Hello FS93 ,
    I believe this should be possible. First thing - regarding the disparity checking, did you perform rgb-depth alignment? By default, color and depth streams aren't aligned, so looking at coordinates on RGB stream doesn't corelate to depth coordiantes.

    To determine whether this is possible, you could also first record some depth and replay it on realsense viewer - as it has nice visualization. If the pipe edges are clearly visible on the depth recording, you can be sure that with appropriate CV techniques you will be able to determine metal pipe edges.
    Thoughts?
    Thanks, Erik

    • FS93 replied to this.
      2 months later

      Dear erik,

      Thanks a lot for your reply, and sorry for my delay, the project has been on hold for a few weeks.

      I did give it a try, as you suggested, at recording the bag file and replaying it with the realsense viewer. The results are satisfactory, as I can individuate the correct coordinates of the extremities of the pipe, as shown:

      My problem is now: given that I can still correctly identify the circles thanks to a hough detection algorithm, how do I extrapolate such coordinates from the depth array? My naive implementation provides me with wrong results regarding depth.

      That's what I tried:

      def getFrame(queue):
        # Get frame from queue
        frame = queue.get()
        # Convert frame to OpenCV format and return
        return frame.getCvFrame()
      
      def getMonoCamera(pipeline, isLeft):
        # Configure mono camera
        mono = pipeline.createMonoCamera()
        # Set Camera Resolution
        mono.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
        if isLeft:
            # Get left camera
            mono.setBoardSocket(dai.CameraBoardSocket.LEFT)
        else :
            # Get right camera
            mono.setBoardSocket(dai.CameraBoardSocket.RIGHT)
        return mono
      
      def getStereoPair(pipeline, monoLeft, monoRight):
          # Configure stereo pair for depth estimation
          stereo = pipeline.createStereoDepth()
          # Checks occluded pixels and marks them as invalid
          stereo.setLeftRightCheck(True)
          # Configure left and right cameras to work as a stereo pair
          monoLeft.out.link(stereo.left)
          monoRight.out.link(stereo.right)
          return stereo
      if __name__ == '__main__':
          # Start defining a pipeline
          pipeline = dai.Pipeline()
          # Set up left and right cameras
          monoLeft = getMonoCamera(pipeline, isLeft = True)
          monoRight = getMonoCamera(pipeline, isLeft = False)
          # Combine left and right cameras to form a stereo pair
          stereo = getStereoPair(pipeline, monoLeft, monoRight)
          stereo.setLeftRightCheck(True)
          camRgb = pipeline.create(dai.node.ColorCamera)
          camRgb.setIspScale(1,3)
          camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
          xoutRgb = pipeline.create(dai.node.XLinkOut)
          xoutRgb.setStreamName("video")
          camRgb.video.link(xoutRgb.input)
          # Set XlinkOut for disparity, rectifiedLeft, and rectifiedRight
          xoutDisp = pipeline.createXLinkOut()
          xoutDisp.setStreamName("disparity")
          
          xoutRectifiedLeft = pipeline.createXLinkOut()
          xoutRectifiedLeft.setStreamName("rectifiedLeft")
      
          xoutRectifiedRight = pipeline.createXLinkOut()
          xoutRectifiedRight.setStreamName("rectifiedRight")
      
          stereo.disparity.link(xoutDisp.input)
          
          stereo.rectifiedLeft.link(xoutRectifiedLeft.input)
          stereo.rectifiedRight.link(xoutRectifiedRight.input)
          
          # Pipeline is defined, now we can connect to the device
      
          with dai.Device(pipeline) as device:
             
              # Output queues will be used to get the rgb frames and nn data from the outputs defined above
              disparityQueue = device.getOutputQueue(name="disparity", maxSize=1, blocking=False)
              rectifiedLeftQueue = device.getOutputQueue(name="rectifiedLeft", maxSize=1, blocking=False)
              rectifiedRightQueue = device.getOutputQueue(name="rectifiedRight", maxSize=1, blocking=False)
              # Calculate a multiplier for colormapping disparity map
              disparityMultiplier = 255 / stereo.getMaxDisparity()
              cv2.namedWindow("Stereo Pair")
              
              # Variable use to toggle between side by side view and one frame view.
              sideBySide = False
              def rescale_frame(frame, percent=75):
                  width = int(frame.shape[1] * percent/ 100)
                  height = int(frame.shape[0] * percent/ 100)
                  dim = (640, 400)
                  return cv2.resize(frame, dim, interpolation =cv2.INTER_AREA)
      
              while True:
      
                  # Get disparity map
                  disparity = getFrame(disparityQueue)
                  depth = 441.25 * 7.5 / disparity 
                  
                  # Colormap disparity for display
                  disparity_multiplied = (disparity * disparityMultiplier).astype(np.uint8)
                  disparity_multiplied = cv2.applyColorMap(disparity_multiplied, cv2.COLORMAP_JET)
                  
                  # Get left and right rectified frame
                  leftFrame = getFrame(rectifiedLeftQueue)
                  rightFrame = getFrame(rectifiedRightQueue)
      
                  blurredLeft = cv2.GaussianBlur(leftFrame, (5, 5), 0)
                  blurredRight = cv2.GaussianBlur(rightFrame, (5, 5), 0)
      
                  imLeft = cv2.adaptiveThreshold(blurredLeft, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
                  imRight = cv2.adaptiveThreshold(blurredRight, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
                  try:
                      circles = cv2.HoughCircles(blurredRight,
                                  cv2.HOUGH_GRADIENT_ALT,
                                  minDist=15,
                                  dp=1.6,
                                  param1=300,
                                  param2=0.9,
                                  minRadius=0,
                                  maxRadius=-1)
                      circles = np.uint16(np.around(circles))
                      for i in circles[0,:]:
                          # draw the outer circle
                          cv2.circle(rightFrame,(i[0],i[1]),i[2],(255,0,255),1)
                          cv2.circle(disparity_multiplied,(i[0],i[1]),i[2],(255,0,255),1)
                          # draw the center of the circle
                      cv2.circle(rightFrame,(i[0],i[1]),1,(255,0,255),1)
                      cv2.circle(disparity_multiplied,(i[0],i[1]),1,(255,0,255),1)
                      xmin_depth = i[0]-i[2]
                      xmax_depth = i[0]+ i[2]
                      ymin_depth = i[1]-i[2]
                      ymax_depth = i[1]+i[2]
                      # print(i[0],i[1],i[2]) 
                      # print(xmin_depth,xmax_depth,ymin_depth,ymax_depth)
                      
                      # print(depth[i[1],i[0]])
                      # depth_cut = depth[ymin_depth:ymax_depth,xmin_depth:xmax_depth].astype(float)
                      # print(depth_cut)
                      dist = depth[i[1],i[0]]
                      
                      print("Detected {0:.2} centimeters away.".format(dist))    
                  except:
                      pass

      Also, even if this would work for the depth estimate, which does not, I still am not sure on how to get the same for the x and y coordinate.

      Any help or clarification would be greatly appreaciated.

      • erik replied to this.

        Hello FS93 ,
        After getting the circle on the depth frame, I would take all those depth points, average them out, and calculate spatial coordiantes from the centroid of the circle (with averaged depth). That's essentially what this demo does, except it's square, not circle, but I assume it shouldn't be too complex to change this logic.
        Thanks, Erik

          Thanks a lot erik ,

          I was able to get some results adapting the demo you proposed me.

          I still have a couple of doubts:

          1. I calculated the hough circles pixel coordinates based on the right image output of the camera, does this mean that the coordinates are expressed in the reference frame of the right camera sensor?

          2. The value of the Z coordinate is expressed as the length of the orthogonal segment starting at X,Y point and reaching the object, or is it the length of the segment starting at 0,0 and reaching the object? I guess the first option, but I am asking just to be sure.

          • erik replied to this.

            Hello FS93 ,
            By default, coordinates are taken from right rectified frame. You can also align the depth with other streams, eg. color, example here.
            It's the Z coordinate of the object, not the distance to the object. For the actual distance to the object, you would need to use x/y/z and pitagoras formula.
            Thanks, Erik

            4 months later

            Hello,
            I found the following in the given code -
            depth = 441.25 * 7.5 / disparity
            where 441.25 is the focal length (f) as I understand. I know that the effective focal length for the RGB camera is 3.37 mm and for the stereo camera is 1.3 mm. I was wondering how did you find this 441.25 cm value?

            Thanks,
            Afsana

            • erik replied to this.
              21 days later

              Hi, @FS93 . I'm working on a similar application like yours. After you implement Erik's tips what was the new error between real Z coordinate and that measured by OAK-D camera?

              • erik replied to this.
                6 days later

                I have some clarification about OAK Camera
                We can capture the 7-segment display and convert to number ? Using OAK Camera ?

                a month later

                erik What model of the OAK-D POE camera provides error less than 1.5% at 1m ground truth distance ?

                5 months later

                erik Can you please answer Asfanas question here? I am also trying to get the dispartiy converted into depth and can not find in the link you provided the basis for assuming 7.5 as a effective focal lenghth..

                Kindly help as soon as you are able to

                • erik replied to this.

                  erik Thank you!!!

                  I wanted to just follow up on this, I am using the code given here:
                  (RGB, DEPT, CONFIDENCE aligned.py)
                  https://github.com/luxonis/depthai-python/blob/main/examples/StereoDepth/rgb_depth_aligned.py

                  Essentially I would need to take the disparity output and apply the formula:
                  depth=441.25∗7.5/disparity(pixel)
                  to calculate the depth??

                  Please answer this as I have been stuck on this for past 3 days and my head has been noodling up everday.

                  THANK YOU IN ADVANCE!!!!

                  Regards,
                  Mamoon

                  • erik replied to this.

                    Hi MamoonIsmailKhalid ,
                    If you have 7.5cm baseline distance and 441 [pixels] focal length then yes, that's the correct formula🙂 Otherwise, I would strongly suggest using depth output instead of calculating it yourself on the host.
                    Thanks, Erik

                    How would one extract the depth data directly? Any code examples you can refer me to??I am using the depth data to extract depth information and to overlay it on the output of 2D pose estimation (Google Mediapie) to recosntruct a 3D pose of the key points extracted from Google mediapipe