OAK-D Absolute Depth Precision

FS93

Hello,

I am working on an application using the OAK-D camera, for detecting circles in a scene and calculating their distance from the camera, for a robotic application. I am trying several setups, but I keep getting a big error in terms of absolute distance.

In red you can see the object of which I want to find the distance. The object is at 92 cm, but from the disparity algorithm I get 97 cm. I get this value by first finding the coordinate of the center of the red blob, then getting the relevant disparity value in pixel from the disparity array and findally calculating the distance using the formula you provided in the FAQs. This error is not consistent at all distances (the closer the object, the lower the error and viceversa).

I also attach the disparity map.

It looks like the camera is not able to tell the roll of tape apart from the background white box, and this might cause the big error. Is this an intrinsic limitation of the setup or the camera? Or am I doing something wrong? I wish to use the camera for a robotic application, in which I have to determine the distance of metal pipes edges, from a distance of at least 1.5 meters. If that won't be possible with the OAK-D, I will simply consider more powerful means (i.e. a lidar setup).

Thanks a lot for your help in advance!

erik

Hello FS93 ,
I believe this should be possible. First thing - regarding the disparity checking, did you perform rgb-depth alignment? By default, color and depth streams aren't aligned, so looking at coordinates on RGB stream doesn't corelate to depth coordiantes.

To determine whether this is possible, you could also first record some depth and replay it on realsense viewer - as it has nice visualization. If the pipe edges are clearly visible on the depth recording, you can be sure that with appropriate CV techniques you will be able to determine metal pipe edges.
Thoughts?
Thanks, Erik

FS93

Dear erik,

Thanks a lot for your reply, and sorry for my delay, the project has been on hold for a few weeks.

I did give it a try, as you suggested, at recording the bag file and replaying it with the realsense viewer. The results are satisfactory, as I can individuate the correct coordinates of the extremities of the pipe, as shown:

My problem is now: given that I can still correctly identify the circles thanks to a hough detection algorithm, how do I extrapolate such coordinates from the depth array? My naive implementation provides me with wrong results regarding depth.

That's what I tried:

def getFrame(queue):
  # Get frame from queue
  frame = queue.get()
  # Convert frame to OpenCV format and return
  return frame.getCvFrame()

def getMonoCamera(pipeline, isLeft):
  # Configure mono camera
  mono = pipeline.createMonoCamera()
  # Set Camera Resolution
  mono.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
  if isLeft:
      # Get left camera
      mono.setBoardSocket(dai.CameraBoardSocket.LEFT)
  else :
      # Get right camera
      mono.setBoardSocket(dai.CameraBoardSocket.RIGHT)
  return mono

def getStereoPair(pipeline, monoLeft, monoRight):
    # Configure stereo pair for depth estimation
    stereo = pipeline.createStereoDepth()
    # Checks occluded pixels and marks them as invalid
    stereo.setLeftRightCheck(True)
    # Configure left and right cameras to work as a stereo pair
    monoLeft.out.link(stereo.left)
    monoRight.out.link(stereo.right)
    return stereo

if __name__ == '__main__':
    # Start defining a pipeline
    pipeline = dai.Pipeline()
    # Set up left and right cameras
    monoLeft = getMonoCamera(pipeline, isLeft = True)
    monoRight = getMonoCamera(pipeline, isLeft = False)
    # Combine left and right cameras to form a stereo pair
    stereo = getStereoPair(pipeline, monoLeft, monoRight)
    stereo.setLeftRightCheck(True)
    camRgb = pipeline.create(dai.node.ColorCamera)
    camRgb.setIspScale(1,3)
    camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
    xoutRgb = pipeline.create(dai.node.XLinkOut)
    xoutRgb.setStreamName("video")
    camRgb.video.link(xoutRgb.input)
    # Set XlinkOut for disparity, rectifiedLeft, and rectifiedRight
    xoutDisp = pipeline.createXLinkOut()
    xoutDisp.setStreamName("disparity")
    
    xoutRectifiedLeft = pipeline.createXLinkOut()
    xoutRectifiedLeft.setStreamName("rectifiedLeft")

    xoutRectifiedRight = pipeline.createXLinkOut()
    xoutRectifiedRight.setStreamName("rectifiedRight")

    stereo.disparity.link(xoutDisp.input)
    
    stereo.rectifiedLeft.link(xoutRectifiedLeft.input)
    stereo.rectifiedRight.link(xoutRectifiedRight.input)
    
    # Pipeline is defined, now we can connect to the device

    with dai.Device(pipeline) as device:
       
        # Output queues will be used to get the rgb frames and nn data from the outputs defined above
        disparityQueue = device.getOutputQueue(name="disparity", maxSize=1, blocking=False)
        rectifiedLeftQueue = device.getOutputQueue(name="rectifiedLeft", maxSize=1, blocking=False)
        rectifiedRightQueue = device.getOutputQueue(name="rectifiedRight", maxSize=1, blocking=False)
        # Calculate a multiplier for colormapping disparity map
        disparityMultiplier = 255 / stereo.getMaxDisparity()
        cv2.namedWindow("Stereo Pair")
        
        # Variable use to toggle between side by side view and one frame view.
        sideBySide = False
        def rescale_frame(frame, percent=75):
            width = int(frame.shape[1] * percent/ 100)
            height = int(frame.shape[0] * percent/ 100)
            dim = (640, 400)
            return cv2.resize(frame, dim, interpolation =cv2.INTER_AREA)

        while True:

            # Get disparity map
            disparity = getFrame(disparityQueue)
            depth = 441.25 * 7.5 / disparity 
            
            # Colormap disparity for display
            disparity_multiplied = (disparity * disparityMultiplier).astype(np.uint8)
            disparity_multiplied = cv2.applyColorMap(disparity_multiplied, cv2.COLORMAP_JET)
            
            # Get left and right rectified frame
            leftFrame = getFrame(rectifiedLeftQueue)
            rightFrame = getFrame(rectifiedRightQueue)

            blurredLeft = cv2.GaussianBlur(leftFrame, (5, 5), 0)
            blurredRight = cv2.GaussianBlur(rightFrame, (5, 5), 0)

            imLeft = cv2.adaptiveThreshold(blurredLeft, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
            imRight = cv2.adaptiveThreshold(blurredRight, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
            try:
                circles = cv2.HoughCircles(blurredRight,
                            cv2.HOUGH_GRADIENT_ALT,
                            minDist=15,
                            dp=1.6,
                            param1=300,
                            param2=0.9,
                            minRadius=0,
                            maxRadius=-1)
                circles = np.uint16(np.around(circles))
                for i in circles[0,:]:
                    # draw the outer circle
                    cv2.circle(rightFrame,(i[0],i[1]),i[2],(255,0,255),1)
                    cv2.circle(disparity_multiplied,(i[0],i[1]),i[2],(255,0,255),1)
                    # draw the center of the circle
                cv2.circle(rightFrame,(i[0],i[1]),1,(255,0,255),1)
                cv2.circle(disparity_multiplied,(i[0],i[1]),1,(255,0,255),1)
                xmin_depth = i[0]-i[2]
                xmax_depth = i[0]+ i[2]
                ymin_depth = i[1]-i[2]
                ymax_depth = i[1]+i[2]
                # print(i[0],i[1],i[2]) 
                # print(xmin_depth,xmax_depth,ymin_depth,ymax_depth)
                
                # print(depth[i[1],i[0]])
                # depth_cut = depth[ymin_depth:ymax_depth,xmin_depth:xmax_depth].astype(float)
                # print(depth_cut)
                dist = depth[i[1],i[0]]
                
                print("Detected {0:.2} centimeters away.".format(dist))    
            except:
                pass

Also, even if this would work for the depth estimate, which does not, I still am not sure on how to get the same for the x and y coordinate.

Any help or clarification would be greatly appreaciated.

erik

Hello FS93 ,
After getting the circle on the depth frame, I would take all those depth points, average them out, and calculate spatial coordiantes from the centroid of the circle (with averaged depth). That's essentially what this demo does, except it's square, not circle, but I assume it shouldn't be too complex to change this logic.
Thanks, Erik

FS93

Thanks a lot erik ,

I was able to get some results adapting the demo you proposed me.

I still have a couple of doubts:

I calculated the hough circles pixel coordinates based on the right image output of the camera, does this mean that the coordinates are expressed in the reference frame of the right camera sensor?
The value of the Z coordinate is expressed as the length of the orthogonal segment starting at X,Y point and reaching the object, or is it the length of the segment starting at 0,0 and reaching the object? I guess the first option, but I am asking just to be sure.

erik

Hello FS93 ,
By default, coordinates are taken from right rectified frame. You can also align the depth with other streams, eg. color, example here.
It's the Z coordinate of the object, not the distance to the object. For the actual distance to the object, you would need to use x/y/z and pitagoras formula.
Thanks, Erik

afsana

Hello,
I found the following in the given code -
depth = 441.25 * 7.5 / disparity
where 441.25 is the focal length (f) as I understand. I know that the effective focal length for the RGB camera is 3.37 mm and for the stereo camera is 1.3 mm. I was wondering how did you find this 441.25 cm value?

Thanks,
Afsana

erik

Hi afsana ,
Please see docs here: https://docs.luxonis.com/projects/api/en/latest/components/nodes/stereo_depth/#calculate-depth-using-disparity-map

Thanks, Erik

MamoonIsmailKhalid

erik Can you please answer Asfanas question here? I am also trying to get the dispartiy converted into depth and can not find in the link you provided the basis for assuming 7.5 as a effective focal lenghth..

Kindly help as soon as you are able to

afsana

Thanks Erik!

AlexandreRibeiro

Hi, @FS93 . I'm working on a similar application like yours. After you implement Erik's tips what was the new error between real Z coordinate and that measured by OAK-D camera?

erik

(AlexandreRibeiro for the stereo depth accuracy, see report here)

AlexandreRibeiro

erik Thanks.

asif

erik What model of the OAK-D POE camera provides error less than 1.5% at 1m ground truth distance ?

premvijayakumar

I have some clarification about OAK Camera
We can capture the 7-segment display and convert to number ? Using OAK Camera ?

erik

@asif just replied here: https://discuss.luxonis.com/d/343-accuracy-of-spatial-detection-and-possible-improvements/8

erik

Hi MamoonIsmailKhalid ,
Please see the documentation here.
Thanks, Erik

MamoonIsmailKhalid

erik Thank you!!!

I wanted to just follow up on this, I am using the code given here:
(RGB, DEPT, CONFIDENCE aligned.py)
https://github.com/luxonis/depthai-python/blob/main/examples/StereoDepth/rgb_depth_aligned.py

Essentially I would need to take the disparity output and apply the formula:
depth=441.25∗7.5/disparity(pixel)
to calculate the depth??

Please answer this as I have been stuck on this for past 3 days and my head has been noodling up everday.

THANK YOU IN ADVANCE!!!!

Regards,
Mamoon

erik

Hi MamoonIsmailKhalid ,
If you have 7.5cm baseline distance and 441 [pixels] focal length then yes, that's the correct formula🙂 Otherwise, I would strongly suggest using depth output instead of calculating it yourself on the host.
Thanks, Erik

MamoonIsmailKhalid

How would one extract the depth data directly? Any code examples you can refer me to??I am using the depth data to extract depth information and to overlay it on the output of 2D pose estimation (Google Mediapie) to recosntruct a 3D pose of the key points extracted from Google mediapipe