How to get world position from Oak D camera

SSadiaC · Nov 22, 2023

Hi, I am using XLink to capture video from the OAK D cameras and I am running a detection model on the host device. I now want to get the position of the object detected in the real world. I am confused about where to start from. How can I utilize the stereo camera properties to get the real-world position from camera position?

jakaskerl · Nov 22, 2023

Hi SadiaC
By saying on "the host device", you mean a PC or a RPI?
To calculate the XYZ coordinates from a depth image, you can use SpatialCalculator node (executed directly on the OAK), or you can use the host side calculation (available here).

What these two do essentially, is to get approximate position for any pixel in a depth map. This is done by using FOV of the sensor and resolution and calculating approximate x, y position on a depth map. The Z component is known so we only need some trigonometry to solve for the other two.

Thanks,
Jaka

SSadiaC · Nov 22, 2023

Hi you mentioned that the Z component is known, is that because we have the stereo cameras to give us Z? Without using the depth map, even then we would still know the Z right from the stereo cameras, so given a bounding box of an object can we correctly get the distance of that object from that Z?

jakaskerl · Nov 22, 2023

SadiaC Hi you mentioned that the Z component is known, is that because we have the stereo cameras to give us Z?

Correct.

SadiaC Without using the depth map, even then we would still know the Z right from the stereo cameras, so given a bounding box of an object can we correctly get the distance of that object from that Z?

Not sure I understand what you mean. Stereo cameras give us the depth map by calculating the disparity between left and right.
You could calculate the depth if you knew the exact dimensions of that object and made sure the bounding box is tightly fitting the object, which is very difficult and inefficient. Otherwise, you would need depth.

Thanks,
Jaka

SSadiaC · Nov 30, 2023

Thank you for getting back to me. I currently only want to use the RGB camera and not use the depth map from the stereo cameras. With only using two RGB camera (two Oak D Lite), if I calibrate it and calculate all the intrinsic and extrinsic parameters using this: luxonis/depthai-experimentstree/master/gen2-multiple-devices/multi-cam-calibration , will I be able to get the distance of an object? I want to use the depth map and stereo cameras later on and want to only focus on the rgb cameras for now.

jakaskerl · Nov 30, 2023

Hi @SadiaC
It's possible to only use the OAK-D-Lite as normal RGB cameras and use OpenCV to perform stereo calculations/matching.
But this is not currently possible on the Depthai platform, so you won't be able to leverage the hardware accelerated depth from the OAKs.

Thanks,
Jaka

SSadiaC · Dec 1, 2023

Hi thank you for the reply. So i followed the steps in mulitcam calibration to get the extrinsic parameters. I then followed this https://docs.luxonis.com/projects/api/en/latest/samples/calibration/calibration_reader/#calibration-reader to get the intrinsic parameters. I have then followed the code that you wrote under another post:

import numpy as np

# Given point in image coordinates (in pixels)
image_point = np.array([x_pixel, y_pixel, 1])  # Replace with your actual x, y pixel coordinates

# Given intrinsic matrix K
K = np.array([
    [fx, 0, cx],
    [0, fy, cy],
    [0, 0, 1]
])  # Replace fx, fy, cx, cy with your actual intrinsic parameters
# or you can get it using device.readCalibration().getCameraIntrinsics()

# Invert the intrinsic matrix
K_inv = np.linalg.inv(K)

# Convert to normalized camera coordinates
normalized_camera_point = K_inv @ image_point

# Given CAM_to_WORLD transformation matrix
CAM_to_WORLD = np.array([
    [a, b, c, d],
    [e, f, g, h],
    [i, j, k, l],
    [m, n, o, p]
])  # Replace with your actual transformation matrix

# Transform to world coordinates
homogeneous_world_point = CAM_to_WORLD @ np.append(normalized_camera_point, 1)

# Convert to Cartesian coordinates
world_point = homogeneous_world_point[:3] / homogeneous_world_point[3]

print("World Point:", world_point)

So would the world point now be able to provide distance now that the camera has been calibrated. I may be totally wrong but i am trying to understand what the world point gives us, just a relative position?

jakaskerl · Dec 1, 2023

Hi SadiaC
The example you are referencing is related to getting the approximate position when using a calibration board. This should not be used for depth sensing, as it is very inaccurate. The distances are relative to the camera, yes.

For that you would need to use cv2 functions for stereo matching.
GPT guide:

Stereo Calibration and Rectification

Before calculating depth, perform stereo rectification to align the images from both cameras. This uses the intrinsic and extrinsic parameters of the cameras.

Load Calibration Data: Load intrinsic parameters (cameraMatrix1, cameraMatrix2) and distortion coefficients (distCoeffs1, distCoeffs2) for both cameras, along with extrinsic parameters (rotation R and translation T matrices).
Stereo Rectification: Use cv2.stereoRectify() to compute rectification transforms for each camera. This outputs rectification transforms (R1, R2), projection matrices (P1, P2), and the disparity-to-depth mapping matrix (Q).

Image Preprocessing

Capture Images: Capture synchronized images from both cameras.
Undistort and Rectify Images: Use cv2.undistortPoints() and cv2.initUndistortRectifyMap() along with cv2.remap() to undistort and rectify the images.

Disparity Map Calculation

Configure Stereo Matcher: Create a stereo matcher using cv2.StereoBM_create() or cv2.StereoSGBM_create(). Tune parameters as needed.
Compute Disparity Map: Use the stereo matcher to compute the disparity map from rectified images.

Depth Calculation

Calculate Depth: Use the disparity map and Q matrix to calculate depth. Depth can be calculated as depth = focal_length * baseline / disparity, with focal_length and baseline derived from P1, P2, and Q.

Example Code Skeleton

import numpy as np
import cv2

# Load camera parameters
cameraMatrix1, cameraMatrix2, distCoeffs1, distCoeffs2, R, T = load_calibration_data()

# Stereo rectification
R1, R2, P1, P2, Q, _, _ = cv2.stereoRectify(cameraMatrix1, distCoeffs1, cameraMatrix2, distCoeffs2, imageSize, R, T)

# Capture images
imgL, imgR = capture_stereo_images()

# Undistort and rectify images
map1x, map1y = cv2.initUndistortRectifyMap(cameraMatrix1, distCoeffs1, R1, P1, imageSize, cv2.CV_32FC1)
map2x, map2y = cv2.initUndistortRectifyMap(cameraMatrix2, distCoeffs2, R2, P2, imageSize, cv2.CV_32FC1)
rectifiedL = cv2.remap(imgL, map1x, map1y, cv2.INTER_LINEAR)
rectifiedR = cv2.remap(imgR, map2x, map2y, cv2.INTER_LINEAR)

# Stereo Matcher and disparity
stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(rectifiedL, rectifiedR)

# Calculate depth
depth = cv2.reprojectImageTo3D(disparity, Q)

Notes

Parameter Tuning: Quality of depth estimation depends on correct calibration and parameter tuning.
Handling Disparities and Depths: The disparity map may contain invalid values. Handle these in the depth map.
Optimization: Real-time depth calculation can be intensive. Optimizations may be necessary.

Thanks,
Jaka

SSadiaC · Dec 6, 2023

Thank you for your detailed reply!