Hi everyone,
I’m trying to calibrate the world coordinate system (charuco board) to camera coordinates (OAK-D Pro) using solvePnP
.
Goal: Find the camera extrinsics to transform points from the camera frame (DepthAI stereo-derived XYZ) to a charuco board coordinate system.
What I did:
- Detected 2D charuco corners in RGB image (1080p from CAM_A).
- Used known charuco world coordinates, RGB camera intrinsics (read from camera), and
solvePnP
to get rvec
, tvec
.
- Converted (u, v, Z) from depth map to XYZ using intrinsics from
getCameraIntrinsics()
for CAM_A.
Problem: When I transform the 3D points (XYZ from DepthAI stereo) using the solved extrinsics, the points look correct in XY but are clearly wrong in Z (z-axis offset or scale). Note: board (=world) coordinate system has x, y on the pattern plane, so Z is pointing upwards in this case.
I tried to debug by comparing solvePnP
marker translation vector to the XYZ from depth map — they do not match as expected.
Suspicions:
- I might be using incorrect camera intrinsics or projection model.
- DepthAI pointcloud uses a different reference frame or model than standard OpenCV (K).
Questions
- Is the camera model used in DepthAI stereo XYZ points the same as OpenCV pinhole model?
- Does
getCameraIntrinsics()
for CAM_A (1080p) match the internal projection model used to generate the XYZ points?
- Any known pitfalls when aligning depth-derived XYZ points with RGB-based solvePnP results?
Code snippets for context
Pipeline setup
def configure_pipeline(pipeline: dai.Pipeline) -> dai.Pipeline:
cam_rgb = pipeline.create(dai.node.ColorCamera)
cam_rgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
cam_rgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
cam_rgb.setInterleaved(False)
cam_rgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
cam_rgb.setPreviewSize(640, 400)
cam_rgb.setFps(5)
cam_rgb.initialControl.setManualFocus(125)
xout_rgb_preview = pipeline.create(dai.node.XLinkOut)
xout_rgb_preview.setStreamName("rgb_preview")
cam_rgb.preview.link(xout_rgb_preview.input)
xout_rgb = pipeline.create(dai.node.XLinkOut)
xout_rgb.setStreamName("rgb")
cam_rgb.isp.link(xout_rgb.input)
cam_left = pipeline.create(dai.node.MonoCamera)
cam_right = pipeline.create(dai.node.MonoCamera)
cam_left.setBoardSocket(dai.CameraBoardSocket.CAM_B)
cam_right.setBoardSocket(dai.CameraBoardSocket.CAM_C)
cam_left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
cam_right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
cam_left.setFps(5)
cam_right.setFps(5)
stereo = pipeline.create(dai.node.StereoDepth)
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DETAIL)
stereo.setDepthAlign(dai.CameraBoardSocket.CAM_A)
stereo.initialConfig.PostProcessing.TemporalFilter.enabled = True
cam_left.out.link(stereo.left)
cam_right.out.link(stereo.right)
stereo_xout = pipeline.create(dai.node.XLinkOut)
stereo_xout.setStreamName("depth")
stereo.depth.link(stereo_xout.input)
return pipeline
Convert pixel + depth to XYZ
def uvz_to_xyz(u, v, Z, K):
fx = K[0][0]
fy = K[1][1]
cx = K[0][2]
cy = K[1][2]
X = (u - cx) * Z / fx
Y = (v - cy) * Z / fy
return (X, Y, Z)
Calibration + depth capture snippet
oPipeline = configure_pipeline(dai.Pipeline())
with dai.Device(oPipeline, dai.UsbSpeed.HIGH) as device:
rgb_queue = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)
depth_queue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)
rgb_preview_queue = device.getOutputQueue(name="rgb_preview", maxSize=4, blocking=False)
calib_data = device.readCalibration()
K = calib_data.getCameraIntrinsics(
cameraId=dai.CameraBoardSocket.CAM_A,
resizeWidth=1920,
resizeHeight=1080,
keepAspectRatio=True
)
np.save('./CalibrationOutput/camera_intrinsics.npy', K)
time_start = time.time()
while (time.time() - time_start < 10):
if rgb_queue.has():
rgb_frame = rgb_queue.get().getCvFrame()
if depth_queue.has():
depth_frame = depth_queue.get().getFrame()
if rgb_preview_queue.has():
rgb_preview_frame = rgb_preview_queue.get().getCvFrame()
if rgb_preview_frame is not None:
cv2.imshow("RGB and Depth", rgb_preview_frame)
key = cv2.waitKey(1)
cv2.destroyAllWindows()
if rgb_frame is not None and depth_frame is not None:
cv2.imwrite('./CalibrationOutput/test_image.png', rgb_frame)
np.save('./CalibrationOutput/test_depth.npy', depth_frame)
else:
raise Exception("No RGB or Depth frame received.")
Additional context
I had an earlier error using factory calibration that required applying an extra scale factor (0.9) when affine transforming (using EstimateAffine3D from OpenCV) to the world frame, which hinted at calibration inaccuracies. After re-calibration, this was close to 1.0, but I still see the Z offset problem.
Request
Any ideas on what I might be missing? Especially whether DepthAI’s stereo XYZ and OpenCV camera model (with K) are directly compatible, or if a correction step is needed.
Thank you!
(This post was written with the help of an LLM to better formulate my question, and was used only for structuring and writing)