Trouble matching solvePnP extrinsics to camera XYZ points

wildisen486

Hi everyone,

I’m trying to calibrate the world coordinate system (charuco board) to camera coordinates (OAK-D Pro) using solvePnP.

Goal: Find the camera extrinsics to transform points from the camera frame (DepthAI stereo-derived XYZ) to a charuco board coordinate system.

What I did:

Detected 2D charuco corners in RGB image (1080p from CAM_A).
Used known charuco world coordinates, RGB camera intrinsics (read from camera), and solvePnP to get rvec, tvec.
Converted (u, v, Z) from depth map to XYZ using intrinsics from getCameraIntrinsics() for CAM_A.

Problem: When I transform the 3D points (XYZ from DepthAI stereo) using the solved extrinsics, the points look correct in XY but are clearly wrong in Z (z-axis offset or scale). Note: board (=world) coordinate system has x, y on the pattern plane, so Z is pointing upwards in this case.

I tried to debug by comparing solvePnP marker translation vector to the XYZ from depth map — they do not match as expected.

Suspicions:

I might be using incorrect camera intrinsics or projection model.
DepthAI pointcloud uses a different reference frame or model than standard OpenCV (K).

Questions

Is the camera model used in DepthAI stereo XYZ points the same as OpenCV pinhole model?
Does getCameraIntrinsics() for CAM_A (1080p) match the internal projection model used to generate the XYZ points?
Any known pitfalls when aligning depth-derived XYZ points with RGB-based solvePnP results?

Code snippets for context

Pipeline setup

def configure_pipeline(pipeline: dai.Pipeline) -> dai.Pipeline:
    cam_rgb = pipeline.create(dai.node.ColorCamera)
    cam_rgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
    cam_rgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
    cam_rgb.setInterleaved(False)
    cam_rgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
    cam_rgb.setPreviewSize(640, 400)
    cam_rgb.setFps(5)
    cam_rgb.initialControl.setManualFocus(125)

    xout_rgb_preview = pipeline.create(dai.node.XLinkOut)
    xout_rgb_preview.setStreamName("rgb_preview")
    cam_rgb.preview.link(xout_rgb_preview.input)

    xout_rgb = pipeline.create(dai.node.XLinkOut)
    xout_rgb.setStreamName("rgb")
    cam_rgb.isp.link(xout_rgb.input)

    cam_left = pipeline.create(dai.node.MonoCamera)
    cam_right = pipeline.create(dai.node.MonoCamera)
    cam_left.setBoardSocket(dai.CameraBoardSocket.CAM_B)
    cam_right.setBoardSocket(dai.CameraBoardSocket.CAM_C)
    cam_left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
    cam_right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
    cam_left.setFps(5)
    cam_right.setFps(5)

    stereo = pipeline.create(dai.node.StereoDepth)
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DETAIL)
    stereo.setDepthAlign(dai.CameraBoardSocket.CAM_A)
    stereo.initialConfig.PostProcessing.TemporalFilter.enabled = True

    cam_left.out.link(stereo.left)
    cam_right.out.link(stereo.right)

    stereo_xout = pipeline.create(dai.node.XLinkOut)
    stereo_xout.setStreamName("depth")
    stereo.depth.link(stereo_xout.input)

    return pipeline

Convert pixel + depth to XYZ

def uvz_to_xyz(u, v, Z, K):
    fx = K[0][0]
    fy = K[1][1]
    cx = K[0][2]
    cy = K[1][2]

    X = (u - cx) * Z / fx
    Y = (v - cy) * Z / fy
    return (X, Y, Z)

Calibration + depth capture snippet

oPipeline = configure_pipeline(dai.Pipeline())

with dai.Device(oPipeline, dai.UsbSpeed.HIGH) as device:
    rgb_queue = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)
    depth_queue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)
    rgb_preview_queue = device.getOutputQueue(name="rgb_preview", maxSize=4, blocking=False)

    calib_data = device.readCalibration()
    K = calib_data.getCameraIntrinsics(
        cameraId=dai.CameraBoardSocket.CAM_A,
        resizeWidth=1920,
        resizeHeight=1080,
        keepAspectRatio=True
    )
    np.save('./CalibrationOutput/camera_intrinsics.npy', K)

    time_start = time.time()
    while (time.time() - time_start < 10):
        if rgb_queue.has():
            rgb_frame = rgb_queue.get().getCvFrame()
        if depth_queue.has():
            depth_frame = depth_queue.get().getFrame()
        if rgb_preview_queue.has():
            rgb_preview_frame = rgb_preview_queue.get().getCvFrame()

        if rgb_preview_frame is not None:
            cv2.imshow("RGB and Depth", rgb_preview_frame)

        key = cv2.waitKey(1)
    cv2.destroyAllWindows()

    if rgb_frame is not None and depth_frame is not None:
        cv2.imwrite('./CalibrationOutput/test_image.png', rgb_frame)
        np.save('./CalibrationOutput/test_depth.npy', depth_frame)
    else:
        raise Exception("No RGB or Depth frame received.")

Additional context

I had an earlier error using factory calibration that required applying an extra scale factor (0.9) when affine transforming (using EstimateAffine3D from OpenCV) to the world frame, which hinted at calibration inaccuracies. After re-calibration, this was close to 1.0, but I still see the Z offset problem.

Request

Any ideas on what I might be missing? Especially whether DepthAI’s stereo XYZ and OpenCV camera model (with K) are directly compatible, or if a correction step is needed.

Thank you!

(This post was written with the help of an LLM to better formulate my question, and was used only for structuring and writing)

jakaskerl

wildisen486

Is the camera model used in DepthAI stereo XYZ points the same as OpenCV pinhole model?

Should be once the image is undistorted, yes.

Any known pitfalls when aligning depth-derived XYZ points with RGB-based solvePnP results?

There are pitfalls regarding intrinsics, when image is actually cropped instead of downscaled so the intrinsics are incorrect - seems to be fine in your case as long as you are using .isp output and not .preview.

Adding this here to hopefully point you in the right direction:
luxonis/oak-examplestree/master/gen2-multiple-devices/multi-cam-calibration

Thanks,
Jaka