How to get better pointclouds

Enodo-Oscar

Hello,

me and my team are working on a grasping application where we want to be able to grasp objects whose geometry can be inferred by the pointclouds captured using some depth camera. We bought an OAK-d pro which be compare to a Realsense D435i depth camera.

We were preliminarily happy with the depth map produced by the RealSense camera (with unaltered, default settings) but unhappy with their color camera, which is why we switched over to the OAK-d pro. Now we find however that the 3d reconstructions we build using the OAK-d pro is insufficient after trying out various settings. The code we use to produce the actual depth image is, color image is:

import os

import json

import numpy as np

import depthai as dai

import matplotlib.pyplot as plt
cwd = os.path.join(os.path.dirname(os.path.realpath(__file__)), "images")
# Create pipelinepipeline = dai.Pipeline()
FPS = 1
pipeline = dai.Pipeline()

camRgb = pipeline.create(dai.node.ColorCamera)

monoLeft = pipeline.create(dai.node.MonoCamera)

monoRight = pipeline.create(dai.node.MonoCamera)

depth = pipeline.create(dai.node.StereoDepth)

# pointcloud = pipeline.create(dai.node.PointCloud)

sync = pipeline.create(dai.node.Sync)

xOut = pipeline.create(dai.node.XLinkOut)

xOut.input.setBlocking(False)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)

camRgb.setIspScale(1,3)camRgb.setFps(FPS)
monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)

monoLeft.setCamera("left")

monoLeft.setFps(FPS)

monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)

monoRight.setCamera("right")

monoRight.setFps(FPS)
config = depth.initialConfig.get()

config.postProcessing.speckleFilter.enable = True

config.postProcessing.speckleFilter.speckleRange = 60

config.postProcessing.temporalFilter.enable = True

config.postProcessing.decimationFilter.decimationFactor = 1

config.postProcessing.spatialFilter.holeFillingRadius = 2

config.postProcessing.spatialFilter.numIterations = 1

config.postProcessing.thresholdFilter.minRange = 900 # mm

config.postProcessing.thresholdFilter.maxRange = 1500 # mm

config.censusTransform.enableMeanMode = True

config.costMatching.linearEquationParameters.alpha = 0

config.costMatching.linearEquationParameters.beta = 2

depth.initialConfig.set(config)

depth.initialConfig.setConfidenceThreshold(200)

depth.setLeftRightCheck(True)

depth.setExtendedDisparity(False)

depth.setSubpixel(True)

depth.setSubpixelFractionalBits(4)

depth.setRectifyEdgeFillColor(0) # Black, to better see the cutout

# configure color/depth image outputsmonoLeft.out.link(depth.left)

monoRight.out.link(depth.right)

camRgb.isp.link(sync.inputs["rgb"])

depth.depth.link(sync.inputs["dep"])

sync.out.link(xOut.input)

xOut.setStreamName("out")
# Add outputs for mono framesxOutLeft = pipeline.create(dai.node.XLinkOut)

xOutLeft.setStreamName("mono_left")monoLeft.out.link(xOutLeft.input)
xOutRight = pipeline.create(dai.node.XLinkOut)

xOutRight.setStreamName("mono_right")

monoRight.out.link(xOutRight.input)

# Connect to device and start pipeline

with dai.Device(pipeline, device_info) as device:

print("Connected to device")

q = device.getOutputQueue(name="out", maxSize=4, blocking=False)

qLeft = device.getOutputQueue(name="mono_left", maxSize=4, blocking=False)

qRight = device.getOutputQueue(name="mono_right", maxSize=4, blocking=False)

# Obtain intrinsic parameters from the calibration data

calib_data = device.readCalibration()

K_color = calib_data.getCameraIntrinsics(dai.CameraBoardSocket.CAM_A, dai.Size2f(1920, 1080)) dist_color = calib_data.getDistortionCoefficients(dai.CameraBoardSocket.CAM_A)

K_depth = calib_data.getCameraIntrinsics(dai.CameraBoardSocket.CAM_C, dai.Size2f(640, 400))

# find extrinsics

extrinsics_color = np.eye(4)

color_to_depth = calib_data.getCameraExtrinsics(dai.CameraBoardSocket.CAM_A, dai.CameraBoardSocket.CAM_C) extrinsics_depth = np.array(color_to_depth)

i = 0

while True:

i += 1 # Get frames

message = q.get()

color_frame = message["rgb"].getCvFrame()

depth_frame = message["dep"].getCvFrame().astype(np.float32)

mono_left_frame = qLeft.get().getCvFrame()

mono_right_frame = qRight.get().getCvFrame()
depth_array = np.array(depth_frame)

color_array = np.array(color_frame)[:,:,::-1]

mask = np.ones_like(depth_array)

Here is a link to some image material:

https://drive.google.com/drive/folders/1eog_DYnHkH_4OMERsfkqux5-aag8-8qh?usp=sharing

It contains a reconstruction using realsense data, and then two images of a reconstruction done using the oakd-pro along with a depth map used to produce the reconstruction.

The questions are as follows:

Are we using suboptimal settings for our purpose? We would like material to be well-defined to the point where antipodal grasps can be proposed using the depth map alone.
Are we using a suboptimal choice of device? The Oak-d PRO has a baseline of 75mm and is intended for distances 0.7m to 12m, but we are realistically operating in the range 0.6-1.2m. Realsense d435i has a baseline of 50mm and an intended range of 0.3-3m. Could it be that we should select a camera from your line-up which has a shorter baseline but the same good color camera?
We expect all objects of interest to move a constant velocity (speed and direction). What pre/post-processing would mitigate noise related to this?

I appreciate your response,

Enodo-Oscar

jakaskerl

Hi Enodo-Oscar
The depth image for the OAK is very bad. I think you can easily improve it by changing some settings: https://docs.luxonis.com/hardware/platform/depth/configuring-stereo-depth/

Thanks,
Jaka