Hello,
me and my team are working on a grasping application where we want to be able to grasp objects whose geometry can be inferred by the pointclouds captured using some depth camera. We bought an OAK-d pro which be compare to a Realsense D435i depth camera.
We were preliminarily happy with the depth map produced by the RealSense camera (with unaltered, default settings) but unhappy with their color camera, which is why we switched over to the OAK-d pro. Now we find however that the 3d reconstructions we build using the OAK-d pro is insufficient after trying out various settings. The code we use to produce the actual depth image is, color image is:
import os
import json
import numpy as np
import depthai as dai
import matplotlib.pyplot as plt
cwd = os.path.join(os.path.dirname(os.path.realpath(__file__)), "images")
# Create pipelinepipeline = dai.Pipeline()
FPS = 1
pipeline = dai.Pipeline()
camRgb = pipeline.create(dai.node.ColorCamera)
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
depth = pipeline.create(dai.node.StereoDepth)
# pointcloud = pipeline.create(dai.node.PointCloud)
sync = pipeline.create(dai.node.Sync)
xOut = pipeline.create(dai.node.XLinkOut)
xOut.input.setBlocking(False)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camRgb.setIspScale(1,3)camRgb.setFps(FPS)
monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
monoLeft.setCamera("left")
monoLeft.setFps(FPS)
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
monoRight.setCamera("right")
monoRight.setFps(FPS)
config = depth.initialConfig.get()
config.postProcessing.speckleFilter.enable = True
config.postProcessing.speckleFilter.speckleRange = 60
config.postProcessing.temporalFilter.enable = True
config.postProcessing.decimationFilter.decimationFactor = 1
config.postProcessing.spatialFilter.holeFillingRadius = 2
config.postProcessing.spatialFilter.numIterations = 1
config.postProcessing.thresholdFilter.minRange = 900 # mm
config.postProcessing.thresholdFilter.maxRange = 1500 # mm
config.censusTransform.enableMeanMode = True
config.costMatching.linearEquationParameters.alpha = 0
config.costMatching.linearEquationParameters.beta = 2
depth.initialConfig.set(config)
depth.initialConfig.setConfidenceThreshold(200)
depth.setLeftRightCheck(True)
depth.setExtendedDisparity(False)
depth.setSubpixel(True)
depth.setSubpixelFractionalBits(4)
depth.setRectifyEdgeFillColor(0) # Black, to better see the cutout
# configure color/depth image outputsmonoLeft.out.link(depth.left)
monoRight.out.link(depth.right)
camRgb.isp.link(sync.inputs["rgb"])
depth.depth.link(sync.inputs["dep"])
sync.out.link(xOut.input)
xOut.setStreamName("out")
# Add outputs for mono framesxOutLeft = pipeline.create(dai.node.XLinkOut)
xOutLeft.setStreamName("mono_left")monoLeft.out.link(xOutLeft.input)
xOutRight = pipeline.create(dai.node.XLinkOut)
xOutRight.setStreamName("mono_right")
monoRight.out.link(xOutRight.input)
# Connect to device and start pipeline
with dai.Device(pipeline, device_info) as device:
print("Connected to device")
q = device.getOutputQueue(name="out", maxSize=4, blocking=False)
qLeft = device.getOutputQueue(name="mono_left", maxSize=4, blocking=False)
qRight = device.getOutputQueue(name="mono_right", maxSize=4, blocking=False)
# Obtain intrinsic parameters from the calibration data
calib_data = device.readCalibration()
K_color = calib_data.getCameraIntrinsics(dai.CameraBoardSocket.CAM_A, dai.Size2f(1920, 1080)) dist_color = calib_data.getDistortionCoefficients(dai.CameraBoardSocket.CAM_A)
K_depth = calib_data.getCameraIntrinsics(dai.CameraBoardSocket.CAM_C, dai.Size2f(640, 400))
# find extrinsics
extrinsics_color = np.eye(4)
color_to_depth = calib_data.getCameraExtrinsics(dai.CameraBoardSocket.CAM_A, dai.CameraBoardSocket.CAM_C) extrinsics_depth = np.array(color_to_depth)
i = 0
while True:
i += 1 # Get frames
message = q.get()
color_frame = message["rgb"].getCvFrame()
depth_frame = message["dep"].getCvFrame().astype(np.float32)
mono_left_frame = qLeft.get().getCvFrame()
mono_right_frame = qRight.get().getCvFrame()
depth_array = np.array(depth_frame)
color_array = np.array(color_frame)[:,:,::-1]
mask = np.ones_like(depth_array)
Here is a link to some image material:
https://drive.google.com/drive/folders/1eog_DYnHkH_4OMERsfkqux5-aag8-8qh?usp=sharing
It contains a reconstruction using realsense data, and then two images of a reconstruction done using the oakd-pro along with a depth map used to produce the reconstruction.
The questions are as follows:
- Are we using suboptimal settings for our purpose? We would like material to be well-defined to the point where antipodal grasps can be proposed using the depth map alone.
- Are we using a suboptimal choice of device? The Oak-d PRO has a baseline of 75mm and is intended for distances 0.7m to 12m, but we are realistically operating in the range 0.6-1.2m. Realsense d435i has a baseline of 50mm and an intended range of 0.3-3m. Could it be that we should select a camera from your line-up which has a shorter baseline but the same good color camera?
- We expect all objects of interest to move a constant velocity (speed and direction). What pre/post-processing would mitigate noise related to this?
I appreciate your response,
Enodo-Oscar