Get 3D location from RGB detections with OAK-D

RocaPiedra · Jul 12, 2024

Hi!

I'm trying to use an OAK-D Pro to detect and locate objects. So far I've managed to get the detection part working properly but I'm struggling to get the distance in the stereo. This is due to the disparity between the RGB sensor and the Stereo. I've started to use spatialCalcConfigInQueue(device.getInputQueue("spatialCalcConfig")) but the lack of alignment between both sources of data causes a massive error, making the location pretty much useless.

In the image below you can see my detections:

As you can see, the direct translation to the depth is quite far from the real detection:

I started to test the alignment but after setting the DepthAlign self.stereo.setDepthAlign(depthai.CameraBoardSocket.RGB) nothing changed. I understand that there is an additional step afterwards that performs a resize that ultimately allows a pretty good alignment but being this out of the pipeline I cannot see how can I get the spatial calculation for the aligned stereo data. Is there any straightforward way to do this and get an accurate spatial location?

I thought about doing the opposite, resizing the RGB image and calculating the locations with the "aligned" RGB image. I'm going to try this next, but I'm afraid that the resizing will negatively impact the accuracy of my detections.

I found some examples that use NN and output spatial location but being a neural network model I could not find how it is done.

Thanks in advance!

jakaskerl · Jul 12, 2024

Hi @RocaPiedra
self.stereo.setDepthAlign(depthai.CameraBoardSocket.RGB) this should do it. Perhaps there is a mismatch between the FOV between RGB and depth so only the display is wrong?

Thanks,
Jaka

RocaPiedra · Jul 15, 2024

jakaskerl
Hi Jaka,

The measured values are wrong. The real distance in Z is like 0.5 m, the displayed distance is the distance to the ground (matches the measurement with the location of the measurement in the image). What happens when you perform the setDepthAlign? Can we expect pixel-to-pixel correlation between RGB and Depth after matching the resolutions of both images? If so, something is definitely not working properly.

Some additional information, I'm handling everything within a class that stores all the device configurations, maybe there is something wrong with my setup:

def __init__(self, rgb_res:depthai.ColorCameraProperties.SensorResolution = depthai.ColorCameraProperties.SensorResolution.THE_12_MP, rgb_preview_res: Resolution = Resolution(1280,720), depth_res: depthai.MonoCameraProperties.SensorResolution = depthai.MonoCameraProperties.SensorResolution.THE_400_P, stereoProfile:depthai.node.StereoDepth.PresetMode = depthai.node.StereoDepth.PresetMode.HIGH_ACCURACY, use_alignment: bool = False, calibrate: bool = True):

        self.pipeline = depthai.Pipeline()

        self.cam_rgb = self.pipeline.create(depthai.node.ColorCamera)

        self.monoLeft = self.pipeline.create(depthai.node.MonoCamera)

        self.monoRight = self.pipeline.create(depthai.node.MonoCamera)

        self.stereo = self.pipeline.create(depthai.node.StereoDepth)

        self.spatialLocationCalculator = self.pipeline.create(depthai.node.SpatialLocationCalculator)

        self.prepareRGBPipeline(rgb_res, rgb_preview_res)

        self.prepareSpatialLocationCalculator(depth_res, stereoProfile)

        if use_alignment: self.prepareAlignment()

        self.device = depthai.Device(self.pipeline)

        self.RGBQueue = self.device.getOutputQueue("rgb", maxSize=1, blocking=False) # set the max size to stop processing old images

        self.depthQueue = self.device.getOutputQueue(name="depth", maxSize=1, blocking=False) # set the max size to stop processing old images

        self.spatialCalcQueue = self.device.getOutputQueue(name="spatialData", maxSize=1, blocking=False)

        self.spatialCalcConfigInQueue = self.device.getInputQueue("spatialCalcConfig")

        self.device.setIrLaserDotProjectorBrightness(750) # in mA, 0..1200 - over 750 saturates (Docs)

        self.device.setIrFloodLightBrightness(0)

        if calibrate: self.calibrateRGB()

    def prepareRGBPipeline(self, res: depthai.ColorCameraProperties.SensorResolution, preview_res: Resolution):

        self.cam_rgb.setResolution(res)

        self.cam_rgb.setPreviewSize(preview_res.width, preview_res.height)

        self.cam_rgb.setInterleaved(False)

        self.xout_rgb = self.pipeline.create(depthai.node.XLinkOut)

        self.xout_rgb.setStreamName("rgb")

        self.cam_rgb.preview.link(self.xout_rgb.input)

        return True

    def prepareAlignment(self):

        self.stereo.setDepthAlign(depthai.CameraBoardSocket.RGB)

        return True

    def prepareSpatialLocationCalculator(self, monoRes:depthai.MonoCameraProperties.SensorResolution = depthai.MonoCameraProperties.SensorResolution.THE_400_P, stereoProfile:depthai.node.StereoDepth.PresetMode = depthai.node.StereoDepth.PresetMode.HIGH_ACCURACY):

            # Define sources and outputs

        self.xOutDepth = self.pipeline.create(depthai.node.XLinkOut)

        self.xoutSpatialData = self.pipeline.create(depthai.node.XLinkOut)

        self.xinSpatialCalcConfig = self.pipeline.create(depthai.node.XLinkIn)

        self.xOutDepth.setStreamName("depth")

        self.xoutSpatialData.setStreamName("spatialData")

        self.xinSpatialCalcConfig.setStreamName("spatialCalcConfig")

        # Properties

        self.monoLeft.setResolution(monoRes)

        self.monoLeft.setCamera("left")

        self.monoRight.setResolution(monoRes)

        self.monoRight.setCamera("right")

        self.stereo.setDefaultProfilePreset(stereoProfile)

        self.stereo.setLeftRightCheck(True)

        self.stereo.setSubpixel(True)

        # Config

        self.initROIConfig()

        # Linking

        self.monoLeft.out.link(self.stereo.left)

        self.monoRight.out.link(self.stereo.right)

        self.spatialLocationCalculator.passthroughDepth.link(self.xOutDepth.input)

        self.stereo.depth.link(self.spatialLocationCalculator.inputDepth)

        self.spatialLocationCalculator.out.link(self.xoutSpatialData.input)

        self.xinSpatialCalcConfig.out.link(self.spatialLocationCalculator.inputConfig)

        return True

Thank you for your time.

jakaskerl · Jul 15, 2024

Hi @RocaPiedra
Could you create a MRE so I can try it locally. The issue seems outside the pipeline object.
You can send it here or to me directly if privacy concerns you (jaka.skerl@luxonis.com).

Thanks,
Jaka

RocaPiedra · Jul 16, 2024

Hi Jaka!

I'm not sure how I solved it, but the current alignment is noticeably better:

Still, as you can see in the image above, I haven't been able to reduce the error completely.

To give you some context, I'm trying to get to the detected item with a robot arm. I expect some error in the depth calculation but I've noticed it is pretty much consistent with the alignment error, if I solve it I should be able to give accurate enough directions to the robot. Below you can see what happens when I try to align the RGB camera with the detection, the target is below and to the right of the center of the image, which is quite close to the actual centroid of the bounding box:

You can see it better here:

I'm currently using autofocus and it will probably be the main source of error, but the application I've devised has multiple stages and I need proper focus in each of them to get accurate detections. I would like to switch between autofocus and fixed focus to guarantee that each time I move the robot to another stage the camera is properly focused. Is there any way to perform the alignment every time I change location? I'm not sure about what I can and cannot change on the fly without having to regenerate the full pipeline. I understand that I can modify capture parameters (brightness and such), focus and filters. I've read somewhere that stereo configuration can be updated too, does that include the alignment?

If you still need the MRE to solve my questions let me know and I'll get it ready, currently, my code has classes that have information I cannot disclose.

Thanks!

jakaskerl · Jul 17, 2024

RocaPiedra I'm not sure about what I can and cannot change on the fly without having to regenerate the full pipeline. I understand that I can modify capture parameters (brightness and such), focus and filters. I've read somewhere that stereo configuration can be updated too, does that include the alignment?

The stereo parameters can't be changed during runtime (efforts are being made to have it work). Currently you need to modify the intrinsics each time the lens moves during AF.

RocaPiedra If you still need the MRE to solve my questions let me know and I'll get it ready

Is alignment good when proper AF value is selected?

Thanks,
Jaka

RocaPiedra · Jul 18, 2024

jakaskerl

jakaskerl The stereo parameters can't be changed during runtime (efforts are being made to have it work). Currently you need to modify the intrinsics each time the lens moves during AF.

Can you elaborate on this? I'm not sure what you mean by intrinsics, I'm pretty new to Depth AI and I do not really know what I can and cannot do to my pipeline and my device objects during runtime. I've been playing with the control and config inputs, what else can I do? Is it possible to change either the pipeline or the device nodes during execution?

The thing is, if I have to reconnect to the camera every time I change position to get proper alignment, it will increase my operation time a lot as it takes some seconds to run my constructor class where I generate the pipeline and configure the device. Increasing the processing time of the location algorithm 5 or 10 times is not an option.

Yeah I've tried to do it with manual focus, nothing has changed, this is what I'm doing (I get an error due to calling device before than it is created but the default value is the lens position I want - messy, but I haven't discovered how to perform the SetDepthAlignment after device creation without breaking the code):

def calibrateRGB(self):
        try:
            lensPosition = self.getLensPosition()
        except Exception as e:            
            lensPosition = 129
            print("Error [OAKCam.calibrateRGB]: SETTING LENS POSITION TO DEFAULT VALUE " + str(lensPosition) + " - " + str(e))
        if lensPosition:
            self.cam_rgb.initialControl.setAutoFocusMode(depthai.CameraControl.AutoFocusMode.OFF)
            self.cam_rgb.initialControl.setManualFocus(lensPosition)
            self.current_lens_position = lensPosition
    
    def getLensPosition(self):
        calibData = self.device.readCalibration2()
        return calibData.getLensPosition(depthai.CameraBoardSocket.RGB)

Here you can see the results:

And here you can see the camera position after alignment:

Pretty much the same error as before if not the exact same, seems like focus was not the issue here. BTW, error is mostly the same wherever the detection is, which suprised me, I thought it would be related to the angle of the detection and the lenses or something:

It seems quite consistent, which made me think it might be an issue with my bounding box definition? This is how I generate it from the detection contour:

# ROI is generated with the references of opencv images, 0,0 is top left corner and only positive values
	@staticmethod
	def generateROI(contour) -> depthai.Rect:
		if contour is not None:
			x, y, w, h = cv2.boundingRect(contour)
			topLeft = depthai.Point2f(x, y)
			bottomRight = depthai.Point2f(x + w, y + h)
			return depthai.Rect(topLeft, bottomRight)			
		else:
			raise Exception("[generateROI]: Contour is not ready")

Then send it to the queue like this:

def SetSpatialCalcROI(self, ROI: depthai.Rect, RGBRes: Resolution, DepthRes: Resolution):
        self.spatialCalcConfigInQueue.send(self.setROIConfig(ROI, RGBRes, DepthRes))

def setROIConfig(self, new_ROI: depthai.Rect, resRGB: Resolution, resDepth: Resolution,
                     algorithm: depthai.SpatialLocationCalculatorAlgorithm = depthai.SpatialLocationCalculatorAlgorithm.MEDIAN):
        
        self.config.roi = self.scaleROI(new_ROI, resRGB, resDepth)
        self.config.calculationAlgorithm = algorithm
        cfg = depthai.SpatialLocationCalculatorConfig()
        cfg.addROI(self.config)
        
        return cfg

And then I just paint it on the image.

I haven't had time to prepare the MRE, in any case here is the class constructor where I define everything related to the camera in case it is relevant:

class OakCAM():       
    
    def __init__(self, 
                 rgb_res:depthai.ColorCameraProperties.SensorResolution = depthai.ColorCameraProperties.SensorResolution.THE_12_MP, 
                 rgb_preview_res: Resolution = Resolution(1280,720), 
                 depth_res: depthai.MonoCameraProperties.SensorResolution = depthai.MonoCameraProperties.SensorResolution.THE_400_P,
                 stereoProfile:depthai.node.StereoDepth.PresetMode = depthai.node.StereoDepth.PresetMode.HIGH_ACCURACY,
                 use_alignment: bool = False,
                 calibrate: bool = True):        
        self.pipeline = depthai.Pipeline()
        # define sources of data
        self.cam_rgb = self.pipeline.create(depthai.node.ColorCamera)
        self.monoLeft = self.pipeline.create(depthai.node.MonoCamera)
        self.monoRight = self.pipeline.create(depthai.node.MonoCamera)
        self.stereo = self.pipeline.create(depthai.node.StereoDepth)
        self.spatialLocationCalculator = self.pipeline.create(depthai.node.SpatialLocationCalculator)

        self.use_alignment = use_alignment
        
        self.prepareRGBPipeline(rgb_res, rgb_preview_res)
        if calibrate: self.calibrateRGB()
        self.prepareSpatialLocationCalculator(depth_res, stereoProfile)
        if use_alignment: self.prepareAlignment()

        self.device = depthai.Device(self.pipeline)
        self.RGBQueue = self.device.getOutputQueue("rgb", maxSize=1, blocking=False) # set the max size to stop processing old images
        self.depthQueue = self.device.getOutputQueue(name="depth", maxSize=1, blocking=False) # set the max size to stop processing old images
        self.spatialCalcQueue = self.device.getOutputQueue(name="spatialData", maxSize=1, blocking=False)
        self.spatialCalcConfigInQueue = self.device.getInputQueue("spatialCalcConfig")
        self.controlQueue = self.device.getInputQueue("RGBControlIn")
        self.configQueue = self.device.getInputQueue("RGBConfigIn")

        self.device.setIrLaserDotProjectorBrightness(750) # in mA, 0..1200 - over 750 saturates (Docs)
        self.device.setIrFloodLightBrightness(0)
        
        self.current_lens_position = self.getLensPosition()
                
    def prepareRGBPipeline(self, res: depthai.ColorCameraProperties.SensorResolution, preview_res: Resolution):    
        # set initial RGB config
        self.cam_rgb.setResolution(res)
        self.cam_rgb.setPreviewSize(preview_res.width, preview_res.height)
        self.cam_rgb.setInterleaved(False)
        # create links for inputs and outputs
        self.rgb_control_in = self.pipeline.create(depthai.node.XLinkIn)
        self.rgb_config_in = self.pipeline.create(depthai.node.XLinkIn)
        self.xout_rgb = self.pipeline.create(depthai.node.XLinkOut)
        # configure stream names
        self.rgb_control_in.setStreamName("RGBControlIn")
        self.rgb_config_in.setStreamName("RGBConfigIn")
        self.xout_rgb.setStreamName("rgb")
        # link preview with output
        self.rgb_control_in.out.link(self.cam_rgb.inputControl)
        self.rgb_config_in.out.link(self.cam_rgb.inputConfig)
        self.cam_rgb.preview.link(self.xout_rgb.input) # se podría cambiar por raw, isp, video, still o preview, dependiendo de la salida 
        return True

    def prepareAlignment(self):
        self.stereo.setDepthAlign(depthai.CameraBoardSocket.RGB)
        return True
    
    def prepareSpatialLocationCalculator(self, 
                                         monoRes:depthai.MonoCameraProperties.SensorResolution = depthai.MonoCameraProperties.SensorResolution.THE_400_P,
                                         stereoProfile:depthai.node.StereoDepth.PresetMode = depthai.node.StereoDepth.PresetMode.HIGH_ACCURACY):
            # Define sources and outputs
        self.xOutDepth = self.pipeline.create(depthai.node.XLinkOut)
        self.xoutSpatialData = self.pipeline.create(depthai.node.XLinkOut)
        self.xinSpatialCalcConfig = self.pipeline.create(depthai.node.XLinkIn)

        self.xOutDepth.setStreamName("depth")
        self.xoutSpatialData.setStreamName("spatialData")
        self.xinSpatialCalcConfig.setStreamName("spatialCalcConfig")

        # Properties
        self.monoLeft.setResolution(monoRes)
        self.monoLeft.setCamera("left")
        self.monoRight.setResolution(monoRes)
        self.monoRight.setCamera("right")

        self.stereo.setDefaultProfilePreset(stereoProfile)
        self.stereo.setLeftRightCheck(True)
        self.stereo.setSubpixel(True)

        # Config        
        self.initROIConfig()
        
        # Linking
        self.monoLeft.out.link(self.stereo.left)
        self.monoRight.out.link(self.stereo.right)

        self.spatialLocationCalculator.passthroughDepth.link(self.xOutDepth.input)
        self.stereo.depth.link(self.spatialLocationCalculator.inputDepth)

        self.spatialLocationCalculator.out.link(self.xoutSpatialData.input)
        self.xinSpatialCalcConfig.out.link(self.spatialLocationCalculator.inputConfig)
        return True

Thank you for your time Jaka, I'll try to prepare the MRE tomorrow.

jakaskerl · Jul 19, 2024

RocaPiedra I'm not sure what you mean by intrinsics,

https://www.baeldung.com/cs/focal-length-intrinsic-camera-parameters

RocaPiedra I've been playing with the control and config inputs, what else can I do? Is it possible to change either the pipeline or the device nodes during execution?

Generally no, but you can send frames selectively using Script node. But that would not apply to your use-case.

RocaPiedra The thing is, if I have to reconnect to the camera every time I change position to get proper alignment, it will increase my operation time a lot as it takes some seconds to run my constructor class where I generate the pipeline and configure the device. Increasing the processing time of the location algorithm 5 or 10 times is not an option.

That is a valid concern. If that turns out to be an issue, there are some changes in the works that allow changing calibration during runtime.

RocaPiedra Thank you for your time Jaka, I'll try to prepare the MRE tomorrow.

please make it as minimal as you possibly can.

Thanks,
Jaka