I'm also doing spatial object detection with the RGB camera set to its maximum FoV and "squeezing" the RGB images into the detector. More specifically, my ColorCamera
node is set with:
setResolution(dai::ColorCameraProperties::SensorResolution::THE_12_MP)
setIspScale(1, 3)
setPreviewKeepAspectRatio(false)
In contrast, the resolution of the monochrome cameras is set to the lowest possible configuration, that is dai::MonoCameraProperties::SensorResolution::THE_400_P
.
Looking into the RGB and depth images, then projecting the detection coordinates to a plane, it does look like the spatial detector is doing the "right thing" and correctly aligning / scaling the data so that they match as well as possible — which it should be able to do by computing projections from the intrinsic / extrinsic data available in the OAK camera itself. See below a few examples:
The one problem I have is that, as can be seen above, there is a relatively large region at the borders of the RGB image for which no depth data is available. When an object is detected at these regions, position estimates can get wildly wrong, as seen below:
It would be nice if there were some mechanism in the DepthAI API to guard against such cases — for example, an option to ignore any detection with bounds beyond the region covered by the depth map.