ImageManip node causes Depth roi error in MobileNetSpatialDetectionNetwork

FrancisTse

I have been experiencing some difficulty in getting consistent depth readings from MobileNetSpatialDetectionNetwork using an OAK-D-Lite running from a Raspberry Pi 4. It appears that this inconsistency of depth readings were caused by misalignment of the depth roi used by the MobileNetSpatialDetectionNetwork to compute the spatial coordinates.

As an illustration of the issue, I used the example code in spatial_mobilenet.py, to get perfect alignment of the roi to the face detection bounding box. I modified spatial_mobilenet.py to use the face-detection-retail-004 model from the model zoo. Here is the spatial_mobilenet_modified.py code that I ended up with.

When I ran the code with the OAK-D-Lite pointing to a face target, I could see the face was detected and a bounding box drawn around the face on the preview image. On the depth image, there is a clear silhouette of the face target and the roi box is within the face silhouette as expected. BTW, I was able to move the target around both left, right, up, down, closer, further away and the roi box is always within the face target silhouette. I captured the screenshots of one of the preview and depth window instances for illustration.

I then modified the code adding a ImageManip node between the camRgb.preview output and the MobileNetSpatialDetectionNetwork input. The preview size was left as 300x300 and the imageManip .setResize was set to 300x300 as well. Here is the spatial_mobilenet_modified_with_imageManip.py code that I ended up with.

When I ran the new spatial_mobilenet_modified_with_imageManip.py code, the roi box on the depth image got smaller and moved to the upper left side. I captured screenshots of one of the preview_with_ImageManip and depth_with_ImageManip window instances for illustration. Please look for the smaller white box on the upper left corner on the depth_with_ImageManip.png image. One interesting observation is that, as I increase the FRAME_SIZE, e.g. to FRAME_SIZE(800, 800), the roi box size seems to increase proportionally. The larger the FRAME_SIZE, the larger the roi box.

I do not know if I am not using the ImageManip node correctly or if this is what I would expect. Has anyone else seen the same behavior? It would be really nice if I can insert ImageManip nodes to build more complex pipelines and keep the depth detections consistent.

Would appreciate hearing from others who have experimented with building complex pipelines for spatial detections using ImageManip nodes and are getting consistent coordinate results.

Thanks,
Francis.

erik

Hi FrancisTse ,
Thanks for the report! Changing preview size to;

camRgb.setPreviewSize(1920,1080)

Seems to work as expected. But to me, it doesn't seem to work as expected, I will report this to the FW team.
Thanks, Erik

FrancisTse

Hello Erik,
The roi area size seems to increase as I increased the camRgb.setPreviewSize() values. As I increased it to 1920x1080, the roi size for the case with the ImageManip node starts to match the roi size with preview set directly to 300x300.
However, I am hoping to be able to set preview to 1072x1072 as in the gen2-face-recognition experiment. I want to be able to add spatial detection to the face recognition pipeline so I can tell how far the recognized face is.
Hope the FW team can help with this.
Thanks,
Francis.