I have been experiencing some difficulty in getting consistent depth readings from MobileNetSpatialDetectionNetwork using an OAK-D-Lite running from a Raspberry Pi 4. It appears that this inconsistency of depth readings were caused by misalignment of the depth roi used by the MobileNetSpatialDetectionNetwork to compute the spatial coordinates.
As an illustration of the issue, I used the example code in spatial_mobilenet.py, to get perfect alignment of the roi to the face detection bounding box. I modified spatial_mobilenet.py to use the face-detection-retail-004 model from the model zoo. Here is the spatial_mobilenet_modified.py code that I ended up with.
When I ran the code with the OAK-D-Lite pointing to a face target, I could see the face was detected and a bounding box drawn around the face on the preview image. On the depth image, there is a clear silhouette of the face target and the roi box is within the face silhouette as expected. BTW, I was able to move the target around both left, right, up, down, closer, further away and the roi box is always within the face target silhouette. I captured the screenshots of one of the preview and depth window instances for illustration.
I then modified the code adding a ImageManip node between the camRgb.preview output and the MobileNetSpatialDetectionNetwork input. The preview size was left as 300x300 and the imageManip .setResize was set to 300x300 as well. Here is the spatial_mobilenet_modified_with_imageManip.py code that I ended up with.
When I ran the new spatial_mobilenet_modified_with_imageManip.py code, the roi box on the depth image got smaller and moved to the upper left side. I captured screenshots of one of the preview_with_ImageManip and depth_with_ImageManip window instances for illustration. Please look for the smaller white box on the upper left corner on the depth_with_ImageManip.png image. One interesting observation is that, as I increase the FRAME_SIZE, e.g. to FRAME_SIZE(800, 800), the roi box size seems to increase proportionally. The larger the FRAME_SIZE, the larger the roi box.
I do not know if I am not using the ImageManip node correctly or if this is what I would expect. Has anyone else seen the same behavior? It would be really nice if I can insert ImageManip nodes to build more complex pipelines and keep the depth detections consistent.
Would appreciate hearing from others who have experimented with building complex pipelines for spatial detections using ImageManip nodes and are getting consistent coordinate results.
Thanks,
Francis.