I have been experiencing some difficulty in getting consistent depth readings from MobileNetSpatialDetectionNetwork
using an OAK-D-Lite
running from a Raspberry Pi 4. It appears that this inconsistency of depth readings were caused by misalignment of the depth roi
used by the MobileNetSpatialDetectionNetwork
to compute the spatial coordinates.
As an illustration of the issue, I used the example code in spatial_mobilenet.py, to get perfect alignment of the roi
to the face detection bounding box. I modified spatial_mobilenet.py to use the face-detection-retail-004
model from the model zoo. Here is the spatial_mobilenet_modified.py code that I ended up with.
When I ran the code with the OAK-D-Lite
pointing to a face target, I could see the face was detected and a bounding box drawn around the face on the preview
image. On the depth
image, there is a clear silhouette of the face target and the roi
box is within the face silhouette as expected. BTW, I was able to move the target around both left, right, up, down, closer, further away and the roi
box is always within the face target silhouette. I captured the screenshots of one of the preview and depth window instances for illustration.
I then modified the code adding a ImageManip
node between the camRgb.preview
output and the MobileNetSpatialDetectionNetwork input
. The preview
size was left as 300x300 and the imageManip .setResize
was set to 300x300 as well. Here is the spatial_mobilenet_modified_with_imageManip.py
code that I ended up with.
When I ran the new spatial_mobilenet_modified_with_imageManip.py
code, the roi
box on the depth
image got smaller and moved to the upper left side. I captured screenshots of one of the preview_with_ImageManip and depth_with_ImageManip window instances for illustration. Please look for the smaller white box on the upper left corner on the depth_with_ImageManip.png
image. One interesting observation is that, as I increase the FRAME_SIZE
, e.g. to FRAME_SIZE(800, 800)
, the roi
box size seems to increase proportionally. The larger the FRAME_SIZE, the larger the roi
box.
I do not know if I am not using the ImageManip
node correctly or if this is what I would expect. Has anyone else seen the same behavior? It would be really nice if I can insert ImageManip
nodes to build more complex pipelines and keep the depth detections consistent.
Would appreciate hearing from others who have experimented with building complex pipelines for spatial detections using ImageManip nodes and are getting consistent coordinate results.
Thanks,
Francis.