Accuracy of spatial detection and possible improvements

Lluiz_sutil · Sep 28, 2021

Hello there, I'm building an object detection software involving calculating the area of a tile on a surface with the OAK-D camera. I detect the 4 corners of the object and calculate de area using the Euclidean space formula for (x,y,z) from an arbitrary angle. It works quite well at a small distance (around 70-100cm), however the accuracy of my Z axis is not reliable enough to get an accurate result at the distance I need (around 2-3 meters). From what I've been reading, there should be a 10% loss in accuracy according to the distance, so if my object is 1000 mm away, there should be a 100 mm variance, so from 900 to 1000 mm, is that correct?
From what I tested in other examples, sub pixel would give me a better result at a larger distance, but as it is not currently supported in mobileNet spatial detection network, is there any other way I could possibly get better accuracy in my depth readings?
I was also looking into the OAK-D with infrared camera, and was wondering if that could potentially enhance my accuracy by providing extra contrast between the floor and my object, maybe with LED or external lighting.

erik · Sep 28, 2021

Hello luiz_sutil,
depth has about 3% variance, so _3cm at 1m and that matches our own testings as well. Subpixel should work better, if it's not supported by mobileNet spatial detection network you could also calculate the depth on the host (like here). I would also suggest using latest develop library (python3 -m pip install --user --extra-index-url https://artifacts.luxonis.com/artifactory/luxonis-python-snapshot-local "depthai==2.10.0.0.dev+388dc8f7178cbfe1e24fc03d5f190ea9671ce0aa"), as there are additional stereo depth fixes on that branch. For stereo depth, you would need good texture of both floor and tile to get best results, so not sure how IR LED/camera would be helpful. Or are you thinking for the color camera, where you are running NN?
Thanks, Erik

Lluiz_sutil · Sep 30, 2021

I used the function to calculate on the host, it gave me a better result! However my biggest problem seems to be a variance in depth, for example in the palm detection the distance calculated from the camera to the palm varies very little if at all, while in my program I get a large amount of variance. I am using the same camera settings, so I'm not sure why that would be, does the way the the palm gets detected make any diference? For a 70x40 tile, im getting results from 65-80 x 35-50.
Thanks for the quick responses, Luiz.

erik · Sep 30, 2021

Hello luiz_sutil ,
I doubt it, since the bounding box (of the palm) is downscaled to 0.3 and it averages depth in that region. I would look into min/max threshold; this is the key element here, to reduce the noise and therefore have better accuracy for spatial coordinates.
Thanks, Erik

Lluiz_sutil · Sep 30, 2021

Would the size of the bounding box affect the precision/variance of the depth? I added my own model to the palm detection program, and while the detection of the tile gives me a very stable measurement, the detection of the corners still vary quite a lot (jumps from 1900 to 2300 mm) while measuring my hand at the same distance gives me a stable reading as well. I tried to increase the bounding box size for the corners but it didn't make much difference. Any tips on how I should proceed?

erik · Oct 1, 2021

The (downscaled) bounding box is the area where algorithm will take all depth points and average them out. So if the depth map is noisy, you might get noisy spatial coordinates. This can be mitigated with appropriate min/max threshold and adequate ROI area.
Thanks, Erik

Aasif · Oct 10, 2022

I use OAK-D POE to measure the difference of depths (distance in mm from camera) between the left eye and nose-tip of a human face. The difference in depth between eye and nose-tip of a human face might be 0.5cm to 2cm. But I don't get the correct depth. The 3% variance or 1.8% @ 1m makes the depth (distance from camera) of eye and nose-tip almost the same.
What should I do to get the difference of depth in eye and nose-tip? Is there any OAK-D POE camera to generate such an accurate depth?

erik · Oct 10, 2022

Hi asif , I'd first suggest placing the OAK closer to the face - about 80cm (min depth=70cm by default) and using 800P depth resolution, together with subpixel mode enabled. It should be below 1% error rate at that distance. Otherwise you might want to look at our upcoming OAK-D-SR, which will be more accurate for short distances. Thoughts?
Thanks, Erik