Depth Estimation of object

SOnusingh · 2025-07-07T11:16:18+00:00

Is it possible to detect the depth of a person after I have trained yolo ? I have oak ffc 4p and a monocular camera setup of IMX378. I am not looking for accuracy, but the best possible thing next to that.

jakaskerl · 2025-07-07T11:56:22+00:00

SOnusingh
I know we have some existing monocular depth models, but perhaps there is a better solution for an already trained yolo.. cc @KlemenSkrlj

Thanks,
Jaka

KlemenSkrlj · 2025-07-07T12:09:26+00:00

Correct, with a mono setup your ideal pipeline would be to run the YOLO and also run something like MiDaS (as @jakaskerl already shared). You could do some experimenting though if you get better depth prediction if you run mono-depth just on the crop of the person or if it is better to run on the whole frame. Currently we have MiDaS exported for more horizontal input shapes so doing it just on a crop (which would presummebly be more vertical) might not yield the best results. So definitely worth playing around a bit with it.

SOnusingh · 2025-07-07T13:38:43+00:00

Sure I will look into that. I have been trying to convert luxonis/depth-anything-v2:vit-s-mde-indoors-336x252 onnx

to blob file but the converter fails.

Should I go this way? or try with MiDaS?

KlemenSkrlj · 2025-07-07T14:05:17+00:00

Yeah DepthAnythingV2 is not supported on RVC2 - there are some operations that fail and also the model itself is too big/complex. If you check official models on the ZOO and you see that we didn't export it for specific architecture it usually means that it is not supported or we don't recommend using it on that platform.
So yeah, I would suggest you go with MiDaS and you won't need to re-export it since it is already in our ZOO - but you need to use DepthAIv3 to use the model directly since its is a .superblob file packaged in a NNArchive. You can refer to the documentation here on how to setup inference.

SOnusingh · 2025-07-13T10:10:32+00:00

Thanks, however I don't know how but I was able to get it to work. Not what the results I was looking for, as the results are only relative depth but yeah. It does work.