MobileNetSpatialDetectionNetwork with portrait cam?

Aartificiel · Dec 27, 2021

hello!

using an OAK-D installed in portrait orientation. got the code working for RGB and monos (including the 2-stage manip rotate at resolution of multiples of 16 and resize (without preserving aspect ratio) to 300x300 for mobilenet). using C++.

everything "works" (in terms of compile/running), NN recognizes and frames things, but the depth values and tracking are wonky and i now assume that the stereo code is thrown off by the cameras locations (above/below instead of left/right).

is there a way to compensate for that?

or should i dislocate the depth and NN, use non-spatial MobileNetDetection, iterating the tracklets and calling the spatial calculator on depth image with rotated coords?

in other words is the MobileNetSpatialDetectionNetwork a simple "convenient" one-stop solution that correlates the NN coords with the depth frame and slaps the coordinates on the blobs, or is there more to it than that? does the ObjectTracker care about depth data?

endgame is to get the equivalent data produced by the spatial_object_tracker example, but in portrait orientation.

thanks!
alex.

erik · Dec 30, 2021

artificiel sorry for the delay. I assume your suggestion should work. The main reason is that the color frames are rotated, while depth frames aren't, which leads to the issue you have. You could also try rotating depth frames (not sure if it would work, but worth a try). As you mentioned, it's a one-stop solution, it doesn't (currently) take into the account any potential rotations.
Thanks, Erik

Aartificiel · Jan 2, 2022

hello Erik! thanks for the follow-up.

maybe I was not clear but yes I do rotate the mono cams, and I can confirm the depth value are "wrong". I assume the StereoDepth expects left-right disparity. it would be great to have a way to indicate the camera orientation to StereoDepth.

i have a follow-up question: I'm passing the RGB data to Mobilenet 300x300 with camRgb.setPreviewKeepAspectRatio(False), so the whole RGB frame is considered (it works well with our conditions and targets). the ImgDetections coordinates are returned the <0..1> range. is that <0..1> range calibrated to the <0..1> range of a SpatialLocationCalculator fed with StereoDepth? or is there something else required to correlate correctly the RGB and depth data?

(or course in my sideways case I will need to rotate the ImgDetection coordinates -90º before feeding then to the LocationCalculator).

thanks,
alex.

erik · Jan 4, 2022

Hello artificiel ,
could you share the minimal reproducible code for this detecting after rotating? Another option would be to rotate after object detection/tracking, so you don't get into issues that you are facing. Thoughts?
Regarding second question, it's the 0..1 of the RGB frame, as inference is done on the RGB frame.
Thanks, Erik