Suggestions for finding min/max xyz coordinates in camera view?

WestwardWinds

Hello, I am working on a project using OAK-D Pro in which I want to track the position of people within a room and eventually trigger events based on that information. I have a lot of it figured out- I know that the upper left corner of frame is 0,0 and that the mobilenet SSD spatial coordinates use the center of frame as 0,0,0. I haven't included calibration in my code just yet but I know that I can use that to transform the detections' spatial coordinates into world coordinates in relation to the calibration image's location.

What I haven't been able to get a handle on is how to pull the min and max x,y,z measurements from the depth stream so that I can eventually divide the room into regions for triggers.

Say I have a 10x10 room that I want divided into 4 regions (lets ignore how resolution cropping would change how much of the room the camera can see). I would need at least the minimum and maximum x and z measurements (in camera space) or x and y measurements (in world space) to compare against the coordinates of the tracked people wouldn't I?

I'm sure I could hard code the room size and do something like trigger is True if spatialcoordinate.x is >= a1 and <= a2 && spatialcoordinate.z is >= b1 and <= b2 but ideally I could have it done either on the fly or during the calibration setup so I could, for example, move the camera into an 8x12 room and not have to rewrite my code. I'm pretty flexible in how I could implement this and I would have access to the rooms ahead of time if your suggestion includes having something physical in the room.

Thanks!

jakaskerl

Hi WestwardWinds
Perhaps you could calibrate it for a new room by making a person detected with spatial detection stand in all four corners of the room. The room would then automatically divide in regions specified beforehand. This would allow you to accommodate for a room without changing your code.
Depth maps are usually noisy and you might get wrong measurements if you only focus on min and max depth.

Thoughts?
Jaka

WestwardWinds

Thank you for the suggestion, jakaskerl. I ended up trying to implement a SpatialLocationCalculator node alongside my MobileNet so that users could click on the cv2 window to create ROI to create the bounds. Unfortunately, I found some weird issues related to running both at the same time, it seems to really disrupt the depth frame generation so I am back to square one. I will most likely open up an issue on GitHub if I can narrow down the problem a little bit more.

I will most likely end up doing as you suggest, I just wanted to try to be fancy with it. I figure I can just record the current coordinates reported by the tracked person with a keypress to log them and then compare tracked object coordinates to those logged points.

However, one other thing I was considering was making some ArUco markers and placing them in the space and doing the point tracking off of that. Do you have any thoughts on that versus your suggestion?

Thanks as always

jakaskerl

Hi WestwardWinds
Yes, using aruco markers to map out a room is a great idea and it's more accurate. You also probably wouldn't need to use spatial calculator as many aruco examples calculate the camera-to-aruco vector anyway.
It is a bit more complex in my opinion since many CV2 features were changed with updates so the majority of examples and guides won't work without some manual refractoring.

Id suggest using chatGPT or some other LLM to aid you in coding this if you are unfamiliar.

Thanks,
Jaka