Hello
I'd like to translate the pixel coordinates in the camera frame (Oak-D-Pro) to the coordinates in my room. I searched the web for inspiration and learned that quaternions might be the way to go but I'm quite not there to put the pieces together. I saw procedures where markers where placed at specific locations in the room to create a correspondence with the pixels in the camera frame. But since the Oak-D-Pro has an IMU unit, a sense of depth and a laser dot projector, I wonder if placing markers in the room can be avoided. I imagine that the only coordinates I need to know is the x-y-z position of the camera in the room and that the remaining necessary parameters for the calculations could be obtained from the IMU, FOV, depth etc.
It would be super nice and cool and generous if some bright minds could walk me, and potential other seekers, through the process of calibration and calculation. At some point in my journey I'd like to track objects in the camera frame and assign them real world coordinates.