The camera is streaming a depth map that gives me depth in m that is I am getting the z value for each pixel. I want to know the real-world x and y values for each pixel, or basically I want to convert the depth map into a real-world point cloud. How can I do this?
I want this to be a vector based implementation and should not involve any loops so that the implementation is GPU optimised.
I read that this can be done if we have the intrensics of the camera known, but I am unsure what these values would be for a stereo pair of cameras.
Could someone please help out with the implementation?