I'm using Oak cameras as part of a robotic rover system and I'm trying to get a better understanding on the techniques for mapping. I know that the Oak pipeline has a point cloud node that will output point cloud data from a single input depth frame. But this is not what I am getting at. I would like to know if there's a good method for taking a series of data either from the point cloud node or depth data that has been converted to coordinates and putting it all together to create a map that is larger than just a single depth frame worth of data. If the accuracy of the disparity/depth images was perfect then this question wouldn't exist because you could just continually add the coordinate data together as it streams in to make the world model. But unfortunately there are a lot of factors which make that approach unfeasible in the real world.
I want to simplify the setup as much as possible because I'm only interested in solving the specific problem with the now 100% accurate depth data coming from the Oak camera. These are the constraints:
- The system I have uses a high frequency RTK receiver, so I can get very accurate camera poses using the RTK stream for positioning and the IMU on the camera for orientation. The RTK data typically arrives sooner than the RGB/depth data from the camera anyways, so I don't have to worry about timing issues. This completely eliminates the need for any kind of SLAM algorithm, so the positioning problem doesn't need to be considered for this.
- I don't need millimeter perfect accuracy. I'm using a voxel grid where the voxel size is anywhere from 2 to 5cm, so very minute position inaccuracies are not a concern.
- The rover is generally always in motion. The max speed is around 5m/s. I have tried the temporal filter and while it does a very good job eliminating much of the depth value random noise, it looks really poor when the camera is moving.
- Most importantly, these calculations will be happening on a somewhat powerful host. I've been using an 80+ watt TDP laptop with a discrete GPU so the compute capability is not a concern. I have zero expectation of a mapping procedure like this being able to run on the Oak device.
I have come up with some ideas for how to possibly get around this but they are very rudimentary. The easiest solution I have tried so far is to just use averaging. To do this I take a batch of coordinate data, 10 depth frames worth for example, bin all the coordinates into their respective voxel locations so that each voxel in the grid has a population which indicates how many points from the coordinate/point cloud data fell within that voxel. Lastly I just set a threshold, so a voxel has to have over a certain population in order to be added to the world model. This method works ok, but there are limitations. If the batch of depth data is stationary or moves in a relatively straight path, then all is well. However, if the camera rotates quickly, like when the rover is making a sharp turn, then there can be issues. The camera rotation could be initiated half way through the batch of 10 frames, which put part of the turn in one frame batch, and the completion of the turn spans half way into the next batch. In these cases, those areas where the camera panned through typically don't do very well because those voxels don't get imaged much, so the populations will naturally be low.
It is a bit unfortunate that the depthai library does not offer any kind of mapping, meshing, or occupancy gridding functionality similar to something like stereolabs. Although it isn't something provided to users in the api, I would be shocked if Luxonis hasn't researched this topic. I remember reading something about SLAM being supported on the RAE device, so I'm sure this topic is something that had to be addressed before. I would imagine there is probably research about characterizing stereo depth noise and ways of mitigating it, so even if the specifics are for internal uses only, it would be great to get some basic information to steer me in the right direction. I would be extremely appreciative of any insight that can be provided on this topic, even if it's as simple as research papers or flow charts for procedures.