I'm curious about the algorithm that is performing RGB depth alignment in StereoDepth. Would you please describe the algorithm in detail?

If I had to guess, the algorithm would look roughly something like:

  1. Compute depth map for left (or right) mono image according to normal stereo matching process
  2. For each pixel in depth map, compute the disparity relative to the RGB frame using the depth at that pixel and the baseline to the RGB camera (disparity = f_x * baseline_to_RGB / depth). Note that the baseline in this equation is different than the one between the two mono cameras that was used during stereo matching and may need to be negative.
  3. Use computed disparity to compute the location in the RGB depth map corresponding to the current pixel in the mono depth map
  4. Put the depth at the current pixel of the mono depth map into the corresponding pixel of the RGB depth map

Is that the idea? Can you provide more detail? Does image rectification for the RGB image play a role and if so, would the final depth map produced by StereoDepth only be valid for the rectified RGB image?

Hi @asperry
Yes, that's what the algorithm does. It firstly gets the depth map from mono cameras, and then makes a 3D reprojection of RGB image on the depth map and after that, it reprojects it back into the RGB cooridnate system. Because of projection in 3D world the RGB image must be rectified to achieve best results with alignment. The script will still work with unrectified RGB image, but the misalignment on the edges will be visible (since the lines on the edges of the frames will not be straight anymore).

Thanks,
Jaka

Thanks for your reply @jakaskerl! However, I think I am now more confused. Could you be more specific about what is meant by "and then makes a 3D reprojection of RGB image on the depth map and after that, it reprojects it back into the RGB cooridnate system"? Because that does not sound like what I described in my original post, but you said "Yes, that's what the algorithm does" to my original post. My original description does not involve any 3D points or reprojection at all. Is the algorithm closed source? Could I read it somewhere?

Hi @asperry
Sorry, I misunderstood which script you meant. If you are referring to this one, then yes, there is an error in the code since rgb image would need to be rectified to have a proper alignment. And yes, the algos are closed source for now.

Thanks,
Jaka

2 months later

Hi @jakaskerl,

I am confused now too. The example code you are referring to does apply RGB image undistortion by setting the mesh source in this line of code. Do you mean something else than image undistortion by the term 'rectified images'?

Hi @florm
'rectified images' refers to undistortion and rectification output.

Thanks,
Jaka

8 days later

Hi @jakaskerl

I'm still confused by your use of the term "rectified" for the RGB image. Rectification typically refers to computing a 3x3 rotation matrix that causes the image plane from two or more cameras to be on the same plane. Therefore, implied in the use of the word "rectified" would be another camera to rectify with.

In the case of the RGB camera, it does not take part in the stereo matching process and, besides that, there does not seem to be a rectification matrix for the RGB camera in the calibration on EEPROM indicating that rectification for the RGB image is not available natively from the camera calibration.

So are you saying that we need to compute a rectification matrix ourselves for the RGB image? Or are you using the term "rectified" loosely here for the RGB camera to just indicate "undistorted"?

Hi @asperry
I believe I was using the term correctly; correct me if I've made a mistake. The idea was that the RGB camera would have to be rectified in order to have perfect alignment with the mono cameras. Which is not the case in depthai, since we don't rectify the RGB. We only rectify the mono cameras, the RGB is undistorted only. And that is good enough for most cases.

Thanks,
Jaka

Hi @jakaskerl

So you are indeed meaning rectify in the sense that I understand it. But then that brings me back to my original question about how the RGB alignment is actually being computed. If the algorithm is as you said, that is, 3D points are being projected onto the RGB image, then you wouldn't need to rectify the RGB image to do that.

Do we really need to rectify the RGB image? I'm asking because if we do need to, we will have to compute the rectification manually since it is not part of the camera calibration on EEPROM.

10 days later

Hi @asperry
Yes, the computation needs to be done manually with openCV for RGB image, since it was not done in previous srcipt. Or for alignment, you can try improved script which will later also be deployed. link

Thanks,
Jaka