Hey Jaka,
Working with rsantan here, we've verified that the normalized bounding boxes our model is sending are the same values right up until we send the ROI configuration. My thought is that because the color frame is just from the perspective of the left lens, when it gets to the depth frame (which is fused left and right) it gets scaled incorrectly.
We de-normalize the boxes and convert from x,y,w,h to x1,y1,x2,y2.
Do you have an email we can forward the code too?