3D object detection using objectron

walelignmessele · May 11, 2023

Hi All,

I have run into an issue while trying to detect an object and get the distance from the camera to the object. According to the documentation (https://github.com/google/mediapipe/blob/master/docs/solutions/objectron.md#camera-coordinate), the rotation and translation matrices are with respect to the camera coordinate frame (see figure 1) but when I physically measure the distance between the camera and object, the distance/vector does not add up.

For example, if the camera is mounted at some distance from in space as shown in Figure 2, I would like to know the rotation and translation of the center of the bounding box to the camera frame. As per the documentation and my understanding, the rotation and translation of the object are given in terms of the camera coordinates. But if I compute the vector (sqrt(tx²+ty²+tz²)) from this translation vector, I am not getting the distance marked in black in my sketch. Would you guys please point out what I am missing here?

Figure 1.

Figure 2

Lastly, Can you point me in the right direction to annotate and train 3D objects bounding boxes?

jakaskerl · May 11, 2023

Hi walelignmessele

Could you also post the results you are getting (landmarks, rotation, translation and scale) for a given image?
I think mediapipe doesn't and won't have a training pipeline for the objectron (source). Though someone suggested this repo which does have the training capabilities and should work similarly.

Hope this helps,
Jaka

walelignmessele · May 12, 2023

Hi Jack,

Here's what I am getting …

landmark_3d:landmark {

x: -0.23063945770263672

y: 0.24567705392837524

z: -0.3632972240447998

}

landmark {

x: -0.23018594086170197

y: 0.24805420637130737

z: -0.3413083851337433

}

landmark {

x: -0.2664649486541748

y: 0.27159667015075684

z: -0.4176005721092224

}

landmark {

x: -0.20126153528690338

y: 0.20943409204483032

z: -0.30709484219551086

}

landmark {

x: -0.23754052817821503

y: 0.232976496219635

z: -0.38338702917099

}

landmark {

x: -0.2237383872270584

y: 0.2583776116371155

z: -0.34320738911628723

}

landmark {

x: -0.26001739501953125

y: 0.28192001581192017

z: -0.41949963569641113

}

landmark {

x: -0.19481398165225983

y: 0.21975743770599365

z: -0.3089938461780548

}

landmark {

x: -0.23109297454357147

y: 0.24329984188079834

z: -0.3852860927581787

}

Rotation Matrix:[[ 0.52339786 0.48900229 -0.41368178]

[ 0.8380273 -0.65292066 0.26844922]

[-0.15415953 0.57842147 -0.86994398]]

Translation Matrix:[-0.23063946 0.24567705 -0.36329722]

Scale:[0.01231865 0.05914985 0.08769785]

walelignmessele · May 12, 2023

The 9 3D box landmarks correspond to the corners in the following pic

jakaskerl · May 12, 2023

And the measured distance is?

Thanks,
Jaka

walelignmessele · May 12, 2023

1.7m from the camera

walelignmessele · May 13, 2023

^^ @erik @jakaskerl Thanks!

erik · May 13, 2023

Hi walelignmessele , This looks like a challenge with objectron model, so it might be best to ask on their github directly. Unfortunately, I am not sure how you'd get the full 3D coordinates (in mm, not normalized) from these results.

walelignmessele · May 15, 2023

@erik I did ask this question and waiting for a response. @jakaskerl do you have any other suggestions please?