• 3D object detection using objectron

Hi All,

  1. I have run into an issue while trying to detect an object and get the distance from the camera to the object. According to the documentation (https://github.com/google/mediapipe/blob/master/docs/solutions/objectron.md#camera-coordinate),  the rotation and translation matrices are with respect to the camera coordinate frame (see figure 1) but when I physically measure the distance between the camera and object, the distance/vector does not add up.

For example, if the camera is mounted at some distance from in space as shown in Figure 2, I would like to know the rotation and translation of the center of the bounding box to the camera frame. As per the documentation and my understanding, the rotation and translation of the object are given in terms of the camera coordinates. But if I compute the vector (sqrt(tx2+ty2+tz2)) from this translation vector, I am not getting the distance marked in black in my sketch. Would you guys please point out what I am missing here?

Figure 1.

Figure 2

  1. Lastly, Can you point me in the right direction to annotate and train 3D objects bounding boxes? 

    Hi walelignmessele

    1. Could you also post the results you are getting (landmarks, rotation, translation and scale) for a given image?

    2. I think mediapipe doesn't and won't have a training pipeline for the objectron (source). Though someone suggested this repo which does have the training capabilities and should work similarly.

    Hope this helps,
    Jaka

    Hi Jack,

    Here's what I am getting …

    landmark_3d:landmark {

    x: -0.23063945770263672

    y: 0.24567705392837524

    z: -0.3632972240447998

    }

    landmark {

    x: -0.23018594086170197

    y: 0.24805420637130737

    z: -0.3413083851337433

    }

    landmark {

    x: -0.2664649486541748

    y: 0.27159667015075684

    z: -0.4176005721092224

    }

    landmark {

    x: -0.20126153528690338

    y: 0.20943409204483032

    z: -0.30709484219551086

    }

    landmark {

    x: -0.23754052817821503

    y: 0.232976496219635

    z: -0.38338702917099

    }

    landmark {

    x: -0.2237383872270584

    y: 0.2583776116371155

    z: -0.34320738911628723

    }

    landmark {

    x: -0.26001739501953125

    y: 0.28192001581192017

    z: -0.41949963569641113

    }

    landmark {

    x: -0.19481398165225983

    y: 0.21975743770599365

    z: -0.3089938461780548

    }

    landmark {

    x: -0.23109297454357147

    y: 0.24329984188079834

    z: -0.3852860927581787

    }

    Rotation Matrix:[[ 0.52339786 0.48900229 -0.41368178]

    [ 0.8380273 -0.65292066 0.26844922]

    [-0.15415953 0.57842147 -0.86994398]]

    Translation Matrix:[-0.23063946 0.24567705 -0.36329722]

    Scale:[0.01231865 0.05914985 0.08769785]

    The 9 3D box landmarks correspond to the corners in the following pic

    Hi walelignmessele , This looks like a challenge with objectron model, so it might be best to ask on their github directly. Unfortunately, I am not sure how you'd get the full 3D coordinates (in mm, not normalized) from these results.