A
Aingu7ae

  • Mar 27, 2023
  • Joined May 22, 2022
  • 0 best answers
  • Hi erik ,

    so far I found two approaches, Direct Linear Transform (DLT) and the Projective 3 Point Algorithm (P3P).

    With DLT you can approximate the intrinsic and extrinsic camera matrices, given at least six points in the picture / video and their corresponding world coordinates. That's the method the tutorial is using I referenced a few post ago.

    With the P3P algorithm you approximate the extrinsic camera matrix, given three points and their corresponding world coordinates and the intrinsic camera matrix. Since we know our intrinsic matrix this would probably be the way to go.

    Here are 5 minute intro videos about DLT and P3P.
    And these are the extended video lectures about DLT and P3P.

    Harnessing the capabilities of the IMU would be a blast! Then the only missing piece is the position of the camera, which we could meassure ourselvs, which could be a big timesaver plus we wouldn't need to carry around (huge?) checkerboard patterns 😉.

    I assume that for the depicted scene in my inital posting the checkerboard needs to be quite large for the calibration process to recognize the pattern?! Do you know a formular to estimate the minimum size of the checkerboard squares, given the camera's intrinsic matrix and the distance from the camera?

    I asked ChatGPT:

    Q: How to get the camera translation matrix from the camera intrinsic matrix and the camera rotation matrix, without knowing the translation vector 't'?

    Without knowing the translation vector t, it is not possible to compute the full camera matrix.

    However, if you have some information about the scene or the camera setup, you may be able to estimate the translation vector t using techniques such as visual odometry or structure from motion. Once you have an estimate for t, you can compute the full camera matrix.

    Maybe it would suffice to use the pictures from the stereo cameras to approximate the position of the camera in relation to features? But I haven't followed this rabbit hole (yet).

    Kind regards

    • Hi jakaskerl

      Trying to do the chessboard calibration afterwards is not optimal in my situation, unfortunately. I would have to reattach and try to align the camera to the orignal position in the video footage and this will most certainly introduce a calibration error since I never gonna manage to align the camera in exactly the same way. Besides getting access to the location and a big enough time slot to do so poses the even bigger challenge.

      This is why I'm looking for a method to get the extrinsics from the video material. Sorry, I just realize that the prior sentence would have been a better title for my question.

      It would probably suffice to just reconstruct the rotation matrix of the camera, since all the other variables like intrinsic matrix, translation vector are known (more or less 😉).

      What about the calib.json file from the gen2-record-replay scripts I used for the recording? Is something in there what we could use to reconstruct the rotation matrix?

      This afternoon I came across another interesting approach which is called vanishing point camera calibration . But a drawback with that method seems that we can compute jaw and pitch from the image but not the roll... so this approach might be not so usefull after all.

      If you or anybody else reading this knows other methods worth exploring I'd be happy to try!

      • erik replied to this.
      • Hi jakaskerl, thank you for your explanation, I really appreciate that.

        Here is what I have trouble wrapping my mind around. Although I did the chessboard calibration at a different location afterwards and with a completely different camera pose, the process you describe (if I understand you: multiplying the points/vectors of the pointcloud with the cam-to-world matrix of the chessboard calibration) will yield the real world positions of the pointcloud points in relation to the (first) camera at the ceiling which recorded the original video material and that had a different pose than the calibration camera?

        • Hello erik,

          thank you for taking an interest 🙂 Yeah, I did the multi-cam calibration which gave me a few matrices.

          world_to_cam
          [[ 0.61411884 0.7890858 -0.01420041 -0.21579439]
          [-0.00386166 0.02099733 0.99977207 0.31440464]
          [ 0.78920412 -0.61392403 0.01594204 1.05200151]
          [ 0. 0. 0. 1. ]]

          cam_to_world
          [[ 0.61411884 -0.00386166 0.78920412 -0.6965064 ]
          [ 0.7890858 0.02099733 -0.61392403 0.80952764]
          [-0.01420041 0.99977207 0.01594204 -0.33416839]
          [ 0. 0. 0. 1. ]]

          trans_vec
          [[-0.21579439]
          [ 0.31440464]
          [ 1.05200151]]

          rot_vec
          [[-1.43083528]
          [-0.71236433]
          [-0.70309224]]

          intrinsics_mat
          [[1546.18994140625, 0.0, 962.3573608398438]
          [0.0, 1546.18994140625, 531.37255859375]
          [0.0, 0.0, 1.0]]

          I guess first I have to manage to create the pointcloud from the video material. I recorded the scene with gen2-record-replay. Is gen2-pointcloud a good starting place to do so?

          How is the workflow to transform the world coordinates once I aquired the pointcloud?

          I found this tutorial about geometric camera calibration (Camera Calibration with Example in Python). I presume this is another approach? The math looks a little scary. Given 6 points in the image plane and their corresponding world coordinates, it should be possible to reconstruct the extrinsic matrix. The points have to be independent but I don't know what it means in this context. I can get the world coordinates for 6 points in the image, but how do I check for independence? I thought there are no more then 3 independent vectors in 3D space?! I'm feeling a little lost here and don't know what to learn first to fill my blanks. If you or other readers got a pointer that would be nice!

          • Hi, does anybody know a tool or have code to get the extrinsics afterwards?

            • erik replied to this.
            • Do I notice if frames drop during the recording process? I looked at record.py but I had the impression that this might happen silently.

              Yesterday I tested the script but was a little unprepared for the amount of data the recording creates. So I had to record with LOW quality setting. But not today! Today is a fine day for a big enough NVMe 🙂

              Do you think the berry can handle the HIGH and BEST quality settings?

            • Hello!

              In the days ahead I got the opportunity to place an OAK-D Pro in a theatre and record contemporary dance. What recording option(s) would you recommend, that enables the most options for learning the ropes, building models, experimenting and having fun afterwards? The OAK is connected to a Raspberry.

              Thanks for your suggestions & kind regards!

              • erik replied to this.
              • Thank you, erik! This looks pretty much what I was looking for, I will expore both paths!

              • Hello!

                I'd like to use video recordings to test/evaluate the performance of the OAK-D Pro and the loaded model. Is there a way to feed h.265/h.264 streams (like the ones created in the VideoEncoder examples) into to chip in the camera? And if so, what's a good approach?

                Kind regards

                • erik replied to this.
                • Hello erik & lss5fc0d, thanks for your feedback! I gotta work through your suggestions and the examples first and I have a hunch that I'll come back with more specific questions. Kind regrads

                • Hello

                  I'd like to translate the pixel coordinates in the camera frame (Oak-D-Pro) to the coordinates in my room. I searched the web for inspiration and learned that quaternions might be the way to go but I'm quite not there to put the pieces together. I saw procedures where markers where placed at specific locations in the room to create a correspondence with the pixels in the camera frame. But since the Oak-D-Pro has an IMU unit, a sense of depth and a laser dot projector, I wonder if placing markers in the room can be avoided. I imagine that the only coordinates I need to know is the x-y-z position of the camera in the room and that the remaining necessary parameters for the calculations could be obtained from the IMU, FOV, depth etc.

                  It would be super nice and cool and generous if some bright minds could walk me, and potential other seekers, through the process of calibration and calculation. At some point in my journey I'd like to track objects in the camera frame and assign them real world coordinates.

                  • erik replied to this.