On the oak-d, from e.g. a 1024x1024 color video frame (setVideoSize) x,y pixel find the corresponding x,y in the stereo frame (same field of view).
Is there an easy way to do this or it has to be done from scratch by factoring in the color camera cropping plus the difference in field of view?

Hi dexter , you are right, rgb-depth alignment would be the way to go🙂
Thanks, Erik

    Thanks erik
    How do I know when a config update with new ROIs made it through to the output?
    The SpatialLocationCalculatorConfig don't seem to take any id/time info, or?

    • erik replied to this.

      Hi dexter ,
      I am not entirely sure what you mean with "when a config update", but SpatialLocationCalculatorData object (output of SpatialLocationCalculator) does have getSequenceNum() and getTimestamp() methods. Would that work?
      Thanks, Erik

        Hi erik
        I assume it takes a little time before a ROI update through SpatialLocationCalculatorConfig makes it to the calculator output. I read in one post that a way to "id" the config is to make minor changes to depthThresholds and check for the matching one on the output.
        I'm doing a detection step first on an RGB frame and then using that as ROIs in the spatial calculator. That brings in a delay in that the frames in the left/right cameras corresponding to the RGB frame used for detection are gone before the config is updated. Thus the spatial info is based on newer frames. Is there a way to compensate for that such that the "frame age" matches?

        • erik replied to this.

          Hi dexter ,
          Since you are doing detection + spatial calc (instead of Spatial detection node) I assume your model isn't Yolo/SSD, and you have to do decoding on the host before you can calculate coordinates? Another option would be to calculate spatial coordinates on the host (demo here). Would that work?
          Thanks, Erik

            Hi erik
            I'm doing text detection and I think the output decoding is probably too complex to be done in a python node. (?)
            I'm using a node for spatial calculations now, but I could try doing it on the host side. Is that spatials on host demo code available in C++ some place?

            • erik replied to this.

              Hi dexter ,
              A team member was just looking into this (porting gen2-ocr demo), but we decided we will be decoding it on the host for now. There's no demo for c++ available, but porting it from python should be quite simple, at least from depthai perspective, as API is 1:1 (python and c++) . Thoughts?
              Thanks, Erik

                Thanks erik
                I'll try porting it, just checking to save me some time... (it took a while to port the detection and ocr parts)
                Will the sequence numbers be the same for color and spatial frames or they need to be synced some other way?
                Perhaps I could even warp the perspective on the spatial frame to get values from rotated rects...

                • erik replied to this.

                  dexter By default, it won't be, color frames will arrive much earlier than spatial detections (as for that you need to perform NN inference and combine it with depth, which takes some time).

                    erik
                    I'm buffering color frames already for looking up the ones used by detection (as in the python demo).
                    I was thinking I could buffer the spatial frames as well to try and find the best match for the color frame used. Detection is using sequence number for this. So the question is how to best find a spatial detection that matches the time of the color frame.

                    I tried buffering the stereo frames and mapping using time stamps. Seems like the time difference is around 20ms or better. This works if the StereoDepth node output time stamp comes from its input frame time stamps, does it? (couldn't find it in the docs)
                    Since I already have the code for using the SpatialLocationCalculator I will try to feed these buffered frames to it before I convert the code for doing the calculation on the host.

                    • erik replied to this.

                      Hi dexter ,

                      StereoDepth node output time stamp comes from its input frame time stamps, does it?

                      Yes, that's correct. I would suggest checking SW msg syncing docs as well, should be quite helpful in your situation.

                      Thanks, Erik

                        Thanks erik
                        The link helped. I was missing setting the same fps for all cameras. Now the frame sequence and time stamps line up perfectly.
                        The only problem now is that the text recognition network is so slow...