• DepthAI
  • Full FOV + Depth in Detection Network

If I use YoloSpatialDetectionNetwork to squeeze or letterbox my image (as described here) and then feed it to
YoloSpatialDetectionNetwork, should I expect the depth to work? It seems like there's no way for it to know how to align the disparity map. I've tried to figure this out experimentally but the results have been ambiguous. Is there a correct way to get full FOV and get the correct depth out of a Detection Network?

  • erik replied to this.

    Hello lss5fc0d ,
    I would suggest displaying also passthrough depth frame and depth bounding boxes - as shown in this example, on the left frame of the video. That way, it will be clear if the mapping is done correctly or not.
    THanks, Erik

    2 months later

    I'm trying to do the same thing, and have displayed the depth frame and depth bounding boxes as shown in the example provided. The bounding boxes do not seem to match the bounding boxes from the stretched camera preview. What suggestions would you have for fixing this? I'm thinking I could split the YoloSpatialDetectionNetwork into a YoloDetectionNetwork and a SpatialLocationCalculator, and warp the bounding box that is output from the YoloDetectionNetwork according to how the feed was stretched, and then feed that into the SpatialLocationCalculator. Does this make sense, and/or are there other possible solutions?

    • erik replied to this.
      2 years later

      I'm also doing spatial object detection with the RGB camera set to its maximum FoV and "squeezing" the RGB images into the detector. More specifically, my ColorCamera node is set with:

      setResolution(dai::ColorCameraProperties::SensorResolution::THE_12_MP)
      setIspScale(1, 3)
      setPreviewKeepAspectRatio(false)

      In contrast, the resolution of the monochrome cameras is set to the lowest possible configuration, that is dai::MonoCameraProperties::SensorResolution::THE_400_P.

      Looking into the RGB and depth images, then projecting the detection coordinates to a plane, it does look like the spatial detector is doing the "right thing" and correctly aligning / scaling the data so that they match as well as possible — which it should be able to do by computing projections from the intrinsic / extrinsic data available in the OAK camera itself. See below a few examples:

      The one problem I have is that, as can be seen above, there is a relatively large region at the borders of the RGB image for which no depth data is available. When an object is detected at these regions, position estimates can get wildly wrong, as seen below:

      It would be nice if there were some mechanism in the DepthAI API to guard against such cases — for example, an option to ignore any detection with bounds beyond the region covered by the depth map.

        Hi xperroni
        Could you check where the spatial bounding box is located. I think it stays within the bounds of depth, which makes it not lay on top of the detected object. That results in depth being bad. I think you could change this by using a different averaging method (setSpatialCalculationAlgorithm()) and lowering the setBoundingBoxScaleFactor().

        Thanks,
        Jaka

          Hi jakaskerl

          jakaskerl Could you check where the spatial bounding box is located.

          How can I do that?

          jakaskerl I think it stays within the bounds of depth, which makes it not lay on top of the detected object.

          That would explain the results I'm getting at the borders, but wouldn't change the underlying issue — that objects detected close to the borders of the RGB image lack depth data.

          jakaskerl I think you could change this by using a different averaging method (setSpatialCalculationAlgorithm()) and lowering the setBoundingBoxScaleFactor().

          The bounding box scale factor is set to 0.5 and the object depth is computed by using the MIN algorithm. But I don't see how changing either would fix the issue with objects at the borders. If the object is outside the bounds of the depth map, whatever algorithm I choose will still be working on incorrect data; nor will changing the bounding box scale factor help if there is no "correct" ROI to narrow down to.

          In the end it still seems that the only practical solution is to discard objects detected too close to the borders.

            xperroni

            for detection in detections:
                        roiData = detection.boundingBoxMapping
                        roi = roiData.roi
                        roi = roi.denormalize(depthFrameColor.shape[1], depthFrameColor.shape[0])
                        topLeft = roi.topLeft()
                        bottomRight = roi.bottomRight()
                        xmin = int(topLeft.x)
                        ymin = int(topLeft.y)
                        xmax = int(bottomRight.x)
                        ymax = int(bottomRight.y)
                        cv2.rectangle(depthFrameColor, (xmin, ymin), (xmax, ymax), color, 1)

            xperroni In the end it still seems that the only practical solution is to discard objects detected too close to the borders.

            Likely yes, afaik there isn't a way to do that in the API.

            Thanks,
            Jaka