depth information x, y

RyanLee

Hi, I have a question about the below image. there is x, y coordinates but I don't understand where(which point) it is based on. Can you help me? Please help me to understand the x, y coordinates information.

Best regards,

Ryan.

jakaskerl

Hi RyanLee
[x, y, z] are averages of coordinates for each point inside the ROI.
The demo script also allows the following averaging algorithms:

        elif key == ord('1'):
            calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEAN
            print('Switching calculation algorithm to MEAN!')
            newConfig = True
        elif key == ord('2'):
            calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MIN
            print('Switching calculation algorithm to MIN!')
            newConfig = True
        elif key == ord('3'):
            calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MAX
            print('Switching calculation algorithm to MAX!')
            newConfig = True
        elif key == ord('4'):
            calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MODE
            print('Switching calculation algorithm to MODE!')
            newConfig = True
        elif key == ord('5'):
            calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN
            print('Switching calculation algorithm to MEDIAN!')
            newConfig = True

Hope this helps,
Jaka

freestuff

RyanLee How to get depth of point x,y or area(x,y). I want to use Jetson for YOLOv8 for detection and OAK-D Lite for depth of detected object.

RyanLee

jakaskerl Hi,

Thank you for your valuable feedback. But still it does not match each x, y value for me. Sorry could explain some more and why you are doing the averages of coordnates??

Best regards,

Ryan.

freestuff

jakaskerl How to get depth of point x,y or area(x,y). I want to use Jetson for YOLOv8 for detection and OAK-D Lite for depth of detected object.

jakaskerl

Hi RyanLee
I'm not sure what you mean by "id does not match x, y value for me", could you please elaborate?

The demo works by calculating approximate spatial coordinates for a given region of interest. Ill use this demo to explain.
We run a pipeline and acquire a depth frame which stores depth data (distance from camera along Z axis) for each pixel inside the frame. This depth frame then gets passed to HostSpatialsCalc class which attempts to infer the position for every pixel.
First it sets the center of the frame as coordinate frame origin. The center pixel therefore gets values (x=0, y=0, z=depth). Pixels (x and y) to the left and right from the origin get calculated based on camera FOV, their distance from the center and their depth using pythagorean theorem (since x and y depend on depth).
In short, calculate the angle from center to the desired pixel as seen by the observer (taking FOV into account) and multiplying that by depth.

source code:

def _calc_angle(self, frame, offset, HFOV):
        return math.atan(math.tan(HFOV / 2.0) * offset / (frame.shape[1] / 2.0))

def calc_spatials(self, depthData, roi, averaging_method=np.mean):

        depthFrame = depthData.getFrame()

        roi = self._check_input(roi, depthFrame) # If point was passed, convert it to ROI
        xmin, ymin, xmax, ymax = roi

        # Calculate the average depth in the ROI.
        depthROI = depthFrame[ymin:ymax, xmin:xmax]
        inRange = (self.THRESH_LOW <= depthROI) & (depthROI <= self.THRESH_HIGH)

        # Required information for calculating spatial coordinates on the host
        HFOV = np.deg2rad(self.calibData.getFov(dai.CameraBoardSocket(depthData.getInstanceNum())))

        averageDepth = averaging_method(depthROI[inRange])

        centroid = { # Get centroid of the ROI
            'x': int((xmax + xmin) / 2),
            'y': int((ymax + ymin) / 2)
        }

        midW = int(depthFrame.shape[1] / 2) # middle of the depth img width
        midH = int(depthFrame.shape[0] / 2) # middle of the depth img height
        bb_x_pos = centroid['x'] - midW
        bb_y_pos = centroid['y'] - midH

        angle_x = self._calc_angle(depthFrame, bb_x_pos, HFOV)
        angle_y = self._calc_angle(depthFrame, bb_y_pos, HFOV)

        spatials = {
            'z': averageDepth,
            'x': averageDepth * math.tan(angle_x),
            'y': -averageDepth * math.tan(angle_y)

Averaging like np.mean() is introduced since it gives a more precise result. Depth of a pixel is often very noisy and tends to jump around a lot, but this can be avoided (or at least improved) if we average out the depth of pixels inside a region we are interested in.

Hope this clears thing up a bit, let me know if you have any more questions.

Thanks,
Jaka

jakaskerl

freestuff answered in another thread