Hello team, I have been using gen2 spatial cal on host to work with my object detection model but after some testings, the distance i got was inaccurate. Therefore, i am looking into spatial calculation on 1 location or multi roi, but i come up with a few questions: in both files, can i change the resolution of the depthFrameColor? i want to change the topLeft = dai.Point2f(0.4, 0.4) and bottomRight = dai.Point2f(0.6, 0.6) based on the pixel location but i do not know how. it seems that the x, y coordinates of both points are percentage of of a pixel location/ width or height that were denormalized. however, the denormalize function was inside a for loop, so I do not know if i should have these point at the place where they were assigned or put them inside the for loop

Hi KwinT! The resolution of the depthFrameColor cannot be directly changed in the spatial_calculator_multi_roi.py or spatial_location_calculator.py files. The resolution is determined by the depth map produced by the StereoDepth node in the pipeline, which is based on the input from the MonoCamera nodes. If you want to change the resolution, you would need to adjust the resolution of the MonoCamera nodes, which will in turn affect the resolution of the depth map and the depthFrameColor. The topLeft and bottomRight points are indeed defined as a percentage of the width and height of the frame. The values range from 0 to 1, with (0,0) representing the top-left corner of the frame and (1,1) representing the bottom-right corner. If you want to define these points based on pixel locations, you would need to convert the pixel locations to this normalized coordinate system. For example, if you have a frame of size 1280x720 and you want the top-left point to be at pixel location (320,180), you would calculate the normalized coordinates as (320/1280, 180/720) = (0.25, 0.25). Similarly, if you want the bottom-right point to be at pixel location (960,540), you would calculate the normalized coordinates as (960/1280, 540/720) = (0.75, 0.75). You can assign these normalized coordinates to topLeft and bottomRight respectively. These points can be assigned outside the for loop, as they are not dependent on the loop iterations. Sources: https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_location_calculator/#source-code https://docs.luxonis.com/projects/api/en/latest/components/nodes/spatial_location_calculator/#usage https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_calculator_multi_roi/#source-code https://docs.luxonis.com/projects/api/en/latest/components/nodes/spatial_location_calculator/#spatiallocationcalculator Hope this helps! 🙂 Best, Luxonis BOT

Hi KwinT The resolution of the depthFrameColor cannot be directly changed in the spatial_calculator_multi_roi.py or spatial_location_calculator.py files. The resolution is determined by the depth map produced by the StereoDepth node in the pipeline, which is based on the input from the MonoCamera nodes. If you want to change the resolution, you would need to adjust the resolution of the MonoCamera nodes, which will in turn affect the resolution of the depth map and the depthFrameColor. The topLeft and bottomRight points are indeed defined as a percentage of the width and height of the frame. The values range from 0 to 1, with (0,0) representing the top-left corner of the frame and (1,1) representing the bottom-right corner. If you want to define these points based on pixel locations, you would need to convert the pixel locations to this normalized coordinate system. For example, if you have a frame of size 1280x720 and you want the top-left point to be at pixel location (320,180), you would calculate the normalized coordinates as (320/1280, 180/720) = (0.25, 0.25). Similarly, if you want the bottom-right point to be at pixel location (960,540), you would calculate the normalized coordinates as (960/1280, 540/720) = (0.75, 0.75). You can assign these normalized coordinates to topLeft and bottomRight respectively. These points can be assigned outside the for loop, as they are not dependent on the loop iterations. Thanks, Jaka

Hi jakaskerl , Thank you for your explanation. I was trying to combine my Objection detection by YOLO with the spatial__location caculator.py but I encountered this error "[18443010018D7E0E00] [3.8.3] [40.575] [SpatialLocationCalculator(3)] [error] No ROI specified for depth calculator! Skipping frame". what I did was taking the bounding box around the object detected as a ROI and I want to have the spatial cal as the output: I defined the topLeft and bottomRight by taking x1/frame RGB width, y1/ frame RGB height, x2/frame RGB width, y2/ frame RGB height where x1,y1 are coordinates of the top left of the bounding box, x2, y2 are coordinates of the bottom right of the bounding box. But I got that error. I resized the RGB frame to be equal to the frame of depthFrameColor and assigned topLeft and bottomRight variable as the formula above but still got the same error. So one of my guesses was because the fps might be too fast so the SpatialLocationCalculator func could not process? Another guess was that spatial_location_calculator.py only works for 1 ROI so when multiple boxes were captured leading to that error.? if you could give me a suggestion, what spatial calculator model should be used to possibly achieve what I am trying to do? ( except for the gen2spatialcalonhost, which was not a good option for me)

spatial_calculator_multi_roi.py or spatial_location_calculator.py questions

KwinT

Hello team,

I have been using gen2spatialcalonhost to work with my object detection model but after some testings, the distance i got was inaccurate. Therefore, i am looking into spatial calculation on 1 location or multi roi, but i come up with a few questions:

in both files, can i change the resolution of the depthFrameColor?
i want to change the topLeft = dai.Point2f(0.4, 0.4) and bottomRight = dai.Point2f(0.6, 0.6) based on the pixel location but i do not know how. it seems that the x, y coordinates of both points are percentage of of a pixel location/ width or height that were denormalized. however, the denormalize function was inside a for loop, so I do not know if i should have these point at the place where they were assigned or put them inside the for loop

jakaskerl

Hi KwinT

The resolution of the depthFrameColor cannot be directly changed in the spatial_calculator_multi_roi.py or spatial_location_calculator.py files. The resolution is determined by the depth map produced by the StereoDepth node in the pipeline, which is based on the input from the MonoCamera nodes. If you want to change the resolution, you would need to adjust the resolution of the MonoCamera nodes, which will in turn affect the resolution of the depth map and the depthFrameColor.
The topLeft and bottomRight points are indeed defined as a percentage of the width and height of the frame. The values range from 0 to 1, with (0,0) representing the top-left corner of the frame and (1,1) representing the bottom-right corner. If you want to define these points based on pixel locations, you would need to convert the pixel locations to this normalized coordinate system. For example, if you have a frame of size 1280x720 and you want the top-left point to be at pixel location (320,180), you would calculate the normalized coordinates as (320/1280, 180/720) = (0.25, 0.25). Similarly, if you want the bottom-right point to be at pixel location (960,540), you would calculate the normalized coordinates as (960/1280, 540/720) = (0.75, 0.75). You can assign these normalized coordinates to topLeft and bottomRight respectively. These points can be assigned outside the for loop, as they are not dependent on the loop iterations.

Thanks,
Jaka

KwinT

Hi jakaskerl,

Thank you for your explanation. I was trying to combine my Objection detection by YOLO with the spatial__locationcaculator.py but I encountered this error "[18443010018D7E0E00] [3.8.3] [40.575] [SpatialLocationCalculator(3)] [error] No ROI specified for depth calculator! Skipping frame".

what I did was taking the bounding box around the object detected as a ROI and I want to have the spatial cal as the output:

I defined the topLeft and bottomRight by taking x1/frame RGB width, y1/ frame RGB height, x2/frame RGB width, y2/ frame RGB height where x1,y1 are coordinates of the top left of the bounding box, x2, y2 are coordinates of the bottom right of the bounding box. But I got that error.
I resized the RGB frame to be equal to the frame of depthFrameColor and assigned topLeft and bottomRight variable as the formula above but still got the same error.

So one of my guesses was because the fps might be too fast so the SpatialLocationCalculator func could not process? Another guess was that spatial_location_calculator.py only works for 1 ROI so when multiple boxes were captured leading to that error.?

if you could give me a suggestion, what spatial calculator model should be used to possibly achieve what I am trying to do? ( except for the gen2spatialcalonhost, which was not a good option for me)

jakaskerl

KwinT Another guess was that spatial_location_calculator.py only works for 1 ROI so when multiple boxes were captured leading to that error.?

Not the case: https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_calculator_multi_roi/

Did you miss spatial_calc.initialConfig.addROI(config) in your code?

KwinT So one of my guesses was because the fps might be too fast so the SpatialLocationCalculator func could not process?

Likely not the case either since the node is HW accelerated.

On-device location calculator is the most efficient way of doing it.

Thanks,
Jaka

KwinT

jakaskerl

Did you miss spatial_calc.initialConfig.addROI(config) in your code?

No I have this line in my code.

'''import cv2

import depthai as dai

import socket

import json

import errno

from ultralytics import YOLO

import numpy as np

stepSize = 0.05
newConfig = False
# Create pipelinepipeline = dai.Pipeline()
# Define sources and outputsmonoLeft = pipeline.create(dai.node.MonoCamera)

monoRight = pipeline.create(dai.node.MonoCamera)

stereo = pipeline.create(dai.node.StereoDepth)

spatialLocationCalculator = pipeline.create(dai.node.SpatialLocationCalculator)
xoutDepth = pipeline.create(dai.node.XLinkOut)

xoutSpatialData = pipeline.create(dai.node.XLinkOut)

xinSpatialCalcConfig = pipeline.create(dai.node.XLinkIn)
xoutDepth.setStreamName("depth")

xoutSpatialData.setStreamName("spatialData")

xinSpatialCalcConfig.setStreamName("spatialCalcConfig")
# Properties

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoLeft.setCamera("left")

monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoRight.setCamera("right")
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)

stereo.setLeftRightCheck(True)stereo.setSubpixel(True)
# Create YOLO

modelmodel = YOLO('yolov8n.pt')

# Define source and outputcamRgb = pipeline.create(dai.node.ColorCamera)

xoutVideo = pipeline.create(dai.node.XLinkOut)
xoutVideo.setStreamName("video")
# Properties

camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

camRgb.setVideoSize(640,400)
xoutVideo.input.setBlocking(False)

xoutVideo.input.setQueueSize(1)
# LinkingcamRgb.video.link(xoutVideo.input) ###for RGB
monoLeft.out.link(stereo.left)

monoRight.out.link(stereo.right)
spatialLocationCalculator.passthroughDepth.link(xoutDepth.input)

stereo.depth.link(spatialLocationCalculator.inputDepth)
spatialLocationCalculator.out.link(xoutSpatialData.input)

xinSpatialCalcConfig.out.link(spatialLocationCalculator.inputConfig)

# Connect to device and start pipeline

while True:

videoIn = video.get()

# Get BGR frame from NV12 encoded video frame to show with OpenCV

frame = videoIn.getCvFrame()

# Perform YOLOv8 inference on the frame

results = model(frame)

# Get annotated frame with YOLOv8 predictions

annotated_frame = results[0].plot()

# All data needed to be transferred

result=results[0]

output=[]

# roi

inDepth = depthQueue.get() # Blocking call, will wait until a new data has arrived

depthFrame = inDepth.getFrame() # depthFrame values are in millimeters

depth_downscaled = depthFrame[::4]

if np.all(depth_downscaled == 0):

min_depth = 0 # Set a default minimum depth value when all elements are zero

else:

min_depth = np.percentile(depth_downscaled[depth_downscaled != 0], 1)

max_depth = np.percentile(depth_downscaled, 99)

depthFrameColor = np.interp(depthFrame, (min_depth, max_depth), (0, 255)).astype(np.uint8)

depthFrameColor = cv2.applyColorMap(depthFrameColor, cv2.COLORMAP_HOT)

spatialData = spatialCalcQueue.get().getSpatialLocations()

# YOLO

for box in result.boxes:

# coordinates of the bounding box

x1,y1,x2,y2=[round(x) for x in box.xyxy[0].tolist()]

# obj id

class_id=box.cls[0].item()

id=result.names[class_id]

output.append([x1,y1,x2,y2, id])

print(output)

#### x1,y1=topLeft, x2,y2=bottomRight

# config

# size of the box

'''topLeft = dai.Point2f(x1/depthFrameColor.shape[1], y1/depthFrameColor.shape[0])

bottomRight = dai.Point2f(x2/depthFrameColor.shape[1], y2/depthFrameColor.shape[0])'''

topLeft = dai.Point2f(x1/630, y1/390)

bottomRight = dai.Point2f(x2/630, y2/390)

config = dai.SpatialLocationCalculatorConfigData()

config.depthThresholds.lowerThreshold = 100

config.depthThresholds.upperThreshold = 10000

calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN

## size is using from here

config.roi = dai.Rect(topLeft, bottomRight)

spatialLocationCalculator.inputConfig.setWaitForMessage(False)

spatialLocationCalculator.initialConfig.addROI(config)

for depthData in spatialData:

roi = depthData.config.roi

roi = roi.denormalize(width=depthFrameColor.shape[1], height=depthFrameColor.shape[0])

xmin = int(roi.topLeft().x)

ymin = int(roi.topLeft().y)

xmax = int(roi.bottomRight().x)

ymax = int(roi.bottomRight().y)

depthMin = depthData.depthMin

depthMax = depthData.depthMax

fontType = cv2.FONT_HERSHEY_TRIPLEX

cv2.rectangle(depthFrameColor, (xmin, ymin), (xmax, ymax), color, 1)

cv2.putText(depthFrameColor, f"X: {int(depthData.spatialCoordinates.x)} mm", (xmin + 10, ymin + 20), fontType, 0.5, color)

cv2.putText(depthFrameColor, f"Y: {int(depthData.spatialCoordinates.y)} mm", (xmin + 10, ymin + 35), fontType, 0.5, color)

cv2.putText(depthFrameColor, f"Z: {int(depthData.spatialCoordinates.z)} mm", (xmin + 10, ymin + 50), fontType, 0.5, color)

# Display the annotated frame

cv2.imshow("YOLOv8 Inference", annotated_frame)

cv2.imshow("depth", depthFrameColor)

if cv2.waitKey(1) == ord('q'):

break

cv2.destroyAllWindows()'''

jakaskerl

Hi @KwinT
Seems there are times at the beginning when ROI is not defined yet. You need to add a default value to it.
Example:
https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_location_calculator/.

Thanks,
Jaka

KwinT

i did give the ROI values for topLeft and bottomRight, but I put them inside the 1st for loop instead of defining them before the "with dai.Device(pipeline) as device:" loop (since you replied in the first post of this topic that it's ok to do it) and I could not make it work. If you mean that I have to add the values for the ROI before entering the "with dai.Device(pipeline) as device:" such as giving those 2 points arbitrary values then that is not my purpose since I want the distance of the bounding box, taken as an ROI.

So do you have any ways to incorporate the spatial location calculation into an object detection model such as YOLO?

jakaskerl

KwinT If you mean that I have to add the values for the ROI before entering the "with dai.Device(pipeline) as device

Add it before and after. The initial roi is set for the startup and the one in the loop is re-set during runtime.

KwinT So do you have any ways to incorporate the spatial location calculation into an object detection model such as YOLO?

https://docs.luxonis.com/projects/api/en/latest/components/nodes/yolo_spatial_detection_network/

Thanks,
Jaka

KwinT

i added the default values first outside the ""with dai.Device(pipeline) as device:" and then modified them in the 1st for loop but when it runs, the modified ROI is completely ignored, the ROI does not move within the bounding box.

jakaskerl

Hi @KwinT
How are you updating the ROI on host side? Should be similar to this:

        if newConfig:
            config.roi = dai.Rect(topLeft, bottomRight)
            config.calculationAlgorithm = calculationAlgorithm
            cfg = dai.SpatialLocationCalculatorConfig()
            cfg.addROI(config)
            spatialCalcConfigInQueue.send(cfg)
            newConfig = False

Thanks,
Jaka

KwinT

Hi Jaka,

Thank you for following this,

To sum up the procedure, I added this block initially (before entering with loop)

'''# Config

topLeft = dai.Point2f(0.4, 0.4)

bottomRight = dai.Point2f(0.6, 0.6)

config = dai.SpatialLocationCalculatorConfigData()

config.depthThresholds.lowerThreshold = 100

config.depthThresholds.upperThreshold = 10000

calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN

config.roi = dai.Rect(topLeft, bottomRight)
spatialLocationCalculator.inputConfig.setWaitForMessage(False)s

patialLocationCalculator.initialConfig.addROI(config)'''

Then inside the for loop, I added the block you recommended

and the rest goes like this

but still did not work

Did i go wrong anywhere? Do you have any thoughts/suggestions on this?

jakaskerl

Hi @KwinT
Remove the second config and just keep mine. Of course, add the thresholds if you need them.

Thanks,
Jaka

KwinT

I removed the second config and it still doesn't work

screencast-from-02-29-2024-123323-pm.webm

3MB

jakaskerl

Hi @KwinT
Please post the code.

KwinT


import cv2

import depthai as dai

import socket

import json

import errno

from ultralytics import YOLO

import numpy as np

import time


stepSize = 0.05


newConfig = False


\# Create pipeline

pipeline = dai.Pipeline()


\# Define sources and outputs

monoLeft = pipeline.create(dai.node.MonoCamera)

monoRight = pipeline.create(dai.node.MonoCamera)

stereo = pipeline.create(dai.node.StereoDepth)

spatialLocationCalculator = pipeline.create(dai.node.SpatialLocationCalculator)


xoutDepth = pipeline.create(dai.node.XLinkOut)

xoutSpatialData = pipeline.create(dai.node.XLinkOut)

xinSpatialCalcConfig = pipeline.create(dai.node.XLinkIn)


xoutDepth.setStreamName("depth")

xoutSpatialData.setStreamName("spatialData")

xinSpatialCalcConfig.setStreamName("spatialCalcConfig")


\# Properties

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoLeft.setCamera("left")

monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoRight.setCamera("right")


stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)

stereo.setLeftRightCheck(True)

stereo.setSubpixel(True)


\# Config

topLeft = dai.Point2f(0.4, 0.4)

bottomRight = dai.Point2f(0.6, 0.6)


config = dai.SpatialLocationCalculatorConfigData()

config.depthThresholds.lowerThreshold = 100

config.depthThresholds.upperThreshold = 10000

calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN

config.roi = dai.Rect(topLeft, bottomRight)


spatialLocationCalculator.inputConfig.setWaitForMessage(False)

spatialLocationCalculator.initialConfig.addROI(config)

\# Create YOLO model

model = YOLO('yolov8n.pt')


\# Create a DepthAI pipeline

\#pipeline = dai.Pipeline()


\# Define source and output

camRgb = pipeline.create(dai.node.ColorCamera)

xoutVideo = pipeline.create(dai.node.XLinkOut)


xoutVideo.setStreamName("video")


\# Properties

camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

camRgb.setVideoSize(640,400)


xoutVideo.input.setBlocking(False)

xoutVideo.input.setQueueSize(1)


\# Linking

camRgb.video.link(xoutVideo.input) ###for RGB


monoLeft.out.link(stereo.left)

monoRight.out.link(stereo.right)


spatialLocationCalculator.passthroughDepth.link(xoutDepth.input)

stereo.depth.link(spatialLocationCalculator.inputDepth)


spatialLocationCalculator.out.link(xoutSpatialData.input)

xinSpatialCalcConfig.out.link(spatialLocationCalculator.inputConfig)

  



\# Connect to device and start pipeline

with dai.Device(pipeline) as device:

   video = device.getOutputQueue(name="video", maxSize=1, blocking=False)


   ### roi

   # Output queue will be used to get the depth frames from the outputs defined above

   depthQueue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)

   spatialCalcQueue = device.getOutputQueue(name="spatialData", maxSize=4, blocking=False)

   spatialCalcConfigInQueue = device.getInputQueue("spatialCalcConfig")


   color = (255, 255, 255)


   # fps starting

   yolo_start_frame=0

   yolo_new_frame=0


   while True:

      

       videoIn = video.get()


       # Get BGR frame from NV12 encoded video frame to show with OpenCV

       frame = videoIn.getCvFrame()


       # Perform YOLOv8 inference on the frame

       results = model(frame)


       # Get annotated frame with YOLOv8 predictions

       annotated_frame = results[0].plot()


       # All data needed to be transferred

       result=results[0]

       output=[]


      


           #cv2.putText(annotated_frame,f"fps: {yolo_fps}",(10,30),cv2.FONT_HERSHEY_SCRIPT_SIMPLEX,1,(0,255,0),2)

          


       # roi

       inDepth = depthQueue.get() # Blocking call, will wait until a new data has arrived


       depthFrame = inDepth.getFrame() # depthFrame values are in millimeters


       depth_downscaled = depthFrame[::4]

       if np.all(depth_downscaled == 0):

           min_depth = 0  # Set a default minimum depth value when all elements are zero

       else:

           min_depth = np.percentile(depth_downscaled[depth_downscaled != 0], 1)

       max_depth = np.percentile(depth_downscaled, 99)

       depthFrameColor = np.interp(depthFrame, (min_depth, max_depth), (0, 255)).astype(np.uint8)

       depthFrameColor = cv2.applyColorMap(depthFrameColor, cv2.COLORMAP_HOT)


       spatialData = spatialCalcQueue.get().getSpatialLocations()


       # YOLO

       for box in result.boxes:

           # coordinates of the bounding box

           x1,y1,x2,y2=[round(x) for x in box.xyxy[0].tolist()]

           # obj id

           class_id=box.cls[0].item()

           id=result.names[class_id]

           output.append([x1,y1,x2,y2, id])

           print(output)

          


           #### x1,y1=topLeft, x2,y2=bottomRight

           # size of the box

           '''topLeft = dai.Point2f(x1/depthFrameColor.shape[1], y1/depthFrameColor.shape[0])

           bottomRight = dai.Point2f(x2/depthFrameColor.shape[1], y2/depthFrameColor.shape[0])'''


           topLeft = dai.Point2f(round(x1/640,2),round(y1/400,2))

           bottomRight = dai.Point2f(round(x2/640,2), round(y2/400,2))


           if newConfig:

               config.roi = dai.Rect(topLeft, bottomRight)

               config.calculationAlgorithm = calculationAlgorithm

               cfg = dai.SpatialLocationCalculatorConfig()

               cfg.addROI(config)

               spatialCalcConfigInQueue.send(cfg)

               newConfig = False


           ######

           print("left box",round(x1/640,2),round(y1/400,2))

           print("right box",round(x2/640,2), round(y2/400,2))


           '''config = dai.SpatialLocationCalculatorConfigData()

           config.depthThresholds.lowerThreshold = 100

           config.depthThresholds.upperThreshold = 10000

           calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN

           ##   size is using from here

           config.roi = dai.Rect(topLeft, bottomRight)


           spatialLocationCalculator.inputConfig.setWaitForMessage(False)

           spatialLocationCalculator.initialConfig.addROI(config)'''


           for depthData in spatialData:

               roi = depthData.config.roi

               roi = roi.denormalize(width=depthFrameColor.shape[1], height=depthFrameColor.shape[0])

               xmin = int(roi.topLeft().x)

               ymin = int(roi.topLeft().y)

               xmax = int(roi.bottomRight().x)

               ymax = int(roi.bottomRight().y)


               depthMin = depthData.depthMin

               depthMax = depthData.depthMax


               fontType = cv2.FONT_HERSHEY_TRIPLEX

               cv2.rectangle(depthFrameColor, (xmin, ymin), (xmax, ymax), color, 1)

               cv2.putText(depthFrameColor, f"X: {int(depthData.spatialCoordinates.x)} mm", (xmin + 10, ymin + 20), fontType, 0.5, color)

               cv2.putText(depthFrameColor, f"Y: {int(depthData.spatialCoordinates.y)} mm", (xmin + 10, ymin + 35), fontType, 0.5, color)

               cv2.putText(depthFrameColor, f"Z: {int(depthData.spatialCoordinates.z)} mm", (xmin + 10, ymin + 50), fontType, 0.5, color)

          

       # Display the annotated frame

       cv2.imshow("YOLOv8 Inference", annotated_frame)

       cv2.imshow("depth", depthFrameColor)


       if cv2.waitKey(1) == ord('q'):

           break


cv2.destroyAllWindows()

jakaskerl

Hi @KwinT
Made some mock results since I don't have the model, but I hope you get the gist of it.

import cv2
import depthai as dai
import numpy as np
#import YOLO from ultralytics

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
spatialLocationCalculator = pipeline.create(dai.node.SpatialLocationCalculator)

xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutSpatialData = pipeline.create(dai.node.XLinkOut)
xinSpatialCalcConfig = pipeline.create(dai.node.XLinkIn)

xoutDepth.setStreamName("depth")
xoutSpatialData.setStreamName("spatialData")
xinSpatialCalcConfig.setStreamName("spatialCalcConfig")

# Camera Properties
monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

# Stereo Depth Config
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.setLeftRightCheck(True)
stereo.setSubpixel(True)

# Spatial Location Calculator Config
spatialLocationCalculator.inputConfig.setWaitForMessage(False)
topLeft = dai.Point2f(0.4, 0.4)
bottomRight = dai.Point2f(0.6, 0.6)

config = dai.SpatialLocationCalculatorConfigData()
config.depthThresholds.lowerThreshold = 100
config.depthThresholds.upperThreshold = 10000
config.calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN

config.roi = dai.Rect(topLeft, bottomRight)
spatialLocationCalculator.initialConfig.addROI(config)

# Linking
monoLeft.out.link(stereo.left)
monoRight.out.link(stereo.right)
stereo.depth.link(spatialLocationCalculator.inputDepth)
spatialLocationCalculator.passthroughDepth.link(xoutDepth.input)
spatialLocationCalculator.out.link(xoutSpatialData.input)
xinSpatialCalcConfig.out.link(spatialLocationCalculator.inputConfig)

# YOLO Model
model = ""

# Connect to device and start pipeline
with dai.Device(pipeline) as device:
    depthQueue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)
    spatialCalcQueue = device.getOutputQueue(name="spatialData", maxSize=4, blocking=False)
    spatialCalcConfigInQueue = device.getInputQueue("spatialCalcConfig")

    i = 0
    while True:
        # Get the frame from the camera
        inDepth = depthQueue.get()
        depthFrame = inDepth.getFrame()
        
        # Colorize the depth frame
        depthFrameColorized = cv2.applyColorMap(cv2.convertScaleAbs(depthFrame, alpha=0.03), cv2.COLORMAP_JET)
        #cv2.imshow("Depth", depthFrameColorized)

        # Depth processing and Dynamic ROI based on YOLO detection
        frame = depthFrameColorized  # Assuming you want to work directly on depth colorized frame
        #results = model(frame)
        #annotated_frame = np.squeeze(results.render())
        # create two mock results with int values in range [0, 400] for x and y
        results = [
            (100+i%100, 100, 200+i%100, 200, 0.5, 0),
            (300, 300-i%100, 400, 400-i%100, 0.5, 0)
        ]
        i += 5
        

        # Iterate through YOLO detections and send updated ROI
        for detection in results:
            x1, y1, x2, y2, conf, cls = detection
            x1, y1, x2, y2 = map(int, [x1, y1, x2, y2])
            print(f"Detected object at ({x1}, {y1}) and ({x2}, {y2})")
            
            # Define the ROI for spatial calculation
            topLeft = dai.Point2f(x1 / frame.shape[1], y1 / frame.shape[0])
            bottomRight = dai.Point2f(x2 / frame.shape[1], y2 / frame.shape[0])
            
            config.roi = dai.Rect(topLeft, bottomRight)
            cfg = dai.SpatialLocationCalculatorConfig()
            cfg.addROI(config)
            spatialCalcConfigInQueue.send(cfg)

            # Draw the ROI on the frame
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        
        cv2.imshow("YOLOv8 Inference", frame)

        if cv2.waitKey(1) == ord('q'):
            break

cv2.destroyAllWindows()

Thanks,
Jaka