Hi glitchyordis ,

  1. You could, but depth would then be upscalled to rgb, in this case 12MP, which would cause issues. So it's better to set the size beforehand. You can't change this config dynamically/at runtime.
  2. Yep correct. For BBs, you can keep them normalized (0..1) instead of pixel coordiantes, which will help you map object detection results to depth map. Useful demo here if you want to do this on the host.

Thoughts?
Thanks, Erik

    Thanks erik. The following applies to Oak-D pro AF cameras.

    1. Is there a way to turn off the stereo streaming on the fly without affecting RGB's stream? I'm only aware of the setStopStreaming approach but that stops every stream. I'm trying to improve fps since we require 12MP rgb to be sent to host.

    2. can we change the filter configurations & the IR projector on the fly?

    1. As a reference to intelRealsense's guide on postprocessing depth which is very similar to that in Oak-D Pro, can the filter approach described be applied for OAK-D Pro?
    • erik replied to this.

      Hi glitchyordis ,

      1. You could use Script node as a "pipeline manager" to either forward or discard frames - example here.
      2. Yep🙂 Filters can be changed in runtime with StereoDepthConfig message. Example here. For IR dot projector you can see example here.
      3. I believe it's very similar, but will confirm with the team and we will update docs.🙂

      Thoughts?
      Thanks, Erik

        9 days later

        erik

        Hi.

        1. Just wanted to double-check, does rgb_depth_alignment 12MP syncs the RGB and depth frames?

        2. I've noticed the following lines in the rgb-depth-alignment. what is the effect on the depth calculation if Oak-D pro took a picture at a different lensPosition and we perform depth alignment?

          # For now, RGB needs fixed focus to properly align with depth.
          # This value was used during calibration
          try:
              calibData = device.readCalibration2()
              lensPosition = calibData.getLensPosition(dai.CameraBoardSocket.RGB)
              if lensPosition:
                  camRgb.initialControl.setManualFocus(lensPosition)
          except:
              raise

          Hi erik, just double checking on Question 2.

          So there is an effect on the rgb image taken from the same position when changing focus and this affects the depth-rgb alignment. In my application using an Oak-D Pro AF,

          • device sends ISP 12MP image (lets call this Image A) to host, this may be at an arbitrary focus
          • host performs obj det -> we hv bbox coordinate

          Does this mean to get a more accurate spatial calculation (x,y,z), after taking Image A, I should

          • manually set camera focus (see quote below)
          • then only get the depth data?
          • and calculate the spatials

          glitchyordis # For now, RGB needs fixed focus to properly align with depth.

          This value was used during calibration

          try:
          calibData = device.readCalibration2()
          lensPosition = calibData.getLensPosition(dai.CameraBoardSocket.RGB)
          if lensPosition:
          camRgb.initialControl.setManualFocus(lensPosition)
          except:
          raise

          • erik replied to this.

            Hi glitchyordis ,
            To get perfectly aligned depth and RGB, you'd want to manually set lens position, as you do above. If you have an object close (eg. below 50cm), and especially very close (eg below 20cm), using AF will cause rgb-depth not being aligned. Not having perfectly aligned depth and RGB could lead to incorrect spatial detections, but usually it wouldn't, as there's bounding box scalling applied anyways.
            Thanks, Erik

              Hi erik,

              1. just to double check, do you mean:

                • get obj det bbox
                • scale bbox width height by 'bboxscalefactor' -> scaled bbox
                • get depth within scaled bbox instead of obj det bbox?
              2. Pls help me take a look at the following code for algining RGB to depth and synching; I was only able to get around 12 FPS (printed by depthai_sdk.fps FPSHandler). Any suggestions on how to improve on FPS? It gets even lower when I calc spatials based on depth frame received and bbox from host.

              python
              from pathlib import Path
              import sys
              import cv2
              import depthai as dai
              import numpy as np
              import time 
              import matplotlib.pyplot as plt
              import math
              import PySimpleGUI as gui
              from depthai_sdk.fps import FPSHandler
              
               class HostSync:
              # from https://github.com/luxonis/depthai-experiments/blob/master/gen2-depth-driven-focus/main.py
                  def __init__(self):
                      self.arrays = {}
              
                  def add_msg(self, name, msg):
                      if not name in self.arrays:
                          self.arrays[name] = []
                      # Add msg to array
                      self.arrays[name].append({"msg": msg, "seq": msg.getSequenceNum()})
              
                      synced = {}
                      for name, arr in self.arrays.items():
                          for i, obj in enumerate(arr):
                              if msg.getSequenceNum() == obj["seq"]:
                                  synced[name] = obj["msg"]
                                  break
                      # If there are 5 (all) synced msgs, remove all old msgs
                      # and return synced msgs
                      if len(synced) == (2):  # Color, Depth
                          # Remove old msgs
                          for name, arr in self.arrays.items():
                              for i, obj in enumerate(arr):
                                  if obj["seq"] < msg.getSequenceNum():
                                      arr.remove(obj)
                                  else:
                                      break
                          return synced
                      return False
              
               def printSystemInformation(info):
              # from https://docs.luxonis.com/projects/api/en/latest/samples/SystemLogger/system_information/#system-information
                  m = 1024 * 1024 # MiB
                  print(f"Ddr used / total - {info.ddrMemoryUsage.used / m:.2f} / {info.ddrMemoryUsage.total / m:.2f} MiB")
                  print(f"Cmx used / total - {info.cmxMemoryUsage.used / m:.2f} / {info.cmxMemoryUsage.total / m:.2f} MiB")
                  print(f"LeonCss heap used / total - {info.leonCssMemoryUsage.used / m:.2f} / {info.leonCssMemoryUsage.total / m:.2f} MiB")
                  print(f"LeonMss heap used / total - {info.leonMssMemoryUsage.used / m:.2f} / {info.leonMssMemoryUsage.total / m:.2f} MiB")
                  t = info.chipTemperature
                  print(f"Chip temperature - average: {t.average:.2f}, css: {t.css:.2f}, mss: {t.mss:.2f}, upa: {t.upa:.2f}, dss: {t.dss:.2f}")
                  print(f"Cpu usage - Leon CSS: {info.leonCssCpuUsage.average * 100:.2f}%, Leon MSS: {info.leonMssCpuUsage.average * 100:.2f} %")
                  print("----------------------------------------")
               
              
              # pipeline ---------------------------------------------------------------------------------------
              device = dai.Device("184430102145421300", maxUsbSpeed=dai.UsbSpeed.SUPER_PLUS)
              
              # Create pipeline
              pipeline = dai.Pipeline()
              pipeline.setXLinkChunkSize(0)
              # Define sources and outputs
              camRgb = pipeline.create(dai.node.ColorCamera)
              monoLeft = pipeline.create(dai.node.MonoCamera)
              monoRight = pipeline.create(dai.node.MonoCamera)
              stereo = pipeline.create(dai.node.StereoDepth)
              
              xoutRgb = pipeline.create(dai.node.XLinkOut)
              xoutDepth = pipeline.create(dai.node.XLinkOut) 
              
              xoutRgb.setStreamName("rgb") 
              xoutDepth.setStreamName("depth")
              
              # Properties
              set_fps = 30
              camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
              camRgb.setFps(set_fps)
              # camRgb.setInterleaved(False) 
              try:
                  calibData = device.readCalibration2()
                  lensPosition = calibData.getLensPosition(dai.CameraBoardSocket.RGB)
                  if lensPosition:
                      camRgb.initialControl.setManualFocus(lensPosition)
                  else:
                      gui.popup("No Calib data for lensposition")
              except:
                  raise
              
              monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
              monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
              monoLeft.setFps(set_fps)
              monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
              monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)
              monoRight.setFps(set_fps)
              
              # Setting node configs
              stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
              # this is similar doing threshold (int) -> stereo_depth.initialConfig.setConfidenceThreshold(threshold)
              # Options: MEDIAN_OFF, KERNEL_3x3, KERNEL_5x5, KERNEL_7x7 (default)
              
              stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)
              
              # stereoConfig
              stereoConfig = stereo.initialConfig.get()
              stereoConfig.postProcessing.speckleFilter.enable = True
              stereoConfig.postProcessing.speckleFilter.speckleRange = 50
              # stereoConfig.postProcessing.temporalFilter.enable = True
              stereoConfig.postProcessing.spatialFilter.enable = True
              stereoConfig.postProcessing.spatialFilter.holeFillingRadius = 2
              stereoConfig.postProcessing.spatialFilter.numIterations = 1
              stereoConfig.postProcessing.thresholdFilter.minRange = 150
              stereoConfig.postProcessing.thresholdFilter.maxRange = 1000
              stereoConfig.postProcessing.decimationFilter.decimationFactor = 2
              stereo.initialConfig.set(stereoConfig)
              
              # Align depth map to the perspective of RGB camera, on which inference is done
              stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
              stereo.setLeftRightCheck(True)
              stereo.setExtendedDisparity(True)
              stereo.setOutputSize(800, 599)
               
              # Linking
              monoLeft.out.link(stereo.left)
              monoRight.out.link(stereo.right)
               
              camRgb.isp.link(xoutRgb.input)
              stereo.depth.link(xoutDepth.input) 
              
              sysLog = pipeline.create(dai.node.SystemLogger)
              linkOut = pipeline.create(dai.node.XLinkOut)
              linkOut.setStreamName("sysinfo")
              sysLog.setRate(0.2)   
              sysLog.out.link(linkOut.input)
              
              # tells depth when hover mouse over window 
              point = (0, 0)
              def show_distance(event, x,y, args, params):
                  global point
                  point = (x,y)
                  print(f"point: {point}")
              
              cv2.namedWindow("12mp")
              cv2.setMouseCallback("12mp", show_distance)
               
              # main ----------------------------------------------------------------------------------------
              device.startPipeline(pipeline)
              device.setLogLevel(dai.LogLevel.INFO)
              device.setLogOutputLevel(dai.LogLevel.INFO) 
              
              calibData = device.readCalibration() 
              
              outputs = ['rgb', 'depth'] 
              
              queues  = [device.getOutputQueue(name, 4, False) for name in outputs]
              qSysInfo = device.getOutputQueue(name="sysinfo", maxSize=4, blocking=False)
              sync = HostSync()
              
              color = (220, 220, 220)
              fps = FPSHandler()
              
              while True: 
                  sysInfo = qSysInfo.tryGet()
                  if sysInfo is not None:
                      printSystemInformation(sysInfo)
              
                  for q in queues:
                      if q.has():
                          synced_msgs = sync.add_msg(q.getName(), q.get())
                          if synced_msgs:
                              fps.nextIter()
                              print('FPS', fps.fps())
                              frame = synced_msgs["rgb"].getCvFrame()
              
                              depthFrame = None # If debug
                              if 'depth' in synced_msgs:
                                  depthFrame: dai.ImgFrame = synced_msgs["depth"].getFrame()
                                  fov = synced_msgs['depth'].getInstanceNum()
              
              
                              frame_resized = cv2.resize(frame,(800,599), interpolation=cv2.INTER_NEAREST)
              
                              # get frame sizes
                              imgH, imgW, _ = frame.shape
                              resizedh, resizedw, _ = frame_resized.shape 
              
                              #### without:
                               host obj det on frame, get bbox, cal spatial using depthFrame 
                              ####
              
                              cv2.circle(frame_resized, point, 10, (0,0,255))
                              
                              distance = depthFrame[point[1]][point[0]] 
                              cv2.putText(frame_resized, "{}cm".format(distance), (point[0],point[1]), cv2.FONT_HERSHEY_PLAIN, 2 , (255,255,255), 2 )
                              cv2.imshow("12mp", frame_resized)
                  
                  if cv2.waitKey(1) == ord('q'):
                      break
              
              device.close()

              example FPS printed (No object detection on Host/device) by depthai_sdk.fps FPS Handler

              FPS 14.071720381301297
              FPS 13.953488372094823
              Ddr used / total - 108.02 / 340.42 MiB
              Cmx used / total - 2.09 / 2.50 MiB
              LeonCss heap used / total - 47.33 / 77.32 MiB
              LeonMss heap used / total - 4.59 / 41.23 MiB
              Chip temperature - average: 46.39, css: 47.35, mss: 46.45, upa: 46.68, dss: 45.09
              Cpu usage - Leon CSS: 48.04%, Leon MSS: 19.31 %

              example FPS printed (object detection on Host, calc spatials on host based on depth from device) by depthai_sdk.fps FPS Handler

              FPS 6.550218340610157
              FPS 6.56959024720649
              Ddr used / total - 108.02 / 340.42 MiB
              Cmx used / total - 2.09 / 2.50 MiB
              LeonCss heap used / total - 47.33 / 77.32 MiB
              LeonMss heap used / total - 4.59 / 41.23 MiB
              Chip temperature - average: 46.45, css: 47.35, mss: 46.45, upa: 46.00, dss: 46.00
              Cpu usage - Leon CSS: 47.33%, Leon MSS: 20.71 %
              • erik replied to this.

                Hi erik , here you go.

                device: Oak-D PRO AF
                latest version of depthai and depthai-sdk

                Note device = dai.Device("111111111111111111", maxUsbSpeed=dai.UsbSpeed.SUPER_PLUS), please change it to your own device mxID.

                # %%
                from pathlib import Path
                import sys
                import cv2
                import depthai as dai
                import numpy as np
                import time 
                import matplotlib.pyplot as plt
                import math
                import PySimpleGUI as gui
                from depthai_sdk.fps import FPSHandler
                import torch
                
                # %%
                class HostSync:
                    # adapted from https://github.com/luxonis/depthai-experiments/blob/master/gen2-depth-driven-focus/main.py
                    
                    def __init__(self):
                        self.arrays = {}
                
                    def add_msg(self, name, msg):
                        if not name in self.arrays:
                            self.arrays[name] = []
                        # Add msg to array
                        self.arrays[name].append({"msg": msg, "seq": msg.getSequenceNum()})
                
                        synced = {}
                        for name, arr in self.arrays.items():
                            for i, obj in enumerate(arr):
                                if msg.getSequenceNum() == obj["seq"]:
                                    synced[name] = obj["msg"] 
                
                        if len(synced) == (2):  # Color, Depth
                            # Remove old msgs
                            for name, arr in self.arrays.items():
                                for i, obj in enumerate(arr):
                                    if obj["seq"] < msg.getSequenceNum():
                                        arr.remove(obj)
                                    else:
                                        break
                            return synced
                        return False
                    
                class TextHelper:
                    # from: https://github.com/luxonis/depthai-experiments/blob/master/gen2-calc-spatials-on-host/utility.py
                    def __init__(self) -> None:
                        self.bg_color = (0, 0, 0)
                        self.color = (255, 255, 255)
                        self.text_type = cv2.FONT_HERSHEY_SIMPLEX
                        self.line_type = cv2.LINE_AA
                    def putText(self, frame, text, coords):
                        cv2.putText(frame, text, coords, self.text_type, 0.5, self.bg_color, 3, self.line_type)
                        cv2.putText(frame, text, coords, self.text_type, 0.5, self.color, 1, self.line_type)
                    def rectangle(self, frame, p1, p2):
                        cv2.rectangle(frame, p1, p2, self.bg_color, 3)
                        cv2.rectangle(frame, p1, p2, self.color, 1)
                
                # %%
                def printSystemInformation(info):
                    # from https://docs.luxonis.com/projects/api/en/latest/samples/SystemLogger/system_information/#system-information
                    
                    m = 1024 * 1024 # MiB
                    print(f"Ddr used / total - {info.ddrMemoryUsage.used / m:.2f} / {info.ddrMemoryUsage.total / m:.2f} MiB")
                    print(f"Cmx used / total - {info.cmxMemoryUsage.used / m:.2f} / {info.cmxMemoryUsage.total / m:.2f} MiB")
                    print(f"LeonCss heap used / total - {info.leonCssMemoryUsage.used / m:.2f} / {info.leonCssMemoryUsage.total / m:.2f} MiB")
                    print(f"LeonMss heap used / total - {info.leonMssMemoryUsage.used / m:.2f} / {info.leonMssMemoryUsage.total / m:.2f} MiB")
                    t = info.chipTemperature
                    print(f"Chip temperature - average: {t.average:.2f}, css: {t.css:.2f}, mss: {t.mss:.2f}, upa: {t.upa:.2f}, dss: {t.dss:.2f}")
                    print(f"Cpu usage - Leon CSS: {info.leonCssCpuUsage.average * 100:.2f}%, Leon MSS: {info.leonMssCpuUsage.average * 100:.2f} %")
                    print("----------------------------------------")
                
                # %%
                def calc_spatials(frame, frame_resized, result, depthFrame, calibData, depthData):
                    # adapted from : https://github.com/luxonis/depthai-experiments/blob/master/gen2-calc-spatials-on-host/calc.py
                
                    def calc_angle(frame, offset, HFOV):
                        return math.atan(math.tan(HFOV / 2.0) * offset / (frame.shape[1] / 2.0))
                
                    imgH, imgW, _ = frame.shape
                    resizedh, resizedw, _ = frame_resized.shape
                
                    # only process the first detection
                    scaled_bb = {
                        "xmin": int(result.xmin[0]/imgW*resizedw),
                        "ymin": int(result.ymin[0]/imgH*resizedh),
                        "xmax": int(result.xmax[0]/imgW*resizedw),
                        "ymax": int(result.ymax[0]/imgH*resizedh),
                        } 
                
                    depthROI = depthFrame[scaled_bb["ymin"]:scaled_bb["ymax"], scaled_bb["xmin"]:scaled_bb["xmax"]]
                    inRange = (150 <= depthROI) & (depthROI <= 1000) # treshold
                
                    averageDepth = np.mean(depthROI[inRange])
                
                    centroid = { # Get centroid of the ROI
                        'x': int((scaled_bb["xmax"] + scaled_bb["xmin"]) / 2),
                        'y': int((scaled_bb["ymax"] + scaled_bb["ymin"]) / 2)
                        }
                    
                    midW = int(depthFrame.shape[1] / 2) # middle of the depth img width
                    midH = int(depthFrame.shape[0] / 2) # middle of the depth img height
                    bb_x_pos = centroid['x'] - midW
                    bb_y_pos = centroid['y'] - midH
                
                    HFOV = np.deg2rad(calibData.getFov(dai.CameraBoardSocket(depthData.getInstanceNum())))
                
                    angle_x = calc_angle(depthFrame, bb_x_pos, HFOV)
                    angle_y = calc_angle(depthFrame, bb_y_pos, HFOV)
                
                    spatials = {
                                'z': averageDepth,
                                'x': averageDepth * math.tan(angle_x),
                                'y': -averageDepth * math.tan(angle_y)
                            } 
                
                    return spatials, scaled_bb, centroid
                
                # %%
                device = dai.Device("184430102145421300", maxUsbSpeed=dai.UsbSpeed.SUPER_PLUS)
                
                # %%
                # Create pipeline
                pipeline = dai.Pipeline()
                pipeline.setXLinkChunkSize(0)
                
                # Define sources and outputs
                camRgb = pipeline.create(dai.node.ColorCamera)
                monoLeft = pipeline.create(dai.node.MonoCamera)
                monoRight = pipeline.create(dai.node.MonoCamera)
                stereo = pipeline.create(dai.node.StereoDepth)
                
                xoutRgb = pipeline.create(dai.node.XLinkOut)
                xoutDepth = pipeline.create(dai.node.XLinkOut) 
                
                xoutRgb.setStreamName("rgb") 
                xoutDepth.setStreamName("depth")
                
                # mono and rgb properties
                set_fps = 30
                camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
                camRgb.setFps(set_fps)
                # camRgb.setInterleaved(False) 
                try:
                    calibData = device.readCalibration2()
                    lensPosition = calibData.getLensPosition(dai.CameraBoardSocket.RGB)
                    if lensPosition:
                        camRgb.initialControl.setManualFocus(lensPosition)
                    else:
                        gui.popup("No Calib data for lensposition")
                except:
                    raise
                
                monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
                monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
                monoLeft.setFps(set_fps)
                monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
                monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)
                monoRight.setFps(set_fps)
                
                # stereoConfig
                stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY) 
                
                stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)
                 
                stereoConfig = stereo.initialConfig.get()
                stereoConfig.postProcessing.speckleFilter.enable = True
                stereoConfig.postProcessing.speckleFilter.speckleRange = 50 
                stereoConfig.postProcessing.spatialFilter.enable = True
                stereoConfig.postProcessing.spatialFilter.holeFillingRadius = 2
                stereoConfig.postProcessing.spatialFilter.numIterations = 1
                stereoConfig.postProcessing.thresholdFilter.minRange = 150
                stereoConfig.postProcessing.thresholdFilter.maxRange = 1000
                stereoConfig.postProcessing.decimationFilter.decimationFactor = 2
                stereo.initialConfig.set(stereoConfig)
                
                stereo.setLeftRightCheck(True)
                stereo.setExtendedDisparity(True)
                
                # Align depth map to rgb
                stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
                stereo.setOutputSize(800, 599)
                 
                # Linking
                monoLeft.out.link(stereo.left)
                monoRight.out.link(stereo.right)
                 
                camRgb.isp.link(xoutRgb.input)
                stereo.depth.link(xoutDepth.input) 
                
                # logger
                sysLog = pipeline.create(dai.node.SystemLogger)
                linkOut = pipeline.create(dai.node.XLinkOut)
                linkOut.setStreamName("sysinfo")
                sysLog.setRate(0.5)   
                sysLog.out.link(linkOut.input)
                
                # gets coordinate (pixels) of mouse on cv2 window when mouse hover overs cv2 window
                point = (0, 0)
                def show_distance(event, x,y, args, params):
                    global point
                    point = (x,y)
                    # print(f"point: {point}")
                
                cv2.namedWindow("12mp")
                cv2.setMouseCallback("12mp", show_distance)
                 
                # %%
                # obj detection
                model = torch.hub.load('ultralytics/yolov5', "yolov5s") 
                model.classes = [0] # detects `people` only
                
                # %%
                device.startPipeline(pipeline)
                device.setLogLevel(dai.LogLevel.INFO)
                device.setLogOutputLevel(dai.LogLevel.INFO) 
                
                calibData = device.readCalibration() 
                
                outputs = ['rgb', 'depth'] 
                queues  = [device.getOutputQueue(name, 4, False) for name in outputs]
                qSysInfo = device.getOutputQueue(name="sysinfo", maxSize=4, blocking=False)
                
                sync = HostSync()
                fps = FPSHandler()
                text = TextHelper()
                
                synced_msgs = None
                while True: 
                    sysInfo = qSysInfo.tryGet()
                    if sysInfo is not None:
                        printSystemInformation(sysInfo)
                
                    for q in queues:
                        if q.has():
                            synced_msgs = sync.add_msg(q.getName(), q.get())
                
                            if synced_msgs:
                                fps.nextIter()
                                print('FPS', fps.fps())
                
                                frame = synced_msgs["rgb"].getCvFrame()
                
                                if 'depth' in synced_msgs:
                                    depthFrame: dai.ImgFrame = synced_msgs["depth"].getFrame()
                                    fov = synced_msgs['depth'].getInstanceNum()
                
                                frame_resized = cv2.resize(frame,(800,599), interpolation=cv2.INTER_NEAREST)
                 
                                # object det yolov5 on host
                                result = model(frame[:,:,::-1],size=840)
                                result = result.pandas().xyxy[0]
                                
                                if len(result)>0:
                                    spatials, scaled_bb, centroid =  calc_spatials(frame, frame_resized, result, depthFrame, calibData, synced_msgs["depth"])
                                    text.rectangle(frame_resized, (scaled_bb["xmin"], scaled_bb["ymin"]), (scaled_bb["xmax"], scaled_bb["ymax"]))
                                    text.putText(frame_resized, "X: " + ("{:.1f}cm".format(spatials['x']/10) if not math.isnan(spatials['x']) else "--"), (centroid["x"] + 10, centroid["y"] + 20))
                                    text.putText(frame_resized, "Y: " + ("{:.1f}cm".format(spatials['y']/10) if not math.isnan(spatials['y']) else "--"), (centroid["x"] + 10, centroid["y"] + 35))
                                    text.putText(frame_resized, "Z: " + ("{:.1f}cm".format(spatials['z']/10) if not math.isnan(spatials['z']) else "--"), (centroid["x"] + 10, centroid["y"] + 50))
                
                                distance = depthFrame[point[1]][point[0]] 
                                cv2.circle(frame_resized, point, 10, (0,0,255))
                                cv2.putText(frame_resized, "{}cm".format(distance), (point[0],point[1]), cv2.FONT_HERSHEY_PLAIN, 2 , (255,255,255), 2 )
                                cv2.imshow("12mp", frame_resized)
                
                    if cv2.waitKey(1) == ord('q'):
                        break
                
                device.close()
                • erik replied to this.

                  Hi erik , thanks for the feedback. is the following alright?

                  import cv2
                  import depthai as dai
                  import numpy as np 
                  import matplotlib.pyplot as plt  
                  from depthai_sdk.fps import FPSHandler
                  import torch
                  
                  # %%
                  class HostSync:
                      # adapted from https://github.com/luxonis/depthai-experiments/blob/master/gen2-depth-driven-focus/main.py
                      
                      def __init__(self):
                          self.arrays = {}
                  
                      def add_msg(self, name, msg):
                          if not name in self.arrays:
                              self.arrays[name] = []
                          # Add msg to array
                          self.arrays[name].append({"msg": msg, "seq": msg.getSequenceNum()})
                  
                          synced = {}
                          for name, arr in self.arrays.items():
                              for i, obj in enumerate(arr):
                                  if msg.getSequenceNum() == obj["seq"]:
                                      synced[name] = obj["msg"] 
                  
                          if len(synced) == (2):  # Color, Depth
                              # Remove old msgs
                              for name, arr in self.arrays.items():
                                  for i, obj in enumerate(arr):
                                      if obj["seq"] < msg.getSequenceNum():
                                          arr.remove(obj)
                                      else:
                                          break
                              return synced
                          return False
                  
                  # %%
                  device = dai.Device("184430102145421300", maxUsbSpeed=dai.UsbSpeed.SUPER_PLUS)
                  
                  # %%
                  # Create pipeline
                  pipeline = dai.Pipeline()
                  pipeline.setXLinkChunkSize(0)
                  
                  # Define sources and outputs
                  camRgb = pipeline.create(dai.node.ColorCamera)
                  monoLeft = pipeline.create(dai.node.MonoCamera)
                  monoRight = pipeline.create(dai.node.MonoCamera)
                  stereo = pipeline.create(dai.node.StereoDepth)
                  
                  xoutRgb = pipeline.create(dai.node.XLinkOut)
                  xoutDepth = pipeline.create(dai.node.XLinkOut) 
                  
                  xoutRgb.setStreamName("rgb") 
                  xoutDepth.setStreamName("depth")
                  
                  # mono and rgb properties
                  set_fps = 30
                  camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
                  camRgb.setFps(set_fps)
                  # camRgb.setInterleaved(False) 
                  try:
                      calibData = device.readCalibration2()
                      lensPosition = calibData.getLensPosition(dai.CameraBoardSocket.RGB)
                      if lensPosition:
                          camRgb.initialControl.setManualFocus(lensPosition) 
                  except:
                      raise
                  
                  monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
                  monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
                  monoLeft.setFps(set_fps)
                  monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
                  monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)
                  monoRight.setFps(set_fps)
                  
                  # stereoConfig
                  stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY) 
                  
                  stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)
                   
                  stereoConfig = stereo.initialConfig.get()
                  stereoConfig.postProcessing.speckleFilter.enable = True
                  stereoConfig.postProcessing.speckleFilter.speckleRange = 50 
                  stereoConfig.postProcessing.spatialFilter.enable = True
                  stereoConfig.postProcessing.spatialFilter.holeFillingRadius = 2
                  stereoConfig.postProcessing.spatialFilter.numIterations = 1
                  stereoConfig.postProcessing.thresholdFilter.minRange = 150
                  stereoConfig.postProcessing.thresholdFilter.maxRange = 1000
                  stereoConfig.postProcessing.decimationFilter.decimationFactor = 2
                  stereo.initialConfig.set(stereoConfig)
                  
                  stereo.setLeftRightCheck(True)
                  stereo.setExtendedDisparity(True)
                  
                  # Align depth map to rgb
                  stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
                  stereo.setOutputSize(800, 599)
                   
                  # Linking
                  monoLeft.out.link(stereo.left)
                  monoRight.out.link(stereo.right)
                   
                  camRgb.isp.link(xoutRgb.input)
                  stereo.depth.link(xoutDepth.input) 
                  
                  point = (0, 0)
                  def show_distance(event, x,y, args, params):
                      global point
                      point = (x,y)
                   
                  cv2.namedWindow("12mp")
                  cv2.setMouseCallback("12mp", show_distance)
                  
                  # %%
                  # obj detection
                  model = torch.hub.load('ultralytics/yolov5', "yolov5s") 
                  model.classes = [0] # detects `people` only
                  
                  # %%
                  device.startPipeline(pipeline) 
                  
                  calibData = device.readCalibration() 
                  
                  outputs = ['rgb', 'depth'] 
                  queues  = [device.getOutputQueue(name, 4, False) for name in outputs]
                  
                  sync = HostSync()
                  fps = FPSHandler() 
                  
                  synced_msgs = None
                  while True:  
                  
                      for q in queues:
                          if q.has():
                              synced_msgs = sync.add_msg(q.getName(), q.get())
                  
                              if synced_msgs:
                                  fps.nextIter()
                                  print('FPS', fps.fps())
                  
                                  frame = synced_msgs["rgb"].getCvFrame()
                  
                                  if 'depth' in synced_msgs:
                                      depthFrame: dai.ImgFrame = synced_msgs["depth"].getFrame()
                                      fov = synced_msgs['depth'].getInstanceNum()
                  
                                  frame_resized = cv2.resize(frame,(800,599), interpolation=cv2.INTER_NEAREST)
                   
                                  # object det yolov5 on host
                                  result = model(frame[:,:,::-1],size=840)
                                  result = result.pandas().xyxy[0]
                   
                                  distance = depthFrame[point[1]][point[0]] 
                                  cv2.circle(frame_resized, point, 10, (0,0,255))
                                  cv2.putText(frame_resized, "{}cm".format(distance), (point[0],point[1]), cv2.FONT_HERSHEY_PLAIN, 2 , (255,255,255), 2 )
                                  cv2.imshow("12mp", frame_resized)
                  
                      if cv2.waitKey(1) == ord('q'):
                          break
                  
                  device.close() 
                  • erik replied to this.

                    Hi glitchyordis ,
                    After removing a bunch of unneeded code, I get to about 17FPS for both depth and RGB. A single 12MP frame (isp, meaning YUV420, 1.5bytes/pixel) is 18MB (144Mbits), so at 17FPS that would be 2.5gpbs. Depth isn't nearly as large.
                    Looking at OAK bandwidth test where I got max 2273mbps for downlink, 2.5gpbs (+depth) is actually quite good.

                    So overall, the bandwidth is the bottleneck here. I don't have 10gbps support on my laptop so I can't select SUPER_PLUS speeds. Tested code:

                    Thanks, Erik

                      7 days later

                      Hi glitchyordis ,

                      1. They are relative to RGB camera, as depth was aligned to RGB camera.
                      2. Usually that's not needed - it was developed for wide FOV cameras (OAK-D W*), where images are very distorted.

                      Thanks, Erik

                      7 months later

                      Hey eric I have the folllowing setup .

                      a) .I am running a object detect using Yolo and finding out depth
                      stereo.setDepthAlign(dai.CameraBoardSocket.RGB) .
                      spatialDetectionNetwork = pipeline.createYoloSpatialDetectionNetwork()

                      in this setup the depth coming out seems to be wrt to RGB

                      b)I am using performing the aruco detection without Depthalign and using
                      hostSpatials = HostSpatialsCalc(device)
                      In this the depth coming out be the right camera.

                      i want a consistent location wrt to rgb camera
                      I should be able to change the camera reference whenever i want

                      Can you tell how to achieve this?

                        jakaskerl I am using depth Align only but I don't know how to verify that …and the values of location is very high

                        ManiRadhakrishnan I am using performing the aruco detection without Depthalign and using
                        hostSpatials = HostSpatialsCalc(device)
                        In this the depth coming out be the right camera.

                        If you align the depth to the RGB camera, then that depth map will be aligned to that camera. What do you mean the values are high?

                        ManiRadhakrishnan I should be able to change the camera reference whenever i want

                        You should be able to change the alignment by altering the stereo config during runtime (not completely sure).

                        Thanks,
                        Jaka

                        The depth value coming out is in 1 metres ,2 metres .Before using the align ,it was perfect with 2 mm accuarcy.