Hi @jakaskerl

I am currently using a custom Yolo model for the YoloSpatialDetectionNetwork node with the RGB resolution set to 12MP and mono resolution for both left and right cameras set to 400P.

I am encountering a strange behavior where the frames are being delayed in reference to the detections. This means that the camera does detect an object but they are not aligned with the frame. The detection comes into the detection queue before that frame is coming into the frame queue. My guess is the frames are somehow getting delayed because of which I am seeing this lag. The calculated NN FPS is around 2 FPS which I know is slow but we've had this similar FPS before too and haven't faced such a lag.

I see similar behavior with 1080P RGB resolution as well.

Please find the bellow example pics for your reference. In one of them (frame number 37) you will see a bounding box on the right for a person. However that frame (frame number 40) that actually consists the person in that bounding box gets displayed/written to disk late and does not consist that detection.

Your insights in this regard would be highly helpful.

Thanks
Yishu

    Hi yishu_corpex
    You are probably sending a separate RGB stream to host, which is not synced with NN passthrough.

    I suggest you either use the Sync node and link NN out and RGB out; if you need full resolution frames on the host side.
    The other way would be to send passthrough to the host, for which you don't need any syncing, but the frames will be smaller.

    Thanks,
    Jaka

    4 days later

    Hi @jakaskerl

    Yes, you are correct. I'm sending the ISP stream to the host, which doesn't seem to be synced with the NN passthrough.

    I did try using the sync node and link the NN and ISP out as suggested by you. However, the sync node output queue does get stuck right after the first frame. It freezes which I guess is a known behavior when frames aren't disposed off as per other posts like frame syncing hangs.

    I did experiment with using the setIspNumFramesPool(...) to increase the pool size, however saw the same behavior of the sync node output queue freezing and the the script is stuck/hung on it.

    Let me know how can this be resolved and of you want I can share with you the MRE for this issue so you can test it at your end as well.

    Thanks
    Yishu

      Hi @yishu_corpex

      yishu_corpex Let me know how can this be resolved and of you want I can share with you the MRE for this issue so you can test it at your end as well.

      Yes, this would be best so I can quickly just alter the code.

      Thanks,
      Jaka

      Hi @jakaskerl

      Please find the MRE below.

      So I am using our own custom Yolov6n trained model. I have shared with you over email the model blob and json file.

      You will require the following libraries to run the script:

      1. OpenCv Python

      2. DepthAI

      3. blobconverter

      4. numpy

      5. datetime

      You will see a lot of print statements in the code. They are just to help debug where exactly the script gets stuck. Also, put the model and json file paths in the argument parser lines as the default path. The script will itself pick that default path when you run it from VS code.

      def getDisparityFrame(frame, cvColorMap):
          maxDisp = stereo.initialConfig.getMaxDisparity()
          disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
          disp = cv2.applyColorMap(disp, cvColorMap)
      
          return disp
      
      # parse arguments 
      parser = argparse.ArgumentParser()
      parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                          default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
      parser.add_argument("-c", "--config", help="Provide config path for inference",
                          default='models/best_ckpt.json', type=str)
      args = parser.parse_args()
      
      
      # parse config
      configPath = Path(args.config)
      if not configPath.exists():
          raise ValueError("Path {} does not exist!".format(configPath))
      
      with configPath.open() as f:
          config = json.load(f)
      nnConfig = config.get("nn_config", {})
      
      # parse input shape
      if "input_size" in nnConfig:
          W, H = tuple(map(int, nnConfig.get("input_size").split('x')))
      
      # extract metadata
      metadata = nnConfig.get("NN_specific_metadata", {})
      classes = metadata.get("classes", {})
      coordinates = metadata.get("coordinates", {})
      anchors = metadata.get("anchors", {})
      anchorMasks = metadata.get("anchor_masks", {})
      iouThreshold = metadata.get("iou_threshold", {})
      confidenceThreshold = metadata.get("confidence_threshold", {})
      
      #print(metadata)
      
      # parse labels
      nnMappings = config.get("mappings", {})
      labels = nnMappings.get("labels", {})
      #print("Labels: ", labels)
      
      # get model path
      nnPath = args.model
      if not Path(nnPath).exists():
          print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
          nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
      
      # sync outputs
      syncNN = True
      
      # Depth and Disparity
      outRectified = False  # Output and display rectified streams
      lrcheck = True  # Better handling for occlusions
      extended = False  # Closer-in minimum depth, disparity range is doubled
      subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
      depth = True  # Display depth frames
      swap_left_right = False #Swap left right frames
      alpha = None
      resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P
      
      median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7
      
      # Create pipeline
      pipeline = dai.Pipeline()
      
      # Define sources and outputs
      camRgb = pipeline.create(dai.node.ColorCamera)
      spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
      #xoutRgb = pipeline.create(dai.node.XLinkOut)
      nnOut = pipeline.create(dai.node.XLinkOut)
      
      
      # By Yishu
      xoutISP = pipeline.create(dai.node.XLinkOut)
      manip = pipeline.create(dai.node.ImageManip)
      xoutManip = pipeline.create(dai.node.XLinkOut)
      
      # Depth and Disparity
      camLeft = pipeline.create(dai.node.MonoCamera)
      camRight = pipeline.create(dai.node.MonoCamera)
      stereo = pipeline.create(dai.node.StereoDepth)
      xoutLeft = pipeline.create(dai.node.XLinkOut)
      xoutRight = pipeline.create(dai.node.XLinkOut)
      xoutDisparity = pipeline.create(dai.node.XLinkOut)
      xoutDepth = pipeline.create(dai.node.XLinkOut)
      xoutRectifLeft = pipeline.create(dai.node.XLinkOut)
      xoutRectifRight = pipeline.create(dai.node.XLinkOut)
      
      # Sync node
      sync = pipeline.create(dai.node.Sync)
      xoutGrp = pipeline.create(dai.node.XLinkOut)
      xoutGrp.setStreamName("xout")
      
      #xoutRgb.setStreamName("rgb")
      xoutISP.setStreamName("ISP")
      nnOut.setStreamName("nn")
      xoutManip.setStreamName("Manip")
      
      xoutLeft.setStreamName("left")
      xoutRight.setStreamName("right")
      xoutDisparity.setStreamName("disparity")
      xoutDepth.setStreamName("depth")
      xoutRectifLeft.setStreamName("rectifiedLeft")
      xoutRectifRight.setStreamName("rectifiedRight")
      
      # Properties
      camRgb.setPreviewSize(W, H)
      
      camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP) #THE_1080_P
      camRgb.setInterleaved(False)
      camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
      camRgb.setFps(25) #40
      
      camRgb.setIspNumFramesPool(10)
      
      # By Yishu
      manip.initialConfig.setKeepAspectRatio(False) #True
      manip.initialConfig.setResize(W, H)
      
      # Change to RGB image than BGR - Yishu
      manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p
      
      # setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
      manip.setMaxOutputFrameSize(1228800)
      
      # Sync node properties
      sync.setSyncThreshold(timedelta(milliseconds=50))
      
      # By Yishu
      nnOut.input.setBlocking(False)
      xoutISP.input.setBlocking(False)
      xoutManip.input.setBlocking(False)
      
      # By Yishu
      nnOut.input.setQueueSize(10)
      xoutISP.input.setQueueSize(10)
      xoutManip.input.setQueueSize(10)
      spatialdetectionNetwork.input.setQueueSize(10)
      
      # Network specific settings
      spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
      spatialdetectionNetwork.setNumClasses(classes)
      spatialdetectionNetwork.setCoordinateSize(coordinates)
      spatialdetectionNetwork.setAnchors(anchors)
      spatialdetectionNetwork.setAnchorMasks(anchorMasks)
      spatialdetectionNetwork.setIouThreshold(iouThreshold)
      spatialdetectionNetwork.setBlobPath(nnPath)
      spatialdetectionNetwork.setNumInferenceThreads(2)
      spatialdetectionNetwork.input.setBlocking(False)
      
      # SpatialYolo specific parameters
      spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
      spatialdetectionNetwork.setDepthLowerThreshold(100)
      spatialdetectionNetwork.setDepthUpperThreshold(5000)
      
      # Depth and Disparity Properties
      if swap_left_right:
          camLeft.setCamera("right")
          camRight.setCamera("left")
      else:
          camLeft.setCamera("left")
          camRight.setCamera("right")
      
      for monoCam in (camLeft, camRight):  # Common config
          monoCam.setResolution(resolution)
          monoCam.setFps(2)
      
      stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
      stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
      stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
      stereo.setLeftRightCheck(lrcheck)
      stereo.setExtendedDisparity(extended)
      stereo.setSubpixel(subpixel)
      if alpha is not None:
          stereo.setAlphaScaling(alpha)
          config = stereo.initialConfig.get()
          config.postProcessing.brightnessFilter.minBrightness = 0
          stereo.initialConfig.set(config)
      
      # Alignment with RGB - Ahmad
      stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
      
      # # 4056x3040
      stereo.setOutputSize(1248, 936)
      
      # Linking
      camRgb.isp.link(manip.inputImage)
      manip.out.link(spatialdetectionNetwork.input)
      
      # By Yishu
      manip.out.link(xoutManip.input)
      
      # Linking stereo depth as an input to the spatial detection network
      stereo.depth.link(spatialdetectionNetwork.inputDepth)
      
      spatialdetectionNetwork.out.link(nnOut.input)
      camRgb.isp.link(xoutISP.input)
      
      # Syncing NN and ISP
      spatialdetectionNetwork.out.link(sync.inputs["NN_Sync"])
      camRgb.isp.link(sync.inputs["ISP_Sync"])
      sync.out.link(xoutGrp.input)
      
      # Depth and Disparity Linking
      camLeft.out.link(stereo.left)
      camRight.out.link(stereo.right)
      stereo.syncedLeft.link(xoutLeft.input)
      stereo.syncedRight.link(xoutRight.input)
      stereo.disparity.link(xoutDisparity.input)
      if depth:
          stereo.depth.link(xoutDepth.input)
      if outRectified:
          stereo.rectifiedLeft.link(xoutRectifLeft.input)
          stereo.rectifiedRight.link(xoutRectifRight.input)
      
      
      streams = ["left", "right", "ISP","nn"] #Change by ahmad
      if outRectified:
          streams.extend(["rectifiedLeft", "rectifiedRight"])
      streams.append("disparity")
      if depth:
          streams.append("depth")
      
      cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
      cvColorMap[0] = [0, 0, 0]
      
      # TakeUp Camera Static IP address
      device_info = dai.DeviceInfo("192.168.220.10")
      
      try:
          # Connect to device and start pipeline
          with dai.Device(pipeline, device_info) as device:
              # Output queues will be used to get the rgb frames and nn data from the outputs defined above
              qDet = device.getOutputQueue("nn", 4, blocking=False)
              qISP = device.getOutputQueue("ISP", 4, blocking=False)
              qManip = device.getOutputQueue("Manip", 4, blocking=False)
      
              qDisparity = device.getOutputQueue("disparity", 1, blocking=False)
              qDepth = device.getOutputQueue("depth", 1, blocking=False)
              qLeft = device.getOutputQueue("left", 1, blocking=False)
              qRight = device.getOutputQueue("right", 1, blocking=False)
      
              qSync = device.getOutputQueue("xout", 4, False)
              msgGrp = None
      
              frame = None
              m_frame = None
      
              Disparityframe = None
              Depthframe = None
              LeftFrame = None
              RightFrame = None
      
              detections = []
              startTime = time.monotonic()
              counter = 0
              color2 = (255, 255, 255)
      
              # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
              def frameNorm(frame, bbox):
                  normVals = np.full(len(bbox), frame.shape[0])
                  normVals[::2] = frame.shape[1]
                  print(np.clip(np.array(bbox), 0, 1))
                  print(normVals)
                  print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
                  return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
      
              def displayFrame(name, frame, detections, i):
                  color_spool = (255, 0, 0)
                  color_person = (0, 255, 0)
                  color = ''
                  color_blank_img = (255, 255, 255)
                  blank_image = np.zeros((3040,4056,1), np.uint8)
                  text_output_path = "D:\\Cameras_Live\\TakeUp\\" + "Label{}".format(j) + ".txt"
                  #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
                  #text_file = open(text_output_path, 'w')
                  for detection in detections:
                      bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                      if labels[detection.label] == 'person':
                          color = color_person
                      else:
                          color = color_spool
                      cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                      cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                      cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                      
                  
                  cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
                  cv2.namedWindow(name, cv2.WINDOW_NORMAL)
                  cv2.imshow(name, frame)
                  print("show complete")
                  
      
              
              j=1
              ###Create the folder that will contain capturing
              folder_name = "D:\\Cameras_Live\\TakeUp"
              path = Path(folder_name)
      
              syncframe = None
              sync_detections = None
      
              while True:
      
                  ###Create the folder that will contain capturing
                  path.mkdir(parents=True, exist_ok=True)
      
                  # Delay to match DA reading fps
                  #time.sleep(4)
                  print("0")
                  msgGrp = qSync.get()
                  print("1")
                  # By Yishu
                  inISP = qISP.get()
                  print("2: ", inISP)
                  inManip = qManip.get()
                  print("3: ", inManip)
                  inDet = qDet.get()
                  print("4: ", inDet)
                  inLeft = qLeft.get()
                  print("5")
                  inRight = qRight.get()
                  print("6")
                  inDepth = qDepth.get()
                  print("7")
                  inDisparity = qDisparity.get()
                  print("8")
                  
      
                  if msgGrp is not None:
                      print('msgGrp: ', msgGrp)
                      for name, msg in msgGrp:
                          if name == "ISP_Sync":
                              syncframe = msg.getCvFrame()
                              print("syncframe: ")
                          if name == "NN_Sync":
                              sync_detections = msg.detections
                              print("sync_detections: ")
                          if syncframe is not None and sync_detections is not None:
                              print("Both sync not none")
                              displayFrame("sync_isp", syncframe, sync_detections, j)
                              print("display_complete")
                          
                  if inManip is not None:
                      m_frame = inManip.getCvFrame()
                  
                  print("9")
                  
                  if inLeft is not None:
                      LeftFrame = inLeft.getCvFrame()
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Left{}.bmp".format(j),LeftFrame)
                  
                  print("10")
                  if inRight is not None:
                      RightFrame = inRight.getCvFrame()
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Right{}.bmp".format(j),RightFrame)
      
                  print("11")
                  if inDepth is not None:
                      Depthframe = inDepth.getCvFrame()
                      Depthframe = Depthframe.astype(np.uint16)
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Depth{}.png".format(j),Depthframe)
      
                  print("12")
                  if inDisparity is not None:
                      Disparityframe = inDisparity.getCvFrame()
                      Disparityframe = getDisparityFrame(Disparityframe, cvColorMap)
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Disparity{}.bmp".format(j),Disparityframe)
                  
                  print("12")
                  if cv2.waitKey(1) == ord('q'):
                      break
      
                  j=j+1
                  '''if j==5:
                      j=1'''
              
      except Exception as e:
              print(f"An error occurred: {e}")
              sys.exit(0)

      Thanks
      Yishu

      Hi @yishu_corpex
      I removed the sync node since it was causing issues; didn't change anything else though but the RGB and NN seem to be in sync:

      
      import cv2
      import depthai as dai
      import numpy as np
      import argparse
      import json
      import time
      import sys
      from pathlib import Path
      import blobconverter
      from datetime import timedelta
      
      
      
      
      def getDisparityFrame(frame, cvColorMap):
          maxDisp = stereo.initialConfig.getMaxDisparity()
          disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
          disp = cv2.applyColorMap(disp, cvColorMap)
      
          return disp
      
      # parse arguments 
      parser = argparse.ArgumentParser()
      parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                          default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
      parser.add_argument("-c", "--config", help="Provide config path for inference",
                          default='models/best_ckpt.json', type=str)
      args = parser.parse_args()
      
      
      # parse config
      configPath = Path(args.config)
      if not configPath.exists():
          raise ValueError("Path {} does not exist!".format(configPath))
      
      with configPath.open() as f:
          config = json.load(f)
      nnConfig = config.get("nn_config", {})
      
      # parse input shape
      if "input_size" in nnConfig:
          W, H = tuple(map(int, nnConfig.get("input_size").split('x')))
      
      # extract metadata
      metadata = nnConfig.get("NN_specific_metadata", {})
      classes = metadata.get("classes", {})
      coordinates = metadata.get("coordinates", {})
      anchors = metadata.get("anchors", {})
      anchorMasks = metadata.get("anchor_masks", {})
      iouThreshold = metadata.get("iou_threshold", {})
      confidenceThreshold = metadata.get("confidence_threshold", {})
      
      #print(metadata)
      
      # parse labels
      nnMappings = config.get("mappings", {})
      labels = nnMappings.get("labels", {})
      #print("Labels: ", labels)
      
      # get model path
      nnPath = args.model
      if not Path(nnPath).exists():
          print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
          nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
      
      # sync outputs
      syncNN = True
      
      # Depth and Disparity
      outRectified = False  # Output and display rectified streams
      lrcheck = True  # Better handling for occlusions
      extended = False  # Closer-in minimum depth, disparity range is doubled
      subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
      depth = True  # Display depth frames
      swap_left_right = False #Swap left right frames
      alpha = None
      resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P
      
      median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7
      
      # Create pipeline
      pipeline = dai.Pipeline()
      
      # Define sources and outputs
      camRgb = pipeline.create(dai.node.ColorCamera)
      spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
      #xoutRgb = pipeline.create(dai.node.XLinkOut)
      nnOut = pipeline.create(dai.node.XLinkOut)
      
      
      # By Yishu
      xoutISP = pipeline.create(dai.node.XLinkOut)
      manip = pipeline.create(dai.node.ImageManip)
      xoutManip = pipeline.create(dai.node.XLinkOut)
      
      # Depth and Disparity
      camLeft = pipeline.create(dai.node.MonoCamera)
      camRight = pipeline.create(dai.node.MonoCamera)
      stereo = pipeline.create(dai.node.StereoDepth)
      xoutLeft = pipeline.create(dai.node.XLinkOut)
      xoutRight = pipeline.create(dai.node.XLinkOut)
      xoutDepth = pipeline.create(dai.node.XLinkOut)
      xoutRectifLeft = pipeline.create(dai.node.XLinkOut)
      xoutRectifRight = pipeline.create(dai.node.XLinkOut)
      
      # Sync node
      
      
      #xoutRgb.setStreamName("rgb")
      xoutISP.setStreamName("ISP")
      nnOut.setStreamName("nn")
      xoutManip.setStreamName("Manip")
      
      xoutLeft.setStreamName("left")
      xoutRight.setStreamName("right")
      xoutDepth.setStreamName("depth")
      xoutRectifLeft.setStreamName("rectifiedLeft")
      xoutRectifRight.setStreamName("rectifiedRight")
      
      # Properties
      camRgb.setPreviewSize(W, H)
      
      camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P) #THE_1080_P
      camRgb.setInterleaved(False)
      camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
      camRgb.setFps(30) #40
      
      camRgb.setIspNumFramesPool(10)
      
      # By Yishu
      manip.initialConfig.setKeepAspectRatio(False) #True
      manip.initialConfig.setResize(W, H)
      
      # Change to RGB image than BGR - Yishu
      manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p
      
      # setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
      manip.setMaxOutputFrameSize(1228800)
      
      
      # By Yishu
      nnOut.input.setBlocking(False)
      xoutISP.input.setBlocking(False)
      xoutManip.input.setBlocking(False)
      
      # By Yishu
      nnOut.input.setQueueSize(10)
      xoutISP.input.setQueueSize(10)
      xoutManip.input.setQueueSize(10)
      spatialdetectionNetwork.input.setQueueSize(10)
      
      # Network specific settings
      spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
      spatialdetectionNetwork.setNumClasses(classes)
      spatialdetectionNetwork.setCoordinateSize(coordinates)
      spatialdetectionNetwork.setAnchors(anchors)
      spatialdetectionNetwork.setAnchorMasks(anchorMasks)
      spatialdetectionNetwork.setIouThreshold(iouThreshold)
      spatialdetectionNetwork.setBlobPath(nnPath)
      spatialdetectionNetwork.setNumInferenceThreads(2)
      spatialdetectionNetwork.input.setBlocking(False)
      
      # SpatialYolo specific parameters
      spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
      spatialdetectionNetwork.setDepthLowerThreshold(100)
      spatialdetectionNetwork.setDepthUpperThreshold(5000)
      
      # Depth and Disparity Properties
      if swap_left_right:
          camLeft.setCamera("right")
          camRight.setCamera("left")
      else:
          camLeft.setCamera("left")
          camRight.setCamera("right")
      
      for monoCam in (camLeft, camRight):  # Common config
          monoCam.setResolution(resolution)
          monoCam.setFps(2)
      
      stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
      stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
      stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
      stereo.setLeftRightCheck(lrcheck)
      stereo.setExtendedDisparity(extended)
      stereo.setSubpixel(subpixel)
      if alpha is not None:
          stereo.setAlphaScaling(alpha)
          config = stereo.initialConfig.get()
          config.postProcessing.brightnessFilter.minBrightness = 0
          stereo.initialConfig.set(config)
      
      # Alignment with RGB - Ahmad
      stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
      
      # # 4056x3040
      stereo.setOutputSize(1248, 936)
      
      # Linking
      camRgb.video.link(manip.inputImage)
      manip.out.link(spatialdetectionNetwork.input)
      
      # By Yishu
      manip.out.link(xoutManip.input)
      
      # Linking stereo depth as an input to the spatial detection network
      stereo.depth.link(spatialdetectionNetwork.inputDepth)
      
      spatialdetectionNetwork.out.link(nnOut.input)
      camRgb.isp.link(xoutISP.input)
      
      # Depth and Disparity Linking
      camLeft.out.link(stereo.left)
      camRight.out.link(stereo.right)
      stereo.syncedLeft.link(xoutLeft.input)
      stereo.syncedRight.link(xoutRight.input)
      if depth:
          stereo.depth.link(xoutDepth.input)
      if outRectified:
          stereo.rectifiedLeft.link(xoutRectifLeft.input)
          stereo.rectifiedRight.link(xoutRectifRight.input)
      
      # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
      def frameNorm(frame, bbox):
          normVals = np.full(len(bbox), frame.shape[0])
          normVals[::2] = frame.shape[1]
          print(np.clip(np.array(bbox), 0, 1))
          print(normVals)
          print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
          return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
      
      def displayFrame(name, frame, detections, i):
          color_spool = (255, 0, 0)
          color_person = (0, 255, 0)
          color = ''
          
          #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
          #text_file = open(text_output_path, 'w')
          for detection in detections:
              bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
              if labels[detection.label] == 'person':
                  color = color_person
              else:
                  color = color_spool
              cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
              cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
              cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
              
          
          cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
          cv2.namedWindow(name, cv2.WINDOW_NORMAL)
          cv2.imshow(name, frame)
          print("show complete")
      
      
      streams = ["left", "right", "ISP","nn"] #Change by ahmad
      if outRectified:
          streams.extend(["rectifiedLeft", "rectifiedRight"])
      if depth:
          streams.append("depth")
      
      cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
      cvColorMap[0] = [0, 0, 0]
      
      # TakeUp Camera Static IP address
      device_info = dai.DeviceInfo()
      
      try:
          # Connect to device and start pipeline
          with dai.Device(pipeline, device_info) as device:
              # Output queues will be used to get the rgb frames and nn data from the outputs defined above
              qDet = device.getOutputQueue("nn", 4, blocking=False)
              qISP = device.getOutputQueue("ISP", 4, blocking=False)
              qManip = device.getOutputQueue("Manip", 4, blocking=False)
      
              qDepth = device.getOutputQueue("depth", 1, blocking=False)
              qLeft = device.getOutputQueue("left", 1, blocking=False)
              qRight = device.getOutputQueue("right", 1, blocking=False)
      
              frame = None
              m_frame = None
              Depthframe = None
              LeftFrame = None
              RightFrame = None
      
              detections = []
              startTime = time.monotonic()
              counter = 0
              color2 = (255, 255, 255)
      
              # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
              def frameNorm(frame, bbox):
                  normVals = np.full(len(bbox), frame.shape[0])
                  normVals[::2] = frame.shape[1]
                  print(np.clip(np.array(bbox), 0, 1))
                  print(normVals)
                  print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
                  return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
      
              def displayFrame(name, frame, detections, i):
                  color_spool = (255, 0, 0)
                  color_person = (0, 255, 0)
                  color = ''
                  color_blank_img = (255, 255, 255)
                  blank_image = np.zeros((3040,4056,1), np.uint8)
                  text_output_path = "D:\\Cameras_Live\\TakeUp\\" + "Label{}".format(j) + ".txt"
                  #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
                  #text_file = open(text_output_path, 'w')
                  for detection in detections:
                      bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                      if labels[detection.label] == 'person':
                          color = color_person
                      else:
                          color = color_spool
                      cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                      cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                      cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                      
                  
                  cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
                  cv2.namedWindow(name, cv2.WINDOW_NORMAL)
                  cv2.imshow(name, frame)
                  print("show complete")
                  
      
              
              j=1
              ###Create the folder that will contain capturing
              folder_name = "D:\\Cameras_Live\\TakeUp"
              path = Path(folder_name)
      
              syncframe = None
              sync_detections = None
      
              while True:
      
                  ###Create the folder that will contain capturing
                  
      
                  # Delay to match DA reading fps
                  #time.sleep(4)
                  import time
      
                  start_time = time.time()
      
                  # Print and reset timing
                  end_time = time.time()
                  print("1 - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  # By Yishu
                  inISP = qISP.get()
      
                  end_time = time.time()
                  print("2: ", inISP, " - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  inManip = qManip.get()
      
                  end_time = time.time()
                  print("3: ", inManip, " - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  inDet = qDet.get()
      
                  end_time = time.time()
                  print("4: ", inDet, " - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  inLeft = qLeft.get()
      
                  end_time = time.time()
                  print("5 - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  inRight = qRight.get()
      
                  end_time = time.time()
                  print("6 - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  inDepth = qDepth.get()
      
                  end_time = time.time()
                  print("7 - Time taken: {:.6f} seconds".format(end_time - start_time))
                  start_time = end_time
      
                  displayFrame("Color", inISP.getCvFrame(), inDet.detections, j)
      
          
      
                  end_time = time.time()
                  print("8 - Time taken: {:.6f} seconds".format(end_time - start_time))
      
                  
                  if inManip is not None:
                      m_frame = inManip.getCvFrame()
                  
                  print("9")
                  
                  if inLeft is not None:
                      LeftFrame = inLeft.getCvFrame()
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Left{}.bmp".format(j),LeftFrame)
                  
                  print("10")
                  if inRight is not None:
                      RightFrame = inRight.getCvFrame()
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Right{}.bmp".format(j),RightFrame)
      
                  print("11")
                  if inDepth is not None:
                      Depthframe = inDepth.getCvFrame()
                      Depthframe = Depthframe.astype(np.uint16)
                      #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Depth{}.png".format(j),Depthframe)
      
                  print("12")
                  if cv2.waitKey(1) == ord('q'):
                      break
      
                  j=j+1
                  '''if j==5:
                      j=1'''
              
      except Exception as e:
              print(f"An error occurred: {e}")
              sys.exit(0)

      Hi @jakaskerl

      I would say the RGB (ISP frame) and NN are almost in sync. I do see the output bounding boxes slightly out of sync in case of a moving objects or person. I'm not sure why is that happening though.

      Also, may I know what issues and why are they being caused by the sync node?

      Also when I calculate the NN FPS using the following line of code similar to other DepthAI examples, I do see it does maxes out at 1.9 FPS.

      # startTime_nn is initiated before the while true loop
      
      if inDet is not None:
         counter += 1
      
      displayFrame("Color", inISP.getCvFrame(), inDet.detections, j)
      nn_fps = counter / (time.monotonic() - startTime_nn)
      print("nn_fps: ", nn_fps)

      I wanted to ask you if you can look into the script and suggest any changes to improve over that given we are using the 12MP resolution. I know when I reduce the resolution to 1080 P the NN FPS does bump up to around 8-10 FPS.

      Thanks
      Yishu

      4 days later

      Hi @yishu_corpex
      Seems like the Xlink was stressed out, so I removed unsynced outputs, I also changed all inputs to non-blocking. I can get about 10FPS whole.

      PRO-TIP: use the pipeline_graph tool, it comes in handy in case of pipeline failures.

      
      import cv2
      import depthai as dai
      import numpy as np
      import argparse
      import json
      import time
      import sys
      from pathlib import Path
      import blobconverter
      from datetime import timedelta
      
      
      def getDisparityFrame(frame, cvColorMap):
          maxDisp = stereo.initialConfig.getMaxDisparity()
          disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
          disp = cv2.applyColorMap(disp, cvColorMap)
      
          return disp
      
      # parse arguments 
      parser = argparse.ArgumentParser()
      parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                          default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
      parser.add_argument("-c", "--config", help="Provide config path for inference",
                          default='models/best_ckpt.json', type=str)
      args = parser.parse_args()
      
      
      # parse config
      configPath = Path(args.config)
      if not configPath.exists():
          raise ValueError("Path {} does not exist!".format(configPath))
      
      with configPath.open() as f:
          config = json.load(f)
      nnConfig = config.get("nn_config", {})
      
      # parse input shape
      if "input_size" in nnConfig:
          W, H = tuple(map(int, nnConfig.get("input_size").split('x')))
      
      # extract metadata
      metadata = nnConfig.get("NN_specific_metadata", {})
      classes = metadata.get("classes", {})
      coordinates = metadata.get("coordinates", {})
      anchors = metadata.get("anchors", {})
      anchorMasks = metadata.get("anchor_masks", {})
      iouThreshold = metadata.get("iou_threshold", {})
      confidenceThreshold = metadata.get("confidence_threshold", {})
      
      #print(metadata)
      
      # parse labels
      nnMappings = config.get("mappings", {})
      labels = nnMappings.get("labels", {})
      #print("Labels: ", labels)
      
      # get model path
      nnPath = args.model
      if not Path(nnPath).exists():
          print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
          nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
      
      # sync outputs
      syncNN = True
      
      # Depth and Disparity
      outRectified = False  # Output and display rectified streams
      lrcheck = True  # Better handling for occlusions
      extended = False  # Closer-in minimum depth, disparity range is doubled
      subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
      depth = True  # Display depth frames
      swap_left_right = False #Swap left right frames
      alpha = None
      resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P
      
      median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7
      
      # Create pipeline
      pipeline = dai.Pipeline()
      
      # Define sources and outputs
      camRgb = pipeline.create(dai.node.ColorCamera)
      spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
      #xoutRgb = pipeline.create(dai.node.XLinkOut)
      nnOut = pipeline.create(dai.node.XLinkOut)
      
      
      
      manip = pipeline.create(dai.node.ImageManip)
      
      # Depth and Disparity
      camLeft = pipeline.create(dai.node.MonoCamera)
      camRight = pipeline.create(dai.node.MonoCamera)
      stereo = pipeline.create(dai.node.StereoDepth)
      
      
      
      # Sync node
      sync = pipeline.create(dai.node.Sync)
      xoutGrp = pipeline.create(dai.node.XLinkOut)
      xoutGrp.setStreamName("xout")
      
      #xoutRgb.setStreamName("rgb")
      
      nnOut.setStreamName("nn")
      
      # Properties
      camRgb.setPreviewSize(W, H)
      
      camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_4_K) #THE_1080_P
      camRgb.setInterleaved(False)
      camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
      camRgb.setFps(15) #40
      
      
      # By Yishu
      manip.initialConfig.setKeepAspectRatio(False) #True
      manip.initialConfig.setResize(W, H)
      
      # Change to RGB image than BGR - Yishu
      manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p
      
      # setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
      manip.setMaxOutputFrameSize(1228800)
      
      # Sync node properties
      sync.setSyncThreshold(timedelta(milliseconds=100))
      
      # By Yishu
      nnOut.input.setBlocking(False)
      
      
      # Network specific settings
      spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
      spatialdetectionNetwork.setNumClasses(classes)
      spatialdetectionNetwork.setCoordinateSize(coordinates)
      spatialdetectionNetwork.setAnchors(anchors)
      spatialdetectionNetwork.setAnchorMasks(anchorMasks)
      spatialdetectionNetwork.setIouThreshold(iouThreshold)
      spatialdetectionNetwork.setBlobPath(nnPath)
      spatialdetectionNetwork.setNumInferenceThreads(2)
      spatialdetectionNetwork.input.setBlocking(False)
      
      # SpatialYolo specific parameters
      spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
      spatialdetectionNetwork.setDepthLowerThreshold(100)
      spatialdetectionNetwork.setDepthUpperThreshold(5000)
      
      # Depth and Disparity Properties
      if swap_left_right:
          camLeft.setCamera("right")
          camRight.setCamera("left")
      else:
          camLeft.setCamera("left")
          camRight.setCamera("right")
      
      for monoCam in (camLeft, camRight):  # Common config
          monoCam.setResolution(resolution)
          monoCam.setFps(15)
      
      stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
      stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
      stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
      stereo.setLeftRightCheck(lrcheck)
      stereo.setExtendedDisparity(extended)
      stereo.setSubpixel(subpixel)
      if alpha is not None:
          stereo.setAlphaScaling(alpha)
          config = stereo.initialConfig.get()
          config.postProcessing.brightnessFilter.minBrightness = 0
          stereo.initialConfig.set(config)
      
      # Alignment with RGB - Ahmad
      stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
      
      # # 4056x3040
      stereo.setOutputSize(1248, 936)
      
      # Linking
      camRgb.isp.link(manip.inputImage)
      manip.inputImage.setBlocking(False)
      manip.inputImage.setQueueSize(10)
      manip.out.link(spatialdetectionNetwork.input)
      
      # By Yishu
      
      # Linking stereo depth as an input to the spatial detection network
      stereo.depth.link(spatialdetectionNetwork.inputDepth)
      
      spatialdetectionNetwork.out.link(nnOut.input)
      
      # Syncing NN and ISP
      spatialdetectionNetwork.out.link(sync.inputs["NN_Sync"])
      camRgb.isp.link(sync.inputs["ISP_Sync"])
      
      sync.inputs["NN_Sync"].setBlocking(False)
      sync.inputs["ISP_Sync"].setBlocking(False)
      
      sync.out.link(xoutGrp.input)
      xoutGrp.input.setBlocking(False)
      
      # Depth and Disparity Linking
      camLeft.out.link(stereo.left)
      camRight.out.link(stereo.right)
      
      
      
      
      streams = ["left", "right", "ISP","nn"] #Change by ahmad
      if outRectified:
          streams.extend(["rectifiedLeft", "rectifiedRight"])
      streams.append("disparity")
      if depth:
          streams.append("depth")
      
      cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
      cvColorMap[0] = [0, 0, 0]
      
      # TakeUp Camera Static IP address
      device_info = dai.DeviceInfo("")
      
      try:
          # Connect to device and start pipeline
          with dai.Device(pipeline, device_info) as device:
              # Output queues will be used to get the rgb frames and nn data from the outputs defined above
              qDet = device.getOutputQueue("nn", 4, blocking=False)
      
              qSync = device.getOutputQueue("xout", 4, False)
              msgGrp = None
      
              frame = None
              m_frame = None
      
              Disparityframe = None
              Depthframe = None
              LeftFrame = None
              RightFrame = None
      
              detections = []
              startTime = time.monotonic()
              counter = 0
              color2 = (255, 255, 255)
      
              # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
              def frameNorm(frame, bbox):
                  normVals = np.full(len(bbox), frame.shape[0])
                  normVals[::2] = frame.shape[1]
                  print(np.clip(np.array(bbox), 0, 1))
                  print(normVals)
                  print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
                  return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
      
              def displayFrame(name, frame, detections, i):
                  color_spool = (255, 0, 0)
                  color_person = (0, 255, 0)
                  color = ''
                  
                  for detection in detections:
                      bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                      if labels[detection.label] == 'person':
                          color = color_person
                      else:
                          color = color_spool
                      cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                      cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                      cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                      
                  cv2.namedWindow(name, cv2.WINDOW_NORMAL)
                  cv2.imshow(name, frame)
                  print("show complete")
                  
      
              
              j=1
              ###Create the folder that will contain capturing
      
              syncframe = None
              sync_detections = None
      
              while True:
                  # Delay to match DA reading fps
                  #time.sleep(4)
                  print("0")
                  msgGrp = qSync.get()
                  print("1")
                  inDet = qDet.get()
                  print("4: ", inDet)
                  
      
                  if msgGrp is not None:
                      print('msgGrp: ', msgGrp)
                      for name, msg in msgGrp:
                          if name == "ISP_Sync":
                              syncframe = msg.getCvFrame()
                              print("syncframe: ")
                          if name == "NN_Sync":
                              sync_detections = msg.detections
                              print("sync_detections: ")
                          if syncframe is not None and sync_detections is not None:
                              print("Both sync not none")
                              displayFrame("sync_isp", syncframe, sync_detections, j)
                              print("display_complete")
                          
                  
                  print("9")
                  print("12")
                  if cv2.waitKey(1) == ord('q'):
                      break
      
                  j=j+1
                  '''if j==5:
                      j=1'''
              
      except Exception as e:
              print(f"An error occurred: {e}")
              sys.exit(0)

      Thanks,
      Jaka

      Hi @jakaskerl

      Thanks for your response. I did use the pipeline graph earlier. However faced this pipeline_graph issue and wasn't aware if it has been resolved yet.

      I did try your updated script, it seems to work well with 4K resolution, however it seems to get stuck after a few frames on 12MP resolution which we have been using lately. Is it because 12MP resolution is using a lot of resources on board?

      Thanks
      Yishu

      Hi @yishu_corpex
      I tried with 12MP and the pipeline doesn't crash. Are you using a POE device by any chance (or a low bandwidth USB)? You can check the pipeline graph how the FPS slowly starts do decline because the XLink nodes can't send frames fast enough.

      Thanks,
      Jaka

      Hi @jakaskerl

      I tried with a 3D USB camera and the pipeline does not crash.

      However, for our deployment we will be using a 3D PoE device and when I test on the PoE device, the pipeline crashes. I can see the FPS slowly declining and eventually becoming 0 across nodes as shown below.

      So does it mean that because we are using a PoE device the Xlink nodes can't send frames fast enough because of the bottleneck of PoE and thus eventually the pipeline crashes?

      Also I see eventually the FPS dropping to 0 across all the nodes. why does that happen?

      Thanks
      Yishu

        yishu_corpex So does it mean that because we are using a PoE device the Xlink nodes can't send frames fast enough because of the bottleneck of PoE and thus eventually the pipeline crashes?

        Yes, you are sending 12MP video to host which is too much for that 1gbps bandwidth to handle. Any chance you can lower the resolution? If FOV is a concern there might be some workarounds.

        yishu_corpex Also I see eventually the FPS dropping to 0 across all the nodes. why does that happen?

        It the queues slowly fill up, because they are not read quickly enough so the entire pipeline backfeeds and blocks.

        Thanks,
        Jaka