Hi lincolnxlw
This is very strange, could you send the minimal reproducible example so I can try to debug it? Are you using any multiprocessing/threading that could cause a race condition or something similar? Perhaps there is a bug somewhere on our side.

Thanks,
Jaka

    Hi jakaskerl

    The link should contain the MRE for this issue.

    Hardware

    • OAK FFC-3P with 2 IMX477 modules running in 1080P in CAM B and CAM C sockets

    The MRE is an minimum ROS package depends on depthai-ros noetic.

    root@83d31607bd1d:/workspaces/depthai_catkin_ws/src/minimum_image_publisher# tree . -h

    .

    ├── [1.2K] CMakeLists.txt

    ├── [4.0K] config

    │ └── [1.4K] yolov6n.yaml

    ├── [4.0K] include

    │ └── [4.0K] minimum_image_publisher

    ├── [4.0K] models

    │ └── [8.9M] yolov6n_openvino_2022.1_6shave.blob

    ├── [2.0K] package.xml

    └── [4.0K] src

    └── [ 11K] image_publisher.cpp

    5 directories, 5 files

    There are also instructions inside the link for how to reproduce the problem.

    Attached below are two screenshots, one when the pipeline is running correctly, and one when the left side camera path is totally blocked.

    Let me know what do you think.

    Thanks

    Lincoln

      @jakaskerl

      Just to follow up, any questions regarding the MRE I provided.

      Also, I created the exact same pipeline in python running with depthai-library docker image, and I will run 10 times without any problem. I attach the python script below.

      #!/usr/bin/env python3
      
      from pathlib import Path
      import cv2
      import depthai as dai
      import numpy as np
      import argparse
      import time
      import errno
      import os
      import sys
      import json
      
      parser = argparse.ArgumentParser()
      parser.add_argument('-nn', '--nn_model', help='select model path for inference',
                          default='/workspaces/yolov6n_openvino_2022.1_6shave.blob', type=str)
      parser.add_argument('-c', '--config', help='Provide config path for inference',
                          default='/workspaces/yolov6n.json', type=str)
      parser.add_argument('-fps', '--fps', type=float, help='Frame rate of camera capturing', default=15)
      
      args = parser.parse_args()
      nn_path = Path(args.nn_model)
      config_path = Path(args.config)
      config_fps = args.fps
      
      if not nn_path.is_file():
          sys.exit('NN not found!')
      if not config_path.is_file():
          sys.exit('Config not found!')
      
      with config_path.open() as f:
          config = json.load(f)
      
      nn_config = config.get("nn_config", {})
      if "input_size" in nn_config:
          nn_width, nn_height = tuple(map(int, nn_config.get("input_size").split('x')))
      metadata = nn_config.get("NN_specific_metadata", {})
      classes = metadata.get("classes", {})
      coordinates = metadata.get("coordinates", {})
      anchors = metadata.get("anchors", {})
      anchor_masks = metadata.get("anchor_masks", {})
      iou_threshold = metadata.get("iou_threshold", {})
      confidence_threshold = metadata.get("confidence_threshold", {})
      
      print(metadata)
      nn_mappings = config.get("mappings", {})
      labels = nn_mappings.get("labels", {})
      
      # pipeline
      pipeline = dai.Pipeline()
      
      camera_l = pipeline.create(dai.node.ColorCamera)
      detection_nn_l = pipeline.create(dai.node.YoloDetectionNetwork)
      xout_rgb_l = pipeline.create(dai.node.XLinkOut)
      xout_nn_l = pipeline.create(dai.node.XLinkOut)
      
      camera_r = pipeline.create(dai.node.ColorCamera)
      detection_nn_r = pipeline.create(dai.node.YoloDetectionNetwork)
      xout_rgb_r = pipeline.create(dai.node.XLinkOut)
      xout_nn_r = pipeline.create(dai.node.XLinkOut)
      
      xout_rgb_l.setStreamName("rgb_l")
      xout_nn_l.setStreamName("nn_l")
      xout_rgb_r.setStreamName("rgb_r")
      xout_nn_r.setStreamName("nn_r")
      
      camera_l.setPreviewSize(nn_width, nn_height)
      camera_l.setBoardSocket(dai.CameraBoardSocket.CAM_B)
      camera_l.setPreviewKeepAspectRatio(False)
      camera_l.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
      camera_l.setInterleaved(False)
      camera_l.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
      camera_l.setFps(config_fps)
      
      camera_r.setPreviewSize(nn_width, nn_height)
      camera_r.setBoardSocket(dai.CameraBoardSocket.CAM_C)
      camera_r.setPreviewKeepAspectRatio(False)
      camera_r.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
      camera_r.setInterleaved(False)
      camera_r.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
      camera_r.setFps(config_fps)
      
      detection_nn_l.setConfidenceThreshold(confidence_threshold)
      detection_nn_l.setNumClasses(classes)
      detection_nn_l.setCoordinateSize(coordinates)
      detection_nn_l.setAnchors(anchors)
      detection_nn_l.setAnchorMasks(anchor_masks)
      detection_nn_l.setIouThreshold(iou_threshold)
      detection_nn_l.setBlobPath(str(nn_path))
      detection_nn_l.setNumInferenceThreads(1)
      detection_nn_l.setNumNCEPerInferenceThread(1)
      detection_nn_l.input.setBlocking(False)
      
      detection_nn_r.setConfidenceThreshold(confidence_threshold)
      detection_nn_r.setNumClasses(classes)
      detection_nn_r.setCoordinateSize(coordinates)
      detection_nn_r.setAnchors(anchors)
      detection_nn_r.setAnchorMasks(anchor_masks)
      detection_nn_r.setIouThreshold(iou_threshold)
      detection_nn_r.setBlobPath(str(nn_path))
      detection_nn_r.setNumInferenceThreads(1)
      detection_nn_r.setNumNCEPerInferenceThread(1)
      detection_nn_r.input.setBlocking(False)
      
      # linking
      camera_l.preview.link(xout_rgb_l.input)
      camera_l.preview.link(detection_nn_l.input)
      detection_nn_l.out.link(xout_nn_l.input)
      
      camera_r.preview.link(xout_rgb_r.input)
      camera_r.preview.link(detection_nn_r.input)
      detection_nn_r.out.link(xout_nn_r.input)
      
      with dai.Device(pipeline) as device:
          print('Device name:', device.getDeviceName())
          if device.getBootloaderVersion() is not None:
              print('Bootloader version:', device.getBootloaderVersion())
          print('Usb speed:', device.getUsbSpeed().name)
          print('Connected cameras:', device.getConnectedCameraFeatures())
      
          device_info = device.getDeviceInfo()
          print('Device mixid:', device_info.getMxId())
      
          q_rgb_l = device.getOutputQueue(name="rgb_l", maxSize=4, blocking=False)
          q_nn_l = device.getOutputQueue(name="nn_l", maxSize=4, blocking=False)
          q_rgb_r = device.getOutputQueue(name="rgb_r", maxSize=4, blocking=False)
          q_nn_r = device.getOutputQueue(name="nn_r", maxSize=4, blocking=False)
      
          start_time = time.monotonic()
      
          counter_l = 0
          detections_l = []
          fps_l = 0
          frame_l = None
      
          counter_r = 0
          detections_r = []
          fps_r = 0
          frame_r = None
      
          text_color = (0, 0, 255)
      
          def to_planar(arr: np.ndarray, shape: tuple) -> np.ndarray:
              return cv2.resize(arr, shape).transpose(2, 0, 1).flatten()
      
          def frameNorm(frame, bbox):
              normVals = np.full(len(bbox), frame.shape[0])
              normVals[::2] = frame.shape[1]
              return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
      
          def displayFrame(name, frame, detections):
              color = (0, 0, 255)
              for detection in detections:
                  bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                  cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
                  cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
                  cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
              cv2.imshow(name, frame)
      
          while True:
              in_nn_l = q_nn_l.tryGet()
              in_rgb_l = q_rgb_l.tryGet()
              in_nn_r = q_nn_r.tryGet()
              in_rgb_r = q_rgb_r.tryGet()
      
              if in_rgb_l is not None:
                  frame_l = in_rgb_l.getCvFrame()
                  fps_l = counter_l / (time.monotonic() - start_time)
                  cv2.putText(frame_l, "NN fps: {:.2f}".format(fps_l),
                              (2, frame_l.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.5, text_color)
              if in_rgb_r is not None:
                  frame_r = in_rgb_r.getCvFrame()
                  fps_r = counter_r / (time.monotonic() - start_time)
                  cv2.putText(frame_r, "NN fps: {:.2f}".format(fps_r),
                              (2, frame_r.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.5, text_color)
      
              if in_nn_l is not None:
                  detections_l = in_nn_l.detections
                  counter_l += 1
      
              if in_nn_r is not None:
                  detections_r = in_nn_r.detections
                  counter_r += 1
      
              if frame_l is not None:
                  displayFrame("left_camera", frame_l, detections_l)
      
              if frame_r is not None:
                  displayFrame("right_camera", frame_r, detections_r)
      
              if cv2.waitKey(1) == ord('q'):
                  break

      The pipeline graph for the python version is the same

      This link contains the code and model to run this python MRE

      Thanks

      Lincoln

        Hi @lincolnxlw ,
        Thank you for the MRE. Just to confirm, are you using latest depthai version (2.24)?

          Hi erik

          For the python example, I am using this docker image that release yesterday https://hub.docker.com/layers/luxonis/depthai-library/latest/images/sha256-be59799f694ef7cc5c60519bd462ac2daa6bef27bd2462e60ed37791121d3c29?context=explore

          It looks like it is using "'2.23.0.0.dev0+dev'"

          For the ROS example, I rebuild it with this docker file yesterday and still having the issue.

          Let me know what do you think

          Lincoln

            Hi @lincolnxlw
            I tried to reproduce with normal OAK-D (so 2x OV9282), but it worked as expected every time (out of 20x) running latest depthai. @jakaskerl , could you perhaps try to repro with FFC-3P and 2x IMX477 cam modules? I'd be surprised that would make a difference though. Perhaps the camera that fails doesn't have perfect connection, and it sometimes doesn't get enumerated, so it doesn't stream?

            Hi lincolnxlw
            Tried with FFC-3P (r3) and two IMX477 connected to cam_L and cam_R.

            lincolnxlw This link contains the code and model to run this python MRE

            This link/code works without issues on my machine. Versions: depthai==2.24.0.0. Tried running 20 times, no errors/crashes.

            Couldn't setup the ROS version due to USB passthrough issues on M1 macs.

            @Luxonis-Adam Can you check with ROS noetic please, so we can pinpoint the issue to a source?
            link:

            lincolnxlw The link should contain the MRE for this issue.

            Thanks,
            Jaka

            Hi @jakaskerl and @erik

            Just want to make sure we are on the same page. I don't have issue running the python MRE, like I mentioned earlier

            Also, I created the exact same pipeline in python running with depthai-library docker image, and I will run 10 times without any problem. I attach the python script below.

            The one has blocking issue is the ROS c++ MRE. It is weird because the pipeline is pretty much the same.

            Thanks

            Lincoln

            Hi, regarding ROS Noetic on Mac, I don't it is officially supported, but you could use Docker images to get the latest version running. In general, if it's possible to connect to the camera with basic DepthAI libraries then it should also work for ROS. Could you share your ROS setup?

              Hi Luxonis-Adam

              As I mentioned in the MRE link, the ROS is setup using docker image depthai-ros built by the latest Dockerfile provided in the depthai-ros repo. Anything more specific you want me to provide?

              Thanks

              Lincoln

              18 days later

              Hi Luxonis-Adam

              Just want to follow up on this. Were you able to reproduce the pipeline blocking issue with the provided MRE?

              Thanks

              Lincoln

              Hi @Luxonis-Adam, @jakaskerl, @erik,

              I added another file image_publisher_ros.cpp in the MRE folder for comparison.

              root@0b6870f3b150:/workspaces/depthai_catkin_ws/src/station_depthai/minimum_image_publisher# tree .
              .
              ├── CMakeLists.txt
              ├── README.md
              ├── config
              │   └── yolov6n.yaml
              ├── include
              │   └── minimum_image_publisher
              ├── models
              │   └── yolov6n_openvino_2022.1_6shave.blob
              ├── package.xml
              └── src
                  ├── image_publisher.cpp
                  └── image_publisher_ros.cpp

              In image_publisher_ros, it doesn't try to get the image from the queues directly but use dai::rosBridge::BridgePublisher to publish the image data. And the blocking issue is much worse than image_publisher.cpp, almost half of the times are failure. I attached here the screen recording of 10 of my run.

              The depthai library I am using is 2.24.0, which is the latest depthai-ros noetic is using. Let me know what else I can provide to help figure out the issue.

              Library information - version: 2.24.0, commit: 6628488ef8956f73f1c7bf4c8f1da218ad327a6f from 2023-12-13 14:45:09 +0100, build: 2023-12-13 23:26:25 +0000, libusb enabled: true

              Thanks!

              Lincoln

              Hi @lincolnxlw, so far I've been unable to replicate the issue. Could you try setting export DEPTHAI_DEBUG=1 before running the example? Also, is this also happening if you run one or none detection networks?

                Hi Luxonis-Adam

                Thanks for responding. I set up the debug and here are my test videos.

                With image_publisher_ros, it failed 2 times out of 5 runs. Here is the video.

                I created a node with only one color camera with detection network. It worked 10 times out of 10 runs. Here is the video.

                Also, when I run the code without pipeline_graph consuming the terminal output. I noticed some red debug messages. But it shows those messages regardless success or failure. And regardless of two detection network or just one.

                == FSYNC enabled for cam mask 0x0
                CAM ID: 1, width: 1920, height: 1080, orientation: 0
                CAM ID: 2, width: 1920, height: 1080, orientation: 0
                == SW-SYNC: 0, cam mask 0x6
                !!! Master Slave config is: single_master_slave !!!
                Starting camera 1
                [E] app_guzzi_command_callback():173: command->id:1
                [E] app_guzzi_command_callback():193: command "1 1" sent
                
                [18443010D1F3F40800] [3.1] [2.609] [system] [warning] PRINT:LeonCss: [E] iq_debug_create():161: iq_debug address 0x88837680
                [E] hai_cm_driver_load_dtp():852: Features for camera IMX214R0 (imx214) are received
                [E] set_dtp_ids():396: //VIV HAL: Undefined VCM DTP ID 0
                [E] set_dtp_ids():405: //VIV HAL: Undefined NVM DTP ID 0
                [E] set_dtp_ids():414: //VIV HAL: Undefined lights DTP ID 0
                [18443010D1F3F40800] [3.1] [2.619] [system] [warning] PRINT:LeonCss: [E] camera_control_start():347: Camera_id = 1 started.
                
                [E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 for start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
                [E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 f
                [18443010D1F3F40800] [3.1] [2.581] [DetectionNetwork(2)] [info] Inference thread count: 1, number of shaves allocated per thread: 6, number of Neural Compute Engines (NCE) allocated per thread: 1
                [18443010D1F3F40800] [3.1] [2.630] [system] [warning] PRINT:LeonCss: or start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
                [18443010D1F3F40800] [3.1] [2.653] [system] [warning] PRINT:LeonCss: [E] vpipe_conv_config():1465: Exit Ok
                [E] callback():123: Camera CB START_DONE event.
                Starting camera 2
                [E] app_guzzi_command_callback():173: command->id:1
                [E] app_guzzi_command_callback():193: command "1 2" sent
                
                [18443010D1F3F40800] [3.1] [2.675] [system] [warning] PRINT:LeonCss: [E] iq_debug_create():161: iq_debug address 0x88375cc0
                [18443010D1F3F40800] [3.1] [2.686] [system] [warning] PRINT:LeonCss: [E] hai_cm_driver_load_dtp():852: Features for camera IMX214R0 (imx214) are received
                [E] set_dtp_ids():396: //VIV HAL: Undefined VCM DTP ID 0
                [E] set_dtp_ids():405: //VIV HAL: Undefined NVM DTP ID 0
                [E] set_dtp_ids():414: //VIV HAL: Undefined lights DTP ID 0
                [E] camera_control_start():347: Camera_id = 2 started.
                
                [18443010D1F3F40800] [3.1] [2.696] [system] [warning] PRINT:LeonCss: [E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 for start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
                [E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 for start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
                inc_camera_process set exposure and gain
                [E] vpipe_conv_config():1465: Exit Ok
                [18443010D1F3F40800] [3.1] [2.707] [system] [warning] PRINT:LeonCss: [E] guzzi_event_send():324: Send: Event ID=20003 no registered recipient
                [E] guzzi_event_send():324: Send: Event ID=20004 no registered recipient
                [E] guzzi_event_send():324: Send: Event ID=20005 no registered recipient
                osDrvImx214Control:514: Start stream
                [E] callback():123: Camera CB START_DONE event.
                inc_camera_process set exposure and gain
                osDrvImx214Control:514: Start stream
                [18443010D1F3F40800] [3.1] [2.718] [system] [warning] PRINT:LeonCss: AF_TRIGGER on camera 1
                [E] app_guzzi_command_callback():173: command->id:5
                [E] camera_control_focus_trigger():591: Focus trigger succeeded camera_id = 1.
                
                [E] app_guzzi_command_callback():218: command "5 1" sent
                
                AF_TRIGGER on camera 2
                [E] app_guzzi_command_callback():173: command->id:5
                [E] camera_control_focus_trigger():591: Focus trigger succeeded camera_id = 2.
                
                [E] app_guzzi_command_callback():218: command "5 2" sent
                
                Starting Guzzi command handling loop...

                Let me know what do you think

                Thanks

                Lincoln

                5 days later

                Hi @lincolnxlw, thanks for the information, to narrow it down more, could you try replicating this setup without ROS, using bare depthai-core library?

                  Hi Luxonis-Adam

                  I switched both cameras from IMX477 to IMX214 and ran the same MRE extensively, and I didn't see any blocking issue. I have 10 of the IMX477 modules and no matter how I pair them up (switch cables as well), I can always see the issue. So I don't think it is an individual camera problem. What do you think? Is this something on firmware side or hardware side? If it is on the hardware side, is it from Luxonis side or ArduCam side?

                  @erik, @jakaskerl feel free to let me know what you think as well. This issue has been blocking us from releasing our depthai-powered product.

                  Thanks

                  Lincoln

                  Hi Lincoln,
                  So to confirm, any type of pipeline (just streaming color stream to XLinkOut) with 2x FFC-IMX477 with FFC-3P using latest depthai version there's a chance it will block one camera at the start, and it won't recover? I'm just trying to understand what would be the smallest repro solution, so the engineering team can fix the issue.
                  Thanks, Erik

                    erik

                    Yes one camera will be blocked and won't recover. The tricky part is the issue not appear for all the pipelines with IMX477. But as comparison, with IMX214, NO issue for all the pipelines.

                    In the latest C++ MRE, I have 3 different nodes

                    • image_publisher_node: color camera nodes and detection nodes for CAM_B and CAM_C, no ROS.
                    • image_publisher_ros_node: color camera nodes and detection nodes for CAM_B and CAM_C, also ROS bridges to publish image and detection to ROS.
                    • image_publisher_ros_single_detection_node: color camera nodes for both cameras, only one camera has detection node. ROS publishing image from two cameras and detection from one camera.

                    In term of severity of the issue

                    For image_publisher_node, the blocking issue can be seen directly from the screen, it happens around 1 out of 10

                    For image_publisher_ros_node, we need pipeline_graph to see the FPS. It happens around 2 out of 5

                    For image_publisher_ros_single_detection_node, we need pipeline_graph to see the FPS, I have NOT see the blocking happen yet

                    Also, like I mentioned a while ago, the issue rarely happens with the python MRE even with IMX477 (1 out of 30?). But I did see it happen before.

                    So in summary

                    • The inconsistency of the issue with IMX477 modules in different pipelines, make it looks like it is more on the firmware side.
                    • But absolutely no issue with IMX214 in all pipelines make it looks like IMX477 hardware is the one to blame.

                    Sorry for the mess. Let me know what do you think.

                    Thanks

                    Lincoln