Pipeline sometimes not being launched as expected

jakaskerl · Dec 5, 2023

Hi lincolnxlw
This is very strange, could you send the minimal reproducible example so I can try to debug it? Are you using any multiprocessing/threading that could cause a race condition or something similar? Perhaps there is a bug somewhere on our side.

Thanks,
Jaka

Llincolnxlw · Dec 9, 2023

jakaskerl

Thanks for the reply. I will prepare something for you.

Lincoln

Llincolnxlw · Dec 12, 2023

Hi jakaskerl

The link should contain the MRE for this issue.

Hardware

OAK FFC-3P with 2 IMX477 modules running in 1080P in CAM B and CAM C sockets

The MRE is an minimum ROS package depends on depthai-ros noetic.

root@83d31607bd1d:/workspaces/depthai_catkin_ws/src/minimum_image_publisher# tree . -h

.

├── [1.2K] CMakeLists.txt

├── [4.0K] config

│ └── [1.4K] yolov6n.yaml

├── [4.0K] include

│ └── [4.0K] minimum_image_publisher

├── [4.0K] models

│ └── [8.9M] yolov6n_openvino_2022.1_6shave.blob

├── [2.0K] package.xml

└── [4.0K] src

└── [ 11K] image_publisher.cpp

5 directories, 5 files

There are also instructions inside the link for how to reproduce the problem.

Attached below are two screenshots, one when the pipeline is running correctly, and one when the left side camera path is totally blocked.

Let me know what do you think.

Thanks

Lincoln

Llincolnxlw · Dec 14, 2023

@jakaskerl

Just to follow up, any questions regarding the MRE I provided.

Also, I created the exact same pipeline in python running with depthai-library docker image, and I will run 10 times without any problem. I attach the python script below.

#!/usr/bin/env python3

from pathlib import Path
import cv2
import depthai as dai
import numpy as np
import argparse
import time
import errno
import os
import sys
import json

parser = argparse.ArgumentParser()
parser.add_argument('-nn', '--nn_model', help='select model path for inference',
                    default='/workspaces/yolov6n_openvino_2022.1_6shave.blob', type=str)
parser.add_argument('-c', '--config', help='Provide config path for inference',
                    default='/workspaces/yolov6n.json', type=str)
parser.add_argument('-fps', '--fps', type=float, help='Frame rate of camera capturing', default=15)

args = parser.parse_args()
nn_path = Path(args.nn_model)
config_path = Path(args.config)
config_fps = args.fps

if not nn_path.is_file():
    sys.exit('NN not found!')
if not config_path.is_file():
    sys.exit('Config not found!')

with config_path.open() as f:
    config = json.load(f)

nn_config = config.get("nn_config", {})
if "input_size" in nn_config:
    nn_width, nn_height = tuple(map(int, nn_config.get("input_size").split('x')))
metadata = nn_config.get("NN_specific_metadata", {})
classes = metadata.get("classes", {})
coordinates = metadata.get("coordinates", {})
anchors = metadata.get("anchors", {})
anchor_masks = metadata.get("anchor_masks", {})
iou_threshold = metadata.get("iou_threshold", {})
confidence_threshold = metadata.get("confidence_threshold", {})

print(metadata)
nn_mappings = config.get("mappings", {})
labels = nn_mappings.get("labels", {})

# pipeline
pipeline = dai.Pipeline()

camera_l = pipeline.create(dai.node.ColorCamera)
detection_nn_l = pipeline.create(dai.node.YoloDetectionNetwork)
xout_rgb_l = pipeline.create(dai.node.XLinkOut)
xout_nn_l = pipeline.create(dai.node.XLinkOut)

camera_r = pipeline.create(dai.node.ColorCamera)
detection_nn_r = pipeline.create(dai.node.YoloDetectionNetwork)
xout_rgb_r = pipeline.create(dai.node.XLinkOut)
xout_nn_r = pipeline.create(dai.node.XLinkOut)

xout_rgb_l.setStreamName("rgb_l")
xout_nn_l.setStreamName("nn_l")
xout_rgb_r.setStreamName("rgb_r")
xout_nn_r.setStreamName("nn_r")

camera_l.setPreviewSize(nn_width, nn_height)
camera_l.setBoardSocket(dai.CameraBoardSocket.CAM_B)
camera_l.setPreviewKeepAspectRatio(False)
camera_l.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camera_l.setInterleaved(False)
camera_l.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camera_l.setFps(config_fps)

camera_r.setPreviewSize(nn_width, nn_height)
camera_r.setBoardSocket(dai.CameraBoardSocket.CAM_C)
camera_r.setPreviewKeepAspectRatio(False)
camera_r.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camera_r.setInterleaved(False)
camera_r.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camera_r.setFps(config_fps)

detection_nn_l.setConfidenceThreshold(confidence_threshold)
detection_nn_l.setNumClasses(classes)
detection_nn_l.setCoordinateSize(coordinates)
detection_nn_l.setAnchors(anchors)
detection_nn_l.setAnchorMasks(anchor_masks)
detection_nn_l.setIouThreshold(iou_threshold)
detection_nn_l.setBlobPath(str(nn_path))
detection_nn_l.setNumInferenceThreads(1)
detection_nn_l.setNumNCEPerInferenceThread(1)
detection_nn_l.input.setBlocking(False)

detection_nn_r.setConfidenceThreshold(confidence_threshold)
detection_nn_r.setNumClasses(classes)
detection_nn_r.setCoordinateSize(coordinates)
detection_nn_r.setAnchors(anchors)
detection_nn_r.setAnchorMasks(anchor_masks)
detection_nn_r.setIouThreshold(iou_threshold)
detection_nn_r.setBlobPath(str(nn_path))
detection_nn_r.setNumInferenceThreads(1)
detection_nn_r.setNumNCEPerInferenceThread(1)
detection_nn_r.input.setBlocking(False)

# linking
camera_l.preview.link(xout_rgb_l.input)
camera_l.preview.link(detection_nn_l.input)
detection_nn_l.out.link(xout_nn_l.input)

camera_r.preview.link(xout_rgb_r.input)
camera_r.preview.link(detection_nn_r.input)
detection_nn_r.out.link(xout_nn_r.input)

with dai.Device(pipeline) as device:
    print('Device name:', device.getDeviceName())
    if device.getBootloaderVersion() is not None:
        print('Bootloader version:', device.getBootloaderVersion())
    print('Usb speed:', device.getUsbSpeed().name)
    print('Connected cameras:', device.getConnectedCameraFeatures())

    device_info = device.getDeviceInfo()
    print('Device mixid:', device_info.getMxId())

    q_rgb_l = device.getOutputQueue(name="rgb_l", maxSize=4, blocking=False)
    q_nn_l = device.getOutputQueue(name="nn_l", maxSize=4, blocking=False)
    q_rgb_r = device.getOutputQueue(name="rgb_r", maxSize=4, blocking=False)
    q_nn_r = device.getOutputQueue(name="nn_r", maxSize=4, blocking=False)

    start_time = time.monotonic()

    counter_l = 0
    detections_l = []
    fps_l = 0
    frame_l = None

    counter_r = 0
    detections_r = []
    fps_r = 0
    frame_r = None

    text_color = (0, 0, 255)

    def to_planar(arr: np.ndarray, shape: tuple) -> np.ndarray:
        return cv2.resize(arr, shape).transpose(2, 0, 1).flatten()

    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

    def displayFrame(name, frame, detections):
        color = (0, 0, 255)
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
            cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
        cv2.imshow(name, frame)

    while True:
        in_nn_l = q_nn_l.tryGet()
        in_rgb_l = q_rgb_l.tryGet()
        in_nn_r = q_nn_r.tryGet()
        in_rgb_r = q_rgb_r.tryGet()

        if in_rgb_l is not None:
            frame_l = in_rgb_l.getCvFrame()
            fps_l = counter_l / (time.monotonic() - start_time)
            cv2.putText(frame_l, "NN fps: {:.2f}".format(fps_l),
                        (2, frame_l.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.5, text_color)
        if in_rgb_r is not None:
            frame_r = in_rgb_r.getCvFrame()
            fps_r = counter_r / (time.monotonic() - start_time)
            cv2.putText(frame_r, "NN fps: {:.2f}".format(fps_r),
                        (2, frame_r.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.5, text_color)

        if in_nn_l is not None:
            detections_l = in_nn_l.detections
            counter_l += 1

        if in_nn_r is not None:
            detections_r = in_nn_r.detections
            counter_r += 1

        if frame_l is not None:
            displayFrame("left_camera", frame_l, detections_l)

        if frame_r is not None:
            displayFrame("right_camera", frame_r, detections_r)

        if cv2.waitKey(1) == ord('q'):
            break

The pipeline graph for the python version is the same

This link contains the code and model to run this python MRE

Thanks

Lincoln

erik · Dec 15, 2023

Hi @lincolnxlw ,
Thank you for the MRE. Just to confirm, are you using latest depthai version (2.24)?

Llincolnxlw · Dec 15, 2023

Hi erik

For the python example, I am using this docker image that release yesterday https://hub.docker.com/layers/luxonis/depthai-library/latest/images/sha256-be59799f694ef7cc5c60519bd462ac2daa6bef27bd2462e60ed37791121d3c29?context=explore

It looks like it is using "'2.23.0.0.dev0+dev'"

For the ROS example, I rebuild it with this docker file yesterday and still having the issue.

Let me know what do you think

Lincoln

erik · Dec 18, 2023

Hi @lincolnxlw
I tried to reproduce with normal OAK-D (so 2x OV9282), but it worked as expected every time (out of 20x) running latest depthai. @jakaskerl , could you perhaps try to repro with FFC-3P and 2x IMX477 cam modules? I'd be surprised that would make a difference though. Perhaps the camera that fails doesn't have perfect connection, and it sometimes doesn't get enumerated, so it doesn't stream?

jakaskerl · Dec 18, 2023

Hi lincolnxlw
Tried with FFC-3P (r3) and two IMX477 connected to cam_L and cam_R.

lincolnxlw This link contains the code and model to run this python MRE

This link/code works without issues on my machine. Versions: depthai==2.24.0.0. Tried running 20 times, no errors/crashes.

Couldn't setup the ROS version due to USB passthrough issues on M1 macs.

@Luxonis-Adam Can you check with ROS noetic please, so we can pinpoint the issue to a source?
link:

lincolnxlw The link should contain the MRE for this issue.

Thanks,
Jaka

Llincolnxlw · Dec 19, 2023

Hi @jakaskerl and @erik

Just want to make sure we are on the same page. I don't have issue running the python MRE, like I mentioned earlier

Also, I created the exact same pipeline in python running with depthai-library docker image, and I will run 10 times without any problem. I attach the python script below.

The one has blocking issue is the ROS c++ MRE. It is weird because the pipeline is pretty much the same.

Thanks

Lincoln

Luxonis-Adam · Dec 21, 2023

Hi, regarding ROS Noetic on Mac, I don't it is officially supported, but you could use Docker images to get the latest version running. In general, if it's possible to connect to the camera with basic DepthAI libraries then it should also work for ROS. Could you share your ROS setup?

Llincolnxlw · Dec 22, 2023

Hi Luxonis-Adam

As I mentioned in the MRE link, the ROS is setup using docker image depthai-ros built by the latest Dockerfile provided in the depthai-ros repo. Anything more specific you want me to provide?

Thanks

Lincoln

Llincolnxlw · Jan 8, 2024

Hi Luxonis-Adam

Just want to follow up on this. Were you able to reproduce the pipeline blocking issue with the provided MRE?

Thanks

Lincoln

Llincolnxlw · Jan 10, 2024

Hi @Luxonis-Adam, @jakaskerl, @erik,

I added another file image_publisher_ros.cpp in the MRE folder for comparison.

root@0b6870f3b150:/workspaces/depthai_catkin_ws/src/station_depthai/minimum_image_publisher# tree .
.
├── CMakeLists.txt
├── README.md
├── config
│   └── yolov6n.yaml
├── include
│   └── minimum_image_publisher
├── models
│   └── yolov6n_openvino_2022.1_6shave.blob
├── package.xml
└── src
    ├── image_publisher.cpp
    └── image_publisher_ros.cpp

In image_publisher_ros, it doesn't try to get the image from the queues directly but use dai::rosBridge::BridgePublisher to publish the image data. And the blocking issue is much worse than image_publisher.cpp, almost half of the times are failure. I attached here the screen recording of 10 of my run.

The depthai library I am using is 2.24.0, which is the latest depthai-ros noetic is using. Let me know what else I can provide to help figure out the issue.

Library information - version: 2.24.0, commit: 6628488ef8956f73f1c7bf4c8f1da218ad327a6f from 2023-12-13 14:45:09 +0100, build: 2023-12-13 23:26:25 +0000, libusb enabled: true

Thanks!

Lincoln

Luxonis-Adam · Jan 10, 2024

Hi @lincolnxlw, so far I've been unable to replicate the issue. Could you try setting export DEPTHAI_DEBUG=1 before running the example? Also, is this also happening if you run one or none detection networks?

Llincolnxlw · Jan 10, 2024

Hi Luxonis-Adam

Thanks for responding. I set up the debug and here are my test videos.

With image_publisher_ros, it failed 2 times out of 5 runs. Here is the video.

I created a node with only one color camera with detection network. It worked 10 times out of 10 runs. Here is the video.

Also, when I run the code without pipeline_graph consuming the terminal output. I noticed some red debug messages. But it shows those messages regardless success or failure. And regardless of two detection network or just one.

== FSYNC enabled for cam mask 0x0
CAM ID: 1, width: 1920, height: 1080, orientation: 0
CAM ID: 2, width: 1920, height: 1080, orientation: 0
== SW-SYNC: 0, cam mask 0x6
!!! Master Slave config is: single_master_slave !!!
Starting camera 1
[E] app_guzzi_command_callback():173: command->id:1
[E] app_guzzi_command_callback():193: command "1 1" sent

[18443010D1F3F40800] [3.1] [2.609] [system] [warning] PRINT:LeonCss: [E] iq_debug_create():161: iq_debug address 0x88837680
[E] hai_cm_driver_load_dtp():852: Features for camera IMX214R0 (imx214) are received
[E] set_dtp_ids():396: //VIV HAL: Undefined VCM DTP ID 0
[E] set_dtp_ids():405: //VIV HAL: Undefined NVM DTP ID 0
[E] set_dtp_ids():414: //VIV HAL: Undefined lights DTP ID 0
[18443010D1F3F40800] [3.1] [2.619] [system] [warning] PRINT:LeonCss: [E] camera_control_start():347: Camera_id = 1 started.

[E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 for start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
[E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 f
[18443010D1F3F40800] [3.1] [2.581] [DetectionNetwork(2)] [info] Inference thread count: 1, number of shaves allocated per thread: 6, number of Neural Compute Engines (NCE) allocated per thread: 1
[18443010D1F3F40800] [3.1] [2.630] [system] [warning] PRINT:LeonCss: or start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
[18443010D1F3F40800] [3.1] [2.653] [system] [warning] PRINT:LeonCss: [E] vpipe_conv_config():1465: Exit Ok
[E] callback():123: Camera CB START_DONE event.
Starting camera 2
[E] app_guzzi_command_callback():173: command->id:1
[E] app_guzzi_command_callback():193: command "1 2" sent

[18443010D1F3F40800] [3.1] [2.675] [system] [warning] PRINT:LeonCss: [E] iq_debug_create():161: iq_debug address 0x88375cc0
[18443010D1F3F40800] [3.1] [2.686] [system] [warning] PRINT:LeonCss: [E] hai_cm_driver_load_dtp():852: Features for camera IMX214R0 (imx214) are received
[E] set_dtp_ids():396: //VIV HAL: Undefined VCM DTP ID 0
[E] set_dtp_ids():405: //VIV HAL: Undefined NVM DTP ID 0
[E] set_dtp_ids():414: //VIV HAL: Undefined lights DTP ID 0
[E] camera_control_start():347: Camera_id = 2 started.

[18443010D1F3F40800] [3.1] [2.696] [system] [warning] PRINT:LeonCss: [E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 for start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
[E] hai_cm_sensor_select_mode():164: No suitable sensor mode. Selecting default one - 0 for start 1920x1080 at 0x0 fps min 0.000000 max 30.000000
inc_camera_process set exposure and gain
[E] vpipe_conv_config():1465: Exit Ok
[18443010D1F3F40800] [3.1] [2.707] [system] [warning] PRINT:LeonCss: [E] guzzi_event_send():324: Send: Event ID=20003 no registered recipient
[E] guzzi_event_send():324: Send: Event ID=20004 no registered recipient
[E] guzzi_event_send():324: Send: Event ID=20005 no registered recipient
osDrvImx214Control:514: Start stream
[E] callback():123: Camera CB START_DONE event.
inc_camera_process set exposure and gain
osDrvImx214Control:514: Start stream
[18443010D1F3F40800] [3.1] [2.718] [system] [warning] PRINT:LeonCss: AF_TRIGGER on camera 1
[E] app_guzzi_command_callback():173: command->id:5
[E] camera_control_focus_trigger():591: Focus trigger succeeded camera_id = 1.

[E] app_guzzi_command_callback():218: command "5 1" sent

AF_TRIGGER on camera 2
[E] app_guzzi_command_callback():173: command->id:5
[E] camera_control_focus_trigger():591: Focus trigger succeeded camera_id = 2.

[E] app_guzzi_command_callback():218: command "5 2" sent

Starting Guzzi command handling loop...

Let me know what do you think

Thanks

Lincoln

Luxonis-Adam · Jan 15, 2024

Hi @lincolnxlw, thanks for the information, to narrow it down more, could you try replicating this setup without ROS, using bare depthai-core library?

Llincolnxlw · Jan 17, 2024

Hi Luxonis-Adam

I switched both cameras from IMX477 to IMX214 and ran the same MRE extensively, and I didn't see any blocking issue. I have 10 of the IMX477 modules and no matter how I pair them up (switch cables as well), I can always see the issue. So I don't think it is an individual camera problem. What do you think? Is this something on firmware side or hardware side? If it is on the hardware side, is it from Luxonis side or ArduCam side?

@erik, @jakaskerl feel free to let me know what you think as well. This issue has been blocking us from releasing our depthai-powered product.

Thanks

Lincoln

erik · Jan 17, 2024

Hi Lincoln,
So to confirm, any type of pipeline (just streaming color stream to XLinkOut) with 2x FFC-IMX477 with FFC-3P using latest depthai version there's a chance it will block one camera at the start, and it won't recover? I'm just trying to understand what would be the smallest repro solution, so the engineering team can fix the issue.
Thanks, Erik

erik · Jan 17, 2024

(forgot to cc, @lincolnxlw ^)

Llincolnxlw · Jan 17, 2024

erik

Yes one camera will be blocked and won't recover. The tricky part is the issue not appear for all the pipelines with IMX477. But as comparison, with IMX214, NO issue for all the pipelines.

In the latest C++ MRE, I have 3 different nodes

image_publisher_node: color camera nodes and detection nodes for CAM_B and CAM_C, no ROS.
image_publisher_ros_node: color camera nodes and detection nodes for CAM_B and CAM_C, also ROS bridges to publish image and detection to ROS.
image_publisher_ros_single_detection_node: color camera nodes for both cameras, only one camera has detection node. ROS publishing image from two cameras and detection from one camera.

In term of severity of the issue

For image_publisher_node, the blocking issue can be seen directly from the screen, it happens around 1 out of 10

For image_publisher_ros_node, we need pipeline_graph to see the FPS. It happens around 2 out of 5

For image_publisher_ros_single_detection_node, we need pipeline_graph to see the FPS, I have NOT see the blocking happen yet

Also, like I mentioned a while ago, the issue rarely happens with the python MRE even with IMX477 (1 out of 30?). But I did see it happen before.

So in summary

The inconsistency of the issue with IMX477 modules in different pipelines, make it looks like it is more on the firmware side.
But absolutely no issue with IMX214 in all pipelines make it looks like IMX477 hardware is the one to blame.

Sorry for the mess. Let me know what do you think.

Thanks

Lincoln