DepthAI script stuck when using 12MP resolution

Yyishu_corpex · May 10, 2024

Earlier when I ran the DepthAI deployment script for running the OAK-D camera, it ran smoothly for all resolutions I used (both 12MP and 1080P). However, now when I try to run the same script using 12MP resolution, the script gets stuck trying to get detections from the queue. Please find the below screenshots of my main code loop.

In order to try and debug it I visualized the pipeline using the luxonis/depthai_pipeline_graph. I see that when I run the script using 1080P resolution it runs without any issue and the graph also shows frame rate and frames being passed to each node as shown below.

However, when I run the script using 12 MP resolution it it gets stuck trying the get detections from the queue. The pipeline graph for this shows really low fps (1 fps) and we can see that the ImageManip node show 0 fps, which I'm guessing means it isn't receiving any frames . This causes the script being stuck at trying to get data from the empty detection queue.

Can you help me out to debug why this issue is happening only when using 12MP resolution?

Thanks & Regards

Yishu

jakaskerl · May 10, 2024

Hi @yishu_corpex
My guess is that due to much higher requirements of 12MP image, the pipeline is unable to process the stream fast enough causing it to crash. I don't see the linking part, are you sending 12MP streams to host side perhaps?

12MP is pretty demanding for both the Xlink as well as the host.

Thanks,
Jaka

Yyishu_corpex · May 10, 2024

Hi @jakaskerl

I understand 12MP has much higher requirements. However, it has been working since many months for us until recently when we started getting this issue. Also, please find below the linking part as requested.

Let me know if you find something off about the linking part, but like I said its strange that what has been working for months suddenly stopped working. Just curious about it.

Thanks & Regards

Yishu

jakaskerl · May 12, 2024

Hi @yishu_corpex
This looks fine at first glance. Could you maybe try to view the passthrough output od the NN node, to make sure the images are coming in correctly? Looks like there might be a problem with how the manip node processes the ISP image which would result in no returning detections.
Also, if that is your custom model, try with a different (perhaps stock) one.

ps. you can attach code by enclosing it in three ` on each side.

Thanks,
Jaka

Yyishu_corpex · May 15, 2024

Hi @jacklin123

So when i set the image resolution to 12MP and try to access the passthrough output of the NN node it gets stuck there itself. However, then I printed the output queue of the ISP and Manip node and both showed me None before getting stuck at passthrough.

I tried the same with setting up 1080P resolution. The very first initial output of the ISP and Manip nodes do show None but the passthrough shows an actual depthai image frame. In the following iterations ISP, Manip and passthrough outputs do show a depthai image frame. Still trying to figure out what might be the issue.

Also, thanks for the input on enclosing this code. I am again enclosing the entire code here for your reference.

with configPath.open() as f:
    config = json.load(f)
nnConfig = config.get("nn_config", {})

# parse input shape
if "input_size" in nnConfig:
    W, H = tuple(map(int, nnConfig.get("input_size").split('x')))

# extract metadata
metadata = nnConfig.get("NN_specific_metadata", {})
classes = metadata.get("classes", {})
coordinates = metadata.get("coordinates", {})
anchors = metadata.get("anchors", {})
anchorMasks = metadata.get("anchor_masks", {})
iouThreshold = metadata.get("iou_threshold", {})
confidenceThreshold = metadata.get("confidence_threshold", {})

print(metadata)

# parse labels
nnMappings = config.get("mappings", {})
labels = nnMappings.get("labels", {})
print("Labels: ", labels)

# get model path
nnPath = args.model
if not Path(nnPath).exists():
    print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
    nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
# sync outputs
syncNN = True

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)
#xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)


# By Yishu
xoutISP = pipeline.create(dai.node.XLinkOut)
manip = pipeline.create(dai.node.ImageManip)
xoutManip = pipeline.create(dai.node.XLinkOut)

# Passthrough to debug
# Send passthrough frames to the host, so frames are in sync with bounding boxes
passthroughOut = pipeline.create(dai.node.XLinkOut)
passthroughOut.setStreamName("pass")
detectionNetwork.passthrough.link(passthroughOut.input)

#xoutRgb.setStreamName("rgb")
xoutISP.setStreamName("ISP")
nnOut.setStreamName("nn")
xoutManip.setStreamName("Manip")

# Properties
camRgb.setPreviewSize(W, H)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P) #THE_1080_P
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.RGB)
camRgb.setFps(25) #40

# By Yishu
manip.initialConfig.setKeepAspectRatio(False) #True
manip.initialConfig.setResize(W, H)

# Change to RGB image than BGR - Yishu
#manip.initialConfig.setFrameType(dai.ImgFrame.Type.RGB888p) #dai.ImgFrame.Type.BGR888p
manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p

# setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
manip.setMaxOutputFrameSize(1228800)

# By Yishu
nnOut.input.setBlocking(False)
xoutISP.input.setBlocking(False)
xoutManip.input.setBlocking(False)

# By Yishu
nnOut.input.setQueueSize(10)
xoutISP.input.setQueueSize(10)
xoutManip.input.setQueueSize(10)
detectionNetwork.input.setQueueSize(10)

# Network specific settings
detectionNetwork.setConfidenceThreshold(confidenceThreshold)
detectionNetwork.setNumClasses(classes)
detectionNetwork.setCoordinateSize(coordinates)
detectionNetwork.setAnchors(anchors)
detectionNetwork.setAnchorMasks(anchorMasks)
detectionNetwork.setIouThreshold(iouThreshold)
detectionNetwork.setBlobPath(nnPath)
detectionNetwork.setNumInferenceThreads(2)
detectionNetwork.input.setBlocking(False)

# Linking
#camRgb.preview.link(detectionNetwork.input)
#detectionNetwork.passthrough.link(xoutRgb.input)
#detectionNetwork.out.link(nnOut.input)
camRgb.isp.link(manip.inputImage)
#camRgb.still.link(manip.inputImage)
manip.out.link(detectionNetwork.input)

# By Yishu
manip.out.link(xoutManip.input)

#detectionNetwork.passthrough.link(xoutISP.input)
detectionNetwork.out.link(nnOut.input)
camRgb.isp.link(xoutISP.input)

device_info = dai.DeviceInfo("192.168.220.10")

# Connect to device and start pipeline
with dai.Device(pipeline, device_info) as device:
    print("1")
    # Output queues will be used to get the rgb frames and nn data from the outputs defined above
    #qRgb = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)
    #qDet = device.getOutputQueue(name="nn", maxSize=4, blocking=False)
    qDet = device.getOutputQueue("nn", 4, blocking=False) #device.getOutputQueue("nn", 1, blocking=False)
    qISP = device.getOutputQueue("ISP", 4, blocking=False)
    qManip = device.getOutputQueue("Manip", 4, blocking=False)

    # Passthrough to debug
    qPass = device.getOutputQueue(name="pass")

    frame = None
    detections = []
    startTime = time.monotonic()
    counter = 0
    color2 = (255, 255, 255)

    # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

    def displayFrame(name, frame, detections, i):
        color_spool = (255, 0, 0)
        color_person = (0, 255, 0)
        color = ''
        text_output_path = "D:\\Cameras_Live\\PayOff\\" + "label_{}".format(j) + ".txt"
        print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
        text_file = open(text_output_path, 'w')
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            if labels[detection.label] == 'person':
                color = color_person
            else:
                color = color_spool
            cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
            cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
            # Write detections to a text file
            text_file.write(f'{labels[detection.label]} {bbox[0]} {bbox[1]} {bbox[2]} {bbox[3]} {detection.confidence}\n')  # Add a separator between entries

        text_file.close()
        # Show the frame
        #cv2.namedWindow("Model Inference", cv2.WND_PROP_FULLSCREEN)
        cv2.namedWindow(name, cv2.WINDOW_NORMAL)
        #cv2.setWindowProperty(name, cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
        cv2.imshow(name, frame)

        
    print("2")
    j=1
    ###Create the folder that will contain capturing
    folder_name = "D:\\Cameras_Live\\PayOff"
    path = Path(folder_name)


    while True:

        ###Create the folder that will contain capturing
        path.mkdir(parents=True, exist_ok=True)

        print("3")
        # By Yishu
        inISP = qISP.tryGet()
        print("4")
        inManip = qManip.tryGet()
        print("Manip: ", inManip)
        print("5")
        frame_pass = qPass.get() #.getCvFrame()
        print('passthrough: ', frame_pass)
        inDet = qDet.get()
        print("6")
        if inISP is not None:
            print("inISP not None")
            frame = inISP.getCvFrame()
            cv2.imwrite("D:\\Cameras_Live\\PayOff\\RGB_Manip_{}.png".format(j), frame)
            nn_fps  =counter / (time.monotonic() - startTime)
            print("nn_fps: ", nn_fps)
            cv2.putText(frame, "NN fps: {:.2f}".format(nn_fps),
                        (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color2)
        print("7")
        if inDet is not None:
            detections = inDet.detections
            counter += 1
        print("8")
        if frame is not None:
            print("Frame not None")
            displayFrame("manip", frame, detections, j)
        print("9")
        if cv2.waitKey(1) == ord('q'):
            break

        j=j+1
        if j==5:
            j=1
        
        print("10")

Thanks,
Yishu

Yyishu_corpex · May 15, 2024

Hi @jakaskerl

I carried out further experiments with this code, when I am setting the resolution to 12MP, the output of the ISP and Manip node queues is always None. Probably that is why it gets stuck when it sends None to NN node. But its strange that by just changing the resolution I am not getting any frames from the camera.

I tested it on another OAK-D camera which is the same model and I see the same behavior on that as well. I'm sure something is going on that I am not able to understand and debug. I have other scripts using the same NN model and 12MP resolution which work fine without this issue. But I want to look into what is going on in this script and why are we having this issue.

jakaskerl · May 15, 2024

Hi @yishu_corpex
Then it's certainly a pipeline issue, I'll run it tomorrow to see what the issue is.

Thanks,
Jaka

Yyishu_corpex · May 15, 2024

Hi @jakaskerl

Okay, sounds good. Thanks for your help. I look forward to hearing back from you regarding this pipeline issue.

Thanks,
Yishu

jakaskerl · May 16, 2024

Hi @yishu_corpex
Can you create a MRE please, so I can run it locally?

Thanks,
Jaka

Yyishu_corpex · May 16, 2024

Hi @jakaskerl

Please find the MRE below.

So I am using our own custom Yolov6n trained model. If you can share with me your official email id, then I can give you the access to the model blob and json file to reproduce.

I tried to reproduced it using the pretrained yolov4_tiny model provided by depthai for car detection under depthai-experiments. However i don't see any issue when I am using this model.

You will require the following libraries to run the script:

OpenCv Python
DepthAI
blobconverter
numpy

Also, put the model and json file paths in the argument parser lines as the default path. The script will itself pick that default path when you run it from VS code.

from pathlib import Path
import cv2
import depthai as dai
import numpy as np
import time
import argparse
import json
import blobconverter

# parse arguments 
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                    default='models/6_shaves_model/best_ckpt_openvino_2022.1_6shave.blob', type=str)
parser.add_argument("-c", "--config", help="Provide config path for inference",
                    default='models/6_shaves_model/best_ckpt.json', type=str)
args = parser.parse_args()

# parse config
configPath = Path(args.config)
if not configPath.exists():
    raise ValueError("Path {} does not exist!".forma/t(configPath))

with configPath.open() as f:
    config = json.load(f)
nnConfig = config.get("nn_config", {})

# parse input shape
if "input_size" in nnConfig:
    W, H = tuple(map(int, nnConfig.get("input_size").split('x')))

# extract metadata
metadata = nnConfig.get("NN_specific_metadata", {})
classes = metadata.get("classes", {})
coordinates = metadata.get("coordinates", {})
anchors = metadata.get("anchors", {})
anchorMasks = metadata.get("anchor_masks", {})
iouThreshold = metadata.get("iou_threshold", {})
confidenceThreshold = metadata.get("confidence_threshold", {})

print(metadata)

# parse labels
nnMappings = config.get("mappings", {})
labels = nnMappings.get("labels", {})
print("Labels: ", labels)

# get model path
nnPath = args.model
if not Path(nnPath).exists():
    print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
    nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
# sync outputs
syncNN = True

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)
#xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)


# By Yishu
xoutISP = pipeline.create(dai.node.XLinkOut)
manip = pipeline.create(dai.node.ImageManip)
xoutManip = pipeline.create(dai.node.XLinkOut)

# Passthrough to debug
# Send passthrough frames to the host, so frames are in sync with bounding boxes
passthroughOut = pipeline.create(dai.node.XLinkOut)
passthroughOut.setStreamName("pass")
detectionNetwork.passthrough.link(passthroughOut.input)

#xoutRgb.setStreamName("rgb")
xoutISP.setStreamName("ISP")
nnOut.setStreamName("nn")
xoutManip.setStreamName("Manip")

# Properties
camRgb.setPreviewSize(W, H)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP) #THE_1080_P
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.RGB)
camRgb.setFps(25) #40

# By Yishu
manip.initialConfig.setKeepAspectRatio(False) #True
manip.initialConfig.setResize(W, H)

# Change to RGB image than BGR - Yishu
manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) 

# setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
manip.setMaxOutputFrameSize(1228800)

# By Yishu
nnOut.input.setBlocking(False)
xoutISP.input.setBlocking(False)
xoutManip.input.setBlocking(False)

# By Yishu
nnOut.input.setQueueSize(10)
xoutISP.input.setQueueSize(10)
xoutManip.input.setQueueSize(10)
detectionNetwork.input.setQueueSize(10)

# Network specific settings
detectionNetwork.setConfidenceThreshold(confidenceThreshold)
detectionNetwork.setNumClasses(classes)
detectionNetwork.setCoordinateSize(coordinates)
detectionNetwork.setAnchors(anchors)
detectionNetwork.setAnchorMasks(anchorMasks)
detectionNetwork.setIouThreshold(iouThreshold)
detectionNetwork.setBlobPath(nnPath)
detectionNetwork.setNumInferenceThreads(2)
detectionNetwork.input.setBlocking(False)


camRgb.isp.link(manip.inputImage)
manip.out.link(detectionNetwork.input)

# By Yishu
manip.out.link(xoutManip.input)

detectionNetwork.out.link(nnOut.input)
camRgb.isp.link(xoutISP.input)

device_info = dai.DeviceInfo("192.168.220.10")

# Connect to device and start pipeline
with dai.Device(pipeline, device_info) as device:
    # Output queues will be used to get the rgb frames and nn data from the outputs defined above
    qDet = device.getOutputQueue("nn", 4, blocking=False) #device.getOutputQueue("nn", 1, blocking=False)
    qISP = device.getOutputQueue("ISP", 4, blocking=False)
    qManip = device.getOutputQueue("Manip", 4, blocking=False)

    # Passthrough to debug
    qPass = device.getOutputQueue(name="pass")

    frame = None
    detections = []
    startTime = time.monotonic()
    counter = 0
    color2 = (255, 255, 255)
    
    j=1
    ###Create the folder that will contain capturing
    folder_name = "D:\\Cameras_Live\\PayOff"
    path = Path(folder_name)


    while True:

        ###Create the folder that will contain capturing
        path.mkdir(parents=True, exist_ok=True)

        # By Yishu
        inISP = qISP.tryGet() # ISP QUEUE IS ALWAYS NONE WHEN SETTING THE RESOLUTION TO 12MP
        
        inManip = qManip.tryGet()
        print('1')
        frame_pass = qPass.get() #.getCvFrame()
        print('2')
        inDet = qDet.get()
        print('3')

        if inISP is not None:
            
            frame = inISP.getCvFrame()
            nn_fps  =counter / (time.monotonic() - startTime)
            print("nn_fps: ", nn_fps)
            cv2.putText(frame, "NN fps: {:.2f}".format(nn_fps),
                        (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color2)
        
        if inDet is not None:
            detections = inDet.detections
            counter += 1
        
        if cv2.waitKey(1) == ord('q'):
            break

        j=j+1
        if j==5:
            j=1

Thanks,
Yishu

jakaskerl · May 17, 2024

Hi @yishu_corpex
It's a depthai FW issue it seems, since the manip can't process the 12MP frames in certain W/H configurations. I have forwarded it to the dev team.

Thanks,
Jaka

Yyishu_corpex · May 17, 2024

Hi @jakaskerl

I'm curious why this firmware issue is only coming up with this script. I have other scripts where I use the same model and same configurations with the Manip node.

Also, how can we track the progress of the resolution of this issue?

Thanks,
Yishu

jakaskerl · May 19, 2024

Hi @yishu_corpex

yishu_corpex I have other scripts where I use the same model and same configurations with the Manip node.

Seems like a scaling issue from 12MP to W/H when using keepAspectRatio(False).

yishu_corpex Also, how can we track the progress of the resolution of this issue?

It's internal so you can't track it, but you can check the bugfixes section under new releases: luxonis/depthai-pythonreleases/tag/v2.25.1.0

Thanks,
Jaka

Yyishu_corpex · Jun 24, 2024

Hi @jakaskerl

I wanted to check with you if you can provide any update on the resolution of this issue.

I am again getting the same issue if I try to make use f YoloSpatialDetectionNetwork node instead of YoloDetectionNetwork node.

This time I'm getting this issue even when I'm using 1080P for RGB Camera and 400P for Left and Right Mono cameras.

Please let me know as it is becoming a major bottleneck for many of our projects.

Thanks & Regards
Yishu

jakaskerl · Jun 25, 2024

Hi @yishu_corpex

yishu_corpex I wanted to check with you if you can provide any update on the resolution of this issue.

Can't say for certain, our FW devs are working on it.

yishu_corpex Please let me know as it is becoming a major bottleneck for many of our projects.

Why not just use the preview? Do you absolutely need the full FOV? You can do something like this to still preserve the FOV:

# Properties
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
camRgb.setIspScale(1, 4)

w, h = camRgb.getIspSize()
camRgb.setPreviewSize(640, 400)
camRgb.setPreviewKeepAspectRatio(False)

xoutVideo.input.setBlocking(False)
xoutVideo.input.setQueueSize(1)

# Linking
camRgb.preview.link(xoutVideo.input)

Thanks,
Jaka

Yyishu_corpex · Jun 25, 2024

Hi @jakaskerl

This does preserve the FOV, however I see more pixel blur around the objects in this case compared to when using the ISP output and then using the ImageManip node to resize it to the NN input size.

This causes decrease in NN accuracy which is substantial enough that we can't ignore it.

I'll look into making changes to the pipeline and see if I can avoid the 12MP scaling FW issue. If not, then I think the best way for now would be to use the Preview as an input to the NN and then re-calculate the bounding boxes to display them on ISP 12MP image.

Thanks
Yishu

Yyishu_corpex · 2 Jun

Hi @jakaskerl @erik

Is there any update on the resolution for the FW issue mentioned above. I do see that when i run depthai 2.30 version in Python the issue seems to be resolved. However ,if I use depthai-core 2.30 version in C++ code which has exactly the same pipeline as Python code, i still get the same FW issue with 12 MP resolution. Its been more than a year and now this issue is causing us a major road block in order to deploy OAK-1 PoE cameras.

I look forward to hearing from you soon.

Thanks & Regards
Yishu

jakaskerl · 6 Jun

yishu_corpex
It shouldn't be an issue on the new v3 depthai. The Camera node has been revamped.

Thanks,
Jaka