Huge Lag between Yolo Detections and Output frames

jakaskerl · Jun 29, 2024

Hi yishu_corpex
You are probably sending a separate RGB stream to host, which is not synced with NN passthrough.

I suggest you either use the Sync node and link NN out and RGB out; if you need full resolution frames on the host side.
The other way would be to send passthrough to the host, for which you don't need any syncing, but the frames will be smaller.

Thanks,
Jaka

Yyishu_corpex · Jul 3, 2024

Hi @jakaskerl

Yes, you are correct. I'm sending the ISP stream to the host, which doesn't seem to be synced with the NN passthrough.

I did try using the sync node and link the NN and ISP out as suggested by you. However, the sync node output queue does get stuck right after the first frame. It freezes which I guess is a known behavior when frames aren't disposed off as per other posts like frame syncing hangs.

I did experiment with using the setIspNumFramesPool(...) to increase the pool size, however saw the same behavior of the sync node output queue freezing and the the script is stuck/hung on it.

Let me know how can this be resolved and of you want I can share with you the MRE for this issue so you can test it at your end as well.

Thanks
Yishu

jakaskerl · Jul 4, 2024

Hi @yishu_corpex

yishu_corpex Let me know how can this be resolved and of you want I can share with you the MRE for this issue so you can test it at your end as well.

Yes, this would be best so I can quickly just alter the code.

Thanks,
Jaka

Yyishu_corpex · Jul 4, 2024

Hi @jakaskerl

Please find the MRE below.

So I am using our own custom Yolov6n trained model. I have shared with you over email the model blob and json file.

You will require the following libraries to run the script:

OpenCv Python
DepthAI
blobconverter
numpy
datetime

You will see a lot of print statements in the code. They are just to help debug where exactly the script gets stuck. Also, put the model and json file paths in the argument parser lines as the default path. The script will itself pick that default path when you run it from VS code.

def getDisparityFrame(frame, cvColorMap):
    maxDisp = stereo.initialConfig.getMaxDisparity()
    disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
    disp = cv2.applyColorMap(disp, cvColorMap)

    return disp

# parse arguments 
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                    default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
parser.add_argument("-c", "--config", help="Provide config path for inference",
                    default='models/best_ckpt.json', type=str)
args = parser.parse_args()


# parse config
configPath = Path(args.config)
if not configPath.exists():
    raise ValueError("Path {} does not exist!".format(configPath))

with configPath.open() as f:
    config = json.load(f)
nnConfig = config.get("nn_config", {})

# parse input shape
if "input_size" in nnConfig:
    W, H = tuple(map(int, nnConfig.get("input_size").split('x')))

# extract metadata
metadata = nnConfig.get("NN_specific_metadata", {})
classes = metadata.get("classes", {})
coordinates = metadata.get("coordinates", {})
anchors = metadata.get("anchors", {})
anchorMasks = metadata.get("anchor_masks", {})
iouThreshold = metadata.get("iou_threshold", {})
confidenceThreshold = metadata.get("confidence_threshold", {})

#print(metadata)

# parse labels
nnMappings = config.get("mappings", {})
labels = nnMappings.get("labels", {})
#print("Labels: ", labels)

# get model path
nnPath = args.model
if not Path(nnPath).exists():
    print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
    nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))

# sync outputs
syncNN = True

# Depth and Disparity
outRectified = False  # Output and display rectified streams
lrcheck = True  # Better handling for occlusions
extended = False  # Closer-in minimum depth, disparity range is doubled
subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
depth = True  # Display depth frames
swap_left_right = False #Swap left right frames
alpha = None
resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P

median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
#xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)


# By Yishu
xoutISP = pipeline.create(dai.node.XLinkOut)
manip = pipeline.create(dai.node.ImageManip)
xoutManip = pipeline.create(dai.node.XLinkOut)

# Depth and Disparity
camLeft = pipeline.create(dai.node.MonoCamera)
camRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
xoutLeft = pipeline.create(dai.node.XLinkOut)
xoutRight = pipeline.create(dai.node.XLinkOut)
xoutDisparity = pipeline.create(dai.node.XLinkOut)
xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutRectifLeft = pipeline.create(dai.node.XLinkOut)
xoutRectifRight = pipeline.create(dai.node.XLinkOut)

# Sync node
sync = pipeline.create(dai.node.Sync)
xoutGrp = pipeline.create(dai.node.XLinkOut)
xoutGrp.setStreamName("xout")

#xoutRgb.setStreamName("rgb")
xoutISP.setStreamName("ISP")
nnOut.setStreamName("nn")
xoutManip.setStreamName("Manip")

xoutLeft.setStreamName("left")
xoutRight.setStreamName("right")
xoutDisparity.setStreamName("disparity")
xoutDepth.setStreamName("depth")
xoutRectifLeft.setStreamName("rectifiedLeft")
xoutRectifRight.setStreamName("rectifiedRight")

# Properties
camRgb.setPreviewSize(W, H)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP) #THE_1080_P
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
camRgb.setFps(25) #40

camRgb.setIspNumFramesPool(10)

# By Yishu
manip.initialConfig.setKeepAspectRatio(False) #True
manip.initialConfig.setResize(W, H)

# Change to RGB image than BGR - Yishu
manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p

# setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
manip.setMaxOutputFrameSize(1228800)

# Sync node properties
sync.setSyncThreshold(timedelta(milliseconds=50))

# By Yishu
nnOut.input.setBlocking(False)
xoutISP.input.setBlocking(False)
xoutManip.input.setBlocking(False)

# By Yishu
nnOut.input.setQueueSize(10)
xoutISP.input.setQueueSize(10)
xoutManip.input.setQueueSize(10)
spatialdetectionNetwork.input.setQueueSize(10)

# Network specific settings
spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
spatialdetectionNetwork.setNumClasses(classes)
spatialdetectionNetwork.setCoordinateSize(coordinates)
spatialdetectionNetwork.setAnchors(anchors)
spatialdetectionNetwork.setAnchorMasks(anchorMasks)
spatialdetectionNetwork.setIouThreshold(iouThreshold)
spatialdetectionNetwork.setBlobPath(nnPath)
spatialdetectionNetwork.setNumInferenceThreads(2)
spatialdetectionNetwork.input.setBlocking(False)

# SpatialYolo specific parameters
spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
spatialdetectionNetwork.setDepthLowerThreshold(100)
spatialdetectionNetwork.setDepthUpperThreshold(5000)

# Depth and Disparity Properties
if swap_left_right:
    camLeft.setCamera("right")
    camRight.setCamera("left")
else:
    camLeft.setCamera("left")
    camRight.setCamera("right")

for monoCam in (camLeft, camRight):  # Common config
    monoCam.setResolution(resolution)
    monoCam.setFps(2)

stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
stereo.setLeftRightCheck(lrcheck)
stereo.setExtendedDisparity(extended)
stereo.setSubpixel(subpixel)
if alpha is not None:
    stereo.setAlphaScaling(alpha)
    config = stereo.initialConfig.get()
    config.postProcessing.brightnessFilter.minBrightness = 0
    stereo.initialConfig.set(config)

# Alignment with RGB - Ahmad
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)

# # 4056x3040
stereo.setOutputSize(1248, 936)

# Linking
camRgb.isp.link(manip.inputImage)
manip.out.link(spatialdetectionNetwork.input)

# By Yishu
manip.out.link(xoutManip.input)

# Linking stereo depth as an input to the spatial detection network
stereo.depth.link(spatialdetectionNetwork.inputDepth)

spatialdetectionNetwork.out.link(nnOut.input)
camRgb.isp.link(xoutISP.input)

# Syncing NN and ISP
spatialdetectionNetwork.out.link(sync.inputs["NN_Sync"])
camRgb.isp.link(sync.inputs["ISP_Sync"])
sync.out.link(xoutGrp.input)

# Depth and Disparity Linking
camLeft.out.link(stereo.left)
camRight.out.link(stereo.right)
stereo.syncedLeft.link(xoutLeft.input)
stereo.syncedRight.link(xoutRight.input)
stereo.disparity.link(xoutDisparity.input)
if depth:
    stereo.depth.link(xoutDepth.input)
if outRectified:
    stereo.rectifiedLeft.link(xoutRectifLeft.input)
    stereo.rectifiedRight.link(xoutRectifRight.input)


streams = ["left", "right", "ISP","nn"] #Change by ahmad
if outRectified:
    streams.extend(["rectifiedLeft", "rectifiedRight"])
streams.append("disparity")
if depth:
    streams.append("depth")

cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
cvColorMap[0] = [0, 0, 0]

# TakeUp Camera Static IP address
device_info = dai.DeviceInfo("192.168.220.10")

try:
    # Connect to device and start pipeline
    with dai.Device(pipeline, device_info) as device:
        # Output queues will be used to get the rgb frames and nn data from the outputs defined above
        qDet = device.getOutputQueue("nn", 4, blocking=False)
        qISP = device.getOutputQueue("ISP", 4, blocking=False)
        qManip = device.getOutputQueue("Manip", 4, blocking=False)

        qDisparity = device.getOutputQueue("disparity", 1, blocking=False)
        qDepth = device.getOutputQueue("depth", 1, blocking=False)
        qLeft = device.getOutputQueue("left", 1, blocking=False)
        qRight = device.getOutputQueue("right", 1, blocking=False)

        qSync = device.getOutputQueue("xout", 4, False)
        msgGrp = None

        frame = None
        m_frame = None

        Disparityframe = None
        Depthframe = None
        LeftFrame = None
        RightFrame = None

        detections = []
        startTime = time.monotonic()
        counter = 0
        color2 = (255, 255, 255)

        # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
        def frameNorm(frame, bbox):
            normVals = np.full(len(bbox), frame.shape[0])
            normVals[::2] = frame.shape[1]
            print(np.clip(np.array(bbox), 0, 1))
            print(normVals)
            print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
            return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

        def displayFrame(name, frame, detections, i):
            color_spool = (255, 0, 0)
            color_person = (0, 255, 0)
            color = ''
            color_blank_img = (255, 255, 255)
            blank_image = np.zeros((3040,4056,1), np.uint8)
            text_output_path = "D:\\Cameras_Live\\TakeUp\\" + "Label{}".format(j) + ".txt"
            #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
            #text_file = open(text_output_path, 'w')
            for detection in detections:
                bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                if labels[detection.label] == 'person':
                    color = color_person
                else:
                    color = color_spool
                cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                
            
            cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
            cv2.namedWindow(name, cv2.WINDOW_NORMAL)
            cv2.imshow(name, frame)
            print("show complete")
            

        
        j=1
        ###Create the folder that will contain capturing
        folder_name = "D:\\Cameras_Live\\TakeUp"
        path = Path(folder_name)

        syncframe = None
        sync_detections = None

        while True:

            ###Create the folder that will contain capturing
            path.mkdir(parents=True, exist_ok=True)

            # Delay to match DA reading fps
            #time.sleep(4)
            print("0")
            msgGrp = qSync.get()
            print("1")
            # By Yishu
            inISP = qISP.get()
            print("2: ", inISP)
            inManip = qManip.get()
            print("3: ", inManip)
            inDet = qDet.get()
            print("4: ", inDet)
            inLeft = qLeft.get()
            print("5")
            inRight = qRight.get()
            print("6")
            inDepth = qDepth.get()
            print("7")
            inDisparity = qDisparity.get()
            print("8")
            

            if msgGrp is not None:
                print('msgGrp: ', msgGrp)
                for name, msg in msgGrp:
                    if name == "ISP_Sync":
                        syncframe = msg.getCvFrame()
                        print("syncframe: ")
                    if name == "NN_Sync":
                        sync_detections = msg.detections
                        print("sync_detections: ")
                    if syncframe is not None and sync_detections is not None:
                        print("Both sync not none")
                        displayFrame("sync_isp", syncframe, sync_detections, j)
                        print("display_complete")
                    
            if inManip is not None:
                m_frame = inManip.getCvFrame()
            
            print("9")
            
            if inLeft is not None:
                LeftFrame = inLeft.getCvFrame()
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Left{}.bmp".format(j),LeftFrame)
            
            print("10")
            if inRight is not None:
                RightFrame = inRight.getCvFrame()
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Right{}.bmp".format(j),RightFrame)

            print("11")
            if inDepth is not None:
                Depthframe = inDepth.getCvFrame()
                Depthframe = Depthframe.astype(np.uint16)
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Depth{}.png".format(j),Depthframe)

            print("12")
            if inDisparity is not None:
                Disparityframe = inDisparity.getCvFrame()
                Disparityframe = getDisparityFrame(Disparityframe, cvColorMap)
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Disparity{}.bmp".format(j),Disparityframe)
            
            print("12")
            if cv2.waitKey(1) == ord('q'):
                break

            j=j+1
            '''if j==5:
                j=1'''
        
except Exception as e:
        print(f"An error occurred: {e}")
        sys.exit(0)

Thanks
Yishu

jakaskerl · Jul 4, 2024

Hi @yishu_corpex
I removed the sync node since it was causing issues; didn't change anything else though but the RGB and NN seem to be in sync:


import cv2
import depthai as dai
import numpy as np
import argparse
import json
import time
import sys
from pathlib import Path
import blobconverter
from datetime import timedelta




def getDisparityFrame(frame, cvColorMap):
    maxDisp = stereo.initialConfig.getMaxDisparity()
    disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
    disp = cv2.applyColorMap(disp, cvColorMap)

    return disp

# parse arguments 
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                    default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
parser.add_argument("-c", "--config", help="Provide config path for inference",
                    default='models/best_ckpt.json', type=str)
args = parser.parse_args()


# parse config
configPath = Path(args.config)
if not configPath.exists():
    raise ValueError("Path {} does not exist!".format(configPath))

with configPath.open() as f:
    config = json.load(f)
nnConfig = config.get("nn_config", {})

# parse input shape
if "input_size" in nnConfig:
    W, H = tuple(map(int, nnConfig.get("input_size").split('x')))

# extract metadata
metadata = nnConfig.get("NN_specific_metadata", {})
classes = metadata.get("classes", {})
coordinates = metadata.get("coordinates", {})
anchors = metadata.get("anchors", {})
anchorMasks = metadata.get("anchor_masks", {})
iouThreshold = metadata.get("iou_threshold", {})
confidenceThreshold = metadata.get("confidence_threshold", {})

#print(metadata)

# parse labels
nnMappings = config.get("mappings", {})
labels = nnMappings.get("labels", {})
#print("Labels: ", labels)

# get model path
nnPath = args.model
if not Path(nnPath).exists():
    print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
    nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))

# sync outputs
syncNN = True

# Depth and Disparity
outRectified = False  # Output and display rectified streams
lrcheck = True  # Better handling for occlusions
extended = False  # Closer-in minimum depth, disparity range is doubled
subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
depth = True  # Display depth frames
swap_left_right = False #Swap left right frames
alpha = None
resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P

median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
#xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)


# By Yishu
xoutISP = pipeline.create(dai.node.XLinkOut)
manip = pipeline.create(dai.node.ImageManip)
xoutManip = pipeline.create(dai.node.XLinkOut)

# Depth and Disparity
camLeft = pipeline.create(dai.node.MonoCamera)
camRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
xoutLeft = pipeline.create(dai.node.XLinkOut)
xoutRight = pipeline.create(dai.node.XLinkOut)
xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutRectifLeft = pipeline.create(dai.node.XLinkOut)
xoutRectifRight = pipeline.create(dai.node.XLinkOut)

# Sync node


#xoutRgb.setStreamName("rgb")
xoutISP.setStreamName("ISP")
nnOut.setStreamName("nn")
xoutManip.setStreamName("Manip")

xoutLeft.setStreamName("left")
xoutRight.setStreamName("right")
xoutDepth.setStreamName("depth")
xoutRectifLeft.setStreamName("rectifiedLeft")
xoutRectifRight.setStreamName("rectifiedRight")

# Properties
camRgb.setPreviewSize(W, H)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P) #THE_1080_P
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
camRgb.setFps(30) #40

camRgb.setIspNumFramesPool(10)

# By Yishu
manip.initialConfig.setKeepAspectRatio(False) #True
manip.initialConfig.setResize(W, H)

# Change to RGB image than BGR - Yishu
manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p

# setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
manip.setMaxOutputFrameSize(1228800)


# By Yishu
nnOut.input.setBlocking(False)
xoutISP.input.setBlocking(False)
xoutManip.input.setBlocking(False)

# By Yishu
nnOut.input.setQueueSize(10)
xoutISP.input.setQueueSize(10)
xoutManip.input.setQueueSize(10)
spatialdetectionNetwork.input.setQueueSize(10)

# Network specific settings
spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
spatialdetectionNetwork.setNumClasses(classes)
spatialdetectionNetwork.setCoordinateSize(coordinates)
spatialdetectionNetwork.setAnchors(anchors)
spatialdetectionNetwork.setAnchorMasks(anchorMasks)
spatialdetectionNetwork.setIouThreshold(iouThreshold)
spatialdetectionNetwork.setBlobPath(nnPath)
spatialdetectionNetwork.setNumInferenceThreads(2)
spatialdetectionNetwork.input.setBlocking(False)

# SpatialYolo specific parameters
spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
spatialdetectionNetwork.setDepthLowerThreshold(100)
spatialdetectionNetwork.setDepthUpperThreshold(5000)

# Depth and Disparity Properties
if swap_left_right:
    camLeft.setCamera("right")
    camRight.setCamera("left")
else:
    camLeft.setCamera("left")
    camRight.setCamera("right")

for monoCam in (camLeft, camRight):  # Common config
    monoCam.setResolution(resolution)
    monoCam.setFps(2)

stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
stereo.setLeftRightCheck(lrcheck)
stereo.setExtendedDisparity(extended)
stereo.setSubpixel(subpixel)
if alpha is not None:
    stereo.setAlphaScaling(alpha)
    config = stereo.initialConfig.get()
    config.postProcessing.brightnessFilter.minBrightness = 0
    stereo.initialConfig.set(config)

# Alignment with RGB - Ahmad
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)

# # 4056x3040
stereo.setOutputSize(1248, 936)

# Linking
camRgb.video.link(manip.inputImage)
manip.out.link(spatialdetectionNetwork.input)

# By Yishu
manip.out.link(xoutManip.input)

# Linking stereo depth as an input to the spatial detection network
stereo.depth.link(spatialdetectionNetwork.inputDepth)

spatialdetectionNetwork.out.link(nnOut.input)
camRgb.isp.link(xoutISP.input)

# Depth and Disparity Linking
camLeft.out.link(stereo.left)
camRight.out.link(stereo.right)
stereo.syncedLeft.link(xoutLeft.input)
stereo.syncedRight.link(xoutRight.input)
if depth:
    stereo.depth.link(xoutDepth.input)
if outRectified:
    stereo.rectifiedLeft.link(xoutRectifLeft.input)
    stereo.rectifiedRight.link(xoutRectifRight.input)

# nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
def frameNorm(frame, bbox):
    normVals = np.full(len(bbox), frame.shape[0])
    normVals[::2] = frame.shape[1]
    print(np.clip(np.array(bbox), 0, 1))
    print(normVals)
    print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
    return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

def displayFrame(name, frame, detections, i):
    color_spool = (255, 0, 0)
    color_person = (0, 255, 0)
    color = ''
    
    #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
    #text_file = open(text_output_path, 'w')
    for detection in detections:
        bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
        if labels[detection.label] == 'person':
            color = color_person
        else:
            color = color_spool
        cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
        cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
        cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
        
    
    cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
    cv2.namedWindow(name, cv2.WINDOW_NORMAL)
    cv2.imshow(name, frame)
    print("show complete")


streams = ["left", "right", "ISP","nn"] #Change by ahmad
if outRectified:
    streams.extend(["rectifiedLeft", "rectifiedRight"])
if depth:
    streams.append("depth")

cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
cvColorMap[0] = [0, 0, 0]

# TakeUp Camera Static IP address
device_info = dai.DeviceInfo()

try:
    # Connect to device and start pipeline
    with dai.Device(pipeline, device_info) as device:
        # Output queues will be used to get the rgb frames and nn data from the outputs defined above
        qDet = device.getOutputQueue("nn", 4, blocking=False)
        qISP = device.getOutputQueue("ISP", 4, blocking=False)
        qManip = device.getOutputQueue("Manip", 4, blocking=False)

        qDepth = device.getOutputQueue("depth", 1, blocking=False)
        qLeft = device.getOutputQueue("left", 1, blocking=False)
        qRight = device.getOutputQueue("right", 1, blocking=False)

        frame = None
        m_frame = None
        Depthframe = None
        LeftFrame = None
        RightFrame = None

        detections = []
        startTime = time.monotonic()
        counter = 0
        color2 = (255, 255, 255)

        # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
        def frameNorm(frame, bbox):
            normVals = np.full(len(bbox), frame.shape[0])
            normVals[::2] = frame.shape[1]
            print(np.clip(np.array(bbox), 0, 1))
            print(normVals)
            print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
            return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

        def displayFrame(name, frame, detections, i):
            color_spool = (255, 0, 0)
            color_person = (0, 255, 0)
            color = ''
            color_blank_img = (255, 255, 255)
            blank_image = np.zeros((3040,4056,1), np.uint8)
            text_output_path = "D:\\Cameras_Live\\TakeUp\\" + "Label{}".format(j) + ".txt"
            #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
            #text_file = open(text_output_path, 'w')
            for detection in detections:
                bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                if labels[detection.label] == 'person':
                    color = color_person
                else:
                    color = color_spool
                cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                
            
            cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
            cv2.namedWindow(name, cv2.WINDOW_NORMAL)
            cv2.imshow(name, frame)
            print("show complete")
            

        
        j=1
        ###Create the folder that will contain capturing
        folder_name = "D:\\Cameras_Live\\TakeUp"
        path = Path(folder_name)

        syncframe = None
        sync_detections = None

        while True:

            ###Create the folder that will contain capturing
            

            # Delay to match DA reading fps
            #time.sleep(4)
            import time

            start_time = time.time()

            # Print and reset timing
            end_time = time.time()
            print("1 - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            # By Yishu
            inISP = qISP.get()

            end_time = time.time()
            print("2: ", inISP, " - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            inManip = qManip.get()

            end_time = time.time()
            print("3: ", inManip, " - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            inDet = qDet.get()

            end_time = time.time()
            print("4: ", inDet, " - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            inLeft = qLeft.get()

            end_time = time.time()
            print("5 - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            inRight = qRight.get()

            end_time = time.time()
            print("6 - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            inDepth = qDepth.get()

            end_time = time.time()
            print("7 - Time taken: {:.6f} seconds".format(end_time - start_time))
            start_time = end_time

            displayFrame("Color", inISP.getCvFrame(), inDet.detections, j)

    

            end_time = time.time()
            print("8 - Time taken: {:.6f} seconds".format(end_time - start_time))

            
            if inManip is not None:
                m_frame = inManip.getCvFrame()
            
            print("9")
            
            if inLeft is not None:
                LeftFrame = inLeft.getCvFrame()
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Left{}.bmp".format(j),LeftFrame)
            
            print("10")
            if inRight is not None:
                RightFrame = inRight.getCvFrame()
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Right{}.bmp".format(j),RightFrame)

            print("11")
            if inDepth is not None:
                Depthframe = inDepth.getCvFrame()
                Depthframe = Depthframe.astype(np.uint16)
                #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Depth{}.png".format(j),Depthframe)

            print("12")
            if cv2.waitKey(1) == ord('q'):
                break

            j=j+1
            '''if j==5:
                j=1'''
        
except Exception as e:
        print(f"An error occurred: {e}")
        sys.exit(0)

Yyishu_corpex · Jul 4, 2024

Hi @jakaskerl

I would say the RGB (ISP frame) and NN are almost in sync. I do see the output bounding boxes slightly out of sync in case of a moving objects or person. I'm not sure why is that happening though.

Also, may I know what issues and why are they being caused by the sync node?

Also when I calculate the NN FPS using the following line of code similar to other DepthAI examples, I do see it does maxes out at 1.9 FPS.

# startTime_nn is initiated before the while true loop

if inDet is not None:
   counter += 1

displayFrame("Color", inISP.getCvFrame(), inDet.detections, j)
nn_fps = counter / (time.monotonic() - startTime_nn)
print("nn_fps: ", nn_fps)

I wanted to ask you if you can look into the script and suggest any changes to improve over that given we are using the 12MP resolution. I know when I reduce the resolution to 1080 P the NN FPS does bump up to around 8-10 FPS.

Thanks
Yishu

jakaskerl · Jul 8, 2024

Hi @yishu_corpex
Seems like the Xlink was stressed out, so I removed unsynced outputs, I also changed all inputs to non-blocking. I can get about 10FPS whole.

PRO-TIP: use the pipeline_graph tool, it comes in handy in case of pipeline failures.


import cv2
import depthai as dai
import numpy as np
import argparse
import json
import time
import sys
from pathlib import Path
import blobconverter
from datetime import timedelta


def getDisparityFrame(frame, cvColorMap):
    maxDisp = stereo.initialConfig.getMaxDisparity()
    disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
    disp = cv2.applyColorMap(disp, cvColorMap)

    return disp

# parse arguments 
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                    default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
parser.add_argument("-c", "--config", help="Provide config path for inference",
                    default='models/best_ckpt.json', type=str)
args = parser.parse_args()


# parse config
configPath = Path(args.config)
if not configPath.exists():
    raise ValueError("Path {} does not exist!".format(configPath))

with configPath.open() as f:
    config = json.load(f)
nnConfig = config.get("nn_config", {})

# parse input shape
if "input_size" in nnConfig:
    W, H = tuple(map(int, nnConfig.get("input_size").split('x')))

# extract metadata
metadata = nnConfig.get("NN_specific_metadata", {})
classes = metadata.get("classes", {})
coordinates = metadata.get("coordinates", {})
anchors = metadata.get("anchors", {})
anchorMasks = metadata.get("anchor_masks", {})
iouThreshold = metadata.get("iou_threshold", {})
confidenceThreshold = metadata.get("confidence_threshold", {})

#print(metadata)

# parse labels
nnMappings = config.get("mappings", {})
labels = nnMappings.get("labels", {})
#print("Labels: ", labels)

# get model path
nnPath = args.model
if not Path(nnPath).exists():
    print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
    nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))

# sync outputs
syncNN = True

# Depth and Disparity
outRectified = False  # Output and display rectified streams
lrcheck = True  # Better handling for occlusions
extended = False  # Closer-in minimum depth, disparity range is doubled
subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
depth = True  # Display depth frames
swap_left_right = False #Swap left right frames
alpha = None
resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P

median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
#xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)



manip = pipeline.create(dai.node.ImageManip)

# Depth and Disparity
camLeft = pipeline.create(dai.node.MonoCamera)
camRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)



# Sync node
sync = pipeline.create(dai.node.Sync)
xoutGrp = pipeline.create(dai.node.XLinkOut)
xoutGrp.setStreamName("xout")

#xoutRgb.setStreamName("rgb")

nnOut.setStreamName("nn")

# Properties
camRgb.setPreviewSize(W, H)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_4_K) #THE_1080_P
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
camRgb.setFps(15) #40


# By Yishu
manip.initialConfig.setKeepAspectRatio(False) #True
manip.initialConfig.setResize(W, H)

# Change to RGB image than BGR - Yishu
manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p

# setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
manip.setMaxOutputFrameSize(1228800)

# Sync node properties
sync.setSyncThreshold(timedelta(milliseconds=100))

# By Yishu
nnOut.input.setBlocking(False)


# Network specific settings
spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
spatialdetectionNetwork.setNumClasses(classes)
spatialdetectionNetwork.setCoordinateSize(coordinates)
spatialdetectionNetwork.setAnchors(anchors)
spatialdetectionNetwork.setAnchorMasks(anchorMasks)
spatialdetectionNetwork.setIouThreshold(iouThreshold)
spatialdetectionNetwork.setBlobPath(nnPath)
spatialdetectionNetwork.setNumInferenceThreads(2)
spatialdetectionNetwork.input.setBlocking(False)

# SpatialYolo specific parameters
spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
spatialdetectionNetwork.setDepthLowerThreshold(100)
spatialdetectionNetwork.setDepthUpperThreshold(5000)

# Depth and Disparity Properties
if swap_left_right:
    camLeft.setCamera("right")
    camRight.setCamera("left")
else:
    camLeft.setCamera("left")
    camRight.setCamera("right")

for monoCam in (camLeft, camRight):  # Common config
    monoCam.setResolution(resolution)
    monoCam.setFps(15)

stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
stereo.setLeftRightCheck(lrcheck)
stereo.setExtendedDisparity(extended)
stereo.setSubpixel(subpixel)
if alpha is not None:
    stereo.setAlphaScaling(alpha)
    config = stereo.initialConfig.get()
    config.postProcessing.brightnessFilter.minBrightness = 0
    stereo.initialConfig.set(config)

# Alignment with RGB - Ahmad
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)

# # 4056x3040
stereo.setOutputSize(1248, 936)

# Linking
camRgb.isp.link(manip.inputImage)
manip.inputImage.setBlocking(False)
manip.inputImage.setQueueSize(10)
manip.out.link(spatialdetectionNetwork.input)

# By Yishu

# Linking stereo depth as an input to the spatial detection network
stereo.depth.link(spatialdetectionNetwork.inputDepth)

spatialdetectionNetwork.out.link(nnOut.input)

# Syncing NN and ISP
spatialdetectionNetwork.out.link(sync.inputs["NN_Sync"])
camRgb.isp.link(sync.inputs["ISP_Sync"])

sync.inputs["NN_Sync"].setBlocking(False)
sync.inputs["ISP_Sync"].setBlocking(False)

sync.out.link(xoutGrp.input)
xoutGrp.input.setBlocking(False)

# Depth and Disparity Linking
camLeft.out.link(stereo.left)
camRight.out.link(stereo.right)




streams = ["left", "right", "ISP","nn"] #Change by ahmad
if outRectified:
    streams.extend(["rectifiedLeft", "rectifiedRight"])
streams.append("disparity")
if depth:
    streams.append("depth")

cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
cvColorMap[0] = [0, 0, 0]

# TakeUp Camera Static IP address
device_info = dai.DeviceInfo("")

try:
    # Connect to device and start pipeline
    with dai.Device(pipeline, device_info) as device:
        # Output queues will be used to get the rgb frames and nn data from the outputs defined above
        qDet = device.getOutputQueue("nn", 4, blocking=False)

        qSync = device.getOutputQueue("xout", 4, False)
        msgGrp = None

        frame = None
        m_frame = None

        Disparityframe = None
        Depthframe = None
        LeftFrame = None
        RightFrame = None

        detections = []
        startTime = time.monotonic()
        counter = 0
        color2 = (255, 255, 255)

        # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
        def frameNorm(frame, bbox):
            normVals = np.full(len(bbox), frame.shape[0])
            normVals[::2] = frame.shape[1]
            print(np.clip(np.array(bbox), 0, 1))
            print(normVals)
            print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
            return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

        def displayFrame(name, frame, detections, i):
            color_spool = (255, 0, 0)
            color_person = (0, 255, 0)
            color = ''
            
            for detection in detections:
                bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                if labels[detection.label] == 'person':
                    color = color_person
                else:
                    color = color_spool
                cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                
            cv2.namedWindow(name, cv2.WINDOW_NORMAL)
            cv2.imshow(name, frame)
            print("show complete")
            

        
        j=1
        ###Create the folder that will contain capturing

        syncframe = None
        sync_detections = None

        while True:
            # Delay to match DA reading fps
            #time.sleep(4)
            print("0")
            msgGrp = qSync.get()
            print("1")
            inDet = qDet.get()
            print("4: ", inDet)
            

            if msgGrp is not None:
                print('msgGrp: ', msgGrp)
                for name, msg in msgGrp:
                    if name == "ISP_Sync":
                        syncframe = msg.getCvFrame()
                        print("syncframe: ")
                    if name == "NN_Sync":
                        sync_detections = msg.detections
                        print("sync_detections: ")
                    if syncframe is not None and sync_detections is not None:
                        print("Both sync not none")
                        displayFrame("sync_isp", syncframe, sync_detections, j)
                        print("display_complete")
                    
            
            print("9")
            print("12")
            if cv2.waitKey(1) == ord('q'):
                break

            j=j+1
            '''if j==5:
                j=1'''
        
except Exception as e:
        print(f"An error occurred: {e}")
        sys.exit(0)

Thanks,
Jaka

Yyishu_corpex · Jul 8, 2024

Hi @jakaskerl

Thanks for your response. I did use the pipeline graph earlier. However faced this pipeline_graph issue and wasn't aware if it has been resolved yet.

I did try your updated script, it seems to work well with 4K resolution, however it seems to get stuck after a few frames on 12MP resolution which we have been using lately. Is it because 12MP resolution is using a lot of resources on board?

Thanks
Yishu

jakaskerl · Jul 9, 2024

Hi @yishu_corpex
I tried with 12MP and the pipeline doesn't crash. Are you using a POE device by any chance (or a low bandwidth USB)? You can check the pipeline graph how the FPS slowly starts do decline because the XLink nodes can't send frames fast enough.

Thanks,
Jaka

Yyishu_corpex · Jul 9, 2024

Hi @jakaskerl

I tried with a 3D USB camera and the pipeline does not crash.

However, for our deployment we will be using a 3D PoE device and when I test on the PoE device, the pipeline crashes. I can see the FPS slowly declining and eventually becoming 0 across nodes as shown below.

So does it mean that because we are using a PoE device the Xlink nodes can't send frames fast enough because of the bottleneck of PoE and thus eventually the pipeline crashes?

Also I see eventually the FPS dropping to 0 across all the nodes. why does that happen?

Thanks
Yishu

jakaskerl · Jul 11, 2024

yishu_corpex So does it mean that because we are using a PoE device the Xlink nodes can't send frames fast enough because of the bottleneck of PoE and thus eventually the pipeline crashes?

Yes, you are sending 12MP video to host which is too much for that 1gbps bandwidth to handle. Any chance you can lower the resolution? If FOV is a concern there might be some workarounds.

yishu_corpex Also I see eventually the FPS dropping to 0 across all the nodes. why does that happen?

It the queues slowly fill up, because they are not read quickly enough so the entire pipeline backfeeds and blocks.

Thanks,
Jaka