Hi yishu_corpex
You are probably sending a separate RGB stream to host, which is not synced with NN passthrough.

I suggest you either use the Sync node and link NN out and RGB out; if you need full resolution frames on the host side.
The other way would be to send passthrough to the host, for which you don't need any syncing, but the frames will be smaller.

Thanks,
Jaka

4 days later

Hi @jakaskerl

Yes, you are correct. I'm sending the ISP stream to the host, which doesn't seem to be synced with the NN passthrough.

I did try using the sync node and link the NN and ISP out as suggested by you. However, the sync node output queue does get stuck right after the first frame. It freezes which I guess is a known behavior when frames aren't disposed off as per other posts like frame syncing hangs.

I did experiment with using the setIspNumFramesPool(...) to increase the pool size, however saw the same behavior of the sync node output queue freezing and the the script is stuck/hung on it.

Let me know how can this be resolved and of you want I can share with you the MRE for this issue so you can test it at your end as well.

Thanks
Yishu

    Hi @yishu_corpex

    yishu_corpex Let me know how can this be resolved and of you want I can share with you the MRE for this issue so you can test it at your end as well.

    Yes, this would be best so I can quickly just alter the code.

    Thanks,
    Jaka

    Hi @jakaskerl

    Please find the MRE below.

    So I am using our own custom Yolov6n trained model. I have shared with you over email the model blob and json file.

    You will require the following libraries to run the script:

    1. OpenCv Python

    2. DepthAI

    3. blobconverter

    4. numpy

    5. datetime

    You will see a lot of print statements in the code. They are just to help debug where exactly the script gets stuck. Also, put the model and json file paths in the argument parser lines as the default path. The script will itself pick that default path when you run it from VS code.

    def getDisparityFrame(frame, cvColorMap):
        maxDisp = stereo.initialConfig.getMaxDisparity()
        disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
        disp = cv2.applyColorMap(disp, cvColorMap)
    
        return disp
    
    # parse arguments 
    parser = argparse.ArgumentParser()
    parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                        default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
    parser.add_argument("-c", "--config", help="Provide config path for inference",
                        default='models/best_ckpt.json', type=str)
    args = parser.parse_args()
    
    
    # parse config
    configPath = Path(args.config)
    if not configPath.exists():
        raise ValueError("Path {} does not exist!".format(configPath))
    
    with configPath.open() as f:
        config = json.load(f)
    nnConfig = config.get("nn_config", {})
    
    # parse input shape
    if "input_size" in nnConfig:
        W, H = tuple(map(int, nnConfig.get("input_size").split('x')))
    
    # extract metadata
    metadata = nnConfig.get("NN_specific_metadata", {})
    classes = metadata.get("classes", {})
    coordinates = metadata.get("coordinates", {})
    anchors = metadata.get("anchors", {})
    anchorMasks = metadata.get("anchor_masks", {})
    iouThreshold = metadata.get("iou_threshold", {})
    confidenceThreshold = metadata.get("confidence_threshold", {})
    
    #print(metadata)
    
    # parse labels
    nnMappings = config.get("mappings", {})
    labels = nnMappings.get("labels", {})
    #print("Labels: ", labels)
    
    # get model path
    nnPath = args.model
    if not Path(nnPath).exists():
        print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
        nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
    
    # sync outputs
    syncNN = True
    
    # Depth and Disparity
    outRectified = False  # Output and display rectified streams
    lrcheck = True  # Better handling for occlusions
    extended = False  # Closer-in minimum depth, disparity range is doubled
    subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
    depth = True  # Display depth frames
    swap_left_right = False #Swap left right frames
    alpha = None
    resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P
    
    median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7
    
    # Create pipeline
    pipeline = dai.Pipeline()
    
    # Define sources and outputs
    camRgb = pipeline.create(dai.node.ColorCamera)
    spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
    #xoutRgb = pipeline.create(dai.node.XLinkOut)
    nnOut = pipeline.create(dai.node.XLinkOut)
    
    
    # By Yishu
    xoutISP = pipeline.create(dai.node.XLinkOut)
    manip = pipeline.create(dai.node.ImageManip)
    xoutManip = pipeline.create(dai.node.XLinkOut)
    
    # Depth and Disparity
    camLeft = pipeline.create(dai.node.MonoCamera)
    camRight = pipeline.create(dai.node.MonoCamera)
    stereo = pipeline.create(dai.node.StereoDepth)
    xoutLeft = pipeline.create(dai.node.XLinkOut)
    xoutRight = pipeline.create(dai.node.XLinkOut)
    xoutDisparity = pipeline.create(dai.node.XLinkOut)
    xoutDepth = pipeline.create(dai.node.XLinkOut)
    xoutRectifLeft = pipeline.create(dai.node.XLinkOut)
    xoutRectifRight = pipeline.create(dai.node.XLinkOut)
    
    # Sync node
    sync = pipeline.create(dai.node.Sync)
    xoutGrp = pipeline.create(dai.node.XLinkOut)
    xoutGrp.setStreamName("xout")
    
    #xoutRgb.setStreamName("rgb")
    xoutISP.setStreamName("ISP")
    nnOut.setStreamName("nn")
    xoutManip.setStreamName("Manip")
    
    xoutLeft.setStreamName("left")
    xoutRight.setStreamName("right")
    xoutDisparity.setStreamName("disparity")
    xoutDepth.setStreamName("depth")
    xoutRectifLeft.setStreamName("rectifiedLeft")
    xoutRectifRight.setStreamName("rectifiedRight")
    
    # Properties
    camRgb.setPreviewSize(W, H)
    
    camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP) #THE_1080_P
    camRgb.setInterleaved(False)
    camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
    camRgb.setFps(25) #40
    
    camRgb.setIspNumFramesPool(10)
    
    # By Yishu
    manip.initialConfig.setKeepAspectRatio(False) #True
    manip.initialConfig.setResize(W, H)
    
    # Change to RGB image than BGR - Yishu
    manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p
    
    # setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
    manip.setMaxOutputFrameSize(1228800)
    
    # Sync node properties
    sync.setSyncThreshold(timedelta(milliseconds=50))
    
    # By Yishu
    nnOut.input.setBlocking(False)
    xoutISP.input.setBlocking(False)
    xoutManip.input.setBlocking(False)
    
    # By Yishu
    nnOut.input.setQueueSize(10)
    xoutISP.input.setQueueSize(10)
    xoutManip.input.setQueueSize(10)
    spatialdetectionNetwork.input.setQueueSize(10)
    
    # Network specific settings
    spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
    spatialdetectionNetwork.setNumClasses(classes)
    spatialdetectionNetwork.setCoordinateSize(coordinates)
    spatialdetectionNetwork.setAnchors(anchors)
    spatialdetectionNetwork.setAnchorMasks(anchorMasks)
    spatialdetectionNetwork.setIouThreshold(iouThreshold)
    spatialdetectionNetwork.setBlobPath(nnPath)
    spatialdetectionNetwork.setNumInferenceThreads(2)
    spatialdetectionNetwork.input.setBlocking(False)
    
    # SpatialYolo specific parameters
    spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
    spatialdetectionNetwork.setDepthLowerThreshold(100)
    spatialdetectionNetwork.setDepthUpperThreshold(5000)
    
    # Depth and Disparity Properties
    if swap_left_right:
        camLeft.setCamera("right")
        camRight.setCamera("left")
    else:
        camLeft.setCamera("left")
        camRight.setCamera("right")
    
    for monoCam in (camLeft, camRight):  # Common config
        monoCam.setResolution(resolution)
        monoCam.setFps(2)
    
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
    stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
    stereo.setLeftRightCheck(lrcheck)
    stereo.setExtendedDisparity(extended)
    stereo.setSubpixel(subpixel)
    if alpha is not None:
        stereo.setAlphaScaling(alpha)
        config = stereo.initialConfig.get()
        config.postProcessing.brightnessFilter.minBrightness = 0
        stereo.initialConfig.set(config)
    
    # Alignment with RGB - Ahmad
    stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
    
    # # 4056x3040
    stereo.setOutputSize(1248, 936)
    
    # Linking
    camRgb.isp.link(manip.inputImage)
    manip.out.link(spatialdetectionNetwork.input)
    
    # By Yishu
    manip.out.link(xoutManip.input)
    
    # Linking stereo depth as an input to the spatial detection network
    stereo.depth.link(spatialdetectionNetwork.inputDepth)
    
    spatialdetectionNetwork.out.link(nnOut.input)
    camRgb.isp.link(xoutISP.input)
    
    # Syncing NN and ISP
    spatialdetectionNetwork.out.link(sync.inputs["NN_Sync"])
    camRgb.isp.link(sync.inputs["ISP_Sync"])
    sync.out.link(xoutGrp.input)
    
    # Depth and Disparity Linking
    camLeft.out.link(stereo.left)
    camRight.out.link(stereo.right)
    stereo.syncedLeft.link(xoutLeft.input)
    stereo.syncedRight.link(xoutRight.input)
    stereo.disparity.link(xoutDisparity.input)
    if depth:
        stereo.depth.link(xoutDepth.input)
    if outRectified:
        stereo.rectifiedLeft.link(xoutRectifLeft.input)
        stereo.rectifiedRight.link(xoutRectifRight.input)
    
    
    streams = ["left", "right", "ISP","nn"] #Change by ahmad
    if outRectified:
        streams.extend(["rectifiedLeft", "rectifiedRight"])
    streams.append("disparity")
    if depth:
        streams.append("depth")
    
    cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
    cvColorMap[0] = [0, 0, 0]
    
    # TakeUp Camera Static IP address
    device_info = dai.DeviceInfo("192.168.220.10")
    
    try:
        # Connect to device and start pipeline
        with dai.Device(pipeline, device_info) as device:
            # Output queues will be used to get the rgb frames and nn data from the outputs defined above
            qDet = device.getOutputQueue("nn", 4, blocking=False)
            qISP = device.getOutputQueue("ISP", 4, blocking=False)
            qManip = device.getOutputQueue("Manip", 4, blocking=False)
    
            qDisparity = device.getOutputQueue("disparity", 1, blocking=False)
            qDepth = device.getOutputQueue("depth", 1, blocking=False)
            qLeft = device.getOutputQueue("left", 1, blocking=False)
            qRight = device.getOutputQueue("right", 1, blocking=False)
    
            qSync = device.getOutputQueue("xout", 4, False)
            msgGrp = None
    
            frame = None
            m_frame = None
    
            Disparityframe = None
            Depthframe = None
            LeftFrame = None
            RightFrame = None
    
            detections = []
            startTime = time.monotonic()
            counter = 0
            color2 = (255, 255, 255)
    
            # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
            def frameNorm(frame, bbox):
                normVals = np.full(len(bbox), frame.shape[0])
                normVals[::2] = frame.shape[1]
                print(np.clip(np.array(bbox), 0, 1))
                print(normVals)
                print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
                return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
    
            def displayFrame(name, frame, detections, i):
                color_spool = (255, 0, 0)
                color_person = (0, 255, 0)
                color = ''
                color_blank_img = (255, 255, 255)
                blank_image = np.zeros((3040,4056,1), np.uint8)
                text_output_path = "D:\\Cameras_Live\\TakeUp\\" + "Label{}".format(j) + ".txt"
                #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
                #text_file = open(text_output_path, 'w')
                for detection in detections:
                    bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                    if labels[detection.label] == 'person':
                        color = color_person
                    else:
                        color = color_spool
                    cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                    cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                    cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                    
                
                cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
                cv2.namedWindow(name, cv2.WINDOW_NORMAL)
                cv2.imshow(name, frame)
                print("show complete")
                
    
            
            j=1
            ###Create the folder that will contain capturing
            folder_name = "D:\\Cameras_Live\\TakeUp"
            path = Path(folder_name)
    
            syncframe = None
            sync_detections = None
    
            while True:
    
                ###Create the folder that will contain capturing
                path.mkdir(parents=True, exist_ok=True)
    
                # Delay to match DA reading fps
                #time.sleep(4)
                print("0")
                msgGrp = qSync.get()
                print("1")
                # By Yishu
                inISP = qISP.get()
                print("2: ", inISP)
                inManip = qManip.get()
                print("3: ", inManip)
                inDet = qDet.get()
                print("4: ", inDet)
                inLeft = qLeft.get()
                print("5")
                inRight = qRight.get()
                print("6")
                inDepth = qDepth.get()
                print("7")
                inDisparity = qDisparity.get()
                print("8")
                
    
                if msgGrp is not None:
                    print('msgGrp: ', msgGrp)
                    for name, msg in msgGrp:
                        if name == "ISP_Sync":
                            syncframe = msg.getCvFrame()
                            print("syncframe: ")
                        if name == "NN_Sync":
                            sync_detections = msg.detections
                            print("sync_detections: ")
                        if syncframe is not None and sync_detections is not None:
                            print("Both sync not none")
                            displayFrame("sync_isp", syncframe, sync_detections, j)
                            print("display_complete")
                        
                if inManip is not None:
                    m_frame = inManip.getCvFrame()
                
                print("9")
                
                if inLeft is not None:
                    LeftFrame = inLeft.getCvFrame()
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Left{}.bmp".format(j),LeftFrame)
                
                print("10")
                if inRight is not None:
                    RightFrame = inRight.getCvFrame()
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Right{}.bmp".format(j),RightFrame)
    
                print("11")
                if inDepth is not None:
                    Depthframe = inDepth.getCvFrame()
                    Depthframe = Depthframe.astype(np.uint16)
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Depth{}.png".format(j),Depthframe)
    
                print("12")
                if inDisparity is not None:
                    Disparityframe = inDisparity.getCvFrame()
                    Disparityframe = getDisparityFrame(Disparityframe, cvColorMap)
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Disparity{}.bmp".format(j),Disparityframe)
                
                print("12")
                if cv2.waitKey(1) == ord('q'):
                    break
    
                j=j+1
                '''if j==5:
                    j=1'''
            
    except Exception as e:
            print(f"An error occurred: {e}")
            sys.exit(0)

    Thanks
    Yishu

    Hi @yishu_corpex
    I removed the sync node since it was causing issues; didn't change anything else though but the RGB and NN seem to be in sync:

    
    import cv2
    import depthai as dai
    import numpy as np
    import argparse
    import json
    import time
    import sys
    from pathlib import Path
    import blobconverter
    from datetime import timedelta
    
    
    
    
    def getDisparityFrame(frame, cvColorMap):
        maxDisp = stereo.initialConfig.getMaxDisparity()
        disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
        disp = cv2.applyColorMap(disp, cvColorMap)
    
        return disp
    
    # parse arguments 
    parser = argparse.ArgumentParser()
    parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                        default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
    parser.add_argument("-c", "--config", help="Provide config path for inference",
                        default='models/best_ckpt.json', type=str)
    args = parser.parse_args()
    
    
    # parse config
    configPath = Path(args.config)
    if not configPath.exists():
        raise ValueError("Path {} does not exist!".format(configPath))
    
    with configPath.open() as f:
        config = json.load(f)
    nnConfig = config.get("nn_config", {})
    
    # parse input shape
    if "input_size" in nnConfig:
        W, H = tuple(map(int, nnConfig.get("input_size").split('x')))
    
    # extract metadata
    metadata = nnConfig.get("NN_specific_metadata", {})
    classes = metadata.get("classes", {})
    coordinates = metadata.get("coordinates", {})
    anchors = metadata.get("anchors", {})
    anchorMasks = metadata.get("anchor_masks", {})
    iouThreshold = metadata.get("iou_threshold", {})
    confidenceThreshold = metadata.get("confidence_threshold", {})
    
    #print(metadata)
    
    # parse labels
    nnMappings = config.get("mappings", {})
    labels = nnMappings.get("labels", {})
    #print("Labels: ", labels)
    
    # get model path
    nnPath = args.model
    if not Path(nnPath).exists():
        print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
        nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
    
    # sync outputs
    syncNN = True
    
    # Depth and Disparity
    outRectified = False  # Output and display rectified streams
    lrcheck = True  # Better handling for occlusions
    extended = False  # Closer-in minimum depth, disparity range is doubled
    subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
    depth = True  # Display depth frames
    swap_left_right = False #Swap left right frames
    alpha = None
    resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P
    
    median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7
    
    # Create pipeline
    pipeline = dai.Pipeline()
    
    # Define sources and outputs
    camRgb = pipeline.create(dai.node.ColorCamera)
    spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
    #xoutRgb = pipeline.create(dai.node.XLinkOut)
    nnOut = pipeline.create(dai.node.XLinkOut)
    
    
    # By Yishu
    xoutISP = pipeline.create(dai.node.XLinkOut)
    manip = pipeline.create(dai.node.ImageManip)
    xoutManip = pipeline.create(dai.node.XLinkOut)
    
    # Depth and Disparity
    camLeft = pipeline.create(dai.node.MonoCamera)
    camRight = pipeline.create(dai.node.MonoCamera)
    stereo = pipeline.create(dai.node.StereoDepth)
    xoutLeft = pipeline.create(dai.node.XLinkOut)
    xoutRight = pipeline.create(dai.node.XLinkOut)
    xoutDepth = pipeline.create(dai.node.XLinkOut)
    xoutRectifLeft = pipeline.create(dai.node.XLinkOut)
    xoutRectifRight = pipeline.create(dai.node.XLinkOut)
    
    # Sync node
    
    
    #xoutRgb.setStreamName("rgb")
    xoutISP.setStreamName("ISP")
    nnOut.setStreamName("nn")
    xoutManip.setStreamName("Manip")
    
    xoutLeft.setStreamName("left")
    xoutRight.setStreamName("right")
    xoutDepth.setStreamName("depth")
    xoutRectifLeft.setStreamName("rectifiedLeft")
    xoutRectifRight.setStreamName("rectifiedRight")
    
    # Properties
    camRgb.setPreviewSize(W, H)
    
    camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P) #THE_1080_P
    camRgb.setInterleaved(False)
    camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
    camRgb.setFps(30) #40
    
    camRgb.setIspNumFramesPool(10)
    
    # By Yishu
    manip.initialConfig.setKeepAspectRatio(False) #True
    manip.initialConfig.setResize(W, H)
    
    # Change to RGB image than BGR - Yishu
    manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p
    
    # setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
    manip.setMaxOutputFrameSize(1228800)
    
    
    # By Yishu
    nnOut.input.setBlocking(False)
    xoutISP.input.setBlocking(False)
    xoutManip.input.setBlocking(False)
    
    # By Yishu
    nnOut.input.setQueueSize(10)
    xoutISP.input.setQueueSize(10)
    xoutManip.input.setQueueSize(10)
    spatialdetectionNetwork.input.setQueueSize(10)
    
    # Network specific settings
    spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
    spatialdetectionNetwork.setNumClasses(classes)
    spatialdetectionNetwork.setCoordinateSize(coordinates)
    spatialdetectionNetwork.setAnchors(anchors)
    spatialdetectionNetwork.setAnchorMasks(anchorMasks)
    spatialdetectionNetwork.setIouThreshold(iouThreshold)
    spatialdetectionNetwork.setBlobPath(nnPath)
    spatialdetectionNetwork.setNumInferenceThreads(2)
    spatialdetectionNetwork.input.setBlocking(False)
    
    # SpatialYolo specific parameters
    spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
    spatialdetectionNetwork.setDepthLowerThreshold(100)
    spatialdetectionNetwork.setDepthUpperThreshold(5000)
    
    # Depth and Disparity Properties
    if swap_left_right:
        camLeft.setCamera("right")
        camRight.setCamera("left")
    else:
        camLeft.setCamera("left")
        camRight.setCamera("right")
    
    for monoCam in (camLeft, camRight):  # Common config
        monoCam.setResolution(resolution)
        monoCam.setFps(2)
    
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
    stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
    stereo.setLeftRightCheck(lrcheck)
    stereo.setExtendedDisparity(extended)
    stereo.setSubpixel(subpixel)
    if alpha is not None:
        stereo.setAlphaScaling(alpha)
        config = stereo.initialConfig.get()
        config.postProcessing.brightnessFilter.minBrightness = 0
        stereo.initialConfig.set(config)
    
    # Alignment with RGB - Ahmad
    stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
    
    # # 4056x3040
    stereo.setOutputSize(1248, 936)
    
    # Linking
    camRgb.video.link(manip.inputImage)
    manip.out.link(spatialdetectionNetwork.input)
    
    # By Yishu
    manip.out.link(xoutManip.input)
    
    # Linking stereo depth as an input to the spatial detection network
    stereo.depth.link(spatialdetectionNetwork.inputDepth)
    
    spatialdetectionNetwork.out.link(nnOut.input)
    camRgb.isp.link(xoutISP.input)
    
    # Depth and Disparity Linking
    camLeft.out.link(stereo.left)
    camRight.out.link(stereo.right)
    stereo.syncedLeft.link(xoutLeft.input)
    stereo.syncedRight.link(xoutRight.input)
    if depth:
        stereo.depth.link(xoutDepth.input)
    if outRectified:
        stereo.rectifiedLeft.link(xoutRectifLeft.input)
        stereo.rectifiedRight.link(xoutRectifRight.input)
    
    # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        print(np.clip(np.array(bbox), 0, 1))
        print(normVals)
        print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
    
    def displayFrame(name, frame, detections, i):
        color_spool = (255, 0, 0)
        color_person = (0, 255, 0)
        color = ''
        
        #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
        #text_file = open(text_output_path, 'w')
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            if labels[detection.label] == 'person':
                color = color_person
            else:
                color = color_spool
            cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
            cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
            
        
        cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
        cv2.namedWindow(name, cv2.WINDOW_NORMAL)
        cv2.imshow(name, frame)
        print("show complete")
    
    
    streams = ["left", "right", "ISP","nn"] #Change by ahmad
    if outRectified:
        streams.extend(["rectifiedLeft", "rectifiedRight"])
    if depth:
        streams.append("depth")
    
    cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
    cvColorMap[0] = [0, 0, 0]
    
    # TakeUp Camera Static IP address
    device_info = dai.DeviceInfo()
    
    try:
        # Connect to device and start pipeline
        with dai.Device(pipeline, device_info) as device:
            # Output queues will be used to get the rgb frames and nn data from the outputs defined above
            qDet = device.getOutputQueue("nn", 4, blocking=False)
            qISP = device.getOutputQueue("ISP", 4, blocking=False)
            qManip = device.getOutputQueue("Manip", 4, blocking=False)
    
            qDepth = device.getOutputQueue("depth", 1, blocking=False)
            qLeft = device.getOutputQueue("left", 1, blocking=False)
            qRight = device.getOutputQueue("right", 1, blocking=False)
    
            frame = None
            m_frame = None
            Depthframe = None
            LeftFrame = None
            RightFrame = None
    
            detections = []
            startTime = time.monotonic()
            counter = 0
            color2 = (255, 255, 255)
    
            # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
            def frameNorm(frame, bbox):
                normVals = np.full(len(bbox), frame.shape[0])
                normVals[::2] = frame.shape[1]
                print(np.clip(np.array(bbox), 0, 1))
                print(normVals)
                print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
                return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
    
            def displayFrame(name, frame, detections, i):
                color_spool = (255, 0, 0)
                color_person = (0, 255, 0)
                color = ''
                color_blank_img = (255, 255, 255)
                blank_image = np.zeros((3040,4056,1), np.uint8)
                text_output_path = "D:\\Cameras_Live\\TakeUp\\" + "Label{}".format(j) + ".txt"
                #print("Detections: ", [[d.label, d.confidence *100, d.xmin, d.ymin, d.xmax, d.ymax] for d in detections])
                #text_file = open(text_output_path, 'w')
                for detection in detections:
                    bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                    if labels[detection.label] == 'person':
                        color = color_person
                    else:
                        color = color_spool
                    cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                    cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                    cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                    
                
                cv2.imwrite("E:\\Cameras_Live\\Test_writing_imgs\\Image{}.bmp".format(j), frame)
                cv2.namedWindow(name, cv2.WINDOW_NORMAL)
                cv2.imshow(name, frame)
                print("show complete")
                
    
            
            j=1
            ###Create the folder that will contain capturing
            folder_name = "D:\\Cameras_Live\\TakeUp"
            path = Path(folder_name)
    
            syncframe = None
            sync_detections = None
    
            while True:
    
                ###Create the folder that will contain capturing
                
    
                # Delay to match DA reading fps
                #time.sleep(4)
                import time
    
                start_time = time.time()
    
                # Print and reset timing
                end_time = time.time()
                print("1 - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                # By Yishu
                inISP = qISP.get()
    
                end_time = time.time()
                print("2: ", inISP, " - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                inManip = qManip.get()
    
                end_time = time.time()
                print("3: ", inManip, " - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                inDet = qDet.get()
    
                end_time = time.time()
                print("4: ", inDet, " - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                inLeft = qLeft.get()
    
                end_time = time.time()
                print("5 - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                inRight = qRight.get()
    
                end_time = time.time()
                print("6 - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                inDepth = qDepth.get()
    
                end_time = time.time()
                print("7 - Time taken: {:.6f} seconds".format(end_time - start_time))
                start_time = end_time
    
                displayFrame("Color", inISP.getCvFrame(), inDet.detections, j)
    
        
    
                end_time = time.time()
                print("8 - Time taken: {:.6f} seconds".format(end_time - start_time))
    
                
                if inManip is not None:
                    m_frame = inManip.getCvFrame()
                
                print("9")
                
                if inLeft is not None:
                    LeftFrame = inLeft.getCvFrame()
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Left{}.bmp".format(j),LeftFrame)
                
                print("10")
                if inRight is not None:
                    RightFrame = inRight.getCvFrame()
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Right{}.bmp".format(j),RightFrame)
    
                print("11")
                if inDepth is not None:
                    Depthframe = inDepth.getCvFrame()
                    Depthframe = Depthframe.astype(np.uint16)
                    #cv2.imwrite("D:\\Cameras_Live\\TakeUp\\Depth{}.png".format(j),Depthframe)
    
                print("12")
                if cv2.waitKey(1) == ord('q'):
                    break
    
                j=j+1
                '''if j==5:
                    j=1'''
            
    except Exception as e:
            print(f"An error occurred: {e}")
            sys.exit(0)

    Hi @jakaskerl

    I would say the RGB (ISP frame) and NN are almost in sync. I do see the output bounding boxes slightly out of sync in case of a moving objects or person. I'm not sure why is that happening though.

    Also, may I know what issues and why are they being caused by the sync node?

    Also when I calculate the NN FPS using the following line of code similar to other DepthAI examples, I do see it does maxes out at 1.9 FPS.

    # startTime_nn is initiated before the while true loop
    
    if inDet is not None:
       counter += 1
    
    displayFrame("Color", inISP.getCvFrame(), inDet.detections, j)
    nn_fps = counter / (time.monotonic() - startTime_nn)
    print("nn_fps: ", nn_fps)

    I wanted to ask you if you can look into the script and suggest any changes to improve over that given we are using the 12MP resolution. I know when I reduce the resolution to 1080 P the NN FPS does bump up to around 8-10 FPS.

    Thanks
    Yishu

    4 days later

    Hi @yishu_corpex
    Seems like the Xlink was stressed out, so I removed unsynced outputs, I also changed all inputs to non-blocking. I can get about 10FPS whole.

    PRO-TIP: use the pipeline_graph tool, it comes in handy in case of pipeline failures.

    
    import cv2
    import depthai as dai
    import numpy as np
    import argparse
    import json
    import time
    import sys
    from pathlib import Path
    import blobconverter
    from datetime import timedelta
    
    
    def getDisparityFrame(frame, cvColorMap):
        maxDisp = stereo.initialConfig.getMaxDisparity()
        disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
        disp = cv2.applyColorMap(disp, cvColorMap)
    
        return disp
    
    # parse arguments 
    parser = argparse.ArgumentParser()
    parser.add_argument("-m", "--model", help="Provide model name or model path for inference",
                        default='models/best_ckpt_openvino_2022.1_6shave.blob', type=str)
    parser.add_argument("-c", "--config", help="Provide config path for inference",
                        default='models/best_ckpt.json', type=str)
    args = parser.parse_args()
    
    
    # parse config
    configPath = Path(args.config)
    if not configPath.exists():
        raise ValueError("Path {} does not exist!".format(configPath))
    
    with configPath.open() as f:
        config = json.load(f)
    nnConfig = config.get("nn_config", {})
    
    # parse input shape
    if "input_size" in nnConfig:
        W, H = tuple(map(int, nnConfig.get("input_size").split('x')))
    
    # extract metadata
    metadata = nnConfig.get("NN_specific_metadata", {})
    classes = metadata.get("classes", {})
    coordinates = metadata.get("coordinates", {})
    anchors = metadata.get("anchors", {})
    anchorMasks = metadata.get("anchor_masks", {})
    iouThreshold = metadata.get("iou_threshold", {})
    confidenceThreshold = metadata.get("confidence_threshold", {})
    
    #print(metadata)
    
    # parse labels
    nnMappings = config.get("mappings", {})
    labels = nnMappings.get("labels", {})
    #print("Labels: ", labels)
    
    # get model path
    nnPath = args.model
    if not Path(nnPath).exists():
        print("No blob found at {}. Looking into DepthAI model zoo.".format(nnPath))
        nnPath = str(blobconverter.from_zoo(args.model, shaves = 6, zoo_type = "depthai", use_cache=True))
    
    # sync outputs
    syncNN = True
    
    # Depth and Disparity
    outRectified = False  # Output and display rectified streams
    lrcheck = True  # Better handling for occlusions
    extended = False  # Closer-in minimum depth, disparity range is doubled
    subpixel = True  # Better accuracy for longer distance, fractional disparity 32-levels
    depth = True  # Display depth frames
    swap_left_right = False #Swap left right frames
    alpha = None
    resolution = dai.MonoCameraProperties.SensorResolution.THE_400_P #THE_800_P
    
    median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7 # 7x7
    
    # Create pipeline
    pipeline = dai.Pipeline()
    
    # Define sources and outputs
    camRgb = pipeline.create(dai.node.ColorCamera)
    spatialdetectionNetwork = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
    #xoutRgb = pipeline.create(dai.node.XLinkOut)
    nnOut = pipeline.create(dai.node.XLinkOut)
    
    
    
    manip = pipeline.create(dai.node.ImageManip)
    
    # Depth and Disparity
    camLeft = pipeline.create(dai.node.MonoCamera)
    camRight = pipeline.create(dai.node.MonoCamera)
    stereo = pipeline.create(dai.node.StereoDepth)
    
    
    
    # Sync node
    sync = pipeline.create(dai.node.Sync)
    xoutGrp = pipeline.create(dai.node.XLinkOut)
    xoutGrp.setStreamName("xout")
    
    #xoutRgb.setStreamName("rgb")
    
    nnOut.setStreamName("nn")
    
    # Properties
    camRgb.setPreviewSize(W, H)
    
    camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_4_K) #THE_1080_P
    camRgb.setInterleaved(False)
    camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR) #BGR color channels rather than RGB, experimented works best less false positives. In accorfdance with model training.
    camRgb.setFps(15) #40
    
    
    # By Yishu
    manip.initialConfig.setKeepAspectRatio(False) #True
    manip.initialConfig.setResize(W, H)
    
    # Change to RGB image than BGR - Yishu
    manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p) #dai.ImgFrame.Type.RGB888p
    
    # setMaxOutputFrameSize to avoid image bigger than max frame size error - Yishu
    manip.setMaxOutputFrameSize(1228800)
    
    # Sync node properties
    sync.setSyncThreshold(timedelta(milliseconds=100))
    
    # By Yishu
    nnOut.input.setBlocking(False)
    
    
    # Network specific settings
    spatialdetectionNetwork.setConfidenceThreshold(confidenceThreshold)
    spatialdetectionNetwork.setNumClasses(classes)
    spatialdetectionNetwork.setCoordinateSize(coordinates)
    spatialdetectionNetwork.setAnchors(anchors)
    spatialdetectionNetwork.setAnchorMasks(anchorMasks)
    spatialdetectionNetwork.setIouThreshold(iouThreshold)
    spatialdetectionNetwork.setBlobPath(nnPath)
    spatialdetectionNetwork.setNumInferenceThreads(2)
    spatialdetectionNetwork.input.setBlocking(False)
    
    # SpatialYolo specific parameters
    spatialdetectionNetwork.setBoundingBoxScaleFactor(0.5)
    spatialdetectionNetwork.setDepthLowerThreshold(100)
    spatialdetectionNetwork.setDepthUpperThreshold(5000)
    
    # Depth and Disparity Properties
    if swap_left_right:
        camLeft.setCamera("right")
        camRight.setCamera("left")
    else:
        camLeft.setCamera("left")
        camRight.setCamera("right")
    
    for monoCam in (camLeft, camRight):  # Common config
        monoCam.setResolution(resolution)
        monoCam.setFps(15)
    
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
    stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
    stereo.setLeftRightCheck(lrcheck)
    stereo.setExtendedDisparity(extended)
    stereo.setSubpixel(subpixel)
    if alpha is not None:
        stereo.setAlphaScaling(alpha)
        config = stereo.initialConfig.get()
        config.postProcessing.brightnessFilter.minBrightness = 0
        stereo.initialConfig.set(config)
    
    # Alignment with RGB - Ahmad
    stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
    
    # # 4056x3040
    stereo.setOutputSize(1248, 936)
    
    # Linking
    camRgb.isp.link(manip.inputImage)
    manip.inputImage.setBlocking(False)
    manip.inputImage.setQueueSize(10)
    manip.out.link(spatialdetectionNetwork.input)
    
    # By Yishu
    
    # Linking stereo depth as an input to the spatial detection network
    stereo.depth.link(spatialdetectionNetwork.inputDepth)
    
    spatialdetectionNetwork.out.link(nnOut.input)
    
    # Syncing NN and ISP
    spatialdetectionNetwork.out.link(sync.inputs["NN_Sync"])
    camRgb.isp.link(sync.inputs["ISP_Sync"])
    
    sync.inputs["NN_Sync"].setBlocking(False)
    sync.inputs["ISP_Sync"].setBlocking(False)
    
    sync.out.link(xoutGrp.input)
    xoutGrp.input.setBlocking(False)
    
    # Depth and Disparity Linking
    camLeft.out.link(stereo.left)
    camRight.out.link(stereo.right)
    
    
    
    
    streams = ["left", "right", "ISP","nn"] #Change by ahmad
    if outRectified:
        streams.extend(["rectifiedLeft", "rectifiedRight"])
    streams.append("disparity")
    if depth:
        streams.append("depth")
    
    cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
    cvColorMap[0] = [0, 0, 0]
    
    # TakeUp Camera Static IP address
    device_info = dai.DeviceInfo("")
    
    try:
        # Connect to device and start pipeline
        with dai.Device(pipeline, device_info) as device:
            # Output queues will be used to get the rgb frames and nn data from the outputs defined above
            qDet = device.getOutputQueue("nn", 4, blocking=False)
    
            qSync = device.getOutputQueue("xout", 4, False)
            msgGrp = None
    
            frame = None
            m_frame = None
    
            Disparityframe = None
            Depthframe = None
            LeftFrame = None
            RightFrame = None
    
            detections = []
            startTime = time.monotonic()
            counter = 0
            color2 = (255, 255, 255)
    
            # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
            def frameNorm(frame, bbox):
                normVals = np.full(len(bbox), frame.shape[0])
                normVals[::2] = frame.shape[1]
                print(np.clip(np.array(bbox), 0, 1))
                print(normVals)
                print((np.clip(np.array(bbox), 0, 1) * normVals).astype(int))
                return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
    
            def displayFrame(name, frame, detections, i):
                color_spool = (255, 0, 0)
                color_person = (0, 255, 0)
                color = ''
                
                for detection in detections:
                    bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                    if labels[detection.label] == 'person':
                        color = color_person
                    else:
                        color = color_spool
                    cv2.putText(frame, labels[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                    cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                    cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                    
                cv2.namedWindow(name, cv2.WINDOW_NORMAL)
                cv2.imshow(name, frame)
                print("show complete")
                
    
            
            j=1
            ###Create the folder that will contain capturing
    
            syncframe = None
            sync_detections = None
    
            while True:
                # Delay to match DA reading fps
                #time.sleep(4)
                print("0")
                msgGrp = qSync.get()
                print("1")
                inDet = qDet.get()
                print("4: ", inDet)
                
    
                if msgGrp is not None:
                    print('msgGrp: ', msgGrp)
                    for name, msg in msgGrp:
                        if name == "ISP_Sync":
                            syncframe = msg.getCvFrame()
                            print("syncframe: ")
                        if name == "NN_Sync":
                            sync_detections = msg.detections
                            print("sync_detections: ")
                        if syncframe is not None and sync_detections is not None:
                            print("Both sync not none")
                            displayFrame("sync_isp", syncframe, sync_detections, j)
                            print("display_complete")
                        
                
                print("9")
                print("12")
                if cv2.waitKey(1) == ord('q'):
                    break
    
                j=j+1
                '''if j==5:
                    j=1'''
            
    except Exception as e:
            print(f"An error occurred: {e}")
            sys.exit(0)

    Thanks,
    Jaka

    Hi @jakaskerl

    Thanks for your response. I did use the pipeline graph earlier. However faced this pipeline_graph issue and wasn't aware if it has been resolved yet.

    I did try your updated script, it seems to work well with 4K resolution, however it seems to get stuck after a few frames on 12MP resolution which we have been using lately. Is it because 12MP resolution is using a lot of resources on board?

    Thanks
    Yishu

    Hi @yishu_corpex
    I tried with 12MP and the pipeline doesn't crash. Are you using a POE device by any chance (or a low bandwidth USB)? You can check the pipeline graph how the FPS slowly starts do decline because the XLink nodes can't send frames fast enough.

    Thanks,
    Jaka

    Hi @jakaskerl

    I tried with a 3D USB camera and the pipeline does not crash.

    However, for our deployment we will be using a 3D PoE device and when I test on the PoE device, the pipeline crashes. I can see the FPS slowly declining and eventually becoming 0 across nodes as shown below.

    So does it mean that because we are using a PoE device the Xlink nodes can't send frames fast enough because of the bottleneck of PoE and thus eventually the pipeline crashes?

    Also I see eventually the FPS dropping to 0 across all the nodes. why does that happen?

    Thanks
    Yishu

      yishu_corpex So does it mean that because we are using a PoE device the Xlink nodes can't send frames fast enough because of the bottleneck of PoE and thus eventually the pipeline crashes?

      Yes, you are sending 12MP video to host which is too much for that 1gbps bandwidth to handle. Any chance you can lower the resolution? If FOV is a concern there might be some workarounds.

      yishu_corpex Also I see eventually the FPS dropping to 0 across all the nodes. why does that happen?

      It the queues slowly fill up, because they are not read quickly enough so the entire pipeline backfeeds and blocks.

      Thanks,
      Jaka