Detection Network not working with Camera node instead of ColorCamera node

lincolnxlw

Using Camera node instead of ColorCamera node has the benefit of automatic image undistortion for ``still``, ``video`` and ``preview`` streams. But when I modify a simple YOLOv3 detection example code to use Camera node, it returns runtime error

Here is the original pipeline for using ColorCamera

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)

detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)

xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("rgb")
nnOut.setStreamName("nn")

# Properties
camRgb.setPreviewSize(416, 416)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_800_P)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camRgb.setFps(40)
camRgb.setBoardSocket(dai.CameraBoardSocket.RGB)

# Network specific settings
detectionNetwork.setConfidenceThreshold(0.8)
detectionNetwork.setNumClasses(80)
detectionNetwork.setCoordinateSize(4)
detectionNetwork.setAnchors([10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319])
detectionNetwork.setAnchorMasks({"side26": [1, 2, 3], "side13": [3, 4, 5]})
detectionNetwork.setIouThreshold(0.5)
detectionNetwork.setBlobPath(blobconverter.from_zoo(name="yolo-v3-tiny-tf", shaves=6))
detectionNetwork.setNumInferenceThreads(2)
detectionNetwork.input.setBlocking(False)

# Linking
camRgb.preview.link(detectionNetwork.input)
if syncNN:
    detectionNetwork.passthrough.link(xoutRgb.input)
else:
    camRgb.preview.link(xoutRgb.input)

detectionNetwork.out.link(nnOut.input)

Here is the modified pipeline with using Camera node

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb: dai.node.Camera = pipeline.create(dai.node.Camera)

detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)

xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("rgb")
nnOut.setStreamName("nn")

# Properties
camRgb.setSize((1280, 800))
camRgb.setPreviewSize(416, 416)
camRgb.setFps(40)
camRgb.setBoardSocket(dai.CameraBoardSocket.RGB)

# Network specific settings
detectionNetwork.setConfidenceThreshold(0.8)
detectionNetwork.setNumClasses(80)
detectionNetwork.setCoordinateSize(4)
detectionNetwork.setAnchors([10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319])
detectionNetwork.setAnchorMasks({"side26": [1, 2, 3], "side13": [3, 4, 5]})
detectionNetwork.setIouThreshold(0.5)
detectionNetwork.setBlobPath(blobconverter.from_zoo(name="yolo-v3-tiny-tf", shaves=6))
detectionNetwork.setNumInferenceThreads(2)
detectionNetwork.input.setBlocking(False)

# Linking
camRgb.preview.link(detectionNetwork.input)
if syncNN:
    detectionNetwork.passthrough.link(xoutRgb.input)
else:
    camRgb.preview.link(xoutRgb.input)

detectionNetwork.out.link(nnOut.input)

What did I miss that causes the error?

jakaskerl

Hi lincolnxlw
Camera node works a bit differently to the ColorCamera node. I suggest you look at the examples we have:
https://github.com/luxonis/depthai-python/tree/main/examples/Camera
It's also worth noting that the node is experimental and some features may not yet work/work well.

Thanks,
Jaka

lincolnxlw

Hi jakaskerl

Thanks for the reply. And thanks for sharing the examples.

For the yolo code using Camera node, I didn't have issue showing the preview as long as I took out the XLinkOut for the NN. So I think I created the preview correctly. But once I create the stream for NN, it shows the misconfiguration error. Even when I took out the queue for the NN, it still shows the error. It looks like somehow linking the preview from Camera node to detection network causes the problem.

It will be very helpful if there is a working YOLO object detection example using Camera node.

I also attached below the complete script and I hope it can help pinpoint the issue.

from pathlib import Path
import sys
import cv2
import depthai as dai
import numpy as np
import time
import blobconverter

# tiny yolo v4 label texts
labelMap = [
    "person",         "bicycle",    "car",           "motorbike",     "aeroplane",   "bus",           "train",
    "truck",          "boat",       "traffic light", "fire hydrant",  "stop sign",   "parking meter", "bench",
    "bird",           "cat",        "dog",           "horse",         "sheep",       "cow",           "elephant",
    "bear",           "zebra",      "giraffe",       "backpack",      "umbrella",    "handbag",       "tie",
    "suitcase",       "frisbee",    "skis",          "snowboard",     "sports ball", "kite",          "baseball bat",
    "baseball glove", "skateboard", "surfboard",     "tennis racket", "bottle",      "wine glass",    "cup",
    "fork",           "knife",      "spoon",         "bowl",          "banana",      "apple",         "sandwich",
    "orange",         "broccoli",   "carrot",        "hot dog",       "pizza",       "donut",         "cake",
    "chair",          "sofa",       "pottedplant",   "bed",           "diningtable", "toilet",        "tvmonitor",
    "laptop",         "mouse",      "remote",        "keyboard",      "cell phone",  "microwave",     "oven",
    "toaster",        "sink",       "refrigerator",  "book",          "clock",       "vase",          "scissors",
    "teddy bear",     "hair drier", "toothbrush"
]

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb: dai.node.Camera = pipeline.create(dai.node.Camera)
detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)

xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("rgb")
nnOut.setStreamName("nn")

# Properties
camRgb.setSize((1280, 800))
camRgb.setPreviewSize(416, 416)
camRgb.setFps(40)
camRgb.setBoardSocket(dai.CameraBoardSocket.RGB)

# Network specific settings
detectionNetwork.setConfidenceThreshold(0.8)
detectionNetwork.setNumClasses(80)
detectionNetwork.setCoordinateSize(4)
detectionNetwork.setAnchors([10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319])
detectionNetwork.setAnchorMasks({"side26": [1, 2, 3], "side13": [3, 4, 5]})
detectionNetwork.setIouThreshold(0.5)
detectionNetwork.setBlobPath(blobconverter.from_zoo(name="yolo-v3-tiny-tf", shaves=6))
detectionNetwork.setNumInferenceThreads(2)
detectionNetwork.input.setBlocking(False)

# Linking
camRgb.preview.link(detectionNetwork.input)
camRgb.preview.link(xoutRgb.input)
detectionNetwork.out.link(nnOut.input)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:
    # Output queues will be used to get the rgb frames and nn data from the outputs defined above
    qRgb = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)
    qDet = device.getOutputQueue(name="nn", maxSize=4, blocking=False)

    frame = None
    detections = []
    startTime = time.monotonic()
    counter = 0
    color2 = (255, 255, 255)

    # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

    def displayFrame(name, frame):
        color = (255, 0, 0)
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            cv2.putText(frame, labelMap[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
        # Show the frame
        cv2.imshow(name, frame)

    while True:
        inRgb = qRgb.tryGet()
        inDet = qDet.tryGet()

        if inRgb is not None:
            frame = inRgb.getCvFrame()
            cv2.putText(frame, "NN fps: {:.2f}".format(counter / (time.monotonic() - startTime)),
                        (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color2)

        if inDet is not None:
            detections = inDet.detections
            counter += 1

        if frame is not None:
            displayFrame("rgb", frame)

        if cv2.waitKey(1) == ord('q'):
            break

Thanks

Lincoln

jakaskerl

Hi lincolnxlw
Though one, but I managed to get it to work by disabling the Warp - the reason you are using the Camera node in the first place I assume.

c.setMeshSource(dai.CameraProperties.WarpMeshSource.NONE)

Try playing around by setting different property configurations for warp (more info under function definition).

Thanks,
Jaka

lincolnxlw

Hi jakaskerl

You are right. I am using Camera node to get the undistorted image automatically. I will take a deeper look and see what I can find. Does this consider a bug? Do we need to open a PR on the github repo?

I also tried to use ColorCamera to manually undistort the image, but there is a different behavior comparing to the undistortion from the Camera node. Can you take a look of this post and let me know if there is something obvious I can do?

Thanks

Lincoln

jakaskerl

Hi lincolnxlw
Yes, I think it's best to report the issue to github repo.
Regarding the post, I tried changing settings to "zoom out" the image, but to no avail. I think it's best to wait for erik, he has more experience in this field.

Thanks,
Jaka