Request for High-Resolution (1920x1080) DetectionNetwork Pipeline in DepthAI v3

shAhn

Hello, I’m working with the OAK-4 S and performing object detection.

My device is running Luxonis OS version 1.8.0, and I’m using DepthAI v3.

I’ve customized the detection_network.py script from the following example:

https://docs.luxonis.com/software-v3/depthai/examples/detection_network/detection_network/

As described in the documentation, I’m using a lower camera resolution due to the resolution limitation of the YOLO model.

However, I’d like to apply “Resolution Techniques for NNs” to obtain full FOV at 1920x1080 resolution. Unfortunately, the demo code below seems to be written for DepthAI v2, which causes compatibility issues for me.

https://github.com/luxonis/oak-examples/blob/master/gen2-display-detections/2-crop_highres.py

Could you provide an updated version of this script that works with DepthAI v3 and allows me to use the full 1920x1080 camera resolution?

#!/usr/bin/env python3

from pathlib import Path
import cv2
import depthai as dai
import numpy as np
import time

# Create pipeline
with dai.Pipeline() as pipeline:
    cameraNode = pipeline.create(dai.node.Camera).build()
    detectionNetwork = pipeline.create(dai.node.DetectionNetwork).build(cameraNode, dai.NNModelDescription("yolov6-nano"))
    labelMap = detectionNetwork.getClasses()

    qRgb = detectionNetwork.passthrough.createOutputQueue()
    qDet = detectionNetwork.out.createOutputQueue()

    pipeline.start()

    frame = None
    detections = []
    startTime = time.monotonic()
    counter = 0
    color2 = (255, 255, 255)

    # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

    def displayFrame(name, frame):
        color = (255, 0, 0)
        for detection in detections:
            bbox = frameNorm(
                frame,
                (detection.xmin, detection.ymin, detection.xmax, detection.ymax),
            )
            cv2.putText(
                frame,
                labelMap[detection.label],
                (bbox[0] + 10, bbox[1] + 20),
                cv2.FONT_HERSHEY_TRIPLEX,
                0.5,
                255,
            )
            cv2.putText(
                frame,
                f"{int(detection.confidence * 100)}%",
                (bbox[0] + 10, bbox[1] + 40),
                cv2.FONT_HERSHEY_TRIPLEX,
                0.5,
                255,
            )
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
        # Show the frame
        cv2.imshow(name, frame)

    while pipeline.isRunning():
        inRgb: dai.ImgFrame = qRgb.get()
        inDet: dai.ImgDetections = qDet.get()
        if inRgb is not None:
            frame = inRgb.getCvFrame()
            cv2.putText(
                frame,
                "NN fps: {:.2f}".format(counter / (time.monotonic() - startTime)),
                (2, frame.shape[0] - 4),
                cv2.FONT_HERSHEY_TRIPLEX,
                0.4,
                color2,
            )

        if inDet is not None:
            detections = inDet.detections
            counter += 1

        if frame is not None:
            displayFrame("rgb", frame)
            print("FPS: {:.2f}".format(counter / (time.monotonic() - startTime)))
        if cv2.waitKey(1) == ord("q"):
            pipeline.stop()
            break

Thanks in advance!

jakaskerl

shAhn
Use camera_output = cameraNode.requestOutput((640,400), resizeMode=dai.ImgResizeMode.STRETCH) and then link this to the neural network.

Thanks,
Jaka