Question regarding ColorCamera configuration/control

gregflurry

Some context: I hope to use a MobileNetSpatialDetectionNetwork node to recognize a set of inanimate and immobile objects in my environment to assist in localization. I also want to use the node to recognize faces and eventually specific individuals. The inanimate objects are all on or close to the floor of the environment. The faces I expect to be over 1 m from the floor. At this time I do not have my OAK-D mounted on a pan/tilt mechanism and hope to avoid doing so. As a result I need to get the maximum VFOV from the color camera.

I found that the example script spatial_mobilenet.py uses the sensor resolution 1080_P. Further, the camera preview by default is center cropped. Thus the cropped ROI is 1080x1080. That ROI is then scaled to 300x300 as specified. So, the MSDN node sees 100% of the sensor image height, but only 56% of its width. So that means the things I want to detect have to be centered w.r.t. the camera to be seen and detected.

That got me to wondering about moving the ROI so that over some period of time, I could change the ROI and "scan" electronically the entire HFOV if necessary. I found rgb_camera_control.py, which allows you to change exposure and the crop ROI (plus other things). I found when changing the exposure area, it was pretty obvious (visually) that it takes the camera about 5 seconds to settle to a new autoexposure setting.

Changing the crop ROI appears (visually) to work immediately. But I'm not sure that is true; it would not be surprising if it took a frame or two, or maybe more.

These configuration and control changes make me wonder, is there some status setting, or something that indicates some steady state where the configuration and control settings are stable and one can trust the information flowing out of the pipeline? More specifically, is there a way to know when the autoexposure has settled; is there a way to know a new crop ROI is engaged? Or do I just need to perhaps take the best guess at some number of frames?

erik

Hello gregflurry , first sorry for the delay.
so I have just recently creating a demo that maximizes VFOV, link here.
Regarding other questions, I hope my readme in that PR will answer them:

This demo shows how you can run NN inferencing on full FOV frames. Color camera's sensor has 4032x3040
resultion, which can only be obtained from isp and still outputs of ColorCamera node. In order to run NN inference on a frame, the frame must be in RGB format and needs to be specific size. We use ImageManip node to convert YUV420 (isp) to RGB and to resize the frame to 300x300 (required by MobileNet NN that we use).

Thanks, Erik

gregflurry

erik I recreated the demo you mentioned in my environment, with the exception of using a different blob. There was actually a difference between what is shown in the readme and what I see. The passthrough is in grayscale, not color as shown in the readme. Am I doing something wrong?

hussain_allawati

erik
Hey Erik,
I want to do the exact same thing, but for a STILL output of ColorCamera node. In other words, I want to resize a 1080p STILL frame to 300x300 and feed it to the MobileNet NN.
I tried the following code, but it generated grayscale images.

# Creating manip
manip = pipeline.create(dai.node.ImageManip)
manip.initialConfig.setResize(300,300)
manip.initialConfig.setFrameType(dai.ImgFrame.Type.RGB888p)

# Linking manip to camera STILL output
camRGB.still.link(manip.inputImage)

Any thoughts on how to solve the issue?

gregflurry

@erik Thanks for the pointer to the demo. I injected the approach into my use of the MobileNetSpatialDetectionNetwork node and it worked nicely, at least as far as I could tell. The total loading on the LeonOS and LeonRT was about the same as my original approach, but with reduced load on OS and more on RT. Not sure if that matters.

There is one thing that bothers me a bit. The demo approach in effect uses 'letter-boxing" which reduces size of any feature/object that might be detected. I don't know enough to know if that makes a difference for accuracy of the NN detection. Does it?

erik

Hello gregflurry, so letterboxing is just one option to increase the (horizonstal) FOV, other option is to not keep the aspect ratio. Docs on that here: https://docs.luxonis.com/projects/api/en/latest/tutorials/maximize_fov/
We haven't done comparison tests to know how much it effects, but I believe you are right on the smaller features taking effect in accuracy. Have you used mono cameras instead of color camera? Otherwise passthrough just passes through the input frame.

gregflurry

erik I realized I was not clear in my previous post. I copied the code from the demo you mentioned and changed exactly two lines of code to use a different blob (tho apparently, based on the labels, it is the same model already converted) for the NN. There are no mono camera nodes created. So in theory I should see a color image for the passthrough. But I do not.

I added some code to show the ImageManip.out and it too displays in grayscale. I checked the type and it was RGB888p.

I am confused and apparently have reached the limit of my knowledge about DepthAI and OpenCV. I am not sure what else to check.

erik

gregflurry could you share the minimal reproducible code for that? We would love to check it out.

gregflurry

erik Happy to supply the code. I derived my version from your GitHub repository here. Below find my code (cannot find a way to attach a file). I think I'm running DepthAI version 2.13.3 with some patches (I've forgotten how to get the version programmatically).

Here are the changes between the original in GitHub and my code (I inserted additional comments to show my changes):

since I don't use the blob converter commented the import
since I need to establish a different path for the blob I used, I import from pathlib
for the NN blob path, I set the blob path to that created for the existing blob

#!/usr/bin/env python3

import cv2
import depthai as dai
# import blobconverter  # <-- REMOVED
import numpy as np
from pathlib import Path  # <-- ADDED

# MobilenetSSD label texts

labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",
            "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
nnBlobPath = str((Path(__file__).parent / Path('examples/models/mobilenet-ssd_openvino_2021.4_5shave.blob')).resolve().absolute())  # <-- added

# Create pipeline
pipeline = dai.Pipeline()

# Define source and output
camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
camRgb.setInterleaved(False)
camRgb.setIspScale(1,5) # 4032x3040 -> 812x608

xoutIsp = pipeline.create(dai.node.XLinkOut)
xoutIsp.setStreamName("isp")
camRgb.isp.link(xoutIsp.input)

# Use ImageManip to resize to 300x300 and convert YUV420 -> RGB
manip = pipeline.create(dai.node.ImageManip)
manip.setMaxOutputFrameSize(270000) # 300x300x3
manip.initialConfig.setResizeThumbnail(300, 300)
manip.initialConfig.setFrameType(dai.RawImgFrame.Type.RGB888p) # needed for NN
camRgb.isp.link(manip.inputImage)

# NN to demonstrate how to run inference on full FOV frames
nn = pipeline.create(dai.node.MobileNetDetectionNetwork)
nn.setConfidenceThreshold(0.5)
# nn.setBlobPath(str(blobconverter.from_zoo(name="mobilenet-ssd", shaves=6)))  # <-- REMOVED
nn.setBlobPath(nnBlobPath)  # <-- ADDED
manip.out.link(nn.input)

xoutNn = pipeline.create(dai.node.XLinkOut)
xoutNn.setStreamName("nn")
nn.out.link(xoutNn.input)

xoutRgb = pipeline.create(dai.node.XLinkOut)
xoutRgb.setStreamName("rgb")
nn.passthrough.link(xoutRgb.input)

with dai.Device(pipeline) as device:
    qRgb = device.getOutputQueue(name='rgb')
    qNn = device.getOutputQueue(name='nn')
    qIsp = device.getOutputQueue(name='isp')

    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

    def displayFrame(name, frame, detections):
        color = (255, 0, 0)
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            cv2.putText(frame, labelMap[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
            cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
        cv2.imshow(name, frame)

    while True:
        if qNn.has():
            dets = qNn.get().detections
            frame = qRgb.get()
            f = frame.getCvFrame()
            displayFrame("rgb", f, dets)
        if qIsp.has():
            frame = qIsp.get()
            f = frame.getCvFrame()
            cv2.putText(f, str(f.shape), (20, 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
            cv2.imshow("isp", f)

        if cv2.waitKey(1) == ord('q'):
            break

erik

Hello gregflurry , I just checked the full-fov-nn experiment with latest version (2.14) and it works as expected. Didn't check your code, since I don't have the blob there.

gregflurry

erik Thanks for checking. I upgraded my PyCharm environment to 2.14.0. I still get the grayscale result. I decided to remove the NN, and thus the blob dependency, from the script in my previous post. The resulting code, included below, simply shows the ISP and the output of the ImageManip node, so it should be easy to run. In my environment, it still produces a grayscale image. I remain puzzled. I hope it is not too much of an imposition to ask that you run it. I suspect you will see a color image, but at least that will confirm I've got something wrong in my environment, tho I have no idea what.

#!/usr/bin/env python3

import cv2
import depthai as dai

# Create pipeline
pipeline = dai.Pipeline()

# Define source and output
camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
camRgb.setInterleaved(False)
camRgb.setIspScale(1,5) # 4032x3040 -> 812x608

xoutIsp = pipeline.create(dai.node.XLinkOut)
xoutIsp.setStreamName("isp")
camRgb.isp.link(xoutIsp.input)

# Use ImageManip to resize to 300x300 and convert YUV420 -> RGB
manip = pipeline.create(dai.node.ImageManip)
manip.setMaxOutputFrameSize(270000) # 300x300x3
manip.initialConfig.setResizeThumbnail(300, 300)
manip.initialConfig.setFrameType(dai.RawImgFrame.Type.RGB888p) # needed for NN
camRgb.isp.link(manip.inputImage)

xoutRgb = pipeline.create(dai.node.XLinkOut)
xoutRgb.setStreamName("rgb")
manip.out.link(xoutRgb.input)

with dai.Device(pipeline) as device:
    qRgb = device.getOutputQueue(name='rgb')
    qIsp = device.getOutputQueue(name='isp')

    while True:
        if qRgb.has():
            frame = qRgb.get()
            f = frame.getCvFrame()
            cv2.imshow("rgb", f)

        if qIsp.has():
            frame = qIsp.get()
            f = frame.getCvFrame()
            cv2.imshow("isp", f)

        if cv2.waitKey(1) == ord('q'):
            break

Thanks.

erik

Hello hussain_allawati , I am not sure if ImageManip supports NV12 frames by default. Could you try moving to image_manip_refactor branch of depthai-python, running python3 examples/install_requirements.py (to get the latest version of depthai), and retrying the same script?
Thanks, Erik

erik

hussain_allawati

erik
Erik, I will try your suggestion tomorrow. However, According to this issue, the ImagManip node should be able to support NV12 frames by now. Could you confirm?

If it still doesn't support NV12, then I have implemented a pipeline to resize images on the host, however I am having issues with it. Everything is described in this discussion

Thanks,
Hussain

erik

Hello hussain_allawati ,
I don't think it's supported already, you would need to use image_manip_refactor branch of the depthai-python where it's supported. To do that: checkout to that branch, call python examples/install_requirements.py to install that version and try using ImageManip to manipulate NV12 frames.
Thanks, Erik