• DepthAI-v2
  • Question regarding ColorCamera configuration/control

Some context: I hope to use a MobileNetSpatialDetectionNetwork node to recognize a set of inanimate and immobile objects in my environment to assist in localization. I also want to use the node to recognize faces and eventually specific individuals. The inanimate objects are all on or close to the floor of the environment. The faces I expect to be over 1 m from the floor. At this time I do not have my OAK-D mounted on a pan/tilt mechanism and hope to avoid doing so. As a result I need to get the maximum VFOV from the color camera.

I found that the example script spatial_mobilenet.py uses the sensor resolution 1080_P. Further, the camera preview by default is center cropped. Thus the cropped ROI is 1080x1080. That ROI is then scaled to 300x300 as specified. So, the MSDN node sees 100% of the sensor image height, but only 56% of its width. So that means the things I want to detect have to be centered w.r.t. the camera to be seen and detected.

That got me to wondering about moving the ROI so that over some period of time, I could change the ROI and "scan" electronically the entire HFOV if necessary. I found rgb_camera_control.py, which allows you to change exposure and the crop ROI (plus other things). I found when changing the exposure area, it was pretty obvious (visually) that it takes the camera about 5 seconds to settle to a new autoexposure setting.

Changing the crop ROI appears (visually) to work immediately. But I'm not sure that is true; it would not be surprising if it took a frame or two, or maybe more.

These configuration and control changes make me wonder, is there some status setting, or something that indicates some steady state where the configuration and control settings are stable and one can trust the information flowing out of the pipeline? More specifically, is there a way to know when the autoexposure has settled; is there a way to know a new crop ROI is engaged? Or do I just need to perhaps take the best guess at some number of frames?

  • erik replied to this.

    Hello gregflurry , first sorry for the delay.
    so I have just recently creating a demo that maximizes VFOV, link here.
    Regarding other questions, I hope my readme in that PR will answer them:

    This demo shows how you can run NN inferencing on full FOV frames. Color camera's sensor has 4032x3040
    resultion, which can only be obtained from isp and still outputs of ColorCamera node. In order to run NN inference on a frame, the frame must be in RGB format and needs to be specific size. We use ImageManip node to convert YUV420 (isp) to RGB and to resize the frame to 300x300 (required by MobileNet NN that we use).

    Thanks, Erik

      @erik Thanks for the pointer to the demo. I injected the approach into my use of the MobileNetSpatialDetectionNetwork node and it worked nicely, at least as far as I could tell. The total loading on the LeonOS and LeonRT was about the same as my original approach, but with reduced load on OS and more on RT. Not sure if that matters.

      There is one thing that bothers me a bit. The demo approach in effect uses 'letter-boxing" which reduces size of any feature/object that might be detected. I don't know enough to know if that makes a difference for accuracy of the NN detection. Does it?

      • erik replied to this.

        erik I recreated the demo you mentioned in my environment, with the exception of using a different blob. There was actually a difference between what is shown in the readme and what I see. The passthrough is in grayscale, not color as shown in the readme. Am I doing something wrong?

        Hello gregflurry, so letterboxing is just one option to increase the (horizonstal) FOV, other option is to not keep the aspect ratio. Docs on that here: https://docs.luxonis.com/projects/api/en/latest/tutorials/maximize_fov/
        We haven't done comparison tests to know how much it effects, but I believe you are right on the smaller features taking effect in accuracy. Have you used mono cameras instead of color camera? Otherwise passthrough just passes through the input frame.

          erik I realized I was not clear in my previous post. I copied the code from the demo you mentioned and changed exactly two lines of code to use a different blob (tho apparently, based on the labels, it is the same model already converted) for the NN. There are no mono camera nodes created. So in theory I should see a color image for the passthrough. But I do not.

          I added some code to show the ImageManip.out and it too displays in grayscale. I checked the type and it was RGB888p.

          I am confused and apparently have reached the limit of my knowledge about DepthAI and OpenCV. I am not sure what else to check.

          • erik replied to this.

            gregflurry could you share the minimal reproducible code for that? We would love to check it out.

              erik Happy to supply the code. I derived my version from your GitHub repository here. Below find my code (cannot find a way to attach a file). I think I'm running DepthAI version 2.13.3 with some patches (I've forgotten how to get the version programmatically).

              Here are the changes between the original in GitHub and my code (I inserted additional comments to show my changes):

              • since I don't use the blob converter commented the import
              • since I need to establish a different path for the blob I used, I import from pathlib
              • for the NN blob path, I set the blob path to that created for the existing blob
              #!/usr/bin/env python3
              
              import cv2
              import depthai as dai
              # import blobconverter  # <-- REMOVED
              import numpy as np
              from pathlib import Path  # <-- ADDED
              
              # MobilenetSSD label texts
              
              labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",
                          "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
              nnBlobPath = str((Path(__file__).parent / Path('examples/models/mobilenet-ssd_openvino_2021.4_5shave.blob')).resolve().absolute())  # <-- added
              
              # Create pipeline
              pipeline = dai.Pipeline()
              
              # Define source and output
              camRgb = pipeline.create(dai.node.ColorCamera)
              camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
              camRgb.setInterleaved(False)
              camRgb.setIspScale(1,5) # 4032x3040 -> 812x608
              
              xoutIsp = pipeline.create(dai.node.XLinkOut)
              xoutIsp.setStreamName("isp")
              camRgb.isp.link(xoutIsp.input)
              
              # Use ImageManip to resize to 300x300 and convert YUV420 -> RGB
              manip = pipeline.create(dai.node.ImageManip)
              manip.setMaxOutputFrameSize(270000) # 300x300x3
              manip.initialConfig.setResizeThumbnail(300, 300)
              manip.initialConfig.setFrameType(dai.RawImgFrame.Type.RGB888p) # needed for NN
              camRgb.isp.link(manip.inputImage)
              
              # NN to demonstrate how to run inference on full FOV frames
              nn = pipeline.create(dai.node.MobileNetDetectionNetwork)
              nn.setConfidenceThreshold(0.5)
              # nn.setBlobPath(str(blobconverter.from_zoo(name="mobilenet-ssd", shaves=6)))  # <-- REMOVED
              nn.setBlobPath(nnBlobPath)  # <-- ADDED
              manip.out.link(nn.input)
              
              xoutNn = pipeline.create(dai.node.XLinkOut)
              xoutNn.setStreamName("nn")
              nn.out.link(xoutNn.input)
              
              xoutRgb = pipeline.create(dai.node.XLinkOut)
              xoutRgb.setStreamName("rgb")
              nn.passthrough.link(xoutRgb.input)
              
              with dai.Device(pipeline) as device:
                  qRgb = device.getOutputQueue(name='rgb')
                  qNn = device.getOutputQueue(name='nn')
                  qIsp = device.getOutputQueue(name='isp')
              
                  def frameNorm(frame, bbox):
                      normVals = np.full(len(bbox), frame.shape[0])
                      normVals[::2] = frame.shape[1]
                      return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)
              
                  def displayFrame(name, frame, detections):
                      color = (255, 0, 0)
                      for detection in detections:
                          bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
                          cv2.putText(frame, labelMap[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
                          cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)
                          cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
                      cv2.imshow(name, frame)
              
                  while True:
                      if qNn.has():
                          dets = qNn.get().detections
                          frame = qRgb.get()
                          f = frame.getCvFrame()
                          displayFrame("rgb", f, dets)
                      if qIsp.has():
                          frame = qIsp.get()
                          f = frame.getCvFrame()
                          cv2.putText(f, str(f.shape), (20, 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, (255,255,255))
                          cv2.imshow("isp", f)
              
                      if cv2.waitKey(1) == ord('q'):
                          break
              • erik replied to this.

                Hello gregflurry , I just checked the full-fov-nn experiment with latest version (2.14) and it works as expected. Didn't check your code, since I don't have the blob there.

                  erik Thanks for checking. I upgraded my PyCharm environment to 2.14.0. I still get the grayscale result. I decided to remove the NN, and thus the blob dependency, from the script in my previous post. The resulting code, included below, simply shows the ISP and the output of the ImageManip node, so it should be easy to run. In my environment, it still produces a grayscale image. I remain puzzled. I hope it is not too much of an imposition to ask that you run it. I suspect you will see a color image, but at least that will confirm I've got something wrong in my environment, tho I have no idea what.

                  #!/usr/bin/env python3
                  
                  import cv2
                  import depthai as dai
                  
                  # Create pipeline
                  pipeline = dai.Pipeline()
                  
                  # Define source and output
                  camRgb = pipeline.create(dai.node.ColorCamera)
                  camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
                  camRgb.setInterleaved(False)
                  camRgb.setIspScale(1,5) # 4032x3040 -> 812x608
                  
                  xoutIsp = pipeline.create(dai.node.XLinkOut)
                  xoutIsp.setStreamName("isp")
                  camRgb.isp.link(xoutIsp.input)
                  
                  # Use ImageManip to resize to 300x300 and convert YUV420 -> RGB
                  manip = pipeline.create(dai.node.ImageManip)
                  manip.setMaxOutputFrameSize(270000) # 300x300x3
                  manip.initialConfig.setResizeThumbnail(300, 300)
                  manip.initialConfig.setFrameType(dai.RawImgFrame.Type.RGB888p) # needed for NN
                  camRgb.isp.link(manip.inputImage)
                  
                  xoutRgb = pipeline.create(dai.node.XLinkOut)
                  xoutRgb.setStreamName("rgb")
                  manip.out.link(xoutRgb.input)
                  
                  with dai.Device(pipeline) as device:
                      qRgb = device.getOutputQueue(name='rgb')
                      qIsp = device.getOutputQueue(name='isp')
                  
                      while True:
                          if qRgb.has():
                              frame = qRgb.get()
                              f = frame.getCvFrame()
                              cv2.imshow("rgb", f)
                  
                          if qIsp.has():
                              frame = qIsp.get()
                              f = frame.getCvFrame()
                              cv2.imshow("isp", f)
                  
                          if cv2.waitKey(1) == ord('q'):
                              break

                  Thanks.

                  2 months later

                  erik
                  Hey Erik,
                  I want to do the exact same thing, but for a STILL output of ColorCamera node. In other words, I want to resize a 1080p STILL frame to 300x300 and feed it to the MobileNet NN.
                  I tried the following code, but it generated grayscale images.

                  # Creating manip
                  manip = pipeline.create(dai.node.ImageManip)
                  manip.initialConfig.setResize(300,300)
                  manip.initialConfig.setFrameType(dai.ImgFrame.Type.RGB888p)
                  
                  # Linking manip to camera STILL output
                  camRGB.still.link(manip.inputImage)

                  Any thoughts on how to solve the issue?

                  • erik replied to this.

                    Hello hussain_allawati , I am not sure if ImageManip supports NV12 frames by default. Could you try moving to image_manip_refactor branch of depthai-python, running python3 examples/install_requirements.py (to get the latest version of depthai), and retrying the same script?
                    Thanks, Erik

                      erik
                      Erik, I will try your suggestion tomorrow. However, According to this issue, the ImagManip node should be able to support NV12 frames by now. Could you confirm?

                      If it still doesn't support NV12, then I have implemented a pipeline to resize images on the host, however I am having issues with it. Everything is described in this discussion

                      Thanks,
                      Hussain

                      • erik replied to this.

                        Hello hussain_allawati ,
                        I don't think it's supported already, you would need to use image_manip_refactor branch of the depthai-python where it's supported. To do that: checkout to that branch, call python examples/install_requirements.py to install that version and try using ImageManip to manipulate NV12 frames.
                        Thanks, Erik