• DepthAI
  • How to increase ImageManip maximum frame size

I am trying to test out different versions of person-detection models in the model zoo. I am using the spacial_face_det.py program https://github.com/spmallick/learnopencv/blob/master/OAK-Object-Detection-with-Depth/spacial_face_det.py and changing the DET_INPUT_SIZE and model_name variables to the different models in the model zoo https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/. All is fine when I tried person-detection-0200 to 0202 which all have smaller DET_INPUT_SIZE. But when I got to person-detection-0203 https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/person-detection-0203, which has a DET_INPUT_SIZE = (864,480), I got an error: [ImageManip(5)] [error] Output image is bigger (1244160B) than maximum frame size specified in properties (1048576B) - skipping frame. How can I increase the ImageManip maximum frame size?

I tried adding the line face_det_manip.initialConfig.setMaxOutputFrameSize(1244160), I got an error AttributeError: 'depthai.ImageManipConfig' object has no attribute 'setMaxOutputFrameSize'.

  • erik replied to this.

    Hi FrancisTse ,
    Could you try face_det_manip.setMaxOutputFrameSize(1244160)?
    Thanks, Erik

    Hello Erik, that worked. Thanks for the quick reply!

    5 days later

    Hello Erik, Sorry I did not have a chance to try out the changes extensively till now. When I increased the ImageManip maximum frame size, I got another error. It seems that the program would keep running and display the output windows for a while till a person walked into the FOV and then the program would stop with the error message:
    [18443010F1AECE1200] [1056.112] [SpatialDetectionNetwork(4)] [critical] Fatal error in openvino '2021.4'. Likely because the model was compiled for different openvino version. If you want to select an explicit openvino version use: setOpenVINOVersion while creating pipeline. If error persists please report to developers. Log: 'Gather' '217'
    [18443010F1AECE1200] [1059.100] [system] [critical] Fatal error. Please report to developers. Log: 'Fatal error on MSS CPU: trap: 00, address: 00000000' '0'
    Traceback (most recent call last):
    File "/home/francis/Desktop/learningOAK-D-Lite/spacial_face_det copy.py", line 161, in <module>
    disp_frame = in_disp.getCvFrame()
    AttributeError: 'depthai.ADatatype' object has no attribute 'getCvFrame'
    Stack trace (most recent call last):
    #14 Object "/bin/python", at 0x587533, in
    #13 Object "/lib/aarch64-linux-gnu/libc.so.6", at 0x7f9be56217, in __libc_start_main
    #12 Object "/bin/python", at 0x587637, in Py_BytesMain
    #11 Object "/bin/python", at 0x5b79eb, in Py_RunMain
    #10 Object "/bin/python", at 0x5c958f, in Py_FinalizeEx
    #9 Object "/bin/python", at 0x5cdde3, in
    #8 Object "/bin/python", at 0x5ce40f, in _PyGC_CollectNoFail
    #7 Object "/bin/python", at 0x485b1b, in
    #6 Object "/bin/python", at 0x5bdabf, in
    #5 Object "/bin/python", at 0x525723, in PyDict_Clear
    #4 Object "/home/francis/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f8972f06f, in
    #3 Object "/home/francis/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f897eea77, in
    #2 Object "/home/francis/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f89965a03, in dai::DataOutputQueue::~DataOutputQueue()
    #1 Object "/home/francis/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f89962c97, in
    #0 Object "/home/francis/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f89a6199c, in
    Segmentation fault (Address not mapped to object [0x39303100000018])
    Segmentation fault

    I went back and reviewed all the models I tried from the same model zoo. These worked and provided the detected bounding box: face-detection-adas-001, face-detection-retail-0004, person-detection-0200, person-detection-0201 and person-detection-0202. The model that produced the error is person-detection-0203. One thing that I noticed is that all the models that worked has the same output blob shape and format- blob with shape: 1, 1, 200, 7 in the format 1, 1, N, 7. The model person-detections-0203 has an output blob with the shape 100, 5 in the format N, 5. Do I have to add additional parameters in my program to account for the difference in the model output blob shape and format, or, have to select a different openvino version to use, as the error message is suggesting. Thanks, Francis.

    • erik replied to this.

      Hi FrancisTse ,
      I believe you are exactly right - that MobileNetDetectionNetwork node can't parse the results because they aren't in the 1,N,7 format (standard SSD model format), but 1,N,5 and then separate output for labels (model desc here). I assume it would be possible to edit the model (eg with onnx tools) and combine these 2 outputs to have a single 1,N,7 output format as SSD model results have. Thoughts?
      Thanks, Erik

      Hello Erik, thanks for confirming the issue. This gives me a better understanding of what the MobileNetDetectionNetwork documentation says MobileNet detection network node is very similar to NeuralNetwork (in fact it extends it). The only difference is that this node is specifically for the MobileNet NN and it decodes the result of the NN on device. This means that out of this node is not a byte array but a ImgDetections that can easily be used in your code. I will evaluate the pre-compiled models with compatible outputs for now and experiment with editing and compiling models for later. Thanks for the link to the onnx tools. It will take me some time to learn more about nn models and how to use the tools. Best regards, Francis.

      2 years later

      hi @erik
      the yolov5 model trained on image size 416*416 and converted to blob given 18 fps while yolov5 model trained on image size 640*640 then converted to blob given fps 6 to 7.

      the fps of the model trained on 640 falls much as compared to the one trained on 416, is it normal or i am missing something??

      @erik for the above question, the custom models are trained on yolov5s.pt

      Hi @Unknown , we don't have comparisons for yolov5s directly with these sizes, but we do have for yolov6, and it's similar - 640x640 is about 2x slower than 416x416:

      From here:
      https://docs.luxonis.com/hardware/platform/rvc/rvc2/

      So yes, I'd say it's normal. Perhaps go with smaller model, or with smaller architecture (nano instead of small).

      hi @erik, Could implementing a frame-skipping mechanism, where the system skips a number of frames before processing a frame using the blob model for detection, help in increasing the effective frames per second (FPS) of the object detection process?