I tried using the zoo model text-detection-0004 after running it trough the blob converter (default parameters).
I started by just trying to get the model set up (no other nodes) and already there I get an error about memory usage.
I read about reducing memory, but I don't seem to get lower than this. Are there more things to tweak or the model simply is too big?

[2.14] [105.592] [NeuralNetwork(3)] [error] Tried to allocate '178972736'B out of '104154623'B available.
[2.14] [105.592] [NeuralNetwork(3)] [error] Neural network executor '0' out of '2' error: OUT_OF_MEMORY
[2.14] [106.246] [system] [info] Memory Usage - DDR: 241.60 / 340.93 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 22.20 / 77.58 MiB, LeonRT Heap: 6.84 / 41.37 MiB

Snippet for the cam setup (OAK-D)

    colorCam->setPreviewSize(1280, 768);
    colorCam->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
    colorCam->setInterleaved(false);
    colorCam->setBoardSocket(dai::CameraBoardSocket::RGB);
    colorCam->setPreviewKeepAspectRatio(false);
    colorCam->setColorOrder(dai::ColorCameraProperties::ColorOrder::BGR);

Adding this

nn->setNumPoolFrames(1);

replaces the error with a warning

[2.14] [159.765] [NeuralNetwork(3)] [warning] Number of pool frames 1 is less than number of executors 2, will likely yield in worse performance

The documentation doesn't say much about that setting. How does it affect the network? Is there another setting to limit the number of executors to avoid the warning? (if it makes sense, doesn't seem obvious)

  • erik replied to this.

    Hi dexter ,
    You are running out of RAM (docs here) by a huge margin. By default, you have 2 inference threads running in parallel for the NN, but even the first executor (0 our of 2) couldn't be run as it doesn't have enough RAM to run inference. Even with RAM optimizations (written in docs) I don't think you would be able to run this model (even at very slow FPS), so you might want to look at smaller AI models. Thoughts?
    Thanks, Erik

      Hi erik ,
      I read that doc and tried to do what I could. The only thing that helped was using setNumPoolFrames, which is not in that doc and its documentation doesn't really indicate what the consequences are when using it.
      Thus I'm not sure it actually works since I haven't handled the output yet (different than the EAST model), but it appears to run a little faster than 1 fps.
      I haven't found much else for text detection than EAST, this one and one similar to TD0004 that only does horizontal text. I want to try something bigger than the EAST 256x256.
      Before trying TD0004 I tried just changing the color camera settings to map the preview to a larger part of the frame (cropping to 1024x1024 really makes the FOV quite limited). I run out of RAM then as well using the EAST model.

        Hi dexter ,
        Interesting, good to know. Let us know if you will have any issue with handling the output of TD0004, we can provide some pointers.
        Thanks, Erik

          Hi erik ,
          I'm planning to look into the text_detection_demo for code to handle the TD0004 output, but if you have something better for C++ I'm happy to try it.
          Do you have any more info on how setNumPoolFrames really affects the model?

          I got everything working again after replacing EAST with TD0004. Thus it runs detection, recognition, depth and spatial in nodes. (It fits after reducing frames/threads for detection and spatial to 1.)
          The benefit is 1280x768 input rather than 256x256 for detection and thus also a wider FOV.
          The downside is higher latency (detection is a bit less than 1s), but since recognition is pretty slow fps stays just above 1 for up to some 6-8 detected texts when it runs in parallel.

            6 days later

            Hi erik ,
            Now I just wish I had something like the "OAK-D LR" to get more depth. Is beta devices usually more expensive or it's going to be about the same when it comes out of beta?
            Is RVC3 going to "run" at the same speed as RVC2 or it's faster? E.g. would an identical network run faster or it has to be using more threads or something to gain speed?

            • erik replied to this.

              Hi dexter ,
              Sorry about the delay. It's usually a bit more expensive just to test the market, we will decrease it a bit when we have a stock of those in storage and ready to be shipped.

              Unfortunately there won't be much AI performance improvement (at least no indication there would be, really depends on openvino). So the main advantage of RVC3 is the on-board host computing that runs Yocto Linux.

              That said, we will start evaluating candidates for RVC4 very soon, and we do target much better AI performance (think 5x) , but it will take some time till we get there (likely initial prototypes Q4 this year, launch of devices say Q2 next year).

              Thoughts?
              Thanks, Erik

                Hi erik ,
                Thanks for the info. It's a bit hard to read the specs to see if it's just "bigger" or also "faster".
                It will be interesting to see what will make RVC4 5x faster, but it's a long wait.