• DepthAI-v2
  • Can not keep FPS at 60 on OAK-D-PRO-POE-AF

Hi!

Hi I am very new to computer vision and currently testing with OAK-D-PRO-POE-AF I got a few weeks ago, and tried to use it for image analysis and capture.

In my use case, I don't need a higher resolution but I require the highest possible fps, so I set the RGB camera to THE_1080_P (which, to my knowledge, is the lowest resolution for the OAK-D-PRO-POE-AF).

However, I tried several pipelines and found that the fps is not always 60.

Could you help me to check if it’s expected or I did something wrong?

You can find each scenario’s test code here:

  • Scenario 1:cam_rgb.video.link(xout_rgb.input), use video as input, and the FPS is around 30 only. 🥲

  • Scenario 2: cam_rgb.preview.link(xout_rgb.input), a very simple RGB preview, fps is around 60. ✅

  • Scenario 3: Save encoded video stream into mp4 container, basically follow the example here, and the result video.mp4 is 60 fps.(I verify with following command ffprobe -v error -select_streams v:0 -count_packets -show_entries stream=avg_frame_rate -of default=noprint_wrappers=1:nokey=1 ./video.mp4, and the result is: 250000000/4163437 = 60 fps) ✅

  • Scenario 4: similar to scenario 3 as I still save encoded video stream into mp4 container, but I add mono left, mono right camera and also create two XLinkOuts to preview their streamings. The video.mp4 is around 35 fps only 🥲, and on left mono camera preview window I can see its fps is around 30. 🥲

  • Scenario 5: similar to scenario 4 but I add stereo_depth after left mono and right mono cameras. The video.mp4 is around 40 fps only 🥲, and on depth preview window the fps is around 30. 🥲

My desired use case:

  • Higher fps, resolution doesn't need to be too high (I know that OAK-D Pro PoE OV9782 should better fit my use case 😆)

  • Ability to save video only when an object is detected, for example: save 5 seconds before AND after when a person is detected walking by

  • Need to determine the depth of this object

If you have any better suggestions for pipeline design, please let me know!

    jakaskerl
    I have lowered the image size to 1080p on RGB camera, but the question is, if I add a StereoDepth node in pipeline, fps of encoded color video H.265 will reduce from 60 fps to 30 fps.

    Is it expected?

      YunyaHsu
      Issue is that for 1Gbps ETH connection, you have about 900megabits of bandwidth.

      1080P NV12/YUV420 frames: 1920 * 1080 * 1.5 * 30fps * 8bits = 747 mbps (when encoded this is about half)
      400P depth frames: 640 * 400 * 2 * 30fps * 8bits = 123 mbps

      If you want higher FPS, you essentially need more bandwidth or have smaller frames.

      YunyaHsu Ability to save video only when an object is detected, for example: save 5 seconds before AND after when a person is detected walking by

      Don't send the streams back to host (or only send back small preview frame). This will enable you to run the pipeline at higher FPS.
      Then use YoloSpatialDetectionNetwork to only send back the depth of the object when it is detected (once the recording is triggered) .

      Thanks,
      Jaka

      @jakaskerl

      Thanks, I remove all detection networks and update pipeline as following:

      RGB camera (1080P) ---video---> video encoder (h.265) ---bitstream---> xout
      Mono Left & Mono Right camera (both 400P) ---out---> StereoDepth ---depth---> xout
      • set fps = 35 on both mono and RBG cameras, it works as expected
      • set fps = 40, the real fps is limited at 30 only

      If my understanding correct, when fps is at 40 the total bandwidth should be around 662 mbps, so I should like to know why I can not receive encoded frame and depth frame as expected?

      1080P NV12/YUV420 frames: 1920 \* 1080 \* 1.5 \* 40fps \* 8bits \* 0.5(when encoded this is about half)  = 498 mbps
      400P depth frames: 640 \* 400 \* 2 \* 40fps \* 8bits = 164 mbps

      Additionally, during testing, there was a phenomenon I didn't understand. I removed the RGB camera and video encoder nodes, and wanted to test the highest acceptable fps with only the StereoDepth node.
      The pipeline is:

      Mono Left & Mono Right camera (both 400P) ---out---> StereoDepth ---depth---> xout
      • set fps = 30, 40, 50, it works as expected
      • set fps = 60, the real fps is lowered to around 50, but the bandwidth in this case should be 246 mbps only.

        YunyaHsu
        Run the scripts with DEPTHAI_DEBUG=1 variable. Check if CPU usage is very high (98+%). This could prevent the sending of packets as fast as possible.

        Thanks,
        Jaka

        @jakaskerl
        When running RGB camera + video encoder only and set fps at 60, CPU usage is around 68%, which is good and received fps is also around 60 👍🏻.

        If I add stereo depth node, means running RGB camera + video encoder + 2 mono cameras + stereo depth and set fps at 40, CPU usage is pretty high, around 94% - 95%, so I think it's the root cause.
        However, according to the specifications, OAK-D-PRO-POE-AF's RGB camera should have a maximum capability of 60 fps, and the mono cameras can even reach up to 120 fps. If CPU limitations are restricting us to only 35 fps, the observed performance doesn't quite align with the camera's specified capabilities IMHO.
        Any suggestion to increase the fps?

        [18443010D15F9D0F00] [192.168.0.137] [29.597] [system] [info] Memory Usage - DDR: 63.01 / 333.28 MiB, CMX: 2.41 / 2.50 MiB, LeonOS Heap: 64.56 / 81.76 MiB, LeonRT Heap: 4.93 / 39.90 MiB / NOC ddr: 1408 MB/s
        [18443010D15F9D0F00] [192.168.0.137] [29.597] [system] [info] Temperatures - Average: 50.69C, CSS: 52.46C, MSS 49.59C, UPA: 49.81C, DSS: 50.92C
        [18443010D15F9D0F00] [192.168.0.137] [29.597] [system] [info] Cpu Usage - LeonOS 94.33%, LeonRT: 23.64%
        [18443010D15F9D0F00] [192.168.0.137] [30.598] [system] [info] Memory Usage - DDR: 63.01 / 333.28 MiB, CMX: 2.41 / 2.50 MiB, LeonOS Heap: 64.56 / 81.76 MiB, LeonRT Heap: 4.93 / 39.90 MiB / NOC ddr: 1413 MB/s
        [18443010D15F9D0F00] [192.168.0.137] [30.598] [system] [info] Temperatures - Average: 50.30C, CSS: 52.90C, MSS 48.92C, UPA: 49.14C, DSS: 50.25C
        [18443010D15F9D0F00] [192.168.0.137] [30.598] [system] [info] Cpu Usage - LeonOS 94.97%, LeonRT: 23.55%
        [18443010D15F9D0F00] [192.168.0.137] [31.600] [system] [info] Memory Usage - DDR: 63.01 / 333.28 MiB, CMX: 2.41 / 2.50 MiB, LeonOS Heap: 64.56 / 81.76 MiB, LeonRT Heap: 4.93 / 39.90 MiB / NOC ddr: 1420 MB/s
        [18443010D15F9D0F00] [192.168.0.137] [31.600] [system] [info] Temperatures - Average: 50.20C, CSS: 52.02C, MSS 49.14C, UPA: 49.37C, DSS: 50.25C
        [18443010D15F9D0F00] [192.168.0.137] [31.600] [system] [info] Cpu Usage - LeonOS 94.03%, LeonRT: 24.51%

        Another test with two mono cameras + stereo depth and set fps at 60, the CPU usage is around 80% to 85% only, but the received fps is still lower than 60, do you know why?

          YunyaHsu
          It's generally either:

          • ISP limitation (500MP/s)
          • Node processing
          • Bandwidth (especially on POE)
          • Host side loop (writing the stream to a mp4 file is costly)

          I'd say best to try running the pipeline with DEPTHAI_LEVEL=TRACE, this will tell you how long each operation takes. Then you can compare with the FPS you are getting and see if it makes sense. Stereo and encoding will probably take the most time, then image acquisition if high res is used.

          I doubt it is host side that is the problem, but to check, you simply time the while true loop.

          Thanks,
          Jaka

          @jakaskerl
          I suspect the stereo depth node is our primary bottleneck.
          My pipeline consists of "2 mono cameras (400P resolution) + stereo depth". I tested it at 20, 40, 80, and 120 fps, running each configuration for approximately 30 seconds.
          The logs (including schema dump if relevant) suggest:

          1. The Mono ISP struggles to maintain the expected fps if I set it higher that 40 fps.
          2. Frame loss occurs in the 'Stereo rectification' stage at higher frame rates.

          Set fps at 20: received about 620 frames, real fps meet expected, CPU usage around 50 - 53%
          [MonoCamera(0)] [trace] Mono ISP took xxx: 623 times
          [MonoCamera(1)] [trace] Mono ISP took xxx: 623 times
          Stereo rectification took xxx: 623 times
          Stereo took xxx: 622 times
          'Median+Disparity to depth' pipeline took xxx: 622 times
          Stereo post processing xxx: 622 times
          Received message from device (depth): 623 times

          Set fps at 40: received only about 860 frames, so the real fps is about 27-28 only, CPU usage around 85 - 85%.
          [MonoCamera(0)] [trace] Mono ISP took xxx: 922 times <--- shouldn't be 1200?
          [MonoCamera(1)] [trace] Mono ISP took xxx: 926 times
          Stereo rectification took xxx: 872 times <--- decrease a bit
          Stereo took xxx: 869 times
          'Median+Disparity to depth' pipeline took xxx: 869 times
          Stereo post processing xxx: 869 times
          Received message from device (depth): 865 times

          Set fps at 80: received only about 586 frames, so the real fps is about 19-20 only, CPU usage around 91 - 95%
          [MonoCamera(0)] [trace] Mono ISP took xxx: 1,162 times <--- shouldn't be 2400?
          [MonoCamera(1)] [trace] Mono ISP took xxx: 1,084 times
          Stereo rectification took xxx: 597 times <--- decrease dramatically?
          Stereo took xxx: 594 times
          'Median+Disparity to depth' pipeline took xxx: 594 times
          Stereo post processing xxx: 594 times
          Received message from device (depth): 591 times

          When I set fps as 120 and let it run about 30 seconds, received only about 432 frames, real fps is around 14, CPU usage around 97 - 99%, pretty high.
          [MonoCamera(0)] [trace] Mono ISP took xxx: 1,578 times <--- shouldn't be 3600?
          [MonoCamera(1)] [trace] Mono ISP took xxx: 1,265 times
          Stereo rectification took xxx: 444 times <--- decrease dramatically as well?
          Stereo took xxx: 442 times
          'Median+Disparity to depth' pipeline took xxx: 442 times
          Stereo post processing xxx: 442 times
          Received message from device (depth): 437 times

            YunyaHsu
            If you checked the operation times you would see the max time taken for processing is by stereo (about 7.5ms) which is fine for 100FPS if you want. Tested the code on USB and got 100FPS no problem.

            import depthai as dai
            from FPS import FPS
            import time
            import numpy as np
            
            fps = 100
            
            pipeline = dai.Pipeline()
            pipeline.setXLinkChunkSize(0)
            
            
            mono_left = pipeline.create(dai.node.MonoCamera)
            mono_left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
            mono_left.setBoardSocket(dai.CameraBoardSocket.LEFT)
            mono_right = pipeline.create(dai.node.MonoCamera)
            mono_right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
            mono_right.setBoardSocket(dai.CameraBoardSocket.RIGHT)
            
            mono_left.setFps(fps)
            mono_right.setFps(fps)
            
            stereo_depth = pipeline.create(dai.node.StereoDepth)
            #stereo_depth.setOutputSize(mono_left.getResolutionWidth(), mono_left.getResolutionHeight())
            stereo_depth.setMedianFilter(dai.StereoDepthProperties.MedianFilter.KERNEL_7x7)
            
            
            mono_left.out.link(stereo_depth.left)
            mono_right.out.link(stereo_depth.right)
            
            xout_depth = pipeline.create(dai.node.XLinkOut)
            xout_depth.setStreamName("depth")
            # stereo_depth.depth.link(sc
            
            script = pipeline.create(dai.node.Script)
            script.setScript("""
            import time
            
            queue = node.io['depth']
            fps = 0
            start = time.monotonic()
            while True:
                msg = queue.get()
                fps += 1
                if fps % 100 == 0:
                    node.warn(f"{fps / (time.monotonic() - start)} fps")
                    start = time.monotonic()
                    fps = 0
            
            
            """)     
            stereo_depth.depth.link(script.inputs['depth'])
                        
            with dai.Device(pipeline) as device:
                start = time.time()
            
                while device.isPipelineRunning():
                    
                    pass

            In NETWORK bootloader mode the network stack is initialized on the same CPU as the rest of the pipeline. The drop in processing is significant enough to drop the framerate from 100FPS to about 60FPS without doing anything at all.
            Will have to be debugged in the FW (this resource hog is crazy too much).

            Thanks,
            Jaka

            @jakaskerl
            Thank you for the test code provided, I confirmed that when testing the same script on my OAK-D-PRO-POE-AF I also only got around 60 fps 😥.

            Could you provide an estimated timeframe for when I might expect a follow-up response or solution to this concern?
            As we need higher fps on RGB camera, our team is considering purchasing OAK-D Pro PoE OV9782 for the project. However, our decision heavily depends on the resolution of this NETWORK bootloader mode CPU consumption issue.
            Can you confirm whether this problem will be addressed, and if so, by when? This information is crucial for my purchase decision.

            Thank you in advance.

              YunyaHsu
              It's a difficult issue to solve. We would need to allocate out best engineers to the task and they are currently focused on RVC4 devices since they are currently a priority.

              So no ETA unfortunately. Do you absolutely need a POE device? Would USB not work?

              Thanks,
              Jaka

              @jakaskerl
              It's sad to hear that there's no clear timeline for resolving this issue.
              However, USB device is not our primary choice for this application. Our use case is expected to be outdoors, where the distance between the camera and the host computer is likely to exceed 5 meters, and possibly be much further.
              Given these circumstances, PoE devices remain our preferred option. The extended range and single-cable solution for both power and data make PoE cameras much more suitable for our outdoor deployment needs.

                YunyaHsu
                There is an option of using CM4 poe device which has the network stack on the RPI CM4 and the oak SOM only functions as a USB device.

                Thanks,
                Jaka

                @jakaskerl
                Thank you for providing the information. After checking, unfortunately the CM4 PoE device does not meet our requirements as we need a global shutter to avoid the jello effect that can occur during high-speed photography.