Can not keep FPS at 60 on OAK-D-PRO-POE-AF

YunyaHsu · Sep 13, 2024

Hi!

Hi I am very new to computer vision and currently testing with OAK-D-PRO-POE-AF I got a few weeks ago, and tried to use it for image analysis and capture.

In my use case, I don't need a higher resolution but I require the highest possible fps, so I set the RGB camera to THE_1080_P (which, to my knowledge, is the lowest resolution for the OAK-D-PRO-POE-AF).

However, I tried several pipelines and found that the fps is not always 60.

Could you help me to check if it’s expected or I did something wrong?

You can find each scenario’s test code here:

Scenario 1：cam_rgb.video.link(xout_rgb.input), use video as input, and the FPS is around 30 only.
Scenario 2: cam_rgb.preview.link(xout_rgb.input), a very simple RGB preview, fps is around 60.
Scenario 3: Save encoded video stream into mp4 container, basically follow the example here, and the result video.mp4 is 60 fps.(I verify with following command ffprobe -v error -select_streams v:0 -count_packets -show_entries stream=avg_frame_rate -of default=noprint_wrappers=1:nokey=1 ./video.mp4, and the result is: 250000000/4163437 = 60 fps)
Scenario 4: similar to scenario 3 as I still save encoded video stream into mp4 container, but I add mono left, mono right camera and also create two XLinkOuts to preview their streamings. The video.mp4 is around 35 fps only , and on left mono camera preview window I can see its fps is around 30.
Scenario 5: similar to scenario 4 but I add stereo_depth after left mono and right mono cameras. The video.mp4 is around 40 fps only , and on depth preview window the fps is around 30.

My desired use case:

Higher fps, resolution doesn't need to be too high (I know that OAK-D Pro PoE OV9782 should better fit my use case )
Ability to save video only when an object is detected, for example: save 5 seconds before AND after when a person is detected walking by
Need to determine the depth of this object

If you have any better suggestions for pipeline design, please let me know!

jakaskerl · Sep 13, 2024

YunyaHsu
Bandwidth issue:
https://docs.luxonis.com/software/depthai/optimizing/#Bandwidth

Lower the image size if you wish to view (send back to host) the streams at high FPS.

Thanks,
Jaka

YunyaHsu · Sep 16, 2024

jakaskerl
I have lowered the image size to 1080p on RGB camera, but the question is, if I add a StereoDepth node in pipeline, fps of encoded color video H.265 will reduce from 60 fps to 30 fps.

Is it expected?

jakaskerl · Sep 16, 2024

YunyaHsu
Issue is that for 1Gbps ETH connection, you have about 900megabits of bandwidth.

1080P NV12/YUV420 frames: 1920 * 1080 * 1.5 * 30fps * 8bits = 747 mbps (when encoded this is about half)
400P depth frames: 640 * 400 * 2 * 30fps * 8bits = 123 mbps

If you want higher FPS, you essentially need more bandwidth or have smaller frames.

YunyaHsu Ability to save video only when an object is detected, for example: save 5 seconds before AND after when a person is detected walking by

Don't send the streams back to host (or only send back small preview frame). This will enable you to run the pipeline at higher FPS.
Then use YoloSpatialDetectionNetwork to only send back the depth of the object when it is detected (once the recording is triggered) .

Thanks,
Jaka

YunyaHsu · Sep 20, 2024

@jakaskerl

Thanks, I remove all detection networks and update pipeline as following:

RGB camera (1080P) ---video---> video encoder (h.265) ---bitstream---> xout
Mono Left & Mono Right camera (both 400P) ---out---> StereoDepth ---depth---> xout

set fps = 35 on both mono and RBG cameras, it works as expected
set fps = 40, the real fps is limited at 30 only

If my understanding correct, when fps is at 40 the total bandwidth should be around 662 mbps, so I should like to know why I can not receive encoded frame and depth frame as expected?

1080P NV12/YUV420 frames: 1920 \* 1080 \* 1.5 \* 40fps \* 8bits \* 0.5(when encoded this is about half)  = 498 mbps
400P depth frames: 640 \* 400 \* 2 \* 40fps \* 8bits = 164 mbps

Additionally, during testing, there was a phenomenon I didn't understand. I removed the RGB camera and video encoder nodes, and wanted to test the highest acceptable fps with only the StereoDepth node.
The pipeline is:

Mono Left & Mono Right camera (both 400P) ---out---> StereoDepth ---depth---> xout

set fps = 30, 40, 50, it works as expected
set fps = 60, the real fps is lowered to around 50, but the bandwidth in this case should be 246 mbps only.

jakaskerl · Sep 20, 2024

YunyaHsu
Run the scripts with DEPTHAI_DEBUG=1 variable. Check if CPU usage is very high (98+%). This could prevent the sending of packets as fast as possible.

Thanks,
Jaka

YunyaHsu · Sep 20, 2024

@jakaskerl
When running RGB camera + video encoder only and set fps at 60, CPU usage is around 68%, which is good and received fps is also around 60 .

If I add stereo depth node, means running RGB camera + video encoder + 2 mono cameras + stereo depth and set fps at 40, CPU usage is pretty high, around 94% - 95%, so I think it's the root cause.
However, according to the specifications, OAK-D-PRO-POE-AF's RGB camera should have a maximum capability of 60 fps, and the mono cameras can even reach up to 120 fps. If CPU limitations are restricting us to only 35 fps, the observed performance doesn't quite align with the camera's specified capabilities IMHO.
Any suggestion to increase the fps?

[18443010D15F9D0F00] [192.168.0.137] [29.597] [system] [info] Memory Usage - DDR: 63.01 / 333.28 MiB, CMX: 2.41 / 2.50 MiB, LeonOS Heap: 64.56 / 81.76 MiB, LeonRT Heap: 4.93 / 39.90 MiB / NOC ddr: 1408 MB/s
[18443010D15F9D0F00] [192.168.0.137] [29.597] [system] [info] Temperatures - Average: 50.69C, CSS: 52.46C, MSS 49.59C, UPA: 49.81C, DSS: 50.92C
[18443010D15F9D0F00] [192.168.0.137] [29.597] [system] [info] Cpu Usage - LeonOS 94.33%, LeonRT: 23.64%
[18443010D15F9D0F00] [192.168.0.137] [30.598] [system] [info] Memory Usage - DDR: 63.01 / 333.28 MiB, CMX: 2.41 / 2.50 MiB, LeonOS Heap: 64.56 / 81.76 MiB, LeonRT Heap: 4.93 / 39.90 MiB / NOC ddr: 1413 MB/s
[18443010D15F9D0F00] [192.168.0.137] [30.598] [system] [info] Temperatures - Average: 50.30C, CSS: 52.90C, MSS 48.92C, UPA: 49.14C, DSS: 50.25C
[18443010D15F9D0F00] [192.168.0.137] [30.598] [system] [info] Cpu Usage - LeonOS 94.97%, LeonRT: 23.55%
[18443010D15F9D0F00] [192.168.0.137] [31.600] [system] [info] Memory Usage - DDR: 63.01 / 333.28 MiB, CMX: 2.41 / 2.50 MiB, LeonOS Heap: 64.56 / 81.76 MiB, LeonRT Heap: 4.93 / 39.90 MiB / NOC ddr: 1420 MB/s
[18443010D15F9D0F00] [192.168.0.137] [31.600] [system] [info] Temperatures - Average: 50.20C, CSS: 52.02C, MSS 49.14C, UPA: 49.37C, DSS: 50.25C
[18443010D15F9D0F00] [192.168.0.137] [31.600] [system] [info] Cpu Usage - LeonOS 94.03%, LeonRT: 24.51%

Another test with two mono cameras + stereo depth and set fps at 60, the CPU usage is around 80% to 85% only, but the received fps is still lower than 60, do you know why?

jakaskerl · Sep 22, 2024

YunyaHsu
It's generally either:

ISP limitation (500MP/s)
Node processing
Bandwidth (especially on POE)
Host side loop (writing the stream to a mp4 file is costly)

I'd say best to try running the pipeline with DEPTHAI_LEVEL=TRACE, this will tell you how long each operation takes. Then you can compare with the FPS you are getting and see if it makes sense. Stereo and encoding will probably take the most time, then image acquisition if high res is used.

I doubt it is host side that is the problem, but to check, you simply time the while true loop.

Thanks,
Jaka

YunyaHsu · Sep 23, 2024

@jakaskerl
I suspect the stereo depth node is our primary bottleneck.
My pipeline consists of "2 mono cameras (400P resolution) + stereo depth". I tested it at 20, 40, 80, and 120 fps, running each configuration for approximately 30 seconds.
The logs (including schema dump if relevant) suggest:

The Mono ISP struggles to maintain the expected fps if I set it higher that 40 fps.
Frame loss occurs in the 'Stereo rectification' stage at higher frame rates.

Set fps at 20: received about 620 frames, real fps meet expected, CPU usage around 50 - 53%
[MonoCamera(0)] [trace] Mono ISP took xxx: 623 times
[MonoCamera(1)] [trace] Mono ISP took xxx: 623 times
Stereo rectification took xxx: 623 times
Stereo took xxx: 622 times
'Median+Disparity to depth' pipeline took xxx: 622 times
Stereo post processing xxx: 622 times
Received message from device (depth): 623 times

Set fps at 40: received only about 860 frames, so the real fps is about 27-28 only, CPU usage around 85 - 85%.
[MonoCamera(0)] [trace] Mono ISP took xxx: 922 times <--- shouldn't be 1200?
[MonoCamera(1)] [trace] Mono ISP took xxx: 926 times
Stereo rectification took xxx: 872 times <--- decrease a bit
Stereo took xxx: 869 times
'Median+Disparity to depth' pipeline took xxx: 869 times
Stereo post processing xxx: 869 times
Received message from device (depth): 865 times

Set fps at 80: received only about 586 frames, so the real fps is about 19-20 only, CPU usage around 91 - 95%
[MonoCamera(0)] [trace] Mono ISP took xxx: 1,162 times <--- shouldn't be 2400?
[MonoCamera(1)] [trace] Mono ISP took xxx: 1,084 times
Stereo rectification took xxx: 597 times <--- decrease dramatically?
Stereo took xxx: 594 times
'Median+Disparity to depth' pipeline took xxx: 594 times
Stereo post processing xxx: 594 times
Received message from device (depth): 591 times

When I set fps as 120 and let it run about 30 seconds, received only about 432 frames, real fps is around 14, CPU usage around 97 - 99%, pretty high.
[MonoCamera(0)] [trace] Mono ISP took xxx: 1,578 times <--- shouldn't be 3600?
[MonoCamera(1)] [trace] Mono ISP took xxx: 1,265 times
Stereo rectification took xxx: 444 times <--- decrease dramatically as well?
Stereo took xxx: 442 times
'Median+Disparity to depth' pipeline took xxx: 442 times
Stereo post processing xxx: 442 times
Received message from device (depth): 437 times

jakaskerl · Sep 23, 2024

YunyaHsu
If you checked the operation times you would see the max time taken for processing is by stereo (about 7.5ms) which is fine for 100FPS if you want. Tested the code on USB and got 100FPS no problem.

import depthai as dai
from FPS import FPS
import time
import numpy as np

fps = 100

pipeline = dai.Pipeline()
pipeline.setXLinkChunkSize(0)


mono_left = pipeline.create(dai.node.MonoCamera)
mono_left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
mono_left.setBoardSocket(dai.CameraBoardSocket.LEFT)
mono_right = pipeline.create(dai.node.MonoCamera)
mono_right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
mono_right.setBoardSocket(dai.CameraBoardSocket.RIGHT)

mono_left.setFps(fps)
mono_right.setFps(fps)

stereo_depth = pipeline.create(dai.node.StereoDepth)
#stereo_depth.setOutputSize(mono_left.getResolutionWidth(), mono_left.getResolutionHeight())
stereo_depth.setMedianFilter(dai.StereoDepthProperties.MedianFilter.KERNEL_7x7)


mono_left.out.link(stereo_depth.left)
mono_right.out.link(stereo_depth.right)

xout_depth = pipeline.create(dai.node.XLinkOut)
xout_depth.setStreamName("depth")
# stereo_depth.depth.link(sc

script = pipeline.create(dai.node.Script)
script.setScript("""
import time

queue = node.io['depth']
fps = 0
start = time.monotonic()
while True:
    msg = queue.get()
    fps += 1
    if fps % 100 == 0:
        node.warn(f"{fps / (time.monotonic() - start)} fps")
        start = time.monotonic()
        fps = 0


""")     
stereo_depth.depth.link(script.inputs['depth'])
            
with dai.Device(pipeline) as device:
    start = time.time()

    while device.isPipelineRunning():
        
        pass

In NETWORK bootloader mode the network stack is initialized on the same CPU as the rest of the pipeline. The drop in processing is significant enough to drop the framerate from 100FPS to about 60FPS without doing anything at all.
Will have to be debugged in the FW (this resource hog is crazy too much).

Thanks,
Jaka

YunyaHsu · Sep 24, 2024

@jakaskerl
Thank you for the test code provided, I confirmed that when testing the same script on my OAK-D-PRO-POE-AF I also only got around 60 fps .

Could you provide an estimated timeframe for when I might expect a follow-up response or solution to this concern?
As we need higher fps on RGB camera, our team is considering purchasing OAK-D Pro PoE OV9782 for the project. However, our decision heavily depends on the resolution of this NETWORK bootloader mode CPU consumption issue.
Can you confirm whether this problem will be addressed, and if so, by when? This information is crucial for my purchase decision.

Thank you in advance.

jakaskerl · Sep 24, 2024

YunyaHsu
It's a difficult issue to solve. We would need to allocate out best engineers to the task and they are currently focused on RVC4 devices since they are currently a priority.

So no ETA unfortunately. Do you absolutely need a POE device? Would USB not work?

Thanks,
Jaka

YunyaHsu · Sep 25, 2024

@jakaskerl
It's sad to hear that there's no clear timeline for resolving this issue.
However, USB device is not our primary choice for this application. Our use case is expected to be outdoors, where the distance between the camera and the host computer is likely to exceed 5 meters, and possibly be much further.
Given these circumstances, PoE devices remain our preferred option. The extended range and single-cable solution for both power and data make PoE cameras much more suitable for our outdoor deployment needs.

jakaskerl · Sep 25, 2024

YunyaHsu
There is an option of using CM4 poe device which has the network stack on the RPI CM4 and the oak SOM only functions as a USB device.

Thanks,
Jaka

YunyaHsu · Sep 27, 2024

@jakaskerl
Thank you for providing the information. After checking, unfortunately the CM4 PoE device does not meet our requirements as we need a global shutter to avoid the jello effect that can occur during high-speed photography.