OAK-D ProW PoE bandwidth and latency issues

hubbla · Sep 28, 2023

Hello!

I am currently developing an application with an OAK-D ProW PoE camera but I am experiencing bandwidth and latency issues. My end goal is to use 3 cameras at the same time.

I define my pipeline using

def create_luxonis_pipeline(
    depth_res: str = "800", 
    rgb_res: str = "800", 
    median_kernel: str = "5x5", 
    depth_alignment: bool = True,
    alpha: Optional[float] = None, 
) -> dai.Pipeline:
    """
    Create a Luxonis OAK-D pipeline for depth and RGB cameras.

    """
    # Validate depth_res
    if depth_res not in RES_MAP_DEPTH:
        raise ValueError(
            f"Invalid depth_res value: {depth_res}. Valid values are {', '.join(RES_MAP_DEPTH.keys())}"
        )

    # Validate rgb_res
    if rgb_res not in RES_MAP_RGB:
        raise ValueError(f"Invalid rgb_res value: {rgb_res}. Valid values are {', '.join(RES_MAP_RGB.keys())}")

    # Validate median_kernel
    if median_kernel not in MEDIAN_MAP:
        raise ValueError(
            f"Invalid median_kernel value: {median_kernel}. Valid values are {', '.join(MEDIAN_MAP.keys())}"
        )

    resolution_depth = RES_MAP_DEPTH[depth_res]
    resolution_rgb = RES_MAP_RGB[rgb_res]
    median = MEDIAN_MAP[median_kernel]

    pipeline = dai.Pipeline()
    
    camRgb = pipeline.create(dai.node.ColorCamera)
    camLeft = pipeline.create(dai.node.MonoCamera)
    camRight = pipeline.create(dai.node.MonoCamera)
    stereo = pipeline.create(dai.node.StereoDepth)
    
    camRgb.setFps(20)    
    camLeft.setFps(20)
    camRight.setFps(20)

    xoutRgb = pipeline.create(dai.node.XLinkOut)
    xoutRgb.input.setBlocking(False)
    xoutRgb.input.setQueueSize(1)

    xoutDepth = pipeline.create(dai.node.XLinkOut)
    xoutDepth.input.setBlocking(False)
    xoutDepth.input.setQueueSize(1)

    for monoCam in (camLeft, camRight):
        monoCam.setResolution(resolution_depth["res"])

    camRgb.setResolution(resolution_rgb["res"])
    camRgb.setPreviewSize(resolution_rgb["w"], resolution_rgb["h"])

    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    stereo.initialConfig.setMedianFilter(median)

    stereo.setRectifyEdgeFillColor(0)
    stereo.setLeftRightCheck(False)
    stereo.setExtendedDisparity(True)
    stereo.setSubpixel(False)
    
    if depth_alignment:
        stereo.setLeftRightCheck(True)
        stereo.setDepthAlign(dai.CameraBoardSocket.CAM_A)

    if alpha is not None:
        stereo.setAlphaScaling(alpha)
        config = stereo.initialConfig.get()
        config.postProcessing.brightnessFilter.minBrightness = 0
        stereo.initialConfig.set(config)

    xoutRgb.setStreamName("rgb")
    xoutDepth.setStreamName("depth")

    camRgb.preview.link(xoutRgb.input)
    camLeft.out.link(stereo.left)
    camRight.out.link(stereo.right)
    stereo.depth.link(xoutDepth.input)
    # pipeline.setXLinkChunkSize(0)

    return pipeline

Setup 1

The camera connected to the switch (https://www.tp-link.com/baltic/business-networking/omada-sdn-switch/tl-sg105pe/) is returning ~2 or less FPS in depthai_demo.py the script. When executing poe_test_script.py all tests return OK.

I tried to Wireshark the connection, and interestingly the frame length was constantly increasing (look image) and the received output had a very low frame rate and was delayed by ~13 seconds.

I tried to manually set the length of the frame to any number by doing

    openvino_version = dai.OpenVINO.Version.VERSION_2021_4
    config = dai.Device.Config()
    config.board.network.mtu = 9000
    config.board.sysctl.append("net.inet.tcp.path_mtu_discovery=0")
    config.board.sysctl.append("net.inet.tcp.rfc1323=0")
    config.board.network.xlinkTcpNoDelay = False # Default True
    config.board.sysctl.append("net.inet.tcp.delayed_ack=1") # configure sysctl settings. 0 by default.
    config.version = openvino_version
    dev = dai.Device(config, dev_info)
    
    device: dai.Device = stack.enter_context(dev)

    device.startPipeline(create_luxonis_pipeline())
    dic["rgb-" + mxid] = device.getOutputQueue(name="rgb",maxSize=1, blocking=False)
    dic["depth-" + mxid] = device.getOutputQueue(name="depth",maxSize=1, blocking=False)

But I still received frames like those in the image above, not following value specified in the config.

Setup 2
Camera connected to Ubiquiti Gigabit PoE Injector, same way of creating pipeline as before without adjusting config.board. In Wireshark I can see the same length of every frame (image below), the camera streams with 20 FPS in the demo code, and my scripts smoothly with acceptable latency.

To be completely honest I don't have any further debugging ideas. Could you please indicate to me what I did incorrectly? Is my switch a limitation (according to documentation, switching speed speed should be more then sufficient)? Why values configured manually are not being observed?

Best,

Hubert

jakaskerl · Sep 28, 2023

Hi hubbla
What guide did you use to set the settings for the board? I assume https://docs.luxonis.com/projects/hardware/en/latest/pages/guides/getting-started-with-poe/#advance-network-settings?

I see that you have set the MTU to 9000, which is supported by the switch, but this has to be set on the host as well, otherwise it won't work. Just confirming.

config.board.network.xlinkTcpNoDelay set to False (according to gpt) means that the TCP Nagle algorithm is enabled, which can introduce delays as it tries to bundle small packets into larger ones. This can introduce latency in real-time applications. I'm not sure how much of an effect that has, as I'm not knowledgeable enough about this matter, but it should be kept in mind just in case.

Some debugging ideas:

First, revert your MTU and other sysctl settings to their defaults. Simplify the setup to eliminate potential causes.
Use a tool like iperf to test the raw network throughput and latency between your Raspberry Pi (or whichever host you're using) and the OAK-D ProW PoE. This will give you a baseline network performance without DepthAI in the picture.
Ensure that your switch's firmware is updated. Sometimes, firmware updates can fix performance or compatibility issues.
If possible, test with a different PoE switch.

Hope this helps in any way,
Jaka

hubbla · Sep 29, 2023

Dear jakaskerl ,

The network looks ok.

Indeed MTU settings are problematic, but this does not fix the problem. The camera does not work with values different than default.

I used also this switch https://store.ui.com/us/en/products/us-xg-6poe, but there is no difference.

Interestingly with the PoE injector, it works.

Best,

Hubert

jakaskerl · Sep 29, 2023

Hi hubbla
That is very strange indeed. Does it make any difference connecting the switch to a DHCP server (like a router)?

robotaiguy · Oct 14, 2023

The packet size setting is being recognized by the OAK. You can see it in the size of those packets. your message overhead is 65 or 66 bits, and in fact, when you set it to 9000, it's probably just setting a toggle for "jumbo frames" or "standard frames" because you're pushing closer to 15,000 bits per packet. Even when you have it set to standard, you're pushing 2114 bits per packet - 66 bits for overhead = exactly 2kb per packet.
What's happening after that, as a result of the higher packet size, I am not sure, but this DOES seem like a class case of Bill the Cat! I"m hoping I'm not the only one old enough to get that reference…but what I mean is, ACK!
This is quite typically the response that gets seen when using jumbo packets in a highly dynamic traffic environment, rather than a singularly purposed datacenter file transfer server. In fact, I would bet that everything slows down for you, and this form page gets especially annoying as you're waiting for responses to clear and elements to load.
Nagle, himself, expressed frustration that the delayed ack and jumbo packets were often combined, but usually with disasterous results…but don't trust anything I say…take it from the guy himself, as you can read from the quote below. But I would first try to disable the jumbo packets, back to 1500, and leave ackdelay on. If it doesn't get better, switch them. We might also have yackack or no_delay options if other elements on your network require the jumbo packets…but usually they don't.

The idea of the Nagle algorithm was to prevent more than one undersized packet from being in transit at a time. The idea of delayed ACKs (which came from Berkeley) was to avoid sending a lone ACK for each character received when typing over a Telnet connection with remote echo, by waiting a fixed period for traffic in the reverse direction upon which the ACK could be piggybacked.

The interaction of the two algorithms is awful. If you do big send, big send, big send, that works fine. If you do send, get reply, send, get reply, that works fine. If you do small send, small send, get reply, there will be a brief stall. This is because the second small send is delayed by the Nagle algorithm until an ACK comes back, and the delayed ACK algorithm adds 0.5 second or so before that happens.

A delayed ACK is a bet. The TCP implementation is betting that data will be sent shortly and will make it unnecessary to send a lone ACK. Every time a delayed ACK is actually sent, that bet was lost. The TCP spec allows an implementation to lose that bet every time without turning off delayed ACKs. Properly, delayed ACKs should only turn on when a few unnecessary ACKs that could have been piggybacked have been sent in a row, and any time a delayed ACK is actually sent, delayed ACKs should be turned off again. There should have been a counter for this.

Unfortunately, delayed ACKs went in after I got out of networking in 1986, and this was never fixed. Now it's too late.

John Nagle

Share

Improve this answer

Follow

answered May 21, 2013 at 6:06

John Nagle79566 silver badges5

robotaiguy · Oct 14, 2023

And looking at your elapsed times for each frame (I should clarify that i'm referring to "packet frame", not image frame here…HUGE difference), when you're running with 1500 packet size, it's 49ms, but when you go up to 9K (actually 15k in your circumstance), you're seeing 325ms each. At least you're only tryign to push 1200x800, right? I"m assuming 800 signifies 800p? That's 960k pixels, and if you're using preview output for BGR pixel formatted frames to use with imshow, for instance, then we're talking about 24 bits per pixel, so about 2.9MB per frame, x 3 cameras? 8.7ishMB per frame x 20 fps = 174 MBps one way. That's 1.4Gbps right there. If you don't have a 10GBaseT uplink connection to the host, there's another source of delay.
Now, it would definitely seem attractive to split this up into 14kB chunks rather than 2kB chunks, but the problem comes with the TCP protocol's want to perfectly adhere to ack requirements….to keep perspective on this, we're generally talking abotu between a 1/4 and 1/2 a second delay for every misaligned ack. There's simply no possible way to recover from that.
Take my situationm, for instance. I'm streaming out full frame images at 12.3MP, but Ii'm compressing to 12 bits per pixel (NV12) and then I'm going further and compressing the Y and U channels to be semi-planar for transportation purposes, which brings that down to around maybe 9 bits per pixel average (I shouldn't even do this because YUV throughput is heavily dependant on the brightness of pixels in the image…at night, it's a rocket. But 12.3MP times 9 bits per pixel is 110Mb per frame, or 13.9MB per frame. Now, if I'm. using a switch with a 10Gb uplink and 1Gb camera ports, 10 fps would put me over the "theoretical" maximum, let alone the practical. So I have to employ even further specialized compression techniques, or lower my framerate expectations or lower resolution expectations, or break out my wallet in a pretty substantial way.
Even if we were only talking about theoretical maximums, the physics doesn't work out. And even if your network was the perfectly ideal candidate for jumbo frames, it's still only about a 6% increase in throughput…but if you're even possibly not perfectly ideal for the Nagle algorithm, the performance hit is unrecoverable. Let that sink in.

This is a god article that gives even more granularity towards your packet overhead and waht it's comprised of, and why it's showing as 66 bytes on your wireshark.
https://www.cablefree.net/wireless-technology/maximum-throughput-gigabit-ethernet/

robotaiguy · Oct 14, 2023

Now, if you can handle any noise in your system, and can manage your own QA of each frame, you can try transmiting over UDP instead of TCP. In that case, you can forget anything else I said about latency, because UDP requires NO acks at all. Jumbo frames FLIES with UDP. But, you have no way of knowing if your frames are going to come through or not. And if you start missing packet frames, your software may very well end up stitching packet frames from one image frame in with packet frames from another image frame. I know a little about this…see attached image, lol