• DepthAI-v2
  • decoding a depthi ai 265 stream needs 3 frames using gstreamer

Hi there dear Luxonis Community! šŸ˜

I’m transferring a video stream via TCP from one PC to another. I’m using gstreamer 1.24. The Stream gets encoded using to h265/hevc by the camera (hw) itself Rgb Encoding which makes it quite easy on compute-resources.

My goal is to decode that stream AS FAST as possible with the LOWEST possible latency.

Here is my decoding pipeline

tcpclientsrc host=robot-car port=8554 timeout=10
  ! application/x-rtp-stream
  ! rtpstreamdepay
  ! application/x-rtp,media=video,clock-rate=90000,encoding-name=H265
  ! rtph265depay
  ! queue
  ! h265parse config-interval=-1
  ! video/x-h265,stream-format=hvc1,alignment=au
  ! queue
  ! vah265dec qos=false
  ! video/x-raw(memory:DMABuf),drm-format=NV12
  ! queue max-size-buffers=1 leaky=1
  ! glimagesink sync=false qos=false

I analysed the gstreamer latency using the environment variables

GST_TRACERS="latency(flags=pipeline+element+reported)"
GST_DEBUG_FILE=trace.log
GST_DEBUG="1,GST_TRACER:7"

I'm able to detect that vah265dec is reporting and actually using 100ms for a framerate of 30/1. Which indicates to me, that the 100ms/3 frames-caching is happening by design.

Time: 0:00:09.549951361

Latency Statistics:

        0x55c61a9124d0.tcpclientsrc0.src|0x55c61a9023b0.glimagesinkbin0.sink: mean=0:00:00.100234675 min=0:00:00.050495205 max=0:00:00.121120488
        0x55c61a9124d0.tcpclientsrc0.src|0x55c61aaf81d0.sink.sink: mean=0:00:00.100317872 min=0:00:00.050545080 max=0:00:00.121196562

Element Latency Statistics:
        0x55c61aab37f0.capsfilter0.src: mean=0:00:00.000006225 min=0:00:00.000003737 max=0:00:00.000566180
        0x55c61aa24230.rtpstreamdepay0.src: mean=0:00:00.000018388 min=0:00:00.000011602 max=0:00:00.000102474
        0x55c61ab05ea0.capsfilter1.src: mean=0:00:00.000009136 min=0:00:00.000002785 max=0:00:00.000039395
        0x55c61a80ebf0.rtph265depay0.src: mean=0:00:00.000035421 min=0:00:00.000019036 max=0:00:00.001843301
        0x55c61a90bbd0.queue0.src: mean=0:00:00.000013551 min=0:00:00.000008907 max=0:00:00.000028313
        0x55c61aa337f0.h265parse0.src: mean=0:00:00.000118532 min=0:00:00.000057499 max=0:00:00.001058128
        0x55c61ab02100.capsfilter2.src: mean=0:00:00.000013271 min=0:00:00.000008676 max=0:00:00.000036559
        0x55c61a880640.queue1.src: mean=0:00:00.000020159 min=0:00:00.000008676 max=0:00:00.001672268
        0x55c61ab7d2e0.vah265dec0.src: mean=0:00:00.099842792 min=0:00:00.049800864 max=0:00:00.120715323
        0x55c61a6ff020.capsfilter3.src: mean=0:00:00.000017780 min=0:00:00.000007975 max=0:00:00.000036318
        0x55c61ab6b040.queue2.src: mean=0:00:00.000084689 min=0:00:00.000015750 max=0:00:00.000448497
        0x55c61a910a00.gluploadelement0.src: mean=0:00:00.000059767 min=0:00:00.000031881 max=0:00:00.000378325
        0x55c61a884b60.glcolorconvertelement0.src: mean=0:00:00.000011727 min=0:00:00.000008746 max=0:00:00.000036179
        0x55c61aae5390.glcolorbalance0.src: mean=0:00:00.000011701 min=0:00:00.000008977 max=0:00:00.000027973

Element Reported Latency:
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.451577889
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.451589761
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.451596213
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.451604469
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.451613756
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.451619778
        0x55c61aa337f0.h265parse0: min=0:00:00.033333333 max=0:00:00.000000000 ts=0:00:00.451625208
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.452703745
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.452719985
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.452726798
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.452732198
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.452737779
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.452743360
        0x55c61aa337f0.h265parse0: min=0:00:00.033333333 max=0:00:00.000000000 ts=0:00:00.452748509
        0x55c61ab02100.capsfilter2: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.452753499
        0x55c61a880640.queue1: min=5124095:34:33.676218283 max=0:00:00.000000000 ts=0:00:00.452758789
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.453004593
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.453018700
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.453027326
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.453034800
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.453055419
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.453063274
        0x55c61aa337f0.h265parse0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.453078442
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454061529
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454071748
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454077459
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454082459
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.454087879
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454097597
        0x55c61aa337f0.h265parse0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454106734
        0x55c61ab02100.capsfilter2: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454113537
        0x55c61a880640.queue1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.454120901
        0x55c61ab7d2e0.vah265dec0: min=0:00:00.100000000 max=0:00:00.000000000 ts=0:00:00.454126211
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525850890
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525866730
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525873834
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525879454
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.525885065
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525894312
        0x55c61aa337f0.h265parse0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525904942
        0x55c61ab02100.capsfilter2: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525912256
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525902348
        0x55c61a880640.queue1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525918067
        0x55c61ab7d2e0.vah265dec0: min=0:00:00.100000000 max=0:00:00.000000000 ts=0:00:00.525935610
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525935650
        0x55c61a6ff020.capsfilter3: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525947172
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525956400
        0x55c61ab6b040.queue2: min=5124095:34:33.609551616 max=0:00:00.000000000 ts=0:00:00.525961940
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525968623
        0x55c61a9023b0.glimagesinkbin0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525975436
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.525983551
        0x55c61a910a00.gluploadelement0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525988240
        0x55c61a884b60.glcolorconvertelement0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525995904
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.525999291
        0x55c61aa337f0.h265parse0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526016032
        0x55c61aae5390.glcolorbalance0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526003128
        0x55c61ab02100.capsfilter2: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526024538
        0x55c61a880640.queue1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526037162
        0x55c61ab7d2e0.vah265dec0: min=0:00:00.100000000 max=0:00:00.000000000 ts=0:00:00.526045558
        0x55c61a6ff020.capsfilter3: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526053062
        0x55c61ab6b040.queue2: min=5124095:34:33.609551616 max=0:00:00.000000000 ts=0:00:00.526060767
        0x55c61a9023b0.glimagesinkbin0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526068592
        0x55c61a9124d0.tcpclientsrc0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526069313
        0x55c61aab37f0.capsfilter0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526082778
        0x55c61a910a00.gluploadelement0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526076697
        0x55c61aa24230.rtpstreamdepay0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526089962
        0x55c61ab05ea0.capsfilter1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526096905
        0x55c61a884b60.glcolorconvertelement0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526099730
        0x55c61a80ebf0.rtph265depay0: min=0:00:00.000000000 max=99:99:99.999999999 ts=0:00:00.526102175
        0x55c61aae5390.glcolorbalance0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526108878
        0x55c61a90bbd0.queue0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526117704
        0x55c61aa337f0.h265parse0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526127473
        0x55c61ab02100.capsfilter2: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526132993
        0x55c61a880640.queue1: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526138263
        0x55c61ab7d2e0.vah265dec0: min=0:00:00.100000000 max=0:00:00.000000000 ts=0:00:00.526143343
        0x55c61a6ff020.capsfilter3: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526148242
        0x55c61ab6b040.queue2: min=5124095:34:33.609551616 max=0:00:00.000000000 ts=0:00:00.526153342
        0x55c61a9023b0.glimagesinkbin0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526158511
        0x55c61a910a00.gluploadelement0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526164553
        0x55c61a884b60.glcolorconvertelement0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526171486
        0x55c61aae5390.glcolorbalance0: min=0:00:00.000000000 max=0:00:00.000000000 ts=0:00:00.526177688

I'm currently trying very hard to shake that latency of 100ms introduced by the decoder down to ~0ms

Has anyone any idea WHY vah265dec is doing so. I also tried:

- avdec_h265

- libde265dec

- vaapih265dec

all with the same result.

I also tried a all-in-one pipeline, with no DepthAi involved

#!/usr/bin/env bash

set -e

(
  export GST_DEBUG="GST_TRACER:7"
  export GST_TRACERS='latency(flags=pipeline+element)'
  # export GST_TRACERS='latency(flags=pipeline+element)'
  export GST_DEBUG_FILE=trace.log
  # export GST_DEBUG="3"
  export GST_DEBUG_DUMP_DOT_DIR="$(pwd)"

  rm trace.log || true

  gst-launch-1.0 \
    videotestsrc num-buffers=30 is-live=true \
    ! video/x-raw,width=1280,height=720,framerate=10/1 \
    ! vah265enc b-frames=0 \
    ! h265parse \
    ! rtph265pay aggregate-mode=zero-latency config-interval=-1 pt=96 \
    ! rtpstreampay \
    ! application/x-rtp-stream \
    ! rtpstreamdepay \
    ! application/x-rtp,media=video,clock-rate=90000,encoding-name=H265 \
    ! rtph265depay \
    ! h265parse config-interval=1 \
    ! video/x-h265,stream-format=hvc1,alignment=au \
    ! vah265dec qos=false \
    ! fakesink sync=false qos=false
)

gst-stats-1.0 trace.log

This does NOT introduce any latencies.

So because ALL decoders introduce that latency of 3 frames, and other encoders DO NOT introduce that latency, I'm beginning to blame the h265 stream šŸ˜†

Here's my DepthAI code:

    pipeline = dai.Pipeline()

    # Define sources and output
    camRgb = pipeline.create(dai.node.ColorCamera)
    videoEnc = pipeline.create(dai.node.VideoEncoder)
    xout = pipeline.create(dai.node.XLinkOut)

    xout.setStreamName("h265")

    # Properties
    camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
    camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_720_P)
    camRgb.setFps(30)
    videoEnc.setDefaultProfilePreset(
        camRgb.getFps(), dai.VideoEncoderProperties.Profile.H265_MAIN
    )
    videoEnc.setNumBFrames(0)
    videoEnc.setBitrateKbps(10000)  # 10 Mbps
    videoEnc.setKeyframeFrequency(240)

    # Linking
    camRgb.video.link(videoEnc.input)
    videoEnc.bitstream.link(xout.input)

I disabled the BFrames, which is the only idea I had, why there might be a latency.

Anyone any other idea, on why the decoder needs to cache 3 frames/100ms to decode the DepthAI h265 stream?

The camera is a OAK-D S2

Any help is greatly appreciated, thanks for your time! 😊

Another thing. I'm having a very hard time to find any information on b-frames. To my knowledge B-frames are bi-directional frames where a frame can get it's compression from the previous and next frame. But what does for example videoEnc.setNumBFrames(3) actually mean? I believe the max value is 3, when you set it to -1 it becomes a bigger error.
Any information on what B-frames the encoder supports?

Thanks for your time šŸ™‚

    Markus

    Markus Any information on what B-frames the encoder supports?

    Setting 3 B-frames tells the encoder to use a structure where, after a P-frame, it will insert up to 3 B-frames before the next reference frame. This can increase compression efficiency because B-frames generally take up fewer bits, but they require reordering at the decoder side.

    You can test this yourself:

    videoEnc.out.link(xout.input)
    videoEnc.setNumBFrames(3)
    
    # Connect to device and start pipeline
    with dai.Device(pipeline) as device:
    
        # Output queue will be used to get the encoded data from the output defined above
        q = device.getOutputQueue(name="h265", maxSize=30, blocking=True)
        
        while True: 
            print(q.get().getFrameType())

    I've forwarded the latency to the team as it is a bit out of my scope.

    Thanks,
    Jaka

    a month later

    Thank you very much for the response. It was quite helpful šŸ˜

    I've forwarded the latency to the team as it is a bit out of my scope.

    Any news from the team? 😃

    Thanks again!

      Markus
      RVC4 is high priority at the moment. I don't think anyone had time to debug this further...