• How to decode h265 files or numpy array generated by depthai using ffmpeg

I want to decode h265 file or numpy array data generated by depthai camera using ffmpeg.

I tried to decode the h265 file or numpy array created by the depthai camera using ffmpeg, but it failed. If there is a method recommended by depthai or a method for decoding with ffmpeg, please let me know.


First, I will write the method I tried.

It was created by referring to the depthai example.

depthai -> extract h265 file

#!/usr/bin/env python3

import depthai as dai

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and output
camRgb = pipeline.create(dai.node.ColorCamera)
videoEnc = pipeline.create(dai.node.VideoEncoder)
xout = pipeline.create(dai.node.XLinkOut)

xout.setStreamName('h265')

# Properties
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_4_K)
videoEnc.setDefaultProfilePreset(30, dai.VideoEncoderProperties.Profile.H265_MAIN)

# Linking
camRgb.video.link(videoEnc.input)
videoEnc.bitstream.link(xout.input)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:

    # Output queue will be used to get the encoded data from the output defined above
    q = device.getOutputQueue(name="h265", maxSize=30, blocking=True)
    count = 0
    # The .h265 file is a raw stream file (not playable yet)
    with open('video.h265', 'wb') as videoFile:
        print("Press Ctrl+C to stop encoding...")
        try:
            while True:
                h265Packet = q.get()  # Blocking call, will wait until a new data has arrived
                frame = h265Packet.getData().tofile(videoFile)
                # save hevc files
                with open(f'output{count}.hevc', 'wb') as video_file:
                    video_file.write(frame)
                    count+=1
        except KeyboardInterrupt:
            # Keyboard interrupt (Ctrl + C) detected
            pass

    print("To view the encoded data, convert the stream file (.h265) into a video file (.mp4) using a command below:")
    print("ffmpeg -framerate 30 -i video.h265 -c copy video.mp4")
  • save h265 file
# save hevcfiles
with open(f'output{count}.hevc', 'wb') as video_file:
    video_file.write(frame)
    count+=1

python ffmpeg

However, only i-frames were decoded.

import numpy as np
import subprocess as sp
import time
import sys

a = []

for i in range(177):
    input_file = f'images/output{i}.hevc'

    ffmpeg_cmd = [
            'ffmpeg',
            '-y',
            '-i', input_file,
            '-c:v', 'hevc',
            '-pix_fmt', 'yuv420p',
            '-f', 'rawvideo',
            '-analyzeduration', '100M',
            '-probesize', '100M',
            '-'
    ]

    process = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE)
    raw_data, _ = process.communicate()
    if len(raw_data) > 0:
        a.append(len(raw_data))
print(a)

decode result

  • decode i frame
ffmpeg version n5.0.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --enable-nonfree --enable-cuda-nvcc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-gpl --enable-libx265 --enable-shared
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
Input #0, hevc, from 'images/output150.hevc':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: hevc (Main), yuv420p(tv, bt470bg), 960x520, 30 tbr, 1200k tbn
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> hevc (libx265))
Press [q] to stop, [?] for help
x265 [info]: HEVC encoder version 3.2.1+1-b5c86a64bbbe
x265 [info]: build info [Linux][GCC 9.3.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-3 (Main tier)
x265 [info]: Thread pool created using 64 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 5 / wpp(9 rows)
x265 [warning]: Source height < 720p; disabling lookahead-slices
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing deblock sao
Output #0, rawvideo, to 'pipe:':
  Metadata:
    encoder         : Lavf59.16.100
  Stream #0:0: Video: hevc, yuv420p(tv, bt470bg, progressive), 960x520, q=2-31, 30 fps, 30 tbn
    Metadata:
      encoder         : Lavc59.18.100 libx265
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=    1 fps=0.0 q=31.6 Lsize=       8kB time=00:00:00.06 bitrate= 956.8kbits/s speed=0.606x
video:8kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
x265 [info]: frame I:      1, Avg QP:31.56  kb/s: 1401.60
x265 [info]: consecutive B-frames: 100.0% 0.0% 0.0% 0.0% 0.0%

encoded 1 frames in 0.07s (14.47 fps), 1401.60 kb/s, Avg QP:31.56
  • decode p, b frames
ffmpeg version n5.0.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --enable-nonfree --enable-cuda-nvcc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-gpl --enable-libx265 --enable-shared
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
[hevc @ 0x56192fa4eac0] Format hevc detected only with low score of 1, misdetection possible!
[hevc @ 0x56192fa4fe80] PPS id out of range: 0
    Last message repeated 1 times
[hevc @ 0x56192fa4fe80] Error parsing NAL unit #1.
[hevc @ 0x56192fa4eac0] Could not find codec parameters for stream 0 (Video: hevc, none): unspecified size
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
Input #0, hevc, from 'images/output174.hevc':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: hevc, none, 25 tbr, 1200k tbn
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> hevc (libx265))
Press [q] to stop, [?] for help
[hevc @ 0x56192fa55b40] PPS id out of range: 0
[hevc @ 0x56192fa55b40] Error parsing NAL unit #1.
Error while decoding stream #0:0: Invalid data found when processing input
Cannot determine format of input stream 0:0 after EOF
Error marking filters as finished

etc…

Strangely, when decoded using python av, it was decoded normally.

I would like to be able to decode it using ffmpeg if possible. To use nvidia cuda!

This may be helpful:

from depthai_sdk import OakCamera, RecordType

with OakCamera() as oak:
    color = oak.create_camera('color', resolution='1080P', fps=20, encode='H265')


    # Synchronize & save all (encoded) streams
    oak.record([color.out.encoded], './', RecordType.VIDEO)
    # Show color stream
    oak.visualize([color.out.camera], scale=2/3, fps=True)

    oak.start(blocking=True)
ffmpeg started on 2023-07-03 at 14:45:19
Report written to "ffmpeg-20230703-144519.log"
Log level: 48
Command line:
"C:\\Users\\User\\anaconda3\\envs\\depthai\\Library\\bin\\ffmpeg.exe" -i color.mp4 -report
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with clang version 16.0.6
  configuration: --prefix=/d/bld/ffmpeg_1687155081971/_h_env/Library --cc=clang.exe --cxx=clang++.exe --nm=llvm-nm --ar=llvm-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-libdav1d --ld=lld-link --target-os=win64 --enable-cross-compile --toolchain=msvc --host-cc=clang.exe --extra-libs=ucrt.lib --extra-libs=vcruntime.lib --extra-libs=oldnames.lib --strip=llvm-strip --disable-stripping --host-extralibs= --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libopus --pkg-config=/d/bld/ffmpeg_1687155081971/_build_env/Library/bin/pkg-config
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'color.mp4'.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url color.mp4.
Successfully parsed a group of options.
Opening an input file: color.mp4.
[NULL @ 00000168E36FDCC0] Opening 'color.mp4' for reading
[file @ 00000168E36CF780] Setting default whitelist 'file,crypto,data'
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] ISO: File Type Major Brand: isom
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] Processing st: 0, edit list 0 - media time: 0, duration: 8714000
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 19.916667 0.005299
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 19.916667 0.005299
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 20.000000 0.002049
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 20.000000 0.002049
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 40.000000 0.008195
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 40.000000 0.008195
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 60.000000 0.018439
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 60.000000 0.018439
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 59.940060 0.002588
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] rfps: 59.940060 0.002588
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] Before avformat_find_stream_info() pos: 6167768 bytes read:37523 seeks:1 nb_streams:1
[hevc @ 00000168E370FA80] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 00000168E370FA80] Decoding VPS
[hevc @ 00000168E370FA80] Main profile bitstream
[hevc @ 00000168E370FA80] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 00000168E370FA80] Decoding SPS
[hevc @ 00000168E370FA80] Main profile bitstream
[hevc @ 00000168E370FA80] Decoding VUI
[hevc @ 00000168E370FA80] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 00000168E370FA80] Decoding PPS
[hevc @ 00000168E370FA80] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 00000168E370FA80] Decoding SEI
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000168E36FDCC0] After avformat_find_stream_info() pos: 9854 bytes read:70291 seeks:2 frames:1
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'color.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf60.3.100
  Duration: 00:00:08.71, start: 0.000000, bitrate: 5662 kb/s
  Stream #0:0[0x1](und), 1, 1/1000000: Video: hevc (Main) (hev1 / 0x31766568), yuv420p(tv), 640x480, 5658 kb/s, 19.97 fps, 20 tbr, 1000k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Successfully opened the file.
At least one output file must be specified
[AVIOContext @ 00000168E36BF440] Statistics: 70291 bytes read, 2 seeks