Running NN causes 4k encoding to drop frames

Nathan

Hi! I posted about an issue I'm seeing with encoding 4k@30fps on an Oak-1 PoE: the encoder can normally do about 28 fps, but it drops to about 20 fps once I start running a neural net to do object detection. From what I understand the NN should not be draining resources from the encoder, so perhaps I'm doing something wrong.

I put together some simplified code to show the problem:

#!/usr/bin/env python3

from pathlib import Path
import sys
import cv2
import depthai as dai
import numpy as np
from datetime import datetime
import time
import json
import sacn
import random
from pathlib import Path

# Get argument first
nnPath = str((Path(__file__).parent / Path('depthai-python/examples/models/mobilenet-ssd_openvino_2021.4_5shave.blob')).resolve().absolute())

if not Path(nnPath).exists():
    import sys
    raise FileNotFoundError(f'Required file/s not found, please run "{sys.executable} install_requirements.py"')

use_nn = True
preview_size = (300, 300)
sensor_resolution = dai.ColorCameraProperties.SensorResolution.THE_4_K
codec = dai.VideoEncoderProperties.Profile.H265_MAIN
file_extension = 'h265'
fps = 30

pipeline = dai.Pipeline()

camRgb = pipeline.create(dai.node.ColorCamera)
videoEncoder = pipeline.create(dai.node.VideoEncoder)
nn = None
if use_nn:
  nn = pipeline.create(dai.node.MobileNetDetectionNetwork)

videoOut = pipeline.create(dai.node.XLinkOut)
nnOut = None
if use_nn:
  nnOut = pipeline.create(dai.node.XLinkOut)

videoOut.setStreamName("h265")
if use_nn:
  nnOut.setStreamName("nn")

# Properties
camRgb.setBoardSocket(dai.CameraBoardSocket.RGB)
camRgb.setResolution(sensor_resolution)
camRgb.setPreviewSize(preview_size)
camRgb.setInterleaved(False)

videoEncoder.setDefaultProfilePreset(fps, codec)

if use_nn:
  nn.setConfidenceThreshold(0.5)
  nn.setBlobPath(nnPath)
  nn.setNumInferenceThreads(2)
  nn.input.setBlocking(False)

# Linking
camRgb.video.link(videoEncoder.input)
videoEncoder.bitstream.link(videoOut.input)
if use_nn:
  nn.out.link(nnOut.input)
  camRgb.preview.link(nn.input)

# Connect to device and start pipeline
print(datetime.now().strftime('%H:%M.%S.%f: Starting device'))
with dai.Device(pipeline) as device:

    print(datetime.now().strftime('%H:%M.%S.%f: Device started'))
    # Set debugging level
    #device.setLogLevel(dai.LogLevel.DEBUG)
    #device.setLogOutputLevel(dai.LogLevel.DEBUG)

    # Queues
    queue_size = 8
    qDet = None
    if use_nn:
      qDet = device.getOutputQueue("nn", queue_size)
    qRgbEnc = device.getOutputQueue('h265', maxSize=30, blocking=True)

    frameCount = 0
    frameStart = None

    while True:
        inDet = None
        if qDet:
            inDet = qDet.tryGet()

        while qRgbEnc.has():
            encFrame = qRgbEnc.get()
            # Ordinarily we might write this to a file
            frameCount += 1

        if not frameStart:
            frameStart = datetime.now()
            frameCount = 0
        if (datetime.now() - frameStart).total_seconds() > 10:
            print("Saw %s frames in 10 seconds" % frameCount)
            frameStart = datetime.now()
            frameCount = 0

        time.sleep(.01)

You can set use_nn = False to see the higher framerate when the neural net is disabled.

erik

Hi Nathan ,
I added ImageManip in between the camera and NN node - I believe NN was the bottleneck for the videoencoder. Could you try with the code below? I get this output:

14:20.35.434798: Device started
Saw 281 frames in 10 seconds
Saw 285 frames in 10 seconds
Saw 287 frames in 10 seconds
Saw 287 frames in 10 seconds
Saw 287 frames in 10 seconds
Saw 287 frames in 10 seconds
Saw 287 frames in 10 seconds
Saw 287 frames in 10 seconds
Saw 288 frames in 10 seconds
Saw 287 frames in 10 seconds

Thoughts?
Thanks, Erik

https://pastebin.com/WzsnhMxB

Nathan

Thank you Eric, that helps a ton! I'm seeing about 27 FPS with the imageManip change:

07:54.27.914642: Device started
Saw 258 frames in 10 seconds
Saw 266 frames in 10 seconds
Saw 271 frames in 10 seconds
Saw 270 frames in 10 seconds
Saw 269 frames in 10 seconds

I'll try your blobconverter usage (and maybe updating depthai) to see if it helps close the remaining gap.

Could you help me understand why this helps? In my mental model there were two streams coming out of the camera - a preview stream at 300x300 and a main video stream at 4k30 - and I thought the two would be independent. Why would the NN be a bottleneck for the other branch?

Also, I actually had an imageManip node between the camera and the NN in the code I simplified to show to you! I'll tinker with it to see if I can figure out why it wasn't having this same effect of removing the bottleneck.

erik

Hi Nathan ,
It's because preview is created from video frame (see here), so if ColorCamera can't produce new preview frames (its blocked), it also can't produce new video frames.
Thanks, Erik

Nathan

It looks like these two lines make the difference:

manip.inputImage.setBlocking(False)
manip.inputImage.setQueueSize(2)

So perhaps the NN was a blocking queue, but even then it's surprising to me that the camera wouldn't fill other queues when the NN preview queue was full.

Edit: but the NN had

nn.input.setBlocking(False)

I would have thought that this would have prevented blocking on this path.

Edit 2: on further testing, it's very specifically the queue depth of 2 that solves the problem. Any idea why this might be?

Nathan

Ok, so that explains why I'd need manip.inputImage.setBlocking(False) but I still don't understand why manip.inputImage.setQueueSize(2) is necessary. Does the queue continue blocking until I specify its size?

erik

Hi Nathan , I belive it's because only 2 preview frames are "accepted" by ImageManip, which means video frames can normally operate and be sent to videoencoder.
Thanks, Erik