Concatenate left and right frames ?

apirrone

Hello,

Is it possible to concatenate two 1080p rgb frames using ImageManip ?

I saw this example but it seems overkill to use a neural network just to concatenate images 🙂

Thanks !

jakaskerl

Hi apirrone
Manip node is currently single input only, so no. You will have to resort to using a script node or doing concatenation host side.

Thanks,
Jaka

apirrone

Ok, thank you for the quick answer !

apirrone

Since I read that on your docs :

I tried to make a custom model for concatenating two images following this tutorial. I kept the 6 shaves as in this example but I don't know how to size it, I am still learning.

However, I need to concatenate 1080p images, which seems to be a problem memory wise.

Here is my test script, and the error below

import depthai as dai
import cv2

pipeline = dai.Pipeline()

camLeft = pipeline.create(dai.node.ColorCamera)
camLeft.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camLeft.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camLeft.setFps(30)
camLeft.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)

camRight = pipeline.create(dai.node.ColorCamera)
camRight.setBoardSocket(dai.CameraBoardSocket.CAM_D)
camRight.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRight.setFps(30)
camRight.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)

# Concatenation NN
nn = pipeline.create(dai.node.NeuralNetwork)
nn.setBlobPath("../utils/generate_models/models/concat_openvino_2022.1_6shave.blob")
nn.setNumInferenceThreads(2)

camLeft.isp.link(nn.inputs['left'])
camRight.isp.link(nn.inputs['right'])

nn_xout = pipeline.create(dai.node.XLinkOut)
nn_xout.setStreamName("concat")
nn.out.link(nn_xout.input)

device = dai.Device(pipeline)
q = device.getOutputQueue(name="concat", maxSize=30, blocking=False)

while True:
    concatIm = q.tryGet().getCvFrame()
    cv2.imshow("concat", concatIm)
    cv2.waitKey(1)

Results in :

RuntimeError: NeuralNetwork(2) - Out of memory while creating pool for resulting tensor. Number of frames: 8 each with size: 49766400B

Is it foolish of me to hope being able to work with two 1080p color images this way ? 🙂

I will try the script node option and see if it works better.

Thank you !

Antoine

jakaskerl

Hi apirrone
I think you should be using preview instead of ISP since ISP is in yuv420 format. Do you explicitly need RGB images? I'm not entirely sure how the memory limit it set, ill test it when I get to the office today.

Thanks,
Jaka

jakaskerl

import depthai as dai
import numpy as np
import cv2

pipeline = dai.Pipeline()

camLeft = pipeline.create(dai.node.ColorCamera)
camLeft.setBoardSocket(dai.CameraBoardSocket.RGB)
camLeft.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camLeft.setFps(30)
camLeft.setPreviewSize(1080, 1080)
camLeft.setInterleaved(False)
camLeft.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

# Concatenation NN
nn = pipeline.create(dai.node.NeuralNetwork)
nn.setBlobPath("../models/concat_openvino_2022.1_5shave.blob")
nn.setNumInferenceThreads(2)

camLeft.preview.link(nn.inputs['img1'])
camLeft.preview.link(nn.inputs['img2'])

nn_xout = pipeline.create(dai.node.XLinkOut)
nn_xout.setStreamName("concat")

nn.out.link(nn_xout.input)

# Pipeline is defined, now we can connect to the device
with dai.Device(pipeline) as device:

    qNn = device.getOutputQueue(name="concat", maxSize=4, blocking=False)
    shape = (3, 1080, 1080 * 2)

    while True:
        inNn = np.array(qNn.get().getData())
        print(inNn.shape)
        frame = inNn.view(np.float16).reshape(shape).transpose(1, 2, 0).astype(np.uint8).copy()

        cv2.imshow("Concat", frame)

        if cv2.waitKey(1) == ord('q'):
            break

apirrone

Hey @jakaskerl, thank you for your help.

Let me give you some more context :

I need two (ideally 1080p) RGB images because I am trying to use the camera for VR teleoperation. The idea behind concatenating the two images is that it makes things easier for the synchronization of the two images. I just encode a single stream containing both images, send it over the network and split it on the host side (that's the idea anyways, assuming the images are hardware synchronized).
I am using a OAK-FFC-4P currently with two IMX378 modules (we may want to add more in the future).

Using preview instead of ISP, and adding a few things that were in the script you sent last, I still get an out of memory error :

Traceback (most recent call last):
  File "<...>/test_concat_nn.py", line 41, in <module>
    with dai.Device(pipeline) as device:
RuntimeError: NeuralNetwork(2) - Out of memory while creating pool for resulting tensor. Number of frames: 8 each with size: 49766400B

Here is the current state of the script

pipeline = dai.Pipeline()

camLeft = pipeline.create(dai.node.ColorCamera)
camLeft.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camLeft.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camLeft.setFps(30)
camLeft.setPreviewSize(1920, 1080)
camLeft.setInterleaved(False)
camLeft.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)
camLeft.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)


camRight = pipeline.create(dai.node.ColorCamera)
camRight.setBoardSocket(dai.CameraBoardSocket.CAM_D)
camRight.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRight.setFps(30)
camRight.setPreviewSize(1920, 1080)
camRight.setInterleaved(False)
camRight.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)
camRight.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

# Concatenation NN
nn = pipeline.create(dai.node.NeuralNetwork)
nn.setBlobPath("../utils/generate_models/models/concat_openvino_2022.1_6shave.blob")
nn.setNumInferenceThreads(2)

camLeft.preview.link(nn.inputs['left'])
camRight.preview.link(nn.inputs['right'])

nn_xout = pipeline.create(dai.node.XLinkOut)
nn_xout.setStreamName("concat")
nn.out.link(nn_xout.input)

# Pipeline is defined, now we can connect to the device
with dai.Device(pipeline) as device:

    qNn = device.getOutputQueue(name="concat", maxSize=4, blocking=False)
    shape = (3, 1080, 1920 * 2)

    while True:
        inNn = np.array(qNn.get().getData())
        print(inNn.shape)
        frame = inNn.view(np.float16).reshape(shape).transpose(1, 2, 0).astype(np.uint8).copy()

        cv2.imshow("Concat", frame)

        if cv2.waitKey(1) == ord('q'):
            break

Maybe the issue comes from the way I built the nn ? this is the script I used

#! /usr/bin/env python3

from pathlib import Path
import torch
from torch import nn
import blobconverter


class CatImgs(nn.Module):
    def forward(self, left, right):
        return torch.cat((left, right), 3)


# Define the expected input shape (dummy input)
shape = (1, 3, 1920*2, 1080)
X = torch.ones(shape, dtype=torch.float32)

path = Path("out/")
path.mkdir(parents=True, exist_ok=True)
onnx_file = "out/concat.onnx"

print(f"Writing to {onnx_file}")
torch.onnx.export(
    CatImgs(),
    (X, X),
    onnx_file,
    opset_version=12,
    do_constant_folding=True,
    input_names=["left", "right"],  # Optional
    output_names=["output"],  # Optional
)

# No need for onnx-simplifier here

# Use blobconverter to convert onnx->IR->blob
blobconverter.from_onnx(
    model=onnx_file,
    data_type="FP16",
    shaves=6,
    use_cache=False,
    output_dir="./models",
    optimizer_params=[],
)

I get the following graph

Thank you very much

jakaskerl

Hi apirrone
The model shape looks wrong. Here's how I did it (but with square input instead of 16:9):

#! /usr/bin/env python3

from pathlib import Path
import torch
from torch import nn
import blobconverter

class CatImgs(nn.Module):
    def forward(self, img1, img2):
        return torch.cat((img1, img2), dim=3)

# Define the expected input shape (dummy input)
shape = (1, 3, 1080, 1080)
X = torch.ones(shape, dtype=torch.float32)

path = Path("out/")
path.mkdir(parents=True, exist_ok=True)
onnx_file = "out/concat.onnx"

print(f"Writing to {onnx_file}")
torch.onnx.export(
    CatImgs(),
    (X, X),
    onnx_file,
    opset_version=12,
    do_constant_folding=True,
    input_names = ['img1', 'img2'], # Optional
    output_names = ['output'], # Optional
)

# No need for onnx-simplifier here

# Use blobconverter to convert onnx->IR->blob
blobconverter.from_onnx(
    model=onnx_file,
    data_type="FP16",
    shaves=5,
    use_cache=False,
    output_dir="../models",
    optimizer_params=[]
)

apirrone

jakaskerl

Ah yes indeed ! I mixed up the output shape with the input shape …

Ok, it works now, but it is extremely slow (maybe 0.5fps, even in 720p)

As you may have guessed from my other post, I would ideally like to have the following pipeline, all in 1080p RGB 30fps:

rectify (unwarp) the left and right images
concatenate both images
encode the new image in a h26(4/5) stream

So that I can just grab the stream and not have to do any processing on the host computer, except from splitting the images in the end.

Do you think that this is at all feasible on OAK-D hardware ?

I guess I could settle for 720p and have two encoded streams (is that better or worse than encoding a single image but twice as big performance wise btw ?), would that be possible ?

Thank you very much !

Antoine

jakaskerl

Hi apirrone
I think the unwarping is going to cause problems here since it is the most demanding. 2x720p at 30fps is probably too much since the manip to my knowledge only has 1 available shave to work with. The rest should be fine if you get the manip to work.

Thanks,
Jaka