Integrating custom YOLO RVC4 model into PoE TCP streaming pipeline

BadriPratti

Hi all,

I’m working off the gen3/stream-manipulation/poe-tcp-streaming example in depthai-experiments (link below). The demo itself runs perfectly, but when I swap in my custom YOLO model (a .rvc4.tar.xz archive) I’m running into a few problems:

No detections even though my script launches and frames stream.

A minimal code snippet showing how to load/run a .rvc4.tar.xz YOLO archive in this pipeline would be helpful.

Tips on extracting and using the archive’s class labels and input shape would also be helpful

The TCP‐streaming Script node sometimes hangs when I call .get() on the NN output.

jakob

Hi @BadriPratti ,

No detections even though my script launches and frames stream.
A minimal code snippet showing how to load/run a .rvc4.tar.xz YOLO archive in this pipeline would be helpful.

Please consult the neural-networks/generic-example for a simple illustration of the neural network inference pipeline (you can also use it as a sanity check that your model is able to produce predictions).

Tips on extracting and using the archive’s class labels and input shape would also be helpful

You can use dai.NNArchive object to load the NN Archive. The required info can be obtained from the object itself (e..g by calling .getInputWidth(), .getInputHeight(), etc. on it)

The TCP‐streaming Script node sometimes hangs when I call .get() on the NN output.

What exactly do you mean by "hangs"? Does it freeze completely or just take some time to get the output?

BadriPratti

Hello,
Thank You for your fast response.
I have tried to apply the recommndation that you gave to the code but I am getting this error:

Also this is my current code that I modifed from the POE-TCP-Streaming. I tried to edit the oak.py into accpeting the .tar.xz. Also the tar.xz file that I have does work fine. Although the device isnt using any other application I am getting this error message.

'#!/usr/bin/env python3
from pathlib import Path
import depthai as dai
from utils.annotation_node import AnnotationNode
from utils.oak_arguments import initialize_argparser
from utils.scripts import get_client_script, get_server_script

_, args = initialize_argparser()
device = dai.Device(dai.DeviceInfo(args.device)) if args.device else dai.Device()
platform = device.getPlatform().name

MODEL_PATH = "/app/utils/yolov6n-r2-288x512.rvc4.tar.xz"
nn_archive = dai.NNArchive(MODEL_PATH)
W, H = nn_archive.getInputWidth(), nn_archive.getInputHeight()
CLASSES = nn_archive.getConfig().model.heads[0].metadata.classes

pipeline = dai.Pipeline()

cam = pipeline.create(dai.node.ColorCamera)
cam.setBoardSocket(dai.CameraBoardSocket.CAM_A)
cam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
cam.setPreviewSize(W, H)
cam.setInterleaved(False)
cam.setFps(args.fps_limit)

left  = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
right = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)
stereo = pipeline.create(dai.node.StereoDepth).build(
    left=left.requestOutput((W, H)),
    right=right.requestOutput((W, H)),
    presetMode=dai.node.StereoDepth.PresetMode.HIGH_DETAIL
)
stereo.setDepthAlign(dai.CameraBoardSocket.CAM_A)
stereo.setLeftRightCheck(True)
stereo.setRectification(True)
if platform == "RVC2":
    stereo.setOutputSize(W, H)

nn = pipeline.create(dai.node.SpatialDetectionNetwork).build(
    input     = cam,       # Camera node itself
    stereo    = stereo,    # Or None if you only want 2D
    nnArchive = nn_archive,
    fps       = args.fps_limit,
)
nn.setConfidenceThreshold(0.5)
nn.setBoundingBoxScaleFactor(0.7)

annot = pipeline.create(AnnotationNode).build(
    input_detections = nn.out,
    labels            = CLASSES,
)
nn.passthrough.link(annot.inputs["frame"])

video_enc = pipeline.create(dai.node.VideoEncoder)
video_enc.setDefaultProfilePreset(args.fps_limit, dai.VideoEncoderProperties.Profile.MJPEG)
annot.out_annotations.link(video_enc.input)

script = pipeline.create(dai.node.Script)
script.setProcessor(dai.ProcessorType.LEON_CSS)

video_enc.bitstream.link(script.inputs["frame"])
nn.out.link          (script.inputs["detection"])

for port in ("frame", "detection"):
    script.inputs[port].setBlocking(False)
    script.inputs[port].setMaxSize(1)

if args.mode.lower() == "client":
    script.setScript(get_client_script(args.address))
else:
    script.setScript(get_server_script())

script.outputs["control"].link(cam.inputControl)

with dai.Device(pipeline) as dev:
    print("Pipeline started on PoE TCP with custom YOLO archive.")
    dev.waitUntilPipelineRunning()
    # The host.py client will connect on port 5000 and parse:
    #   32-byte header + detection string + JPEG bytes
    while True:
        pass

jakob

Hi @BadriPratti,

not sure I understand exactly what you try to achieve. If trying to edit the oak.py to also run your custom model, I'd suggest you first tightly follow the original structure of the app and only change the pipeline at places where required (i.e. to insert a nn node with your custom model). The error you are describing occurs because the pipeline is not created properly (you seem to call dai.Pipeline() instead of dai.Pipeline(device) as is the case for the original app).

BadriPratti

Hi @jakob ,
What I’m trying to achieve is running a custom object detection model using the poe-tcp-streaming pipeline instead of the WebSocket-based approach. The main reason for this is to avoid relying on a special URL or link that needs to be manually pasted each time.

I’ve already successfully integrated the .tar.xz model into a different standalone script. Now, I’m working on merging that functionality into the poe-tcp-streaming app so the model runs directly through the PoE interface with TCP-based streaming. I hope this clarifies the goal I’m working towards.

Thanks!

jakob

Hi,

makes sense. Good luck with the merging and feel free to ask for more help if needed!

BadriPratti

Hi,

I believe I’ve successfully combined the object detection pipeline (using the .tar.xz model) with POE-TCP streaming. When I run:python3 host.py client <ip_address>
I do receive the camera stream on the host side, so it seems the code is running correctly. However, I’m not seeing any bounding boxes overlaid on the video—even though object detection appears to be active on the device.

Could this be a directory or model linking issue? Or is there anything else I might have missed in the pipeline setup?

Any suggestions or debugging tips would be appreciated!

Thanks!

`from pathlib import Path

import depthai as dai
from utils.oak_arguments import initialize_argparser
from utils.scripts import get_client_script, get_server_script
from utils.annotation_node import AnnotationNode

_, args = initialize_argparser()

device = dai.Device(dai.DeviceInfo(args.device)) if args.device else dai.Device()

with dai.Pipeline(device) as pipeline:
print("Creating pipeline...")
platform = pipeline.getDefaultDevice().getPlatformAsString()

nn_archive = dai.NNArchive("/app/utils/yolov6n-r2-288x512.rvc4.tar.xz")

if args.media_path:
    replay = pipeline.create(dai.node.ReplayVideo)
    replay.setReplayVideoFile(Path(args.media_path))
    replay.setOutFrameType(
        dai.ImgFrame.Type.BGR888p if platform == "RVC4" else dai.ImgFrame.Type.BGR888i
    )
    replay.setLoop(True)
    replay.setFps(args.fps_limit)
    replay.setSize(1920, 1440)

    cam_out = replay.out

else:
    available = device.getConnectedCameras()
    if len(available) < 3:
        raise ValueError(
            "Device must have 3 cameras (color, left and right) in order to run this experiment."
        )

    cam = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_A)
    cam_out = cam.requestOutput((1920, 1440), fps=args.fps_limit)

    left_cam  = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
    right_cam = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)

    stereo = pipeline.create(dai.node.StereoDepth).build(
        left= left_cam.requestOutput(nn_archive.getInputSize()),
        right= right_cam.requestOutput(nn_archive.getInputSize()),
        presetMode=dai.node.StereoDepth.PresetMode.HIGH_DETAIL,
    )
    stereo.setDepthAlign(dai.CameraBoardSocket.CAM_A)
    if platform == "RVC2":
        stereo.setOutputSize(*nn_archive.getInputSize())
    stereo.setLeftRightCheck(True)
    stereo.setRectification(True)

    nn = pipeline.create(dai.node.SpatialDetectionNetwork).build(
        input=cam,
        stereo=stereo,
        nnArchive=nn_archive,
        fps=float(args.fps_limit),
    )
    if platform == "RVC2":
        nn.setNNArchive(nn_archive, numShaves=7)
    nn.setBoundingBoxScaleFactor(0.7)

    annotation_node = pipeline.create(AnnotationNode).build(
        input_detections=nn.out,
        labels=nn_archive.getConfig().model.heads[0].metadata.classes,
    )

video_enc = pipeline.create(dai.node.VideoEncoder).build(
    cam_out,
    frameRate=args.fps_limit,
    profile=dai.VideoEncoderProperties.Profile.MJPEG,
)

script = pipeline.create(dai.node.Script)
script.setProcessor(dai.ProcessorType.LEON_CSS)

video_enc.bitstream.link(script.inputs["frame"])
script.inputs["frame"].setBlocking(False)
script.inputs["frame"].setMaxSize(1)

if args.mode == "client":
    script.setScript(get_client_script(args.address))
else:
    script.setScript(get_server_script())

if not args.media_path:
    script.outputs["control"].link(cam.inputControl)

print("Pipeline created.")
pipeline.run() `

jakob

Hi @BadriPratti ,

great to hear you managed to do the connection. From your model description, you seem to be using a simple yolov6n object detection model - is that correct? In that case, you might not need to use the SpatialDetectionNetwork but can simply use the DetectionNetwork or the ParsingNeuralNetwork from the depthai-nodes library (for a YOLO model, the two should produce equal results). Could you please try this out?

Any suggestions or debugging tips would be appreciated!

You can always try squeezing-in a debug node after your network and inspect it's outputs. For example, define:

class DebugHostNode(dai.node.HostNode):
    def __init__(self) -> None:
        super().__init__()
    def build(self, msg) -> "DebugHostNode":
        self.link_args(msg)
        return self
    def process(self, msg) -> None:
        # breakpoint() -> do your debugging here
        self.out.send(msg)

and connect it to your nn.out as follows:


nn = ...
nn_debug = pipeline.create(DebugHostNode).build(nn.out)
annotation_node = pipeline.create(AnnotationNode).build(
    input_detections=nn_debug.out,
    ...
)

BadriPratti

Sorry for the late response but I have tried to incorporate DetectionNetwork into the code `from pathlib import Path

import depthai as dai
from utils.oak_arguments import initialize_argparser
from utils.scripts import get_client_script, get_server_script

_, args = initialize_argparser()

device = dai.Device(dai.DeviceInfo(args.device)) if args.device else dai.Device()

with dai.Pipeline(device) as pipeline:
print("Creating pipeline...")
platform = pipeline.getDefaultDevice().getPlatformAsString()

# Load Detection model archive
nn_archive = dai.NNArchive("/app/utils/yolov6n-r2-288x512.rvc4.tar.xz")

# Handle either video replay or real camera input
if args.media_path:
    replay = pipeline.create(dai.node.ReplayVideo)
    replay.setReplayVideoFile(Path(args.media_path))
    replay.setOutFrameType(
        dai.ImgFrame.Type.BGR888p if platform == "RVC4"
        else dai.ImgFrame.Type.BGR888i
    )
    replay.setLoop(True)
    replay.setFps(args.fps_limit)
    replay.setSize(1920, 1440)

    cam = replay
    cam_out = replay.out

else:
    available = device.getConnectedCameras()
    if len(available) < 1:
        raise ValueError("Device must have at least 1 color camera.")

    cam = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_A)
    cam_out = cam.requestOutput((1920, 1440), fps=args.fps_limit)

detection = pipeline.create(dai.node.DetectionNetwork).build(
    input=cam,
    nnArchive=nn_archive,
    fps=float(args.fps_limit),
)
detection.setConfidenceThreshold(0.5)

video_enc = pipeline.create(dai.node.VideoEncoder).build(
    cam_out,
    frameRate=args.fps_limit,
    profile=dai.VideoEncoderProperties.Profile.MJPEG,
)

script = pipeline.create(dai.node.Script)
script.setProcessor(dai.ProcessorType.LEON_CSS)

video_enc.bitstream.link(script.inputs["frame"])
script.inputs["frame"].setBlocking(False)
script.inputs["frame"].setMaxSize(1)

if args.mode == "client":
    script.setScript(get_client_script(args.address))
else:
    script.setScript(get_server_script())

if not args.media_path:
    script.outputs["control"].link(cam.inputControl)

print("Pipeline created.")
pipeline.run()

But now the problem is when run python3 host.py client <IP ADD> no video stream pops up. ALso yes for now I am trying to test with a basic yolov6 model.
I am able to see that I am connected but once again no video stream. I tried to add the ParsingNetwork as well but I am getting error for that. Am i doing something wrong with my code.

jakaskerl

BadriPratti
Can you try running in peripheral mode with python3 main.py?
That could give more information on what is wrong.

Thanks
Jaka

KevinSani

Hello @BadriPratti were you able to run your own custom model in a normal program??? like this one
https://github.com/luxonis/depthai-ml-training/blob/main/training/train_detection_model.ipynb