Hi all, Total noob here. Im trying to vibe code through this. I’m working on a research project where we need to stream a live **stereo feed (left + right)** from an **OAK-D Pro PoE** into a host application (Unity, for XR development). Hardware setup: * 2 x OAK-D Pro PoE * PoE switch * Raspberry Pi 5 (Ethernet to camera via switch, Wi-Fi to laptop) * Laptop running Unity (VR) Architecture goal: * Use the Raspberry Pi as a bridge. (My uni laptop is giving me some WSL / user rights issues) * Capture the stereo streams (left + right) from the OAK-D Pro PoE. * Forward them to the host machine in a format suitable for real-time consumption in Unity. * No heavy on-device NN required, just reliable stereo video delivery. What I’m trying to understand: 1. What is the recommended DepthAI pipeline for streaming a stereo pair (left + right) from an RVC2 PoE device to host? 2. Is it better to use: * MonoCamera nodes for left/right? * StereoDepth node outputs? * ISP/video/preview paths for stereo RGB? 3. What transport format is recommended for real-time host consumption (XLinkOut raw frames, H264/H265 via VideoEncoder, etc.)? 4. Any best practices for low-latency stereo streaming from PoE devices? The end goal is a stable, low-latency stereo feed suitable for XR visualization and research experimentation. Any architectural guidance or example pipelines would be greatly appreciated. Thanks in advance.

Best way to stream stereo feed from OAK-D Pro PoE to host (Unity / XR use case)

Albertrk

Hi all,

Total noob here. Im trying to vibe code through this. I’m working on a research project where we need to stream a live stereo feed (left + right) from an OAK-D Pro PoE into a host application (Unity, for XR development).

Hardware setup:

2 x OAK-D Pro PoE
PoE switch
Raspberry Pi 5 (Ethernet to camera via switch, Wi-Fi to laptop)
Laptop running Unity (VR)

Architecture goal:

Use the Raspberry Pi as a bridge. (My uni laptop is giving me some WSL / user rights issues)
Capture the stereo streams (left + right) from the OAK-D Pro PoE.
Forward them to the host machine in a format suitable for real-time consumption in Unity.
No heavy on-device NN required, just reliable stereo video delivery.

What I’m trying to understand:

What is the recommended DepthAI pipeline for streaming a stereo pair (left + right) from an RVC2 PoE device to host?
Is it better to use:
- MonoCamera nodes for left/right?
- StereoDepth node outputs?
- ISP/video/preview paths for stereo RGB?
What transport format is recommended for real-time host consumption (XLinkOut raw frames, H264/H265 via VideoEncoder, etc.)?
Any best practices for low-latency stereo streaming from PoE devices?

The end goal is a stable, low-latency stereo feed suitable for XR visualization and research experimentation.

Any architectural guidance or example pipelines would be greatly appreciated.

Thanks in advance.

OskarSonc

Hey Albertrk - so firstly you have to decide whether to use Deptai v2 or v3. We have premade examples for both. I would suggest going with v3 if possible, since it's newer and will get more new updates.

For a live stereo pair, use mono cameras left/right and stream them directly, only use StereoDepth if you need rectified frames or disparity/depth. For Unity/XR over a Pi bridge, on‑device H.264 encoding is the most reliable low‑latency transport, raw frames are too heavy for Wi‑Fi. Start with 720p/30 and keep the pipeline minimal.

I would suggest you give your codex/claude link to our premade examples and our docs and start from there:
https://docs.luxonis.com/software-v3/depthai/examples/stereo_depth/stereo_depth/
https://docs.luxonis.com/software-v3/depthai/examples/video_encoder/video_encode/

Also you can check our Github where there are more examples:
luxonis/oak-examplestree/main/tutorials/camera-stereo-depth

Thanks,
Oskar