Building a Multi-Camera Stitching and Detection Pipeline with DepthAI

Janez

When one camera isn’t enough, stitching multiple views into a single, seamless scene can dramatically improve situational awareness. Whether for robotics, surveillance, or wide-area monitoring, combining feeds from multiple cameras allows you to perceive more of the world - in real time and with neural network inference on top.

In this post, we’ll explore how the multiple-device-stitch-nn package lets you connect multiple cameras, align their views, and run on-device object detection across the stitched image.

Note that this example is intended as a conceptual demonstration rather than a production-ready implementation. It provides a foundation for users to extend and refine as needed.

From Multiple Cameras to One Unified View

The multiple-device-stitch-nn package automatically discovers and connects to multiple RVC2 or RVC4 devices on the same network (mixing device types isn’t supported). Once connected, it calculates a homography — a mathematical transformation that maps points from one camera’s image to another - allowing the system to align and blend images from all cameras into a single coherent view.

This homography is computed only once, at startup. From that point forward, all live frames are warped based on the established transformation. Because the system assumes that cameras remain static, recalculation isn’t typically needed - but if any camera moves, users can simply press "r” in the browser visualizer to trigger homography recalculation.

The result is a smooth, stitched panorama that combines multiple DepthAI camera feeds into one continuous stream.

Intelligent Detection Across the Stitched Image

Once the panoramic image is built, the package runs a YOLOv6-nano model on the stitched stream for real-time object detection.

Large stitched images can easily exceed the size that fits efficiently into a single inference pass. To solve this, the pipeline automatically tiles the image into smaller overlapping sections, performs inference on each tile independently, and then merges all detections back into the stitched coordinate space. The browser-based visualizer displays these detections live, overlaid on the combined camera view, with the ability to trigger recalibration instantly.

This design balances efficiency, accuracy, and usability - allowing multiple cameras to act as one intelligent vision system.

Practical Notes and Limitations

To get the best performance, keep in mind a few practical guidelines:

Cameras should be vertically aligned and have good field-of-view overlap to ensure reliable stitching.
The image order matters — specify which camera feed appears first, second, and so on from left to right.
For consistent results, use identical, well-calibrated cameras. Small differences in lens distortion or exposure can lead to noticeable misalignment in the stitched output.

These constraints ensure that the homography remains stable and that the final stitched image is geometrically and visually coherent.

How to run

Get this example running in your setup by following the README instructions on the Github example page.

DeveloperMONAMotors

How can the package multiple-device-stitch-nn be accessed? Can you share the github link? Thank you.

Janez

Hello @DeveloperMONAMotors , thanks for pointing this out, I added the link to the end of the post. The github repo is here: luxonis/oak-examplestree/main/tutorials/multiple-devices/multiple-device-stitch-nn

dicentlabrax

Hi,

Have you tested this with 4-6 cameras?

Janez

@dicentlabrax we haven't yet but please share your results if you get to test it with more cameras