Hi SadiaC
Yes, it's fully possible to do so. This would not include depthai though.
You would basically retrieve two frames, then stack them together using opencv, then run detection.
If you wish to speed up the process you could potentially run the detection on device already and then stitch the detections on host. But that would mean the middle edges wouldn't have any detections (if an object was to be centered between the two frames).
For the syncing part: https://docs.luxonis.com/hardware/platform/deploy/frame-sync/
Thanks,
Jaka