Hi Aingu7ae ,
For AI-only evaluation, it might be easiest to use OpenVINOs Inference Engine (example here).
You could also stream video to the device and run inference on frames using depthai, example here. Note that video needs to be decoded on the host (I think openCV's VideoCapture already handles that). THoughts?
Thanks, Erik