Hi, I'm working on a problem that requires bidirectional comms between a host (android based application) and a luxonis camera over PoE. The application first needs to ID a user and then run a pose estimation based CVML application constantly on the image frames. We've been working with the RVC2 based cameras (Oak D PoE W)

@erik Suggested a possible way for 2-way comms using TCP that can allow for model switching in standalone mode. However, I was looking into the OAK-D CM4 PoE and the Oak 4 D cameras and thought of an alternative solutions arch that I'd love to get the community's feedback on -

Hardware Layer:

  • Android Tablet/App to OAK-D CM4 PoE via Rj45 ethernet.

  • OAK-D CM4 - image processing and Neural Network execution.

Software Architecture:

Android Tablet Side:

  • Android App: The main application running on the tablet

  • HTTP Client: Handles REST API calls for command and control operations

  • WebSocket Client: Manages real-time streaming and status updates

Camera Side:

  • CVML Python Application: Core CVML application that processes requests and controls the camera

  • Flask/FastAPI REST Server: Handles HTTP requests for camera operations

  • WebSocket Server: Manages real-time communication channel

  • DepthAI Library: Library that interfaces directly with the camera and NN hardware

Communication Flows:

  1. REST API (HTTP/JSON):

    • Used for discrete operations like capturing images or changing settings
    • Example endpoints: /camera/capture, /camera/id_object ,/camera/id_face
  2. WebSocket Protocol:

    • Used for streaming preview images from the camera to the Android application

    • Provides real-time video stream

    • Maintains a persistent connection for lower latency

Hi @RakshithSingh ,
So I think both OAK-D-PoE-W as well as CM4-POE/OAK4 would work here, you'd just be limited to Script node with the PoE-W (as it's only RVC2, no RPI host / RVC4).
Example below uses single TCP connection for 2way comms, and you could have some logic on top so some specific message would mean that Script node should switch from forwarding frames (similar to how its done here) from face recognition NN to mediapipe NN.
luxonis/depthai-experimentsblob/master/gen2-poe-tcp-streaming/poe-host-config-focus/oak.py#L33

So something like this:

I'd break down functionality into multiple Script nodes to reduce complexity inside one.

Thanks, @erik for your feedback. I will try to follow your approach for deploying in standalone mode with the RVC2-based PoE cameras. Does the solution arch I describe make sense for deploying a similar solution using the Oak 4 or OAK-D CM4 PoE cameras? Also, since these cameras can also be programmed to launch Python programs (something like sys daemons?) on startup, we won't need to flash the cameras, correct?

Hi @RakshithSingh , yep, it makes sense, and it would work in similar manner on CM4-POE / OAK4 cameras as well. Just wouldn't require so many Script nodes, and could run standard python on the Linux (with all the useful libraries).