Zero worries of course. So likely the API would not work in a usable way over BT/WiFi as the BT/WiFi interface on the ESP32 is not meant for substantial throughput. The who ESP32 chip is architected more for IoT application - where most of the information that is sent/received is metadata in nature (so like JSON files at most, or more typically MQTT). Probably the highest throughput the OAK-D-IoT can support over WiFi is about 8mbps or so - so attempting to run the API over that would likely just never work right.
We don't currently have a high-throughput WiFi DepthAI device, although after learning the hard lesson of the mis-naming here, we have contemplated making one - as we have full TCP/IP support already working natively for DepthAI.
If we did that solution, it would likely still be relatively slow - likely 150mbps or so. Which means nearly all streams would have to be compressed.
So the why
of OAK-D-IoT was actually for this:
https://github.com/luxonis/depthai-hardware/issues/9
The idea is more this IoT application, but in this case, allowing a visually impaired person to get feedback from audio or some other (haptic, for example) about their surroundings - what is where, trajectories, etc. So the idea is exactly that of all the processing and API back/forth happens between DepthAI and the ESP32. And then just results like "door is at 3pm, 5 meters" or <activate wrist buzz pattern 5> are the outputs from the device.
So as you say - not for interacting at a high-data-rate way with a computer - but rather running totally standalone, generating insights as a result of spatial AI + CV, and then taking action with those insights (by driving an actuator, making noise, or sending some insight out over BT/WiFi).
So yes, the better name for "OAK-D-WiFi" should have been "OAK-D-Embedded" or "OAK-D-AIoT" or "OAK-D-IoT" as the whole purpose is about having some relatively-small, completely-self-contained spatial AI device.
Thoughts?
Thanks,
Brandon