- Edited
Around a year ago, I posted a question about using the SDK for human pose estimation (multiple persons). At that time, the answer was that multipose is not feasible. Since then, there are some new developments that give me hope, at least for our modest application. Here are some experiences from revisiting the topic.
- The Pi 5 is significantly faster than the Pi 4, which I'd used previously. It may be possible to use the OAK to detect persons and get distances, plus feed a subset of the same images to a neural net on the Pi 5 and somehow merge the results (yes, of course timing will degrade the fusion).
- Ultralytics has a Yolo pose estimation, which works for multiple persons. The quality isn't as good as the previous models I tried with DepthAI, but it might be good enough. I experimented with the Yolo 11 model on the Pi 5 and it consumed CPU between a third and half second for preprocess plus inference. Not great, but perhaps my application can still use this coarse sampling. I also tried their Yolo v8 pose model - it should be able to run on the OAK, yes? The model/net size appears to be 348x640. Update: I downsized model to 192x320 and got around 9 fps, with almost the same quality of pose estimation, and with multipose.
- OpenVino has a new API (2.0), and I couldn't get things working on the Pi 5; at least it's not obvious how to make Zoo models work with the new API. The way the Pi 5 is set up, it would be difficult to retrofit to a previous level of support. I did try geaxgx/openvino_movenet_multipose again, and after some fiddling to get the new 2.0 API calls in place, it ran on my Intel desktop; however it failed to run on the Pi 5 (with no crash, just silently not classifying images). I don't have any idea how to debug this. I think the openvino multipose is better than Yolo, if it could work.