Hello,
I'm using an OAK-D Lite connected with an Odroid C4 to send UDP commands to a video player.
Basically I want the user chooses between two videos: Video A starts when user make a pose (pose A) and Video B starts when user make a different pose (pose B).
I'm setting up everything to work with user that can stand from 2.5 to 3 meters from the camera and ensure that the pose estimation work only on the person that is in this range.
In my situation there will be more than one person and two or more people could be behind and close to the target user (spotted person in the distance range).
At the moment my code:
run the mobilenet ssd model to search people in the room with 300x300 cam preview
when a person stand bettween 2.5 and 3 meters the program draw boundary box and display spatial information (using MobileNetSpatialDetectionNetwork node just like the demo in depthai Python API section at link: https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_mobilenet/)
auto-crop frame based on the boundary box of the spotted person (like the code example at link: https://docs.luxonis.com/projects/api/en/latest/samples/StereoDepth/depth_crop_control/) and save the frame in a jpg file.
open the jpg file containing the cropped frame using the mediapipe module running the pose estimator task model and display the annotated img result (like the mediapipe google demo: https://github.com/googlesamples/mediapipe/blob/main/examples/pose_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Pose_Landmarker.ipynb)
calculate the degree between right, left arm and vertical to search for a specific pose (based on the semaphore alphabet of geaxgx: https://github.com/geaxgx/depthai_blazepose/tree/main/examples/semaphore_alphabet)
Now it works also if there are 2 or more people in the cropped frame (if the target user is in front of other people), but there are moments where with this amount of people, the pose detector doesn't work well.
I know this isn't a good solution because I'm not using depthai to run mediapipe pose model and I'm saving and reading images in a loop.
If needed I can set the specific spot where the target user could be (drawing footprints on the floor) and say to other people to stay behind the spot.
-What can I do to improve my application? (I mean to run my app faster and better)
-Is there maybe a way to recognize human gesture without calculating degrees?
-Can I run the mediapipe pose detector model directly on OAK device? How can I do that?
-What can I do in your opinion to improve my project? Now it's work but I want to improve that and maybe one day add many gesture, not based on degrees, to do something more than change between two videos
Thank you very much for your support and help.