Hello, I'm using an OAK-D Lite connected with an Odroid C4 to send UDP commands to a video player. Basically I want the user chooses between two videos: Video A starts when user make a pose (pose A) and Video B starts when user make a different pose (pose B). I'm setting up everything to work with user that can stand from 2.5 to 3 meters from the camera and ensure that the pose estimation work only on the person that is in this range. In my situation there will be more than one person and two or more people could be behind and close to the target user (spotted person in the distance range). At the moment my code: run the mobilenet ssd model to search people in the room with 300x300 cam preview when a person stand bettween 2.5 and 3 meters the program draw boundary box and display spatial information (using MobileNetSpatialDetectionNetwork node just like the demo in depthai Python API section at link: https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_mobilenet/ ) auto-crop frame based on the boundary box of the spotted person (like the code example at link: https://docs.luxonis.com/projects/api/en/latest/samples/StereoDepth/depth_crop_control/ ) and save the frame in a jpg file. open the jpg file containing the cropped frame using the mediapipe module running the pose estimator task model and display the annotated img result (like the mediapipe google demo: https://github.com/googlesamples/mediapipe/blob/main/examples/pose_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Pose_Landmarker.ipynb ) calculate the degree between right, left arm and vertical to search for a specific pose (based on the semaphore alphabet of geaxgx: https://github.com/geaxgx/depthai_blazepose/tree/main/examples/semaphore_alphabet ) Now it works also if there are 2 or more people in the cropped frame (if the target user is in front of other people), but there are moments where with this amount of people, the pose detector doesn't work well. I know this isn't a good solution because I'm not using depthai to run mediapipe pose model and I'm saving and reading images in a loop. If needed I can set the specific spot where the target user could be (drawing footprints on the floor) and say to other people to stay behind the spot. -What can I do to improve my application? (I mean to run my app faster and better) -Is there maybe a way to recognize human gesture without calculating degrees? -Can I run the mediapipe pose detector model directly on OAK device? How can I do that? -What can I do in your opinion to improve my project? Now it's work but I want to improve that and maybe one day add many gesture, not based on degrees, to do something more than change between two videos Thank you very much for your support and help.

Gesture Recognition with OAK-D Lite

Diego

Hello,

I'm using an OAK-D Lite connected with an Odroid C4 to send UDP commands to a video player.

Basically I want the user chooses between two videos: Video A starts when user make a pose (pose A) and Video B starts when user make a different pose (pose B).

I'm setting up everything to work with user that can stand from 2.5 to 3 meters from the camera and ensure that the pose estimation work only on the person that is in this range.

In my situation there will be more than one person and two or more people could be behind and close to the target user (spotted person in the distance range).

At the moment my code:

run the mobilenet ssd model to search people in the room with 300x300 cam preview
when a person stand bettween 2.5 and 3 meters the program draw boundary box and display spatial information (using MobileNetSpatialDetectionNetwork node just like the demo in depthai Python API section at link: https://docs.luxonis.com/projects/api/en/latest/samples/SpatialDetection/spatial_mobilenet/)
auto-crop frame based on the boundary box of the spotted person (like the code example at link: https://docs.luxonis.com/projects/api/en/latest/samples/StereoDepth/depth_crop_control/) and save the frame in a jpg file.
open the jpg file containing the cropped frame using the mediapipe module running the pose estimator task model and display the annotated img result (like the mediapipe google demo: https://github.com/googlesamples/mediapipe/blob/main/examples/pose_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Pose_Landmarker.ipynb)
calculate the degree between right, left arm and vertical to search for a specific pose (based on the semaphore alphabet of geaxgx: https://github.com/geaxgx/depthai_blazepose/tree/main/examples/semaphore_alphabet)

Now it works also if there are 2 or more people in the cropped frame (if the target user is in front of other people), but there are moments where with this amount of people, the pose detector doesn't work well.

I know this isn't a good solution because I'm not using depthai to run mediapipe pose model and I'm saving and reading images in a loop.

If needed I can set the specific spot where the target user could be (drawing footprints on the floor) and say to other people to stay behind the spot.

-What can I do to improve my application? (I mean to run my app faster and better)

-Is there maybe a way to recognize human gesture without calculating degrees?

-Can I run the mediapipe pose detector model directly on OAK device? How can I do that?

-What can I do in your opinion to improve my project? Now it's work but I want to improve that and maybe one day add many gesture, not based on degrees, to do something more than change between two videos

Thank you very much for your support and help.

jakaskerl

Diego What can I do to improve my application? (I mean to run my app faster and better)

Try to divide the processing between the device and the host. You will likely need a host to run inference in this case, since the RVC2 is not that powerful.
Saving the frames in jpg file is a major performance bottleneck since you are relying on the write speed of the ODROID. Not sure why you are not just passing the frames directly to mediapipe.

Diego Is there maybe a way to recognize human gesture without calculating degrees?

This way is probably the easiest and most efficient (apart from training a NN model to do that).

Diego Can I run the mediapipe pose detector model directly on OAK device? How can I do that?

You can, yes. It's already done in the blazepose if you run the script with --edge.
We also have some models here if you wish to play around to try and improve performance.

Thanks,
Jaka

Diego

Hi again,

thank you very much for your help.

I've tryed to follow your suggestions:

jakaskerl Try to divide the processing between the device and the host. You will likely need a host to run inference in this case, since the RVC2 is not that powerful.
Saving the frames in jpg file is a major performance bottleneck since you are relying on the write speed of the ODROID. Not sure why you are not just passing the frames directly to mediapipe.

I'm using Odroid C4 to run inference and I switch from saving jpg files to pass directly frame to mediapipe, but is again too slow for my purposes. I've changed the host with my laptop and in this case the latency is acceptable.

In my situation where I have to use mobilenet ssd in parallel with mediapipe pose detection can I use odroid C4 with good results, or the solution is to change the host?

Thanks again.

jakaskerl

Hi Diego
MobilenetSSD should be run at relatively high fps on device, so best to keep this on device. The mediapipe part of the processing should be done on host. But I'm not sure how performant the odroid is.
The easiest way to improve the fps would be to just send smaller frames back to the host. Another thing to try - i know blasedpose has an option to run on the gpu - maybe that would help the inference speed since c4 seems to have a gpu as well.

Thoughts?

Jaka

Diego

Hi jakaskerl

Thank you very much. I'm going to try to run MobilenetSSD on Oak D-Lite and mediapipe part of processing to run on odroid's gpu.

Thanks again.