BlogDepthAI

DepthAI SDK Human Pose Estimation

We are excited to announce a new code example showcasing the incredible capabilities of DepthAI SDK library. With just 7 lines of source code, you can now accurately estimate and track human poses in real-time!

Our latest code example uses machine learning techniques to identify and track key points on the human body, enabling you to build applications and systems that can detect, analyze, and respond to human movement.

from depthai_sdk import OakCamera

with OakCamera() as oak:
    color = oak.create_camera('color')
    human_pose_nn = oak.create_nn('human-pose-estimation-0001', color)
    oak.visualize(human_pose_nn)
    oak.start(blocking=True)

Check out our demo video on YouTube, which showcases the power of this technology in action. You'll see how easy it is to get started with Human Pose estimation, and how quickly you can integrate this technology into your own projects.

To get started, first install the depthai-sdk

pip install depthai-sdk -U

Then run the code shared above!

Comments (18)

We are excited to announce a new code example showcasing the incredible capabilities of DepthAI SDK library. With just 7 lines of source code, you can now accurately estimate and track human poses in real-time!

Our latest code example uses machine learning techniques to identify and track key points on the human body, enabling you to build applications and systems that can detect, analyze, and respond to human movement.

from depthai_sdk import OakCamera

with OakCamera() as oak:
    color = oak.create_camera('color')
    human_pose_nn = oak.create_nn('human-pose-estimation-0001', color)
    oak.visualize(human_pose_nn)
    oak.start(blocking=True)

Check out our demo video on YouTube, which showcases the power of this technology in action. You'll see how easy it is to get started with Human Pose estimation, and how quickly you can integrate this technology into your own projects.

To get started, first install the depthai-sdk

pip install depthai-sdk -U

Then run the code shared above!

2 months later

when running the script above on RaspberryPi (just asis) the frame rate I am getting is very slow. (1-2 fps). However, when I am running rgb_mobilenet.py I am getting 35fps.

I am using OAK-1 and RaspberryPi 3B+.

Any Idea as to why is it so?

  • erik replied to this.

    Hi nsdo ,
    I'm fairly sure it's a problem with RPi - it's not fast enough to visualize such frames. It looks like NN (human-pose-estimation-0001) should run at about 8FPS on RVC2;

    Perhaps try to visualize passthrough instead of the whole frames, which should reduce computational load on RPI;

    from depthai_sdk import OakCamera
    
    with OakCamera() as oak:
        color = oak.create_camera('color')
        human_pose_nn = oak.create_nn('human-pose-estimation-0001', color)
        oak.visualize(human_pose_nn.out.passthrough, fps=True)
        oak.start(blocking=True)
    4 months later

    Hi, I tried the 7-line demo suggested, and it works (even on my RPi4) using an OAK-D. But as said above, it lags considerably wrt real time (maybe two or more seconds). I'm wondering how I can just get the passthrough, that is, only the landmarks and not the image (whole frame) data.

    Looking at https://github.com/luxonis/depthai-experiments/blob/master/gen2-human-pose/main.py we see that as the queue is processed, each item has a list of potentially 18 landmarks (?) containing (x,y) of each landmark. It's unclear to me how this works. I'd also like to obtain depth as well as the (x,y) and a label of the landmark for each item.

    However, looking deeper, there is code for heatmaps, and then

    probMap = cv2.resize(probMap, nm.inputSize) # (456, 256)

    keypoints = getKeypoints(probMap, 0.3)

    new_keypoints_list = np.vstack([new_keypoints_list, *keypoints])

    looks like maybe it is not so simple and bandwidth-limited as I imagined (otherwise why the cv2 call).

    Is this something that depthai_sdk can make easy, or should I just try working with gen2?

    • erik replied to this.

      Hi @erik Thanks for the pointers.

      By the way, when I run the human-pose-estimation-0001 using depthai_demo.py, it runs 4.4 FPS or better, and the depth disparity color is 22 FPS. In principle, extracting the data I need is feasible, it's a matter of coaxing it out of the software.

      Hi @erik: following up on your suggestion to use the API instead the just the SDK, I tried running the gen2-human-pose but ran into a dependency conflict when running install requirements.txt. Specifically, it says depthai-sdk 1.0.1 (which seems to be what these older gen2 demos want) specifies blobconverter 1.0.0. However another line in requirements.txt says blobconverter should be 1.3.0.

      I'm wondering what to tinker with, the requirements.txt line for blobconverter 1.3.0 or the implicit dependencies of depthai-sdk 1.0.1?

      I tried changing it from 1.3.0 to 1.0.0 in requirements.txt and the demo does run for a few seconds, but then gets a segfault in depthai.cpython-390aarch64-linux-gnu.so.

      Is there a better way of using the API than the way these older gen2 examples are coded?

      • erik replied to this.

        Hi TedHerman ,
        Could you share the full error? Seems like it's not relevant to blobconverter (shouldn't be, and version shouldn't matter either).
        That particular demo was developed with "old SDK", which is very out of date now, so I would rather look at geaxgx's demos (linked above), as they are in developed with API and are better (faster/more accurate) as well.

        Hi @erik I'll work on recreating the problem and saving the entire error message. I looked at the repo examples you reference, and they appear to be single person/pose. I would hope to have a multipose model, like https://github.com/geaxgx/openvino_movenet_multipose (which does not have depthai in the name). That's why I went back to gen2. But yes, faster/more accurate is preferable!

        Hello @erik ,

        I was unable to recreate the exact error on gen2 human pose, but instead got two other errors: see below. The problem is thus nondeterministic, perhaps some version problem that manifests as an overrun or bad pointer. I only got this on the RPi ; I briefly tried it on my desktop system and didn't encounter the problem. I doubt it's worth your time to investigate. My time would be better spent adapting depthai_blazepose (which worked brilliantly on the RPi) to the multipose model. I'll likely open another thread if I get stuck on that.

        Error 1:

        pi@raspberrypi:/media/pi/lux/depthai/depthai-experiments/gen2-human-pose $ python main.py -cam

        Available devices:

        [0] 14442C10B16CC9D200 [X_LINK_UNBOOTED]

        Starting pipeline...

        Stack trace (most recent call last) in thread 3546:

        #19 Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in

        #18 Object "/lib/aarch64-linux-gnu/libc.so.6", at 0x7fa6ee1c1b, in

        #17 Object "/lib/aarch64-linux-gnu/libpthread.so.0", at 0x7fa70c0647, in

        #16 Object "python", at 0x620baf, in

        #15 Object "python", at 0x620f3f, in

        #14 Object "python", at 0x4c6b4f, in

        #13 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

        #12 Object "python", at 0x498217, in _PyEval_EvalFrameDefault

        #11 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

        #10 Object "python", at 0x498217, in _PyEval_EvalFrameDefault

        #9 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

        #8 Object "python", at 0x4998b3, in _PyEval_EvalFrameDefault

        #7 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

        #6 Object "python", at 0x49c257, in _PyEval_EvalFrameDefault

        #5 Object "python", at 0x4c6cc7, in

        #4 Object "python", at 0x4a52ff, in _PyObject_MakeTpCall

        #3 Object "python", at 0x4cac53, in

        #2 Object "/home/pi/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f94d80b43, in

        #1 Object "/home/pi/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f94e5760b, in

        #0 Object "/home/pi/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f950ff60c, in

        Bus error (Invalid address alignment [0xe9])

        Bus error

        Error 2:

        pi@raspberrypi:/media/pi/lux/depthai/depthai-experiments/gen2-human-pose $ python main.py -cam

        Available devices:

        [0] 14442C10B16CC9D200 [X_LINK_UNBOOTED]

        Starting pipeline...

        Traceback (most recent call last):

        File "/media/pi/lux/depthai/depthai-experiments/gen2-human-pose/main.py", line 147, in <module>

        show(frame)

        File "/media/pi/lux/depthai/depthai-experiments/gen2-human-pose/main.py", line 121, in show

        A = np.int32(keypoints_list[index.astype(int), 1])

        IndexError: index 15 is out of bounds for axis 0 with size 15

        20 days later

        erik

        Per documentation on https://github.com/geaxgx/depthai_blazepose, the frame rate of lite model runs at 22 FPS with xyz enabled. With that limitation in mind, two questions for you ….

        1. If I setup another parallel pipeline that does nothing but forward frames to host, runs side by side to whats there currently, has no NN processing on that lane, would I be able to get a higher frame rates to the host from the new pipeline?
        2. Would the the additional pipeline have a negative effect on the current NN and reduce the frame rate?

        I think you may be able to see where I am trying to go with this. If we are limited to 22 FPS, I am wondering if I can use the landmarks to estimate where things are located on the raw source. I would be thrilled if I can get 40 FPS on the raw pipeline with the landmarks running at 22 ish giving me just one frame I would have to estimate on (assuming I can correlate images).

        Is that possible?

        --TY

        • erik replied to this.

          Hi Finpush ,

          1. You can only run 1 pipeline, so you'd need to implement this "frame streaming to host" inside the blazepose demo. Using queue/pool sizes/blocking behaviours (docs here) you can make it so it's streaming frames at 30FPS while doing NN at 20FPS.
          2. No, the difference would be negligible.

          Thanks, Erik

            erik

            If I understand you correctly, sounds like I should have a blocking queue before the NN and forward the frame to the host if the the NN is busy processing. I assume the logic for this will have to be on the script unless there is a native mechanism to do this. Rather than pepper you will questions, do you have an example you could point me to? Either way appreciate your response.

            Hi @Finpush
            See lines 30-37, and try to remove the increasing of pool size to see the difference:

            Hello @erik

            I need to use multipose so blazepose is not an option (by the way, I could not get blazepose's edge mode to work, though host mode works fine). Reading up about geaxgx/openvino_movenet_multipose it seems there isn't much hope of that multipose model working on OAKs because of an fp32/fp16 issue.

            Hence, I followed your suggestion to get the gen2 human pose working using the current API. This does work, however it seems rather inelegant to copy its landmark/draw code. Instead, my code imports two functions from depthai/resources/nn/human-pose-estimation-0001/handler.py, namely decode() and draw() -- both require a mock nnManager object. All of this works, but the setup is brittle and in fact, there are two different handler.py versions for human-pose-estimation-0001, one in the sdk. Was there some other way you had in mind?

            Hi @TedHerman ,
            I think you'd be much better off by using geaxgx's demos (blazepose/movenet) instead of the human-pose-estimation model supported in the SDK. Both from accuracy (better model) and speed (not running detection + pose estimation on every frame).

              erik Agree completely that geaxgx's demos run faster! Unfortunately, the only geaxgx demo with multipose is the one which does not run on the OAK. Sure, one can record video and then run movenet_multipose later, on the desktop, with the recorded video. However, I'm under a restriction that we can't record video (it's a privacy IRB thing). So I'm stuck with the one supported by the SDK.

              4 months later

              Hi can anyone advise me please.

              I have a Pi 3b running this DEPTHAI SDK HUMAN POSE ESTIMATION
              from depthai_sdk import OakCamera

              with OakCamera() as oak:

              color = oak.create_camera('color')
              
              human_pose_nn = oak.create_nn('human-pose-estimation-0001', color)
              
              oak.visualize(human_pose_nn)
              
              oak.start(blocking=True)

              but question is my pi freezes and its running very slow any idea how i can make it faster on processing real time.

              Hi @MackyleNaidoo
              The human pose estimation models are known to take a toll on the hardware since they require additional decoding and data handling in order to display results.
              I assume the Rpi is incapable of decoding the results fast enough.

              Here is the handler function: luxonis/depthaiblob/main/depthai_sdk/src/depthai_sdk/nn_models/human-pose-estimation-0001/handler.py

              To confirm, try running a different model to see if the FPS improves. If that is the case, your host is just not powerful enough.

              Thanks
              Jaka