T
TedHerman

TTedHerman · Dec 21, 2024

Thanks for the pointer @jakaskerl ! It wasn't clear to me from the script example whether or not the blob could be changed (reloaded). I did some experiments to learn more about how things work with DepthAI in my app.

I replaced "with dai.Device(pipeline) as device:" by two lines: device = dai.Device(pipeline) device.startPipeline()which does work
Then later, I tried a few other things: print("is pipeline running =",device.isPipelineRunning())

device.close()

print("closed =",device.isClosed()) and this sequence also works as expected.
Then I tried print(pipeline.serializeToJson()) which generates huge output, but interesting to see all the parameters that might be useful in future debugging/design.
The pipeline.serializeToJson() generates a segfault after device.close(), which I guess means the pipeline is destroyed after the device is closed.
Based on these experiments, a different plan (which I will test at some point) is to close the device, then rebuild a new pipeline with a different blob, and start that. If this works, there will likely be a couple second delay for the time to upload the new blob.

TTedHerman · Dec 19, 2024

How can I switch between two different models? That is, an application runs for some time using blob A, then at some point, it decides to switch to blob B. The crudest method is to reboot. Short of that, there could be some linux command to disconnect and reconnect the USB hosting the OAK? Or within DepthAI perhaps use unlink() and remove() to depopulate the pipeline, and then repopulate and use setBlobPath() with a different model?

I feel like this has already been answered in another thread and I couldn't find it.

TTedHerman · Nov 16, 2024

jakaskerl GPT didn't really help solving my openvino multipose problem. I did get the Yolo multipose running on the OAK. Some observations from that:

With the Ultralytics yolo tool, I scaled the NN to 320, but was unable to use other tools to convert to a blob (due to version/distro conflicts). Thanks Luxonis for your online converter - that did the trick.
As you suggested, I had to do all the decoding myself: the blob is not loaded as Yolo, nor do I get the benefit of automated post-processing, so my code has to filter through around 2K bounding boxes (including some hand-crafted NMS-like stuff). Decoding keypoints was easy.
My first version just read an mp4, then fed frames via XLinkIn to the NN. This had the advantage of making it easy to scale each frame to a square 320x320 shape, which is what the NN requires. Subsequently after decoding, it's simple to do reverse scaling and draw on the frame for display. The Yolo model is still not as good as the geaxgx openvino multipose, though perhaps good enough (fingers crossed).
My second version (still a work in progress) got images from the OAK, sent them to the NN, then read both NN output and preview output for decoding, drawing, and display. Here there are some design choices.
One way is to setPreviewSize to the square shape for the NN. This works pretty well, and the ideas from the first version carry over. The downside is that the OAK is using about half its sensor area for pose estimation: the display window is square.
Another way is to setPreviewSize to something like the aspect ratio of the Isp (wide view), wire the output to ImageManip, which changes the shape to square. This did work, the NN was estimating poses over the whole rectangular display. But rescaling bounding boxes and keypoints is a problem! We can't see what ImageManip is doing. Ultimately I did get something working, but I don't really understand why it works.
The result runs around 10fps, which is decent enough for what we need.

TTedHerman · Nov 9, 2024

Around a year ago, I posted a question about using the SDK for human pose estimation (multiple persons). At that time, the answer was that multipose is not feasible. Since then, there are some new developments that give me hope, at least for our modest application. Here are some experiences from revisiting the topic.

The Pi 5 is significantly faster than the Pi 4, which I'd used previously. It may be possible to use the OAK to detect persons and get distances, plus feed a subset of the same images to a neural net on the Pi 5 and somehow merge the results (yes, of course timing will degrade the fusion).
Ultralytics has a Yolo pose estimation, which works for multiple persons. The quality isn't as good as the previous models I tried with DepthAI, but it might be good enough. I experimented with the Yolo 11 model on the Pi 5 and it consumed CPU between a third and half second for preprocess plus inference. Not great, but perhaps my application can still use this coarse sampling. I also tried their Yolo v8 pose model - it should be able to run on the OAK, yes? The model/net size appears to be 348x640. Update: I downsized model to 192x320 and got around 9 fps, with almost the same quality of pose estimation, and with multipose.
OpenVino has a new API (2.0), and I couldn't get things working on the Pi 5; at least it's not obvious how to make Zoo models work with the new API. The way the Pi 5 is set up, it would be difficult to retrofit to a previous level of support. I did try geaxgx/openvino_movenet_multipose again, and after some fiddling to get the new 2.0 API calls in place, it ran on my Intel desktop; however it failed to run on the Pi 5 (with no crash, just silently not classifying images). I don't have any idea how to debug this. I think the openvino multipose is better than Yolo, if it could work.

TTedHerman · Nov 12, 2023

TedHerman Update: I used setResizeThumbnail() instead of setResize() on the ImageManip. This was following a suggestion by @JanCuhel for another topic. It had the effect of getting detections in the full FOV, though the detection bounding boxes were off on the display window. To fix that problem, the code now scales according to RGB preview width (1280). Weirdly, this same scaling had to be used on both x and y axes. Not sure I understand why this works.

TTedHerman · Nov 11, 2023

JanCuhel I never managed to get yolo2openvino working, even when I tried on the desktop. Creating an environment with Python 3.6, plus installing all the requirements got stuck on a large compile/build step. Having already sunk some hours into this, I realized that finding a ready-made Docker container is likely a better option, but even that would be speculative as to whether it would work.

TTedHerman · Nov 11, 2023

Matija The answer turned out to be that the last step after going through all the epochs does does optimization which reduces the size of the .pt file. The intermediate .pt files are all significantly larger.

TTedHerman · Nov 11, 2023

JanCuhel I did manage to train a YoloV7 network and it works pretty well: excellent Fps and reasonably good detection. The NN is [416,416], so the RGB preview is also that size; I tried a larger preview size linking to the host + an ImageManip outputting [416,416] to the NN, and this appears to be equivalent. My question is about getting a wider aspect ration to exploit nearer the FOV of the OAK. What I get is cropping to force the square NN dimension, which amounts to around 2/3 FOV.

There may be the possibility to train a network that would harness an RGB preview size [1280,800] and scale that to [416,256] for example, but how do I train for that? I did try using --rect and --img 416 256: the results after 100 epochs as seen in F1, R, and P graphs were terrible. Some searching around makes me guess that just using --rect plus --img 416 might be the correct way, but how could this work? After all, the NN will expect a particular size of image for input. Is that just specified when doing blob conversion? By the way, the training images appear to be 480x360, if that matters.

TTedHerman · Nov 6, 2023

JanCuhel I'm trying to get a blob for hands-only detection. There are numerous projects for this.

For yolov4, I found cansik/yolo-hand-detection/tree/master which seemed promising, had I been able to convert it using yolo2openvino. Maybe @Matija has a good suggestion, just make a different environment for TF1 and perhaps that will work (my desktop system is somewhat limited; no GPU).

Another idea would be to build a yolov7 model. I have started trying this, but ran into a few obstacles/questions.

First, the dataset for the project mentioned above is has some weird Matlab format. So probably better not to try that as a starting point.
Other hand detection projects use MS COCO's hand collection, which is straightforward. I don't have a sense of how many images and what parameters to use (in spite of the paper/repo WongKinYiu/yolov7). Still, seems like a path to try, and generally yolov7 would have clear advantages.
I started some training specifying yolov7-tiny.pt as the starting weight file, which is around 12MB. After some epochs, the model in last.pt is 284MB. Is that normal?
For my application, I will need the detection to work for gloved and non-gloved hands. Maybe that will just work, I did not look enough at the input images. If it fails to detect gloved hands, then probably more images will have to be included, right?

TTedHerman · Nov 3, 2023

jakaskerl I happened on a specific model (from a couple years back) that happens to be yolo 4 and was hoping to try it.

I did try and fix up the yolo2openvino programs using the TF v1 to v2 converter, but it couldn't handle the decorator "@tf.contrib.framework.add_arg_scope" in its conversion. I tried commenting that out, but later in fact the conversion failed with some decorator problem within tf.

TTedHerman · Nov 3, 2023

Hi,

I have a yolov4-tiny model that I'd like to try. First I need to convert it to a blob. There is a luxonis notebook showing how to do this, but it runs into trouble because the Tensorflow version has changed from 1.x to version 2. I'm wondering if someone has a new version of the code in luxonis/yolo2openvino which has been updated to use TF version 2?

TTedHerman · Oct 26, 2023

erik Agree completely that geaxgx's demos run faster! Unfortunately, the only geaxgx demo with multipose is the one which does not run on the OAK. Sure, one can record video and then run movenet_multipose later, on the desktop, with the recorded video. However, I'm under a restriction that we can't record video (it's a privacy IRB thing). So I'm stuck with the one supported by the SDK.

TTedHerman · Oct 26, 2023

Hello @erik

I need to use multipose so blazepose is not an option (by the way, I could not get blazepose's edge mode to work, though host mode works fine). Reading up about geaxgx/openvino_movenet_multipose it seems there isn't much hope of that multipose model working on OAKs because of an fp32/fp16 issue.

Hence, I followed your suggestion to get the gen2 human pose working using the current API. This does work, however it seems rather inelegant to copy its landmark/draw code. Instead, my code imports two functions from depthai/resources/nn/human-pose-estimation-0001/handler.py, namely decode() and draw() -- both require a mock nnManager object. All of this works, but the setup is brittle and in fact, there are two different handler.py versions for human-pose-estimation-0001, one in the sdk. Was there some other way you had in mind?

TTedHerman · Oct 4, 2023

Hello @erik ,

I was unable to recreate the exact error on gen2 human pose, but instead got two other errors: see below. The problem is thus nondeterministic, perhaps some version problem that manifests as an overrun or bad pointer. I only got this on the RPi ; I briefly tried it on my desktop system and didn't encounter the problem. I doubt it's worth your time to investigate. My time would be better spent adapting depthai_blazepose (which worked brilliantly on the RPi) to the multipose model. I'll likely open another thread if I get stuck on that.

Error 1:

pi@raspberrypi:/media/pi/lux/depthai/depthai-experiments/gen2-human-pose $ python main.py -cam

Available devices:

[0] 14442C10B16CC9D200 [X_LINK_UNBOOTED]

Starting pipeline...

Stack trace (most recent call last) in thread 3546:

#19 Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in

#18 Object "/lib/aarch64-linux-gnu/libc.so.6", at 0x7fa6ee1c1b, in

#17 Object "/lib/aarch64-linux-gnu/libpthread.so.0", at 0x7fa70c0647, in

#16 Object "python", at 0x620baf, in

#15 Object "python", at 0x620f3f, in

#14 Object "python", at 0x4c6b4f, in

#13 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

#12 Object "python", at 0x498217, in _PyEval_EvalFrameDefault

#11 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

#10 Object "python", at 0x498217, in _PyEval_EvalFrameDefault

#9 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

#8 Object "python", at 0x4998b3, in _PyEval_EvalFrameDefault

#7 Object "python", at 0x4b1a47, in _PyFunction_Vectorcall

#6 Object "python", at 0x49c257, in _PyEval_EvalFrameDefault

#5 Object "python", at 0x4c6cc7, in

#4 Object "python", at 0x4a52ff, in _PyObject_MakeTpCall

#3 Object "python", at 0x4cac53, in

#2 Object "/home/pi/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f94d80b43, in

#1 Object "/home/pi/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f94e5760b, in

#0 Object "/home/pi/.local/lib/python3.9/site-packages/depthai.cpython-39-aarch64-linux-gnu.so", at 0x7f950ff60c, in

Bus error (Invalid address alignment [0xe9])

Bus error

Error 2:

pi@raspberrypi:/media/pi/lux/depthai/depthai-experiments/gen2-human-pose $ python main.py -cam

Available devices:

[0] 14442C10B16CC9D200 [X_LINK_UNBOOTED]

Starting pipeline...

Traceback (most recent call last):

File "/media/pi/lux/depthai/depthai-experiments/gen2-human-pose/main.py", line 147, in <module>

show(frame)

File "/media/pi/lux/depthai/depthai-experiments/gen2-human-pose/main.py", line 121, in show

A = np.int32(keypoints_list[index.astype(int), 1])

IndexError: index 15 is out of bounds for axis 0 with size 15

TTedHerman · Oct 3, 2023

Hi @erik I'll work on recreating the problem and saving the entire error message. I looked at the repo examples you reference, and they appear to be single person/pose. I would hope to have a multipose model, like https://github.com/geaxgx/openvino_movenet_multipose (which does not have depthai in the name). That's why I went back to gen2. But yes, faster/more accurate is preferable!

TTedHerman · Oct 3, 2023

Hi @erik: following up on your suggestion to use the API instead the just the SDK, I tried running the gen2-human-pose but ran into a dependency conflict when running install requirements.txt. Specifically, it says depthai-sdk 1.0.1 (which seems to be what these older gen2 demos want) specifies blobconverter 1.0.0. However another line in requirements.txt says blobconverter should be 1.3.0.

I'm wondering what to tinker with, the requirements.txt line for blobconverter 1.3.0 or the implicit dependencies of depthai-sdk 1.0.1?

I tried changing it from 1.3.0 to 1.0.0 in requirements.txt and the demo does run for a few seconds, but then gets a segfault in depthai.cpython-390aarch64-linux-gnu.so.

Is there a better way of using the API than the way these older gen2 examples are coded?

TTedHerman · Oct 2, 2023

Hi @erik Thanks for the pointers.

By the way, when I run the human-pose-estimation-0001 using depthai_demo.py, it runs 4.4 FPS or better, and the depth disparity color is 22 FPS. In principle, extracting the data I need is feasible, it's a matter of coaxing it out of the software.

TTedHerman · Oct 2, 2023

Hi, I tried the 7-line demo suggested, and it works (even on my RPi4) using an OAK-D. But as said above, it lags considerably wrt real time (maybe two or more seconds). I'm wondering how I can just get the passthrough, that is, only the landmarks and not the image (whole frame) data.

Looking at https://github.com/luxonis/depthai-experiments/blob/master/gen2-human-pose/main.py we see that as the queue is processed, each item has a list of potentially 18 landmarks (?) containing (x,y) of each landmark. It's unclear to me how this works. I'd also like to obtain depth as well as the (x,y) and a label of the landmark for each item.

However, looking deeper, there is code for heatmaps, and then

probMap = cv2.resize(probMap, nm.inputSize) # (456, 256)

keypoints = getKeypoints(probMap, 0.3)

new_keypoints_list = np.vstack([new_keypoints_list, *keypoints])

looks like maybe it is not so simple and bandwidth-limited as I imagined (otherwise why the cv2 call).

Is this something that depthai_sdk can make easy, or should I just try working with gen2?

TTedHerman · Sep 28, 2023

On the landing page for luxonis.com, under the Artificial Intelligence section, then the tab for Landmark Detection, there is a kind of pose estimation shown with a woman dancing. It's not the standard human pose estimation: the wire skeleton doesn't include the head and there are two verticals for the torso. What model was used to do this, and can I try it on my OAK-D? I searched the zoo for landmarks but didn't find a description that I thought was the one used.

Note: the TF models here: https://www.tensorflow.org/lite/examples/pose_estimation/overview look similar, though the facial landmarks are missing. Is it because she wears a mask?

TTedHerman · Sep 27, 2023

Thank you, that got the PointcloudComponent demo to working. The IMUComponent fails probably because my OAK-D is too old (firmware complaint). Not a problem since I wasn't planning on using IMU functions.