running heavier models on oak-d, OAK-D CM4 PoE

krishnashravan

jakaskerl
i have tried many person only detection models like pedestrian-detection-adas-0002,person-detection-0203,
but it cant detect people far away very well
yolo was one of thr better ones thats why i tried that
any other models u can recommend
i will check out the jetson series as well

jakaskerl

Hi krishnashravan

Perhaps something like https://docs.openvino.ai/2022.3/omz_models_model_person_detection_retail_0002.html would perform better at higher distances, but I feel it's going to be even slower.

Thanks,
Jaka

krishnashravan

jakaskerl
i have tried both the person detection retail 0002 and person detection retail 0013 as well
even tho they give good results when people are close to the camera
but when dealing with far away detections of people it struggles
Do u think i should custom train any of these models
i have custom trained only yolo v7 and v8 models
any suggestions on which jetson models would be good enough for the heavier models?
thinking of Jetson Orin Nano 8GB Module

the custom trained v8 model (with people class only gives) doesnt give any detections
I have attached the files here this is trained with around 5000 images using the yolov8 custom training notebook.
the quick fix that was suggested for the masks and anchor masks was solved for the 80 classes weights
that is
anchors: [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0,326.0],

anchor_masks: {"side52": [0, 1, 2], "side26": [3, 4, 5], "side13": [6, 7, 8]},
but the above values dont work when i used it on the custom model
only the video shows up no detections and the value for the detections is also empty

jakaskerl

Hi krishnashravan
You could probably apply the transfer learning to just add detections for distant people detection on top of the already existing (and optimized) detection.

krishnashravan any suggestions on which jetson models would be good enough for the heavier models?

I'm not sure I know enough to give a definitive answer to this. The orin nano features 40TOPS ai performance (3 TOPS on luxonis RVC3), so on paper it's much faster, I'm just not knowledgeable enough to know what this means for model performance. Perhaps anyone else reading this has better understanding.

krishnashravan the quick fix that was suggested for the masks and anchor masks was solved for the 80 classes weights

cc @Matija; Afaik, the json obtained from tools.luxonis.com should have the anchors and anchor masks values.

Thanks,
Jaka

krishnashravan

jakaskerl

U mean add transfer learning on existing person detention models?

The files that i posted were contents of the zip file u get after using the tools file

The masks and anchor masks were empty i filled them with the values that were suggested it worked for normal nano and small weights but not for the custom one.

Matija

Do you have some example images about detection of people in the distance? Mainly to see what you're targeting.

While heavier models could help, I believe the problem here is that resolution is not big enough, and so the distance people are just too small for the model to be able to detect them well. There are two ways that could solve that -- either a very high input resolution, which is slow. I am not sure how fast it would be on a Jetson -- my assumption is it should be faster since it has more tops, but since you increase the amount of operations by a lot it's hard to say what the FPS would be.

Another option that you have, is that instead of using default yolov8.yaml when finetuning/transfer learning, you use the yolov8-p2.yaml. You can see in the config in L40 it says xsmall P2, and then that P2 is passed to the Detect head in the last line. This means that the model will be trained to detect much smaller objects, while the throughput should remain more or less the same. So you can try fine-tuning with that head.

@JanCuhel , can we check why the converted model linked above is not detecting anything unless the mentioned fix is applied? And can we try exporting a model trained with yolov8-2.yaml config using tools.luxonis.com.

@krishnashravan What DepthAI version are you using?

krishnashravan

Matija
depthai version is 2.21.2.0
here i have added the pt file that i used to convert in tools
i trained using the images from here

krishnashravan

Matija

krishnashravan

Matija
so are u saying that when training a custom model on the yolo v8
yolo task=detect mode=train model=yolov8n.pt imgsz=640 data=yolov8.yaml epochs=100 name=yolov8n
my yaml file looks like this
train: C:\computervision\yolov8_custom\train

val: C:\computervision\yolov8_custom\val

nc: 1
names: ["person"]

i can add all the information about the layers that was in the p2 file here and then train
or i have to do it some other way
i'm new to fine tuning so any help would be great
how would i train a custom model using the new p2 file
i'm not sure how to do that.

Matija

krishnashravan

As mentioned in a reply above (Matija), you need to use https://tools.luxonis.com/ to make it work with YoloDetectionNetwork.

Matija

krishnashravan

No, training yaml should be kept the same. Instead of model=… you should use model=yolov8-p2.yaml when you call `yolo train`. That's according to this: https://docs.ultralytics.com/usage/cfg/#train. If this doesn't work, you will need to open an issue at Ultralytics page on how to train with it.

In Colab, does it detect people on the images that you shared? They seem to be very different than the training dataset.

krishnashravan

Matija
i have trained using p2 yaml and its here
and i did inferencing on it
its working there are times it doesnt detect
i tried converting using the tools
still the anchors and anchor masks are empty
i have uploaded the pt file as well
in the same directory
any help would be helpful

Matija

krishnashravan

The masks and anchors should be empty since it's an anchroless model. The issue is not in missing masks but at least for me it's in the following error:

[184430105154631200] [3.4] [1.642] [DetectionNetwork(1)] [critical] Fatal error in openvino 'universal'. Likely because the model was compiled for different openvino version. If you want to select an explicit openvino version use: setOpenVINOVersion while creating pipeline. If error persists please report to developers. Log: 'softMaxNClasses' '157' 'CMX memory is not enough!'.

This means that there is not enough memory on the device. It has to do with how we modify the model for decoding to work on OAK. We will look and see if it's possible to somehow optimize this consumption.

In the meantime you can try exporting the model using a lower resolution. I have tried setting it to `512 288` at the export time and I can successfully run the model and see the detections.

krishnashravan

Matija
i exported to 512 as u said
but it keeps on asking for anchor masks
is there any setting that i have to turn of to make that work?
the error looks like this
[1844301051D1420E00] [1.4] [90.287] [DetectionNetwork(3)] [error] Mask is not defined for output layer with width '21760'. Define at pipeline build time using: 'setAnchorMasks' for 'side21760'.

Matija

krishnashravan

Can you export the models with tools and then use the main_api.py from here? You can specify model and config with flags -m and -c . Model should be the blob and config the json from that ZIP file.

Try also updating the depthai to the latest version with pip3 install depthai --upgrade.

I have tried converting the .pt weights from the repository that you shared and running it with the approach I described above and it works without problem.

If it doesn't, please paste the full log from the console.

krishnashravan

Matija
yolo task=detect mode=export model=yolopeople.pt format=onnx imgsz=512
i used this to convert to onnx and then used
http://blobconverter.luxonis.com/
to convert from onnx to blob
and i used the json i made with the pt in the tools and changed the inputsize in json to 512X512
i'm getting this
[1844301051D1420E00] [1.4] [4.514] [DetectionNetwork(1)] [error] Mask is not defined for output layer with width '21760'. Define at pipeline build time using: 'setAnchorMasks' for 'side21760'. [1844301051D1420E00] [1.4] [4.518] [DetectionNetwork(1)] [error] Mask is not defined for output layer with width '21760'. Define at pipeline build time using: 'setAnchorMasks' for 'side21760'.
i'm i going somewhere wrong with what i'm doing?

krishnashravan

Matija
i tried what u said its working both on my code and the api call code that u provided
but the performance is still the same
there are times it detects everyone
sometime it doesnt

Matija

krishnashravan

This is the correct way to export the model. Here are some options you can try:

Use "512 288" for shape to keep the 16:9 aspect ratio.
Lower the confidence threshold.

If this doesn't help, then the performance is likely limited by the quality of the neural network, meaning it's very likely detection performance in Colab or elsewhere for that particular image with the same input shape could be the same. In the provided images, people seem really small. So as an alternative, you could try splitting the image into 4 parts using ImageManip and try feeding crops.

JanCuhel

@krishnashravan we looked into the memory issue when exporting the YoloV8 with P2 head and found the issue is due to the softmax in the model. Unfortunately, there isn't anything we can with this. So, I'd suggest to you to try the options that @Matija has suggested to you with YoloV8 P2 head. Or alternatively, you can also try to use YoloV6 with higher resolution (but it has just 3 detection heads) or YoloV5 with P2 head. Here's the link to the config file. Be aware that this model is relatively big and so the inference will be very slow. To train this model you need to specify the config file using --cfg flag, e.g. to start the training/finetuning you can use this command python train.py --data coco128.yaml --weights yolov5n.pt --cfg yolov5-p2.yaml --device 0. More info can be found here.

krishnashravan

JanCuhel
the inference being in real time is the main focus
i'm doing inferencing on two videos besides live inference
made it work on the camera the far away video is at normal speed with blocking=False
if its true the video is slow.
with the close video (like a cctv video) video with blocking True the video is at normal pace with detections good 80-90% of the times.
with blocking False the video is fast
anyway to fix this to make it so that the video runs at a fixed fps any thing i can change ?
if video: q_rgb = device.getOutputQueue("manip") q_in = device.getInputQueue(name="inFrame", maxSize=4, blocking=True) videoPath = str((parentDir / Path('../../data/' + video_source)).resolve().absolute()) cap = cv2.VideoCapture(videoPath) inputFrameShape = (sizeX, sizeY)

krishnashravan

JanCuhel
both the yolo v6 and v5 will give lower frames?
in the oak d?