jakaskerl
i have tried many person only detection models like pedestrian-detection-adas-0002,person-detection-0203,
but it cant detect people far away very well
yolo was one of thr better ones thats why i tried that
any other models u can recommend
i will check out the jetson series as well

    jakaskerl
    i have tried both the person detection retail 0002 and person detection retail 0013 as well
    even tho they give good results when people are close to the camera
    but when dealing with far away detections of people it struggles
    Do u think i should custom train any of these models
    i have custom trained only yolo v7 and v8 models
    any suggestions on which jetson models would be good enough for the heavier models?
    thinking of Jetson Orin Nano 8GB Module

    the custom trained v8 model (with people class only gives) doesnt give any detections
    I have attached the files here this is trained with around 5000 images using the yolov8 custom training notebook.
    the quick fix that was suggested for the masks and anchor masks was solved for the 80 classes weights
    that is
     anchors: [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0,326.0],

       anchor_masks: {"side52": [0, 1, 2], "side26": [3, 4, 5], "side13": [6, 7, 8]},
    but the above values dont work when i used it on the custom model
    only the video shows up no detections and the value for the detections is also empty

      Hi krishnashravan
      You could probably apply the transfer learning to just add detections for distant people detection on top of the already existing (and optimized) detection.

      krishnashravan any suggestions on which jetson models would be good enough for the heavier models?

      I'm not sure I know enough to give a definitive answer to this. The orin nano features 40TOPS ai performance (3 TOPS on luxonis RVC3), so on paper it's much faster, I'm just not knowledgeable enough to know what this means for model performance. Perhaps anyone else reading this has better understanding.

      krishnashravan the quick fix that was suggested for the masks and anchor masks was solved for the 80 classes weights

      cc @Matija; Afaik, the json obtained from tools.luxonis.com should have the anchors and anchor masks values.

      Thanks,
      Jaka

        jakaskerl

        U mean add transfer learning on existing person detention models?

        The files that i posted were contents of the zip file u get after using the tools file

        The masks and anchor masks were empty i filled them with the values that were suggested it worked for normal nano and small weights but not for the custom one.

        Do you have some example images about detection of people in the distance? Mainly to see what you're targeting.

        While heavier models could help, I believe the problem here is that resolution is not big enough, and so the distance people are just too small for the model to be able to detect them well. There are two ways that could solve that -- either a very high input resolution, which is slow. I am not sure how fast it would be on a Jetson -- my assumption is it should be faster since it has more tops, but since you increase the amount of operations by a lot it's hard to say what the FPS would be.

        Another option that you have, is that instead of using default yolov8.yaml when finetuning/transfer learning, you use the yolov8-p2.yaml. You can see in the config in L40 it says xsmall P2, and then that P2 is passed to the Detect head in the last line. This means that the model will be trained to detect much smaller objects, while the throughput should remain more or less the same. So you can try fine-tuning with that head.

        @JanCuhel , can we check why the converted model linked above is not detecting anything unless the mentioned fix is applied? And can we try exporting a model trained with yolov8-2.yaml config using tools.luxonis.com.

        @krishnashravan What DepthAI version are you using?

          Matija
          depthai version is 2.21.2.0
          here i have added the pt file that i used to convert in tools
          i trained using the images from here

          Matija
          so are u saying that when training a custom model on the yolo v8
          yolo task=detect mode=train model=yolov8n.pt imgsz=640 data=yolov8.yaml epochs=100 name=yolov8n
          my yaml file looks like this
          train: C:\computervision\yolov8_custom\train

          val: C:\computervision\yolov8_custom\val

          nc: 1
          names: ["person"]

          i can add all the information about the layers that was in the p2 file here and then train
          or i have to do it some other way
          i'm new to fine tuning so any help would be great
          how would i train a custom model using the new p2 file
          i'm not sure how to do that.

            krishnashravan

            No, training yaml should be kept the same. Instead of model=… you should use model=yolov8-p2.yaml when you call `yolo train`. That's according to this: https://docs.ultralytics.com/usage/cfg/#train. If this doesn't work, you will need to open an issue at Ultralytics page on how to train with it.

            In Colab, does it detect people on the images that you shared? They seem to be very different than the training dataset.

              Matija
              i have trained using p2 yaml and its here
              and i did inferencing on it
              its working there are times it doesnt detect
              i tried converting using the tools
              still the anchors and anchor masks are empty
              i have uploaded the pt file as well
              in the same directory
              any help would be helpful

                krishnashravan

                The masks and anchors should be empty since it's an anchroless model. The issue is not in missing masks but at least for me it's in the following error:

                [184430105154631200] [3.4] [1.642] [DetectionNetwork(1)] [critical] Fatal error in openvino 'universal'. Likely because the model was compiled for different openvino version. If you want to select an explicit openvino version use: setOpenVINOVersion while creating pipeline. If error persists please report to developers. Log: 'softMaxNClasses' '157' 'CMX memory is not enough!'.

                This means that there is not enough memory on the device. It has to do with how we modify the model for decoding to work on OAK. We will look and see if it's possible to somehow optimize this consumption.

                In the meantime you can try exporting the model using a lower resolution. I have tried setting it to `512 288` at the export time and I can successfully run the model and see the detections.

                  5 days later

                  Matija
                  i exported to 512 as u said
                  but it keeps on asking for anchor masks
                  is there any setting that i have to turn of to make that work?
                  the error looks like this
                  [1844301051D1420E00] [1.4] [90.287] [DetectionNetwork(3)] [error] Mask is not defined for output layer with width '21760'. Define at pipeline build time using:
                  'setAnchorMasks' for 'side21760'.

                    krishnashravan

                    Can you export the models with tools and then use the main_api.py from here? You can specify model and config with flags -m and -c . Model should be the blob and config the json from that ZIP file.

                    Try also updating the depthai to the latest version with pip3 install depthai --upgrade.

                    I have tried converting the .pt weights from the repository that you shared and running it with the approach I described above and it works without problem.

                    If it doesn't, please paste the full log from the console.

                      Matija
                      yolo task=detect mode=export model=yolopeople.pt format=onnx imgsz=512
                      i used this to convert to onnx and then used
                      http://blobconverter.luxonis.com/
                      to convert from onnx to blob
                      and i used the json i made with the pt in the tools and changed the inputsize in json to 512X512
                      i'm getting this
                      [1844301051D1420E00] [1.4] [4.514] [DetectionNetwork(1)] [error] Mask is not defined for output layer with width '21760'. Define at pipeline build time using: 'setAnchorMasks' for 'side21760'.
                      [1844301051D1420E00] [1.4] [4.518] [DetectionNetwork(1)] [error] Mask is not defined for output layer with width '21760'. Define at pipeline build time using: 'setAnchorMasks' for 'side21760'.

                      i'm i going somewhere wrong with what i'm doing?

                        Matija
                        i tried what u said its working both on my code and the api call code that u provided
                        but the performance is still the same
                        there are times it detects everyone
                        sometime it doesnt

                          krishnashravan

                          This is the correct way to export the model. Here are some options you can try:

                          • Use "512 288" for shape to keep the 16:9 aspect ratio.
                          • Lower the confidence threshold.

                          If this doesn't help, then the performance is likely limited by the quality of the neural network, meaning it's very likely detection performance in Colab or elsewhere for that particular image with the same input shape could be the same. In the provided images, people seem really small. So as an alternative, you could try splitting the image into 4 parts using ImageManip and try feeding crops.

                          @krishnashravan we looked into the memory issue when exporting the YoloV8 with P2 head and found the issue is due to the softmax in the model. Unfortunately, there isn't anything we can with this. So, I'd suggest to you to try the options that @Matija has suggested to you with YoloV8 P2 head. Or alternatively, you can also try to use YoloV6 with higher resolution (but it has just 3 detection heads) or YoloV5 with P2 head. Here's the link to the config file. Be aware that this model is relatively big and so the inference will be very slow. To train this model you need to specify the config file using --cfg flag, e.g. to start the training/finetuning you can use this command python train.py --data coco128.yaml --weights yolov5n.pt --cfg yolov5-p2.yaml --device 0. More info can be found here.

                            JanCuhel
                            the inference being in real time is the main focus
                            i'm doing inferencing on two videos besides live inference
                            made it work on the camera the far away video is at normal speed with blocking=False
                            if its true the video is slow.
                            with the close video (like a cctv video) video with blocking True the video is at normal pace with detections good 80-90% of the times.
                            with blocking False the video is fast
                            anyway to fix this to make it so that the video runs at a fixed fps any thing i can change ?
                            if video:
                            q_rgb = device.getOutputQueue("manip")
                            q_in = device.getInputQueue(name="inFrame", maxSize=4, blocking=True)
                            videoPath = str((parentDir / Path('../../data/' + video_source)).resolve().absolute())
                            cap = cv2.VideoCapture(videoPath)
                            inputFrameShape = (sizeX, sizeY)