I'm developing retail people counting system. Camera is placed above shop entrance (in 3 meters height) and I need to count people entering and leaving the area. It looks almost vertically to the floor.
I know there are examples of this use case, but my results are not very good. I've built my implementation based on this example: https://docs.luxonis.com/projects/api/en/latest/samples/ObjectTracker/object_tracker/
With following changes:
- Based on maximize FOV page, I've used letter boxing (so there is black stripe on top and bottom of a frame).
- I've used yolov6t_coco_416x416_openvino_2021.4_5shave for detections.
- I tried to twak confidence_threshold, but no significant improvements.
As a result system has 14 FPS.
Problem is, there are a lot of tracking ids, which has 1 detection and some are discontinued (starts in the middle of screen and are lost soon again).
I was thinking that this approach might not be optimal:
- Detection network accepts square (416x416) so not the whole NN is employed (because of letterboxing). When I scale the image to (stretch it vertically) detections are affected. Are there NN models which are not square shaped?
- It seems a bit overkill for this application to detect people (everyone who enters shop is human or kid :-) ) What about using edge detection or depth data only and try to detect&track moving objects? Are there some ready-to-go NNs, which can be utilized to detect moving "blobs" from depth detection?
Thank you for your thoughts!