Hi!
We have a requirement for counting any moving object with highest accuracy. An object can be counted if its centroid crosses a line, e.g.
if centroid_x > line_x:
count += 1
(I know that the above code is subject to hysteresis, but I'm just providing it as a simple example)
We do not need classification, i.e we do not need to know if the object is a person, bike, car, or anything else.
Actually, we are counting people of all ages, sizes, ethnic backgrounds, babies in strollers, children, old people, etc. However, it seems that using YoloV8n with ObjectTracker is not detecting many people and not tracking them. So, since we know for sure that any moving object in the camera view is 100% definitely a person (this is indoors and we know that there are no cars, bikes, animals, etc.), I'm wondering if skipping the classification would improve our count. The system is counting about 1000 people per day, when the ground truth is that there are 5000-8000 people per day. When observing the Visualizer, I noticed that if a person is detected as a dog or cat, and if it crosses the line, then it is counted. This is fine and it is the desired behavior. This is what I mean by classification is not needed. However, if a person is not detected at all, neither as a person nor any other object, even though they are moving, then we are loosing the count because it is not being tracked.
If using something like this:
with OakCamera() as oak:
cam = oak.create_camera('color')
nn = oak.create_nn('yolov8n_coco_640x352', cam, tracker=True)
nn.config_nn(resize_mode='stretch')
- What is the best tracker to use for this use case? (SHORT_TERM_KCF, SHORT_TERM_IMAGELESS, ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS) ? I read all the docs and all the linked pages but I still don't understand fully how to choose this parameter.
- Are there any other parameters of YoloV8n or ObjectTracker that can improve the accuracy?
- Where can I find sample videos with lots of people of different demographics? (googling only shows the most popular videos that everyone seems to be using in CV/AI which are only a handful of people in an office or uniform demographic people in a retail mall, mostly adults)
- In order to reduce the load on the OAK-D PoE device, is YoloV8n the best model for this use case?
- Does it matter if the camera is directly above the people and pointing vertically down, or should we place the camera at a height of about 6 to 8 feet (3 or 4 meters) and aim it horizontally at the people?
- When the camera is above people's heads and pointing vertically downwards, if people are close to each other (shoulders touching), then 2 or 3 people are detected and tracked as 1 person. Is there a way to improve this and detect them/track them all ?
- The number of people in the FOV is about 10 people at a time, maximum 20 people. Is the OAK-D PoE Series 1 powerful enough to handle this load (given the limitation explained here).
Thanks!