I am working on people tracking using OpenVINO models. There are many sensors covering a large indoor area with different hallways/rooms/etc.
I am using these models to help track a person.
1. Using person detection to detect a person
2. Using person reidentification to reid a person across sensors
3. Using face detection to detect a person (person detection doesn't work for sitting people or with hands in the air)
These algorithms are still not enough to have confident person tracking. The person detection does not get people sitting down or with hands in the air. The face detection doesn't work if peoples heads are facing down looking at their phones. So it is still very leaky for knowing when people are in a room.
These sensors are looking at a static scene, so it has a "control" depth reading about the environment (when no one is in it). When someone moves through that environment, obviously there is a change in depth reading.
I want to store a "control" depth frame on the device (using RAM) and then compare the incoming depth frame to that "control" depth frame, and see what the diff is between the two depth frames.
If I have a person detection that "stops" detecting (because they sat down, raised their arms, or are looking at their phones) I want to be able to check the "control" depth to see if there is still a person there (is there a big difference from the "control" frame for where the person was last detected (it would mean they are still there but might have sat down).
Is there a "best" approach to diff depth frames on device? I don't want to use OpenCV because that would be very slow processing and I was wondering if there was any hardware accelerated approach that could be used? I know that to calculate disparity there is a lot of analysis between mono cameras so is that something I can tap into to diff a depth frame stored in memory from a new depth frame?