This problem involves avoiding objects both those seen before and those never-before-seen. The approach that Luxonis likes to take for such tasks is to use at least semantic depth, usually in addition to known-object detection, depending on the needs of a given application.
Semantic Depth for Unknown Unknown Object Detection and Avoidance
One of the classic problems in autonomous robotic navigation or actuation is to not impact both known and unknown objects. Known objects are things that are known a-priori to the installation to be encountered - such as tools, other machines, workers, equipment, and facilities. Unknown objects are things that may not be anticipated - or even things that are completely unknowable or never-before-seen.
For known objects, training an object detector is sufficient as this is a “positive” form of object detection: “Cat in the path, stop.” “Soccer ball in the path, stop.” etc.
But the most important thing in object avoidance is actually unknown unknown items.
To make up an example, imagine a person in some unknown form of occlusion where only part of a limb is visible while they are wearing clothing with a “flying taco squirrel” as the only visible portion to the perception system. Given that a “flying taco squirrel” is both unknown (as of this writing no such thing exists - but it could in the future) and the only visible portion of a human is this “flying taco squirrel” - there is no possible way that a “positive” form of object detection will be able to detect such an object. As a “positive” system requires being trained on the class of object - or at least a set of things that are similar-enough that a class-agnostic object detector can be used - neither of which are possible in this case. (And since we have no idea in the slightest what a "flying taco squirrel" would look like, we cannot guarantee any semblance of similarity. And worse, this is a "known unknown". The problem we want to be able to solve is the "unknown unknown".)
And this is where a “negative” object detection system is required in such generic obstacle avoidance scenarios. And a very effective technique is to use semantic segmentation of RGB, Depth, or RGB+Depth.
And in such a “negative” system, the semantic segmentation system is trained on all the surfaces that are not objects. So anything that is not that surface is considered an object - allowing the navigation to know it’s location and to take commensurate action (stop, go around, turn around, etc.).
Luxonis will use simulation here as well to train this semantic-depth-based “negative” object detection system. Luxonis has used this technique with success in many object avoidance applications including in significantly non-structured environments including public parks in the presence of the public.
Some public portions of that work are shared here: and examples of the simulation environment, and an example from that public talk is reproduced below:
It is worth nothing that this is real-world testing of a semantic depth system which was:
- Trained only in simulation and tested on a real-world autonomous vehicle using OAK-D.
- Trained only on 80 images (intentionally, to see how quickly the network converged)
- Based on an internal semantic architecture which we developed for this purpose
As one can see, several objects that are VERY hard for traditional depth systems to pick up properly are picked up here, and properly labeled at 10+ FPS, including (red = object, green = traversable, blue = sky):
- The chainlink fence.
a. The entire fence is properly segmented as an object that is not traversable. Chainlink fences are a canonical problem for every mechanism of depth sensing (stereo, ToF, LiDAR, structured light) etc. but are easily perceived by this semantic depth system.
- The repeating pattern of the warehouse.
a. This is a canonical problem for stereo systems.
b. And much work has gone into trying to solve it (e.g. here).
c. Despite this, with only 80 synthetic images, this semantic depth is already identifying a large portion of the warehouse correctly.
- The root beds around the trees.
a. Running over roots is one of the pernicious problems in this industry
b. And semantic depth quickly converged to properly labeling them as objects, despite only 80 training images from simulation
So for unknown-unknown, this sort of "negative" object detection is extremely valuable. As you don't need to have ever seen it before, you can just know it's not one of the safe things to drive over (or fly through, or swim through, etc.) and thereby avoid it or stop.
[Known] Object Detection
And best, the Semantic-Depth for unknown-unknown object detection can be combined with standard object detection of known objects, so both known objects can have pre-programmed behavior. E.g. like below for detecting a person and then following commands from that person:
And in parallel, the robotic system can then not run into things that it doesn't understand or has never seen before.
Together, semantic depth + object detection, when run on with DepthAI can give unknown-unknown object detection/avoidance and known-object detection (and control) - with both given 3D results - so that the unknown-unknown object and known-objects have locations in physical space, which is incredibly important/necessary for safe robotic operation.
3D hand perception shown below as another known object detection in 3D space: