Will do. Pinging team.
Brandon

- Mar 4, 2023
- Joined Feb 11, 2019
- 0 best answers
For anyone needing to maximize the FOV, see here:
We recently wrote up this detailed how-to.
- In Deeplab C++
Yes it means that M1 will just install w/out having any other work. Will be equivalent to x86 Mac, or x86 Linux and/or all the standard builds we have now.
So this should be fully out this week when we do another release with this system.
This is not going to be automatically built:
https://github.com/luxonis/depthai-python/pull/604And sorry for missing your comment @gregflurry . Fortunately this will now be done as soon as this finishes building (i.e. tomorrow).
Looks doable. Investigating how hard.
Thanks. Fixed it above. And for the questions, tagging @Luxonis-Alex .
What's going to win long-term in autonomous driving? Tesla? Waymo?
Largely my view on this subject is not supported by the industry. The industry tends to fall into 2 camps:
- LiDAR. "No one ever got fired by buying LiDAR." Waymo camp
- Monocular depth. "LiDAR is a crutch - solving vision is what really matters." Tesla camp.
And I actually subscribe to secret option 3, which says that 2 is mostly right, but just forgetting that more information is so valuable and that 3 is a crutch that is insufficient once we have matured AI/CV.
Why 2 is mostly right:
Solving vision is where all value lies. That's where the real context is. That's where the supplementarily monetizable data lies (e.g. it's hard to pick up child-traffickers from LiDAR data).
With monocular, information is fundamentally missing. And so to make up for the missing information, the time-domain is used instead. Which makes monocular slower, worse performing, higher-latency and have awful corner-cases where it doesn't work at all. The idea is that for lack of the alternate views of an object (which provide the neural network with the information necessary to know depth), the apriori knowledge of similar scenes and/or the time-domain is used to give the alternate views... but in some cases the time domain will not have the requisite information for that. Or it will result in just worse depth or bad latency.
Why 1 is a distraction when playing the long-game:
Shortcuts are great for winning the short game. LiDAR is a shortcut. In college back in 2006 there was a robotics competition where you had to navigate a maze, avoid obstacles while doing so, pick up an object, and then repeat all of that to return it back to where the robot started. I competed with people way smarter than me, and way more experience.
I knew I couldn't compete with those guys. I knew the proper navigation planning required was way more than I could accomplish. So I looked for a shortcut that prevented me from using it at all. So I experimented with how accurate the motor encoders could be. And since it was a controlled/indoor environment, they turned out to be SUPER accurate. And after tweaking and some trial/error to get an idea of when they had issues, and by how much, my team and I were able to literally solve the whole problem hard-coded. We literally hard-coded all the steps required, and the motor-controllers/encoders/wheels/arm/etc. were good enough to do it.
So our robot looks FREAKING AWESOME and did the whole thing first try, completely every challenge (you got points along the way for each thing you passed so as to do a tie-break if no one completed the whole thing) compared to the best competitor that got at most 50% of the solution.
Now this was cool, and we won and had prizes and stuff - but it was freaking useless. It was a shortcut to make something impressive, fast. It was a crutch. And if you wanted to build off of this, you couldn't. You'd have to just start over.
I view LiDAR the same way. Since you get accurate sparse-point measurements short-range and long, you can make something that drives well in many conditions, easily/quickly. It's like the hard-coding. The trouble is LiDAR is sparse in comparison to CV. It gives enough information to "demo". But when you go from winning this short-term competition of who can look like they're further, faster (just like I did with hard-coding) to actually trying to make a full-production solution that matters to the world, and is scalable - LiDAR doesn't have the requisite information. Vision does.
And don't get me wrong, CV + LiDAR is great for super safety critical stuff. But CV is where the real value is. LiDAR is then an idiot-check hard-stop backup system. Just like most lift-critical systems have those.
But that LiDAR backup system is still missing a lot of information. So ultimately I think a redundant CV system will win. As then you have 2x systems with sufficient information to "really understand".
And then this brings to another point: Any LiDAR-based solution that wants to "get serious" also will need CV, as LiDAR doesn't have enough information. So eventually, LiDAR-heavy teams end up having to solve CV to win/scale.
And then long-run, ignoring investor optics, the demands to have progress, etc. - tech-stack-wise, LiDAR is actually a distraction - as LiDAR solution's can't truly robustly operation without CV. And so the more time put into it, the less time into solving CV.
That said, startup in autonomous driving trying to wow investors, LiDAR is absolutely the right choice. As just like that robotics competition, using the shortcut produced a huge WOW effect. And that's super useful for closing funding rounds etc. It's just a distraction to the tech stack development. But if you close a $1 billion funding round because of it - it's what enabled building the right tech stack.
And this is why I actually think right now, for any autonomous driving company in startup war-mode, LiDAR is the right choice. But they need to have their eye on the longterm by using LiDAR to catapult their finances to pivot to CV.
Note that the above is purely an analysis WRT 75 mph+ autonomous driving for moving people (e.g. Tesla, Waymo, etc.). For autonomous mobile robots (AMRs; forklifts, food delivery, etc.) there are similar trades, but vision becomes even more of a "no brainer", as in people-moving the speeds are 75 mph+, which require depth-vision to 350 meters and beyond, which prior to DepthAI and OAK was "hard". Whereas for autonomous mobile robots (typically <<75mph), the depth-sening needs are "not hard". So LiDAR is an even-worse choice for such AMRs. As the transition to vision will happen sooner there, and so the risk/failure-probability of investing in LiDAR is significantly higher - and the "WOW factor" is largely non-existent - and conversely there are insane "WOW factor" capabilities from DepthAI/OAK-based vision on such platforms that are nearly-impossible to pull off with LiDAR. And particularly impossible when factoring in that on AMR, cost is a lot more sensitive, so the LiDARs used then have to be sparser, and have even-worse performance than vision.
And likely if this is read, the point will be made that "LiDAR isn't sparse", followed by a response (by me) showing that you can build a 360° stereo-depth CV solution with 36.8 million depth points and 300+ meter range for <$900. And you actually just can't build that in LiDAR. No company can do so. And anything coming close is $100,000.
So not only does vision provide the long-term value, it's also orders of magnitude less expensive.
- Edited
Hi Brenden,
The last communication from your employer to Luxonis that of threatening legal action against Luxonis.
As such, we cannot engage with you and/or your employer. And we ask that you do not use Luxonis' solutions in any way and please refrain from contacting or communicating with Luxonis staff in any way.
Thank you,
Brandon
CEO \ Luxonis- Edited
Hi Brenden,
Please sheet me an email at brandon at luxonis dot com and we can discuss there.
EDIT: Please do not email us. We did not realize who your employer was. More in the next post.
Thanks,
BrandonOh sorry we missed this. You can do other focus distances, including autofocus. The RGB-depth alignment may just not be quite as well aligned.
I'd recommend giving this a shot and see how it works out. I suspect it will be fine. If it isn't fine, then I'd recommend re-calibrating at the focal distance of interest, using a large Charuco board.
But I think it will just work.
Thoughts?
Thanks,
BrandonObject Avoidance
This problem involves avoiding objects both those seen before and those never-before-seen. The approach that Luxonis likes to take for such tasks is to use at least semantic depth, usually in addition to known-object detection, depending on the needs of a given application.
Semantic Depth for Unknown Unknown Object Detection and Avoidance
One of the classic problems in autonomous robotic navigation or actuation is to not impact both known and unknown objects. Known objects are things that are known a-priori to the installation to be encountered - such as tools, other machines, workers, equipment, and facilities. Unknown objects are things that may not be anticipated - or even things that are completely unknowable or never-before-seen.
For known objects, training an object detector is sufficient as this is a “positive” form of object detection: “Cat in the path, stop.” “Soccer ball in the path, stop.” etc.
But the most important thing in object avoidance is actually unknown unknown items.
To make up an example, imagine a person in some unknown form of occlusion where only part of a limb is visible while they are wearing clothing with a “flying taco squirrel” as the only visible portion to the perception system. Given that a “flying taco squirrel” is both unknown (as of this writing no such thing exists - but it could in the future) and the only visible portion of a human is this “flying taco squirrel” - there is no possible way that a “positive” form of object detection will be able to detect such an object. As a “positive” system requires being trained on the class of object - or at least a set of things that are similar-enough that a class-agnostic object detector can be used - neither of which are possible in this case. (And since we have no idea in the slightest what a "flying taco squirrel" would look like, we cannot guarantee any semblance of similarity. And worse, this is a "known unknown". The problem we want to be able to solve is the "unknown unknown".)
And this is where a “negative” object detection system is required in such generic obstacle avoidance scenarios. And a very effective technique is to use semantic segmentation of RGB, Depth, or RGB+Depth.
And in such a “negative” system, the semantic segmentation system is trained on all the surfaces that are not objects. So anything that is not that surface is considered an object - allowing the navigation to know it’s location and to take commensurate action (stop, go around, turn around, etc.).
Luxonis will use simulation here as well to train this semantic-depth-based “negative” object detection system. Luxonis has used this technique with success in many object avoidance applications including in significantly non-structured environments including public parks in the presence of the public.
Some public portions of that work are shared here: and examples of the simulation environment, and an example from that public talk is reproduced below:
It is worth nothing that this is real-world testing of a semantic depth system which was:
- Trained only in simulation and tested on a real-world autonomous vehicle using OAK-D.
- Trained only on 80 images (intentionally, to see how quickly the network converged)
- Based on an internal semantic architecture which we developed for this purpose
As one can see, several objects that are VERY hard for traditional depth systems to pick up properly are picked up here, and properly labeled at 10+ FPS, including (red = object, green = traversable, blue = sky):
- The chainlink fence.
a. The entire fence is properly segmented as an object that is not traversable. Chainlink fences are a canonical problem for every mechanism of depth sensing (stereo, ToF, LiDAR, structured light) etc. but are easily perceived by this semantic depth system. - The repeating pattern of the warehouse.
a. This is a canonical problem for stereo systems.
b. And much work has gone into trying to solve it (e.g. here).
c. Despite this, with only 80 synthetic images, this semantic depth is already identifying a large portion of the warehouse correctly. - The root beds around the trees.
a. Running over roots is one of the pernicious problems in this industry
b. And semantic depth quickly converged to properly labeling them as objects, despite only 80 training images from simulation
So for unknown-unknown, this sort of "negative" object detection is extremely valuable. As you don't need to have ever seen it before, you can just know it's not one of the safe things to drive over (or fly through, or swim through, etc.) and thereby avoid it or stop.
[Known] Object Detection
And best, the Semantic-Depth for unknown-unknown object detection can be combined with standard object detection of known objects, so both known objects can have pre-programmed behavior. E.g. like below for detecting a person and then following commands from that person:
Source: https://github.com/geaxgx/depthai_hand_trackerAnd in parallel, the robotic system can then not run into things that it doesn't understand or has never seen before.
Summary
Together, semantic depth + object detection, when run on with DepthAI can give unknown-unknown object detection/avoidance and known-object detection (and control) - with both given 3D results - so that the unknown-unknown object and known-objects have locations in physical space, which is incredibly important/necessary for safe robotic operation.
3D hand perception shown below as another known object detection in 3D space:
Source: https://github.com/geaxgx/depthai_hand_tracker/tree/main/examples/3d_visualization#3d-visualization-and-smoothing-filterOur lowest-cost model launched late last year and has been shipping for a while now.
Our next gen is also releasing now. OAK-D Series 2.
And these are available for early-adopters on our Beta store:
https://shop.luxonis.com/collections/beta-storeNot immediately sure what is going on here, but I'm thinking trying to run w/out the QT GUI will likely work around it for now. Will ask how to do so.
Thanks,
BrandonAs a quick update, the load is because we're encoding all the frames on OAK-D-Lite. And so the Pi is having to decode them, which it is struggling with. We didn't realize it was this slow at decoding, at least with OpenCV. And this is what's causing maxed-out CPU use.
So we're evaluating:
- A faster way to decode on the Pi
- Not encoding at all on OAK-D-Lite (so that the Pi doesn't have to decode)
Thanks,
BrandonOh good idea! Will discuss with team!
Thanks for the report. I think the heavy load is getting the uncompressed video streams back. So if you just take metadata back the load should go close to zero.
We will also see about doing lower resolution previews on the pi to limit CPU use.
On the exception - we’ll investigate, thanks.
And on the difference in display - this is because Pi doesn’t work well with heavier GUIs/etc. so we disable it on the Pi to save CPU and installation complexity.
And we’ll likely further optimize the Pi version to save CPU.
Thanks,
BrandoThanks. I know there's a solution to this but I can't quite remember it. Asking the team.
Thanks again,
BrandonAlso, in exciting news, we're finally on the Commute Guardian itself!
Yes, let me ping Erik and he'll provide the best to leverage here.