Hi folks,
So in down-selecting our architecture we did a TON of work surveying what's available out there to do embedded machine learning. We figured we'd share this here, in case it's helpful to others who are trying to do embedded machine vision. All of the following is public information - it's just sometimes hard to find in the fray. š
The highlights are Kendryte K210 on the lowest end, gyrfalcon 2803 in terms of highest performance/W, and then Intel Myriad X takes the cake in terms of raw video-processing power (can do 3 pairs of stereo cameras doing stereo depth simultaneously, in addition to doing MobileNet-SSD at a nice framerate).
This github seems to be the goldmine of everything thatās going on, here
Neural Processors:
- RockChip RK3399Pro, which is finally out, here!.
- SOPHON BM1880, here, eval board order here ($129), documentation here
- Renesas, here
- Edge TPU here 4 TOPS
- NVIDIA Jetson Nano here
- 0.472 TOPS but leveraging a very efficient software/firmware stack which makes this 'go a lot further'
- Think the Apple days pre-Intel when they talked about how their CPUs do more per MHz. I.e. NVIDIA has done a lot of other optimization which makes the 0.472 TOPS quite effective/usable.
- Qualcomm 2 TOPS, Development Kit for sale here
- Reference design kits for sale here
- Samsung - Exynos wiki
- Exynos 9810 - has ādeep learningā capability - supposedly available (Q1, 2018). Used in Samsung Galaxy S9/S9+
- Exynos 9820 - has āNeural Processing Unitā - not out until Q1, 2019?
- HiSilicon HI3559A, here
- Itās in this tracking camera, here
- It seems the only thin on-par with Appleās A12 processor, with both of them at 5TOPS. Making it the fastest thing you can actually buy for this application.
- Looks to have built-in stereo-depth hardware
- MediaTek P90 here
- One of the top performers in the Android space, but isn't fully out yet.
- Gyrfalcon 2803, here
- 24 TOPS at 1W.
- Definitely the best TOPS/W of anything available.
- Only downside is can only run specific types of inference because of their novel/low-power inference architecture
- Inuitive NU4000AI, here
- 2 TOPS neural, 0.5 TOPS vision processing.
New Special Category: Fully Embedded
- Apollo3 from Ambiq, here, which is in the SparkFun Edge (here) and can run voice recognition with TensorFlow Lite (see Pete Warden's talk, here)
- Kendryte K210 here
- PCB solutions start at $9, here
- Lowest-end and incredible value/price IF it works well
- But sketchy as to what is actually supported
- The GreenWave GAP8 solution here looks really neat and is super low power. Can do low-res video at low power and presumably audio/sensor processing at low power. Check out their store here
- They have an example doing video-based object avoidance for a drone. Really cool!
- It's the PULP-DroneNet, here and it's so clever and effective. I bet a ton of tiny drones will be using this for automatic and cheap obstacle avoidance.
- Check out their videos here and here
Hosts:
- Allwinner solutions - super inexpensive (see Hackaday article here) but downside is they're relatively power hungry and a bit physically large (which is great for hobbyists, as they're easy to hand-solder).
- NXP i.MX 7 series, here - great/low power use, really small physical size. More on the expensive side (presumably better if/when in volume).
- Nationalchip GX6605S SoC, devboard here, Linux images here
Resources:
- An Android AI chip benchmarking site, here
- Plot comparison in terms of TOPS/W vs. W here
- en.wikichip.org - Cool resource with all kinds of semiconductor mfr/device information
Comments
So the only thing that really 'competes' with the Myriad X currently is the HI3559A (EDIT: and the NU4000AI), but as far as I can tell the HI3559A won't be applicable to US markets because of legal restrictions. Anyone know on this?
It sure would be awesome for Intel to release a Myriad X with say a built-in RISC-V or similar, so that say Debian can be run on it directly, and it could then act as its own host. It's the most capable of the bunch, but it sure would be nice to not have to have a host processor alongside it in order to run Linux/etc.
The BM1880 is a version of this, but doesn't have the video processing hardware capabilities of the Myriad X. The RK3399Pro is another version of this, but it doesn't seem to be out yet.
The integrated host solution, particularly if it could run Debian would dominate the market of embedded computer vision/machine learning, I think. Thoughts on that, everyone?
On the low-end, the Kendryte K210 seems like it's the winner. It's 'big' development board is $50 (here, including screen and camera. And the smaller development board with camera and display are $20. And a PCB module is $9, including WiFi, here. It's got a pretty-fast-growing set of community and hardware modules available for it.