Custom Model Training

erik

What’s one of the things that makes our camera systems so powerful? Deep learning. What does that mean? Instead of having to program computers or devices to do things, you can train them. And that's huge.

Just like you can train a dog, you can train any of our powerful computer vision devices. Don’t worry we’ll show you how.

No Training, No Problem

First off, we want to note that you don't have to train anything to get up/running with an OAK camera. There are all sorts of pre-trained models you can use. There are tons of things you can run right away. To name just a few:

Intel/OpenVINO (here)
OpenCV.ai's modelplace.ai (here)
PINTO0309's (here)
Roboflow Universe (here)

And here’s a glimpse at models you can run right away from Roboflow (a fantastic place for dataset development, labeling, and management):

There are lots more too, including reference applications you can run just by downloading and hitting run (e.g. here). But you'll want to train things, trust us.

Some History

Geoffrey Hinton (the godfather of AI) single-handedly invented and killed Deep Learning back in 1986. Before it was even called that. How?

Well, he proved that it was theoretically possible to train electronics (the invention) through a system called “back propagation,” while also making clear how much computation this would take to be useful (what killed it). In 1986, when his paper was released, it would have taken more than all of the world's computers the rest of humanity's existence to train a single neural network. And thus plunged Deep Learning and really the whole Artificial Intelligence space into an "AI Winter". Research effectively stopped.

This all changed around 2009-2012, when folks realized that, well, all of the world's computers from 1986 were WAY less capable than a single modern GPU.

In 2009 ImageNet was released, which provided the first usable dataset (more on the importance of datasets later), and by 2012 Andrew Ng trained a network to detect cats on Youtube. And Boom! The Deep Learning Boom was on: Siri, Alexa, Cortana, Google Photos. A whole slew of AI-based things you've heard of, and even more you haven't, all started taking over markets. Overnight at Google, hundreds of man years of work, what were the best algorithms in the world, were outperformed and replaced by machine learning models.

This is the power of being able to train.

The world's best algorithms can be outperformed. But more importantly, all sorts of previously-intractable problems are now readily-solvable.

OK, I'm Convinced. Training is Cool. But How Do I Do It?

Fortunately, we have prepared open source training scrips which allow you to get to work training right away. And you can even train for free.

Below is our Yolov5 tutorial trained on some grocery-store items. And you can follow along the tutorial that trained this yourself, here. And there's a bunch more training tutorials here covering YOLOv3, TOLOv4, MobileNet SSD, and even Deeplabv3+ for semantic segmentation.

Can't get to the store to test? No problem.

When training a model, you can test out how it performs on our cameras by feeding in images or videos from your computer over USB or ethernet. Don't have the thing you're trying to detect near you? We have you covered there too:

Feed Video (or stills) into a Luxonis camera from your computer, here
You can even stream Youtube directly into a Luxonis camera, here with `--video https://youtu.be/9rlI3Xg9g_A` (to test your custom Johnny 5 detector.)

That covers how you train! It's actually not that hard these days. Machine learning sure has come a long way.

AJ_Chizu

Dear Eric,
My apologizes for the bother.
I recently started to train a model with the intent of using it on OAK-D camera.
For this, I referred to https://colab.research.google.com/github/luxonis/depthai-ml-training/blob/master/colab-notebooks/Easy_TinyYOLOv4_Object_Detector_Training_on_Custom_Data.ipynb#scrollTo=FcGqlLT3un1O

I successfully trained the custom model and was able to make prediction using it on an Image.
However, I continued with the steps for convert model so it can be used on DepthAI as in the article

This requires three steps: 1. Convert model to Tensorflow frozen model. 2. Convert Tf model to OpenVINO IR files .xml and .bin. 3. Compile a blob from the IR files. The blob can be used for inference on DepthAI modules.

I converted my model best weight to Tensorflow frozen and have the .pb file.

I also carried the 2 and 3 step and have the blob file.

However, when i tried to use my model in the camera, I get thes error messages

Input tensor 'input_1' (0) exceeds available data range. Data size (519168B), tensor offset (0), size (1108992B) - skipping inference

Mask is not defined for output layer with width '1000'. Define at pipeline build time using: 'setAnchorMasks' for 'side1000'.

Input image (608x608) does not match NN (3x608)

I am in doubt of my file conversion procedure so I have attached a link to my model weights, .pb, .cfg and error messages.
https://drive.google.com/drive/folders/1aADq61VgaTaG86dpY-x1t9iPNzTLJQVe?usp=sharing

Kindly assist me in resolving this challenge.

Thanks

jakaskerl

AJ_Chizu Answered here.