OpenVino'ing trained MobileNet to run on OAK-D

erik · Apr 8, 2021

eugenek it usually takes some time, did you give it a few minutes? Anyways, it's great that you got 8083 to work. Oh and when compiling for OpenVINO version that's not the latest (used by depthai lib), you have to specify the version in your python script (eg. pipeline.setOpenVINOVersion(version = dai.OpenVINO.Version.VERSION_2021_1)).
So it could be your model actually outputs more "labels" than you populated in labelMap. Printing the output from the NN node can provide info about it. If you specified 224x224 as the input you should set the camera preview size to 224x224 (otherwise you will get an error every time NN node tries to do an inference)

Eeugenek · Apr 8, 2021

erik Yep, I gave 8084 some time.

So, I specified OpenVino version and printed out some stuff from NN. Indeed, it sometimes outputs absolutely tremendous number of labels, so I guess I got it all wrong while converting TF2 model to blob.

As I mentioned in my previous post, I think I'm failing to export frozen graph.
I found a few ways to do that online, all quite different, tried them all with no luck.

I used basic TF2 stuff and bare MobileNet v2 model in my project.
Vast majority of examples which eventually work with DepthAI use TF API a lot and pretrained models from https://github.com/tensorflow/models.git. But they all require labelled data set, with these xml files sitting next to each and every image. There's no example which shows a way to make it happen with unlabeled data sorted by folders (like I described in my first post in this thread or as shown here https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory). Is that possible? How can I make it work?

Thanks,
Eugene.

Eeugenek · Apr 12, 2021

Anyone? Anything?

dhruvsheth-ai · Apr 13, 2021

eugenek Hi and sorry for the delay. I'm sort of clueless here. @erik do you have any docs or suggestions here?

erik · Apr 14, 2021

Hello,
high number of labels/bounding boxes can also be because of low (or non-existent) threshold confidence.

So to get a blob, you first have to convert from TF2 to OpenVINO IR. Intel has many resources on how to perform this, eg: https://github.com/openvinotoolkit/openvino_notebooks. After getting the IR (.bin and .xml) you can compile to .blob.

To label a dataset there are many tools to help you with that, for example Roboflow and SuperAnnotate. I believe you can then export the labels in the desired format.

Thanks, Erik

Eeugenek · Apr 15, 2021

erik
Thanks for that.
This is exactly the problem. I have tried mo_tf.py before, and it yells that:

Cannot load input model: TensorFlow cannot read the model file: "saved_model.pb" is incorrect TensorFlow model file.
The file should contain one of the following TensorFlow graphs:
1. frozen graph in text or binary format
2. inference graph for freezing with checkpoint (--input_checkpoint) in text or binary format
3. meta graph

I am looking for a reliable guide to create frozen graph from TF2 model. Do you think it's worth opening a new thread with "Converting TF2 model to frozen graph" will help draw attention of someone know knows?

While I'm trying to make that example to work with my set of images, I cannot ignore, that training MobileNetV2, from scratch, in TF2 takes roughly 7 times less time (same GPU), than following the steps from the example about (ie using pre-trained model) and the end results (detection confidence) is higher as well.
I must mention that I run stuff on my local PC, as the data I'm using is classified company information, so I have no permission to upload it.
Additionally, I fully acknowledge the limited amount of experience with AI stuff I've got, and I simply can't find good info online. That's why I need some help over here.
Thanks,
Eugene.

Eeugenek · May 5, 2021

Ok,
there was some progress in my attempts to get my stuff running.
Basically, I have bumped into this tutorial - great stuff!

There are two interesting sections in it. Here model predictions are checked, and here Openvino IR is validated on test images.

I modified both validation routines so it runs inference on a BIG bunch of test images, and results are good.

Next, I converted IR to blob (tried both locally or using http://69.164.214.171:8080/) and run on OAK-D, as shown here.

And the classification is all over the place. Total mess.
I have even created a video from my test images, which were successfully classified before blob creation and feed it directly to OAK-D brains, to bypass potential "shaky hands" and "no too good viewing angles" and yet it fails to make proper classification almost all the time.

What can cause that?

SOS!

Thanks,
Eugene.

Brandon · May 7, 2021

Hi eugenek ,

Sorry about the trouble. Would you be able to shot use the model and/or dataset being trained to support@luxonis.com so we can try it out and see what it happening?

Thanks,
Brandon

GGergelySzabolcs · May 7, 2021

eugenek

You mentioned:

Additionally, when I run inference on PC, I scale the colours (dividing by 255). Should I do something like this in the camera script? I can't find anything resebmbling colour scaling in the Python API.

OpenVino supports preprocessing of the input image.

The most common source of error is not specifying the scale, mean values, or reversing the channel order (E.g. OpenVino expects that the input image is BGR but the model requires RGB).

Here are all the command line arguments for OpenVino's model optimizer:
https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html

Therefore when converting from TF to OpenVino you should specify --scale_values [255,255,255], the model compiler will add additional preprocessing layers to the model, at the beginning. (you can check the generated .xml )
Additionally, when specifying at blob conversion step -ip U8, it tells the blob compiler that the input image is U8, in the first layer of the model it will be promoted to FP16, after which preprocessing will happen (scale, mean, reverse input channel, if defined).

Additionally here is a very useful FAQ section for OpenVIno's model compiler:
https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html

Note that these settings depends on the input image format defined in depthai pipeline settings (BGR or RGB, U8 or FP16)

Eeugenek · May 10, 2021

GergelySzabolcs thanks a lot. I kinda figured out most of this stuff by experimenting. Great insight and explanations. Thanks again!

Brandon, I believe that if that stuff (which GergelySzabolcs described above) were described in help and how to manuals in Docs section, my life, and possibly lives of other beginners would have been significantly easier. Yes, all the info is available online, but it is scattered and none of these systems (TF, OpenVINO) is newbie-friendly. It actually took me an awful lot of time to realise OpenVINO can add per-processing layers, although now it sounds so logical and makes so much sense...

GGergelySzabolcs · May 10, 2021

eugenek

We discussed this internally and we will update this page with all the information necessary, together with debugging tips.

MMartinL · Mar 19, 2022

Thank you for this info! As eugenek mentioned (well I agree on everything) its hard to find info when you're a rookie as it's scattered and often on a much more advanced level, it could be hard to connect the dots and see how everything hangs together.

By the way, the "Tutorial - SSD MobileNetv2 training with custom data" is made for Tensorflow 1. Would you recommend going for TS1 for Oak or is it better to run TS2?

Best,

Martin

Sstephansturges · Mar 20, 2022

MartinL

By the way, the "Tutorial - SSD MobileNetv2 training with custom data" is made for Tensorflow 1. Would you recommend going for TS1 for Oak or is it better to run TS2?

--> TS1 or TS2 will not make any difference in the performance unless there is a difference in the model implementation that you are using, meaning if you use the vanilla deeplab from TS1 or the vanilla deeplab2 from TS2 there will be no meaningful difference (AFAIK the publishers did not change the actual underlying architecture between the two, at least in the default settings. I think there are some wider Atrous convolutions available in deepblabv2 if you go with some of the more exotic backbones maybe...? )

Where it does make a difference: if you have a newer GPU which only supports CUDA 11 or newer you will need to set up your project in TS2, or use the very handy tutorial from Puget to set up TF1.15 to work with your GPU: https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/

MMartinL · Mar 20, 2022

Hi!

Thank you for your answer. I´m trying to make things work with the ssd_mobilnet_v2_320x320_coco17_tpu-8 and as i understand this is better done with tf2. I´m also focusing more on TF2 as it feels like the longterm right choice.

The reason i´m going for the ssd_mobilnet is really more a coincidence than a well thought choice.
If i´m looking for a model where accuracy is more important than speed, but as the objects that will be detected are large, slow moving and stand out a bit from the environment i think all models should be able to handle the task. The goal is to have a model that makes realtime object detection with a OAK-D PoE.

so, what i´m really looking for is a good "newbie-model" that is as easy as it gets to create a functioning application to start with, is there any model that is "easier" to work with than others?

I have a bunch of images and xml's that i would like to try, and as this would be something just to learn more about the overall method the final performance is not that important.

do you have any suggestions or is the choice of model less relevant when it comes to complexity development/procedure?

thanks again,

Martin

Sstephansturges · Mar 20, 2022

MartinL do you have any suggestions or is the choice of model less relevant when it comes to complexity development/procedure?

Mobilenet_v2 is going to be a great choice for running your model in realtime on the OAK-D, so that's a good one to continue on. There's really no right or wrong answer there though... you might find a YOLO model is easier to start with if you are just starting with DNNs, because you can train the model with Darknet and don't need to learn the tensorflow API which is somewhat messy to be honest (although it can do a lot more than Darknet once you get the hang of it). It's also a lot easier to set up your data for ingestion into Darknet compared to tensorflow in most cases

This tutorial from Luxonis is excellent if you just want to get started with an SSD and run it on the OAK:
https://colab.research.google.com/github/luxonis/depthai-ml-training/blob/master/colab-notebooks/YoloV3_V4_tiny_training.ipynb