Converting a custom ssdlite_mobilenet_v2

TTim · Jun 16, 2021

Hi all, I'm using the openvino toolkit to convert my custom ssdlite_mobilenet_v2 onnx file to use with my luxonis depthai camera in a modified version of the hellow_world.py example. Here are the steps I've taken:

1) Convert onnx to IR format (.xml and .bin):
python3 mo.py --input_model mb2-ssd-lite.onnx --output_directory --input_shape [1,3,300,300] --data_type FP16 --reverse_input_channels --framework onnx

2) Then convert IR to blob format:
./myriad_compile -m mb2-ssd-lite.xml -o mb2-ssd-lite.blob -ip U8 -VPU_NUMBER_OF_SHAVES 6 -VPU_NUMBER_OF_CMX_SLICES 6

I then modified the hello_world.py example to use my resulting converted mb2-ssd-lite.blob model. Everything goes fine, however, when I go to parse the returned packet it does not have the required format specified here in the openvino documentation.

We expect this:

In [1]: in_nn.getAllLayerNames()
Out[1]: ['detection_out']

But get this instead:

In [1]: in_nn.getAllLayerNames()
Out[1]: ['boxes', 'scores']

Here 'boxes' and 'scores' are the bounding boxes and confidences, but that's all I receive in the packet unlike 'detection_out' which contains [image_id, label, conf, x_min, y_min, x_max, y_max].

** Could it be I doing my conversion process incorrectly in steps 1 and 2 above? **

Sorry for the long post! Thanks inadvance!
Tim

LLuxonis-Lukasz · Jun 16, 2021

Hi Tim,

Since your network is MobilenetSSD based, you can try our MobileNetDetectionNetwork node - you can use this example where it's used and modify it further.

TTim · Jun 16, 2021

Hi thanks for your response. Unfortunately there are two issues against using your model:

1) Mine is custom trained on different classes
2) I am using the lite version of ssd mobilenet v2

Can anyone advise what may be wrong with the packet that my custom model is providing compared with the openvino model? Please read above for details. And thanks again!

erik · Jun 16, 2021

Hello Tim,
I assume that's just how your ONNX model architecture is. You can check it with https://netron.app/, I would assume you will see two outputs. That's not a problem though, our API allows users to read multiple outputs, example here (this one actually has 3 outputs; bounding boxes, confidences and labels).
Thanks, Erik

TTim · Jun 16, 2021

Hi Erik thanks for your input.

I am using the model elsewhere on a different xavier nx environment and it does conform to the output specification given here before conversion with the openvino toolkit:

Output Original Model:

Classifier, name: detection_classes. Contains predicted bounding-boxes classes in a range [1, 91]. The model was trained on Microsoft* COCO dataset version with 91 categories of object, 0 class is for background. Mapping to class names provided in <omz_dir>/data/dataset_classes/coco_91cl_bkgr.txt file.
Probability, name: detection_scores. Contains probability of detected bounding boxes.
Detection box, name: detection_boxes. Contains detection boxes coordinates in format [y_min, x_min, y_max, x_max], where (x_min, y_min) are coordinates of the top left corner, (x_max, y_max) are coordinates of the right bottom corner. Coordinates are rescaled to input image size.
Detections number, name: num_detections. Contains the number of predicted detection boxes.

I would like to use this same model on the oak-d. So after the conversion with the toolkit, I expect to get:

Converted Model
The array of summary detection information, name: DetectionOutput, shape: [1, 1, N, 7], where N is the number of detected bounding boxes. For each detection, the description has the format: [image_id, label, conf, x_min, y_min, x_max, y_max], where:

However, instead I'm only receiving in the packet bounding boxes and confidence when using the oak-d api.

Thoughts?

erik · Jun 16, 2021

Hello Tim, I doubt the OAK/DepthAI API has anything to do with this, there might have been something wrong with the conversion of the model. You can check your model in IR (.bin and .xml) by uploading the XML to the https://netron.app/. I would say you should see the two outputs as you are describing. I would also compare the model from the OpenVINOs model zoo compared to your model, if the architecture is the same. If you provide some screenshots (or even XML) we might be able to help you debug.
Thanks, Erik