Yolo8 Grayscale conversion error

Aandrea · Feb 14, 2024

Hi!

I trained a custom grayscale Yolo8 model using the ultralytics library and I want to use it on the OAK device. The only difference in the model is that is modified to accept 1 channel, instead of 3 channels, images.

When I try to convert it from .pt to .blob using https://www.tools.luxonis.com/ it returns Error while converting to onnx.

I also tried to:

export the .onnx and .xml/.bin files using the ultralytics library
convert them to .blob using https://blobconverter.luxonis.com/
insert the blob into the pipeline as

detection_nn = pipeline.create(dai.node.YoloDetectionNetwork)

detection_nn.setBlobPath(PATH)

detection_nn.setNumClasses(8)

detection_nn.setCoordinateSize(4)

detection_nn.setAnchors([])

detection_nn.setAnchorMasks({})

detection_nn.setIouThreshold(0.5)

detection_nn.setNumInferenceThreads(2)

but when i run the pipeline it returns this error [DetectionNetwork(3)] [error] Mask is not defined for output layer with width '3549'. Define at pipeline build time using: 'setAnchorMasks' for 'side3549'. despite Yolo8 is mask less.

Did someone had and managed to solve this problem?

BradleyDillon · Feb 16, 2024

To deploy your custom grayscale Yolo8 model on the OAK device, you need to convert it into a MyriadX blob format. This process involves two steps:

Convert the model into OpenVINO's Intermediate Representation (IR) format using the Model Optimizer.
Compile the IR format into a blob file using the Model Compiler.

However, since you're encountering an error while converting the model to ONNX, it's possible that there's an unsupported layer or connection between two layers in your model. You can use the Netron app to visualize your model and identify any unsupported layers or connections.

If you're still encountering issues, you can try using the Blobconverter tool, which allows you to convert and compile the model from various formats, including TensorFlow, Caffe, ONNX, and OpenVINO IR.

Once you have the blob file, you can deploy it onto the MYRIAD-X processor within an OAK device for inference.

Sources:

jakaskerl · Feb 16, 2024

Hi @andrea
Hey, the issue lies in that the tools expect models to have RGB images on the input, so 3 channels instead of 1 as a grayscale image. The YoloDetectionNetwork also expects three channels (can be 3x mono channel though).

Could you try converting it for RGB. You can still feed it mono images, just make sure the input has 3 channels.

Thanks,
Jaka

Aandrea · Feb 20, 2024

Hi @jakaskerl and @BradleyDillon and thanks for the replies.

I already have a working RGB model. I was just experimenting using grayscale images in order to decrease the computational cost on the device.

Do you think you will implement a grayscale Yolo conversion in the future?

Thanks for your work,

Andrea

jakaskerl · Feb 20, 2024

Hi @andrea

andrea Do you think you will implement a grayscale Yolo conversion in the future?

Very likely if possible.

Thanks,
Jaka

Aandrea · May 16, 2024

Dear @jakaskerl ,

After some more tests, here is what I found:

I started by analyzing the code of the online tool Luxonis Tools, which expects .pt files with 3-channel inputs. The main changes I had to make are at line 58 in export_yolov8.py:
im = torch.zeros(1, 3, *self.imgsz[::-1])#.to(device) # image size(1,3,320,192) BCHW iDetection

And at line 67 in exporter.py:
'--reverse_input_channels '

I then checked the input and output sizes of the exported IR file using Netron. Next, I compiled the XML into a .blob file using the OpenVINO compile_tool.exe. By debugging the compile_tool code, I confirmed that the compiled blob has the correct input and output layers and sizes.

Finally, I tested adding the blob in the pipeline with different DepthAI nodes:

Using a MobileDetectionNetwork node instead of a YoloDetectionNetwork, the pipeline does not crash, but the frame rate drops to around 3-4 FPS and the bounding boxes seems random.
Using a NeuralNetwork node instead of a YoloDetectionNetwork, the pipeline does not crash, but it gets stuck on getFirstLayerFp16().

I then analyzed the source code in C++, but since it's a runtime error, I believe I need to check the code after the RPC call in DeviceBase.cpp. However, I don't think that code is available publicly, so I've run out of ideas.

I also believe it's not possible to send the neural network a fake 3-channel image that is just a view of the same grayscale channel repeated three times (e.g., using np.stack((grayscale_image,) * 3, axis=-1), which actually occupies only the memory of the grayscale image and builds a 3-channel view).

The overall changes in the blob creation seem to be very small (basically remove the hardcoded part where 3 channels were expected), do you think the changes in the sensor code could be equally minimal and could be done in one of the next release?

Thank you again for your help!
Best regards

jakaskerl · May 16, 2024

Hi @andrea
Thanks for the deep dive, I think it should be possible, but we might need to alter the FW of the YOLO and MBNET nodes as well, for this to work.

Best to create an issue on tools repo, Jan Cuhel will know how to properly convert the model.

Thanks again,
Jaka

Aandrea · May 17, 2024

Dear @jakaskerl,

Thank you for the quick reply.

I've posted it here https://github.com/luxonis/tools/issues/77 too!