Dear @jakaskerl ,
After some more tests, here is what I found:
I started by analyzing the code of the online tool Luxonis Tools, which expects .pt files with 3-channel inputs. The main changes I had to make are at line 58 in export_yolov8.py:
im = torch.zeros(1, 3, *self.imgsz[::-1])#.to(device) # image size(1,3,320,192) BCHW iDetection
And at line 67 in exporter.py:
'--reverse_input_channels '
I then checked the input and output sizes of the exported IR file using Netron. Next, I compiled the XML into a .blob file using the OpenVINO compile_tool.exe. By debugging the compile_tool code, I confirmed that the compiled blob has the correct input and output layers and sizes.
Finally, I tested adding the blob in the pipeline with different DepthAI nodes:
Using a MobileDetectionNetwork node instead of a YoloDetectionNetwork, the pipeline does not crash, but the frame rate drops to around 3-4 FPS and the bounding boxes seems random.
Using a NeuralNetwork node instead of a YoloDetectionNetwork, the pipeline does not crash, but it gets stuck on getFirstLayerFp16().
I then analyzed the source code in C++, but since it's a runtime error, I believe I need to check the code after the RPC call in DeviceBase.cpp. However, I don't think that code is available publicly, so I've run out of ideas.
I also believe it's not possible to send the neural network a fake 3-channel image that is just a view of the same grayscale channel repeated three times (e.g., using np.stack((grayscale_image,) * 3, axis=-1)
, which actually occupies only the memory of the grayscale image and builds a 3-channel view).
The overall changes in the blob creation seem to be very small (basically remove the hardcoded part where 3 channels were expected), do you think the changes in the sensor code could be equally minimal and could be done in one of the next release?
Thank you again for your help!
Best regards