I haven't worked with the onboard processing capabilities of the Oak-D much so I'm trying to get a grasp on what to expect from the hardware on the device.
My plan was to do the neural network inferencing on the host device and use a laptop or something with a GPU like a Jetson. However, if the oak-d can handle the segmentation task then I would be able to use much less power hungry hardware for the rest of the system and free up a lot of cooling capacity. This will eventually be going on a mobile autonomous vehicle so unfortunately I don't have the luxury of a machine with a lot of compute.
What I have in mind is a U-Net with 3 levels on the encoder side and 3 levels on the decoder side. The input image (the rgb image from the oak-d) would be 600x400 and the output image is the same size. All I really need is about 10fps for this to be acceptable. Does anybody know of an example of something similar being done? Also is there a way to try to calculate how much compute a model would require so that I can compare to what is available on the oak-d?