Hi JoeJonshon ,
What do you mean by depthAI's mobilenet? If you mean the model in depthAI demo, it is pre-trained on PASCAL VOC dataset. Since model is lightweight, it could better learn the features of fewer classes and could thus have a higher confidence for those compared to mobilenet_coco, which is trained on 80 classes from COCO dataset. Different training techniques and other parameters used during the training could also affect the final predictions.
Regarding the FPS, there are several factors that can impact that as well. Higher input shape means more operations which decreases the FPS. More classes also result in more parameters and more operations, which similarly affects the FPS. Furthermore, final FPS also depends on your pipeline. If you use multiple nodes and stereo in your pipeline, some of the shaves will be used by those nodes, and consequently less shaves are available to NN, which can further decrease the FPS. If you use exactly the same pipeline and you've only changed the model, I'd have to dig a bit deeper into the models too see why one might be slower. But if the FPS difference between the two is relatively small, the reason is explained in the first two sentences of this paragraph.
Best,
Matija