Ok we ran an experiment with confidence level 0.01. The camera gave between 80 and 160 detections per frame. The depth and features stopped after 12h. Can you confirm that with an experiment too?
We also ran an experiment where we omitted the downsampling node for the hi res images. With the ethernet camera we had to limit the amount of hi res images. But with usb3 we can just transfer all of them. We're not entirely sure yet, but it looks like omitting this downsampling node helps too.
(We still keep the downsampling node from 10Hz to 5Hz because 10Hz is too fast for the network.)