Hi Russ76 ,
Actually, with YoloV5 I linked above, you don't need to train it on the same image size as you want the network to be. Regarding the 5-channel/layer input - there are some very bleeding edge architectures that would allow you this, but we haven't gotten it to work with the OAK. You would also need depth frames/maps when training the model, so the whole training pipeline would need to be specific for such architecture.
Thansk, Erik