Hi dexter ,
The benefit of doing it on the device would be having less computing needed on your host, and also less delay - as you don't need to transfer depth frame, only spatial coordinates. Yes I think that the particular NN model used only recognizes English characters.
Thanks, Erik