Optimal way to stream video from OAK-D S2 PoE -> PoE Switch -> Jetson Orin Nano

TThe_Real_Enrico_Pallazzo · Sep 2, 2023

Heya folks!

Looking to figure out how to do this dance of streaming video from the OAK-D S2 PoE camera via a PoE Switch directly to the Nvidia Jetson Orin Nano (DevKit).

I've been researching which codex/protocols are the fastest, but in-so-far only got that…well…they are kinda ancient.

What gives best 1080p or atleast 720p performance? Inference would mostly be run on the Orin Nano, with the exception (maybe) of hand-gesture-recognition on the camera itself.

I do plan to save the video for later training as well, the Orin has a fresh new Samsung 2TB 990 Pro as main (OS) drive.

How did you folks set up your cameras?

Edit: So far, while researching codecs, while the H.265 seems best for higher-resolution (4k) video…it does come at a cost. H.264 is "faster" but with a pay-off of more disk-space used.

Is there a data-spreadsheet to compare performance and benchmark it against other types/modes of streaming, i.e. the Streaming_API for ZED cameras?

jakaskerl · Sep 3, 2023

Hi The_Real_Enrico_Pallazzo

Codec: H.265 is more efficient in terms of storage as it can compress the video more than H.264. However, it requires more processing power to encode and decode. H.264 is less efficient in terms of storage but requires less processing power. Given that you plan to save the video for later training and you have a 2TB SSD, it might be better to use H.264 to reduce the processing load on your devices. This will also ensure better real-time performance as H.264 has lower latency compared to H.265.

Resolution: The higher the resolution, the more data needs to be transferred and processed. If real-time performance is crucial, you might want to start with 720p and see if the performance is acceptable. If yes, then you can try increasing the resolution to 1080p.

Saving the Video: Writing video to disk can be IO intensive, especially at higher resolutions and frame rates. Make sure your storage can handle the IO requirements. Alternatively you can employ multithreading if you find you can not write video fast enough.

Latency: Minimize the number of intermediate steps and devices between the camera and the Jetson Orin Nano to reduce latency.

A possible setup:

Connect the OAK-D S2 PoE camera and the Nvidia Jetson Orin Nano to the PoE switch.
Configure the OAK-D camera to stream video using H.264 codec at 720p or 1080p resolution.
Run the hand gesture recognition on the OAK-D camera itself.
On the Nvidia Jetson Orin Nano, run a program that receives the video stream, performs the required inference, and saves the video to disk.

Note: I am not aware of a specific data spreadsheet that compares performance and benchmarks against other types/modes of streaming. You might have to do some testing yourself to see what works best for your setup.

Thanks,
Jaka

TThe_Real_Enrico_Pallazzo · Sep 3, 2023

jakaskerl Thank you for your input. As I am still waiting for the 2x Oak-D S2 PoE cameras to arrive, am not sure what exactly consists of the "stream" of said camera.

Is the stream from a single camera combined on the camera (on the fly) from the 3 different lenses..into one? Or does it stream 3 perspectives simultaneously? Or is it 2 stereos combined and colored lens separately --> 2 videos?

Will have to do some calculations after your input…since I'm starting to think the 2 video cameras might overwhelm the little Orin Nano… (if it's 6 total videos…or 4 total….or just 2 videos)

jakaskerl · Sep 3, 2023

Hi The_Real_Enrico_Pallazzo
It's available as separate streams (separate fps/resolutions/...). The streams are combined in case of stereo setup when viewing depth frames, but mono cameras can be viewed separately as well if you wish to.

These are streams that can theoretically be streamed at once if you wish.

But you choose what you want to stream. If you think you might overwhelm your orin nano, you can use lower resoution images, slower FPS or just don't use all streams if you don't need to.

Thanks,
Jaka

TThe_Real_Enrico_Pallazzo · Sep 5, 2023

jakaskerl

If I understood everything correctly, the camera itself is doing the encoding of the multi-video-raw-steams for us, which then only need to be decoded at the recipient point (in my case the Orin Nano)….also the encoding will be depending on the flags we use…or…well, we can probably access it with a custom script to give it instructions on which/how the 3-lens videos are to be encoded. Correct?

One last thing would be, will the AV1 codec be supported in the future on the cameras?

Edit: Apparently some hardware designs have been released (according to Quora):
The AOM has developed a reference design for AV1 hardware codecs that can be used by manufacturers to develop their own AV1-compatible hardware.

So am guessing it's more of an when?, not an if….although it's still very fresh….alongside the other 2 codec standards EVC and VVC