Support for V10

jakaskerl

Hi @rmaxwell
I doubt we can make it opensource (not sure on the contracts). I have passed the feature-request to our ML team.

Thanks,
Jaka

Matija

Hey,

Thanks for the thread everyone! We are actively working on adding support for YoloV10 to tools.luxonis.com and device (FW). No ETA we can commit to, but should be in the next few weeks. Unfortunately it is not easy for us to open source the firmware, so I'll have to ask for some patience.

We are aware of constraints on that, so we are looking to simplify the process with RVC4. While I doubt we could expose FW directly, we are working on concept of HostNodes. Those can then run on your host in case of RVC2 (this might be inefficient if you are sending data back to the device) or directly on the device in case of RVC4.

ChrisCoutureDelValle

Hi all,

Looking forward to the v10 release, will it be available from the conversion API with version: Literal["v10"] = "v10" and at the URL "https://tools.luxonis.com/yolov10"?
Thanks,
Chris

Matija

ChrisCoutureDelValle

Likely yes for the version, can't yet guarantee for the URL.

NikitaSokovnin

Hi all,

A small update on adding YOLOv10 support:

We are waiting for the YOLOv10 support in the Ultralytics library to be able to add it to tools.luxonis.com. Otherwise, it's ready in the development environment.
We are working on adding FW YOLOv10 post-processing support.

ChrisCoutureDelValle

NikitaSokovnin Awesome thanks for the update.

ChrisCoutureDelValle

Hello all,

Just wondering on progress for v10 availability.

Thanks,

Chris

jakaskerl

Hi @ChrisCoutureDelValle
It will be ready EOW, but we are still waiting for Ultralytics to add support.

Thanks,
Jaka

ChrisCoutureDelValle

jakaskerl Hi, I know it was mentioned v10 would be ready by EOW last week, any updates?

jakaskerl

Hi @ChrisCoutureDelValle
It's implemented in depthai and should work. The tools conversion has been merged to master branch of luxonis/tools, but has not been deployed yet because we are trying to optimize the IOU part. But in theory, it should work.

Thanks,
Jaka

ChrisCoutureDelValle

jakaskerl Thanks for the response, makes sense. Is the IOU suboptimal for deployment/should I just wait for the official release on tools.luxonis.com?

jakaskerl

Hi @ChrisCoutureDelValle
Yes, best to wait. It turns out that the new architecture actually runs slower on device, due to the operations used, which can't be optimized for the MX.
So the V10n is about 25% slower than V8n on the same resolution.

Thanks,
Jaka

SamiUddin

@jakaskerl Please have a look at

Kind Regards!

jakaskerl

Hi @SamiUddin
What am I looking at exactly?

Thanks,
Jaka

alexandrebenoit

Hi guys,

well, the 25% speed performance loss on yolo v10 vs v8 on RVC2 is surprising.

When saying "runs slower on device, due to the operations used, which can't be optimized for the MX."

=> could you explain and provide details on the operation that are problematic ? Maybe one can try to find alternatives.

Thanks

jakaskerl

Hi @alexandrebenoit
I wasn't directly involved with the deploy of yolov10 on RVC2, but as far as I understand:
The entire architecture (both head, and middle) uses element-wise operations, which can't be optimized for the RVC2. No significant reparametrization tricks (like .fuse()) are available to mitigate the issue.

numbers:
416x416 - 24FPS, 41ms latency
640x640 - 12FPS, 86ms latency

Thanks,
Jaka

alexandrebenoit

Well, it would be nice if you could share :

the list of problematic yolov10 element-wise operations that are not supported by RVC2
the list of all available ops that the RVC2 can perform (a synthetic datasheet)

For example, if the Pixel-Adaptive Convolutions (PAC) operator is applicable for RVC2, then element-wise products could be replaced by PAC with some specific setup and so on (not as efficient as the base element-wise product on standard processors but this could compensate in some setups, maybe RVC2).

Alex

Matija

alexandrebenoit

Hey, we didn't do a deep dive into slow operations to point out where exactly the issue lies. Based on experience, I would say it's slow because of:

A lot of splitting, slicing, and concatenations.
SiLU - you can see there are a lot of "branch-outs" due to SiLU activation. Comparing this with YoloV6 which uses ReLU and reparametrization trick like in RepVGGs, you can see V8 or similar is slower.
MHSA module definitely doesn't help and is likely the "cherry on top".

You can see the ONNX file we use here for reference. Feel free to compile this to blob and benchmark it. If you want to dive into optimization a bit yourself you can use this as the baseline.

If you want per-op performance, OpenVINO provides also a benchmark app that can return per-layer latencies. Note that we are more focused on releasing this rather than optimizing, given that the gain for nano version is 1% mAP compared to V6.

Matija

To add, this seems to hold also on other HW (see relevant issue here). While paper optimizes for computational cost and parameter count, those do not always strongly correlate with the throughput and latency - typically, certain well-known operations might have more ops/params, but can execute faster.

alexandrebenoit

Yes, indeed, this is then mostly related to data exchange bottlenecks and some complex functions such as Silu.

Then, next steps could be Yolov10 engineering to adapt to hardware or on the research side to look for v11 ;o)

In this community, who would be interested in a given direction for collab ?

Alex