jakaskerl Thanks for the response, makes sense. Is the IOU suboptimal for deployment/should I just wait for the official release on tools.luxonis.com?

Hi @ChrisCoutureDelValle
Yes, best to wait. It turns out that the new architecture actually runs slower on device, due to the operations used, which can't be optimized for the MX.
So the V10n is about 25% slower than V8n on the same resolution.

Thanks,
Jaka

Hi guys,

well, the 25% speed performance loss on yolo v10 vs v8 on RVC2 is surprising.

When saying "runs slower on device, due to the operations used, which can't be optimized for the MX."

=> could you explain and provide details on the operation that are problematic ? Maybe one can try to find alternatives.

Thanks

Hi @alexandrebenoit
I wasn't directly involved with the deploy of yolov10 on RVC2, but as far as I understand:
The entire architecture (both head, and middle) uses element-wise operations, which can't be optimized for the RVC2. No significant reparametrization tricks (like .fuse()) are available to mitigate the issue.

numbers:
416x416 - 24FPS, 41ms latency
640x640 - 12FPS, 86ms latency

Thanks,
Jaka

Well, it would be nice if you could share :

  • the list of problematic yolov10 element-wise operations that are not supported by RVC2
  • the list of all available ops that the RVC2 can perform (a synthetic datasheet)

For example, if the Pixel-Adaptive Convolutions (PAC) operator is applicable for RVC2, then element-wise products could be replaced by PAC with some specific setup and so on (not as efficient as the base element-wise product on standard processors but this could compensate in some setups, maybe RVC2).

Alex

    alexandrebenoit

    Hey, we didn't do a deep dive into slow operations to point out where exactly the issue lies. Based on experience, I would say it's slow because of:

    • A lot of splitting, slicing, and concatenations.
    • SiLU - you can see there are a lot of "branch-outs" due to SiLU activation. Comparing this with YoloV6 which uses ReLU and reparametrization trick like in RepVGGs, you can see V8 or similar is slower.
    • MHSA module definitely doesn't help and is likely the "cherry on top".

    You can see the ONNX file we use here for reference. Feel free to compile this to blob and benchmark it. If you want to dive into optimization a bit yourself you can use this as the baseline.

    If you want per-op performance, OpenVINO provides also a benchmark app that can return per-layer latencies. Note that we are more focused on releasing this rather than optimizing, given that the gain for nano version is 1% mAP compared to V6.

    To add, this seems to hold also on other HW (see relevant issue here). While paper optimizes for computational cost and parameter count, those do not always strongly correlate with the throughput and latency - typically, certain well-known operations might have more ops/params, but can execute faster.

    Yes, indeed, this is then mostly related to data exchange bottlenecks and some complex functions such as Silu.

    Then, next steps could be Yolov10 engineering to adapt to hardware or on the research side to look for v11 ;o)

    In this community, who would be interested in a given direction for collab ?

    Alex

    Hi,

    thanks !

    Well, regarding the provided online tools, is there any up to date standardized performance comparison table on a single or multiple Luxonis products ?

    I saw some tables in the doc but it would be great if the update datetime and maybe model version could be provided.

    Alex

      7 days later

      Hi Team,

      Just a quick follow up, what do I need to change here? Last used this for v7.

      Code:
      _URL = "https://tools.luxonis.com" #"http://tools.luxonis.com/upload" _OUTPUT_FILE_NAME = "output.zip" _FRACTIONS = { "none": 0, "read": 0.1, "initialized": 0.3, "onnx": 0.5, "openvino": 0.65, "blob": 0.8, "json": 0.9, "zip": 1 }

      `def convert_yolo(file_path: str, shape: Union[int, Tuple[int, int]] = 416, version: Literal["v10"] = "v10"):
      files = {'file': open(file_path, 'rb')}
      values = {
      'inputshape': shape if isinstance(shape, int) else " ".join(map(str, shape)),
      'version': version,
      'id': uuid4()
      }
      file_name = _OUTPUT_FILE_NAME
      url = f"{_URL}/upload"
      print(url)

      # progress bar
      proc = multiprocessing.Process(target=get_progress, args=(str(values["id"]),))
      proc.start()
      
      # upload files
      session = requests.Session()
      with session.post(url, files=files, data=values, stream=True) as r:
          r.raise_for_status()
          proc.terminate()
          print(f"Conversion complete. Downloading...")
      
          with open(file_name, 'wb') as f:
              for chunk in r.iter_content(chunk_size=8192):
                  # If you have chunk encoded response uncomment if
                  # and set chunk_size parameter to None.
                  # if chunk:
                  f.write(chunk)
      return file_name`

      Output:
      https://tools.luxonis.com/upload

      Progress

      HTTP error occurred: 520 Server Error: UNKNOWN for url: https://tools.luxonis.com/upload

      Hi @ChrisCoutureDelValle,

      you can use this script:

      import requests
      import multiprocessing
      from typing import Union, Tuple, Literal
      from uuid import uuid4
      import argparse
      
      
      _URL = "https://tools.luxonis.com" #"http://tools.luxonis.com/upload" _OUTPUT_FILE_NAME = "output.zip" _FRACTIONS = { "none": 0, "read": 0.1, "initialized": 0.3, "onnx": 0.5, "openvino": 0.65, "blob": 0.8, "json": 0.9, "zip": 1 }
      _OUTPUT_FILE_NAME = "output.zip"
      
      
      def get_progress(id: str):
          while True:
              try:
                  r = requests.get(f"{_URL}/progress/{id}")
                  r.raise_for_status()
                  data = r.json()
                  print(f"Progress: {data['progress']}")
                  if data["progress"] == 1:
                      break
              except Exception as e:
                  print(f"Error: {e}")
                  break
      
      
      def convert_yolo(file_path: str, shape: Union[int, Tuple[int, int]] = 416, version: Literal["v10"] = "v10"):
          files = {'file': open(file_path, 'rb')}
          values = {
              'inputshape': shape if isinstance(shape, int) else " ".join(map(str, shape)),
              'version': version,
              'id': uuid4()
          }
          file_name = _OUTPUT_FILE_NAME
          url = f"{_URL}/upload"
          print(url)
      
          # progress bar
          proc = multiprocessing.Process(target=get_progress, args=(str(values["id"]),))
          proc.start()
      
          # upload files
          session = requests.Session()
          with session.post(url, files=files, data=values, stream=True) as r:
              r.raise_for_status()
              proc.terminate()
              print(f"Conversion complete. Downloading...")
      
              with open(file_name, 'wb') as f:
                  for chunk in r.iter_content(chunk_size=8192):
                      # If you have chunk encoded response uncomment if
                      # and set chunk_size parameter to None.
                      # if chunk:
                      f.write(chunk)
          return file_name
      
      
      def main():
          parser = argparse.ArgumentParser(description="Convert YOLO models")
          parser.add_argument("path", type=str, help="Path to the model's weights")
          args = parser.parse_args()
          convert_yolo(args.path)
      
      
      if __name__ == "__main__":
          main()

      I tested it with yolov10 nano from Ultralytics and it worked. Btw, if you'd be looking for an inspiration how to write a call api to our tools, you can check out this script.

      Best,
      Jan

        JanCuhel

        Hi Jan,

        I used the script that you sent over to try and convert a custom yolov10n model that I trained and got the same error as Chris. Do you know why I could still be getting this error?

        Thanks,

        Arnav