dave

  • Joined Aug 27, 2023
  • 0 best answers
  • Hi All,

    I'm trying to test a model for inference speed. I am getting some unexpected results, I was wondering if you could help.

    I'm using the following script to test it. It consists of a script node that creates a dummy NNData input for the model. A NeuralNetwork node with the model blob and another script node that reads the output of the model and prints when it receives the data.

    import depthai as dai
    
    dummyInputScript = '''
    dataSize = 200
    bufSize = 2 * dataSize * 3
    list1 = [1.0,1.0,1.0]
    list2 = [1.0,1.0,1.1]
    dummy1 = [list1[i%3] for i in range(dataSize*3)]
    dummy2 = [list2[i%3] for i in range(dataSize*3)]
    buf1 = NNData(bufSize)
    buf1.setLayer("points",dummy1)
    buf2 = NNData(bufSize)
    buf2.setLayer("points",dummy2)
    
    while True:
        node.io["A"].send(buf1)
        node.io["B"].send(buf2)
    '''
    
    dummyOutputScript = '''
    while True:
        data = node.io["in"].get()
        node.io["out"].send(data)
        node.warn("Inference Complete")
    '''
    
    if __name__ == "__main__":
        pipeline = dai.Pipeline()
    
        dummyInput = pipeline.create(dai.node.Script)
        dummyInput.setScript(dummyInputScript)
    
        nnNode = pipeline.create(dai.node.NeuralNetwork)
        nnNode.setBlobPath("pathToModel.blob")
        nnNode.input.setBlocking(True)
        nnNode.input.setQueueSize(1)
    
        dummyInput.outputs["A"].link(nnNode.inputs["A"])
        dummyInput.outputs["B"].link(nnNode.inputs["B"])
    
        dummyOutput = pipeline.create(dai.node.Script)
        dummyOutput.setScript(dummyOutputScript)
    
        nnNode.out.link(dummyOutput.inputs["in"])
    
        xout = pipeline.create(dai.node.XLinkOut)
        xout.setStreamName('host')
    
        dummyOutput.outputs["out"].link(xout.input)
    
        with dai.Device(pipeline) as device:
            device.setLogLevel(dai.LogLevel.WARN)
            device.setLogOutputLevel(dai.LogLevel.WARN)
    
            while True:
                data = device.getOutputQueue("host", maxSize=1, blocking=False).get()

    When run on the model I am testing I get output such as:

    [1944301081FBF81200] [2.2] [1.892] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [2.033] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [2.251] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [2.392] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [2.640] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [2.793] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [3.043] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [3.188] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [3.431] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [3.559] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [3.799] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [3.880] [Script(2)] [warning] Inference Complete
    [1944301081FBF81200] [2.2] [4.149] [Script(2)] [warning] Inference Complete

    For the blob I am testing, when looking at the time intervals between each of the outputs (as a representation of inference time) I get graphs that look like the image below.

    The time between print outputs seem to alternate between a short and long time repeatedly. I was wondering if you had any comments on this result or my method?

    Naively, I would expect the model to have a more regular inference time, assuming there isn't a bottleneck of some sort somewhere else.

    Thanks

    • Thanks both for your insight and suggestion on a workaround. I'll see if I can make that work.

    • Hi, thanks for your reply. The length of the first dimension of A and B will be the same but their length is variable (together). I'm not sure at the moment if there is another way to have the length of A and B variable.

      Do you know if "dynamic_axes" are generally not supported for generating a blob?

      Thanks

    • Hi, I'm wondering if I could get some help. I'm trying to export a model to MyriadX blob using the script at the end of this post but I am getting the following error message:

      {
          "exit_code": -11,
      
          "message": "Command failed with exit code -11, command: /opt/intel/openvino2022_1/tools/compile_tool/compile_tool -m /tmp/blobconverter/12373da38d034f99b4b4cbb60d0158b1/modelSimplified/FP16/modelSimplified.xml -o /tmp/blobconverter/12373da38d034f99b4b4cbb60d0158b1/modelSimplified/FP16/modelSimplified.blob -c /tmp/blobconverter/12373da38d034f99b4b4cbb60d0158b1/myriad_compile_config.txt -d MYRIAD ",        
      
          "stderr": "",
      
          "stdout": "OpenVINO Runtime version ......... 2022.1.0
          Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
          Network inputs:
              A : f16 / [...]
              B : f16 / [...]
          Network outputs:
              out/sink_port_0 : f16 / [...]"
      }

      I have gone over the operations used with Netron against ONNX Support Layers and I think they are supported. Is there something i've missed?

      The script to reproduce the error is below:

      from torch import nn
      import torch, onnx, blobconverter
      from onnxsim import simplify
      
      class model(nn.Module):
          def forward(self, A, B):
              cross = torch.cross(A,B,dim=1)
              cross_mean = torch.mean(cross,dim=0)
              cross_norm = torch.norm(cross_mean)
              cross_mean /= cross_norm
              return cross
      
      X1 = torch.ones((400,3),dtype=torch.float32)
      X2 = torch.ones((400,3),dtype=torch.float32)
      
      torch.onnx.export(
          model(),
          (X1,X2),
          "./model.onnx",
          opset_version=12,
          do_constant_folding=True,
          input_names=['A', 'B'],
          output_names=['out'],
          dynamic_axes={'A': {0: 'n'}, 'B': {0: 'n'}}
      )
      
      onnx_model = onnx.load("./model.onnx")
      model_simplified, check = simplify(onnx_model)
      onnx.save(model_simplified, "./modelSimplified.onnx")
      
      blobconverter.from_onnx(
          model="./modelSimplified.onnx",
          output_dir="./model",
          data_type="FP16",
          shaves=1,
          use_cache=False,
          compile_params=[],
          optimizer_params=[]
      )

      Thanks