I am facing trouble with visualizing the output and I don’t know how to fix it.

import segmentation_models_pytorch as smp
model_smp_mv3= smp.DeepLabV3Plus(encoder_name='timm-mobilenetv3_large_minimal_100',encoder_weights='imagenet’)
torch.onnx.export(model_smp_mv3, dummy_input,save_filepath, opset_version=12, do_constant_folding=False)

import blobconverter
blob_path = blobconverter.from_onnx(

model="smp_mv3.onnx",

optimizer_params=["--data_type=FP16", "--input_shape=[1,3,512, 512]"],

output_dir=“oakd_models/",

shaves=10,

)

with dai.Device(pipeline) as device:
print("connected to the device.")
dev_in = device.getInputQueue("input")
dev_out = device.getOutputQueue("output",maxSize = 1,blocking = False)

frame = dai.ImgFrame()  
frame.setData(to_planar(img2,(nn_shape,nn_shape)))
frame.setType(dai.RawImgFrame.Type.RGB888p)
frame.setWidth(nn_shape)
frame.setHeight(nn_shape)

dev_in.send(frame)
print(frame.getHeight(),frame.getWidth())

data_o = dev_out.get()

layers = data_o.getAllLayers()
for layer_nr, layer in enumerate(layers):
print(f"Layer {layer_nr}")
print(f"Name: {layer.name}")
print(f"Order: {layer.order}")
print(f"dataType: {layer.dataType}")
dims = layer.dims # reverse dimensions
print(f"dims: {dims}”)

Layer2 = data_o.getLayerFp16(layers[0].name)
layer2 = np.array(layer2).reshape(dims)
seg_output = layer2.squeeze(0)
seg_output = np.transpose(seg_output, (1,2,0))

Could you tell me how to get the correct visualization? Should I add scaling? Or convert to BGR?
Thank you

    Tanisha

    Hey, it looks like you are sending the image from host to the model itself. Because you didn't provide any mean or scale flags during the conversion process, you need to preprocess the image in the same manner as your PyTorch model expects it. Most common scenarios are simple division by 255, or image normalization using ImageNet mean and scale.

    If you expect the model to run on camera feed directly, it is recommended you perform the same normalization in the model itself by passing mean and scale flags during the conversion process. You need to transform BGR 0-255 image to the input values expected by the PyTorch model.

    Decoding looks mostly correct, if you want to get classes you would have to argmax over channel dimension and then apply some color map to it.

      Matija I've tried that but I'm still not getting correct visualization of the predicted mask.
      Could I email you the model and code so you could look at it, I've tried everything.

        Tanisha

        Hey, can you explain your progress in detail? We can solve over email later, but would prefer if all the steps are there as it can help others resolve their issues as well.

          Matija

          So, I'm trying to run my deeplabv3plus model using an image from the host. I am using depthai oak only as a processing unit here.
          I normalize my image by dividng by 255 and this as well: normalized_image = (image - mean) / std and image = cv2.imread(img_path2)
          image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) this as well.
          However, I'm still not getting the correct output visualization.

          I have used both tensorflow and pytorch deeplabv3plus model and I have confirmed they show correct results on my host device using both onnx, and pytorch models. As for the tensorflow model it works on image from the colorcamera so I think it is something to do with the way I send my image to oakd.

          Tanisha frame = dai.ImgFrame()
          frame.setData(to_planar(img2,(nn_shape,nn_shape)))
          frame.setType(dai.RawImgFrame.Type.RGB888p)
          frame.setWidth(nn_shape)
          frame.setHeight(nn_shape)

          Try using RGB888i. I think the pipeline expects interleaved input and converts it to planar which messes up the image.

          Thanks,
          Jaka

          Tried that, didn't give correct visualization. I changed the code a bit and I'm able to see the outlines now but I'm still not getting the correct output.

          `pipeline = depthai.Pipeline()
          neural_network = pipeline.create(depthai.node.NeuralNetwork)
          neural_network.setBlobPath(nn_path)

          xin_nn = pipeline.create(depthai.node.XLinkIn)
          xin_nn.out.link(neural_network.input)
          xin_nn.setStreamName("nn_in")

          xout_nn = pipeline.create(depthai.node.XLinkOut)
          xout_nn.setStreamName("nn_out")
          neural_network.out.link(xout_nn.input)
          image = cv2.imread(img_path2)
          image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
          img2 = cv2.resize(image, dsize=(nn_shape, nn_shape))
          img2 = img2.transpose((2, 0, 1))

          with depthai.Device(pipeline) as device:
          q_nn = device.getOutputQueue("nn_out")
          q_nn_in = device.getInputQueue("nn_in")

          nn_data = dai.NNData()
          nn_data.setLayer("input_layer_name", img2)
          q_nn_in.send(nn_data)
          oot = q_nn.get()
          layers = oot.getAllLayers()
          for layer_nr, layer in enumerate(layers):
              print(f"Layer {layer_nr}")
              print(f"Name: {layer.name}")
              print(f"Order: {layer.order}")
              print(f"dataType: {layer.dataType}")
              dims = layer.dims # reverse dimensions
              print(f"dims: {dims}")
          out1 = (np.array(oot.getLayerFp16(layers[0].name)))
          output = out1.reshape(5, nn_shape, nn_shape).astype(np.uint8)
          output = np.transpose(output, (1,2,0))
          test_pred = np.argmax(output, axis=2)`

          I added this line astype(np.uint16) and used dai.NNData() instead of dai.ImgFrame.
          I think it's the way I'm quantizing?

          I can see outlines of the leaves ^

            Tanisha

            Can you try adding the following flags to model optimizer:

            --mean_values [123.675,116.28,103.53] \
            
            --scale_values [58.395,57.12,57.375] \

            This is ImageNet mean and scale multiplied by 255. If you also add --reverse_input_channelsmodel should expect BGR 0-255 images.

            To make sure the normalization is correct, you can try installing openvino-dev==2022.3 first, and calling model optimizer yourself, like:

            mo.py \
            --input_model  model.onnx \
            --model_name segmentation_model \
            --data_type FP16 \
            --output_dir output_dir \
            --input_shape [1,3,1088,1088] \
            --mean_values [123.675,116.28,103.53] \
            --scale_values [58.395,57.12,57.375] \
            --reverse_input_channels

            This will produce OpenVINO IR, which is intermediate representation.

            from openvino.inference_engine import IECore
            ie = IECore()
            
            net = ie.read_network(model=model_xml, weights=model_bin)
            input_blob = next(iter(net.input_info))
            exec_net = ie.load_network(network=net, device_name='CPU')
            img = cv2.imread("img.png") # make sure image is of correct shape
            image = img.astype(np.float32)
            image = np.expand_dims(image, axis=0)
            image = np.moveaxis(image, 3, -3)
            
            output = exec_net.infer(inputs={input_blob: image})

            You can then find the output and post-process it in the same manner as you would otherwise. This basically takes the exported model in .xml and .bin (intermediate representation before blob) and runs it on your CPU.

            Can you let me know what is the result of the above?

              6 days later

              Matija
              Hi Matija,
              I did as you said, converted the onnx model to openvino and here is the openvino output from the script.

              This is similar to the input image which is this one:

              and torch and onnx output as well:
              torch-

              onnx-

              Converting to blob gives me this output:

              Hi @Tanisha
              It looks good, the texture is right, but it looks like there is a problem with visualization (possible overflows on each side; yellow-->blue and blue-->yellow).

              Could you recheck?
              Thanks,
              Jaka

                jakaskerl
                No the output is wrong entirely, there's only 2 classes whereas as you can see in the previous outputs, I can see 4 classes.
                It didn't visualize the clover, it only visualized the grass.

                What preprocessing or post processing should I do in this case?

                  Tanisha

                  Hey, how do you pre-process images before passing them to ONNX or OpenVINO? How do you decode them?

                  We can then share what flags to use with mo.py and how to post-process results.

                    Matija
                    This is how I preprocess:

                    mean = np.array([0.485, 0.456, 0.406])
                    scale = np.array([0.229, 0.224, 0.225])
                    
                    img = cv2.imread(img_path)
                    img = cv2.resize(img, dsize=(1008, 1008), interpolation=cv2.INTER_CUBIC)
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                    img = img.astype(np.float32) / 255.0
                    normalized_image = (img - mean) / scale
                    normalized_image2 = np.expand_dims(normalized_image, axis=0)
                    normalized_image2 = np.moveaxis(normalized_image2, 3, -3)

                    Code for inference on oakd

                    pipeline = depthai.Pipeline()
                    pipeline.setOpenVINOVersion(version = dai.OpenVINO.VERSION_2022_1)
                    neural_network = pipeline.create(depthai.node.NeuralNetwork)
                    neural_network.setBlobPath(nn_path)
                    
                    xin_nn = pipeline.create(depthai.node.XLinkIn)
                    xin_nn.out.link(neural_network.input)
                    xin_nn.setStreamName("nn_in")
                    
                    xout_nn = pipeline.create(depthai.node.XLinkOut)
                    xout_nn.setStreamName("nn_out")
                    neural_network.out.link(xout_nn.input)
                    
                    
                    with depthai.Device(pipeline) as device:
                        q_nn = device.getOutputQueue("nn_out")
                        q_nn_in = device.getInputQueue("nn_in")
                    
                        nn_data = dai.NNData()
                        nn_data.setLayer("input_layer_name", normalized_image2)
                        q_nn_in.send(nn_data)
                        oot = q_nn.get()
                        layers = oot.getAllLayers()
                        for layer_nr, layer in enumerate(layers):
                            print(f"Layer {layer_nr}")
                            print(f"Name: {layer.name}")
                            print(f"Order: {layer.order}")
                            print(f"dataType: {layer.dataType}")
                            dims = layer.dims # reverse dimensions
                            print(f"dims: {dims}")

                    This is my preprocessing:

                    out1 = (np.array(oot.getLayerFp16(layers[0].name)))
                    output = out1.reshape(1, 5, nn_shape, nn_shape)
                    output = output.squeeze(0)
                    output = np.transpose(output, (1,2,0))
                    test_pred = np.argmax(output, axis=2)

                    But it still gives me this output: