Segmentation output not visualized correctly

Tanisha

I am trying to run a Deeplabv3+ segmentation model on Oakd. I converted to blob using blobconverter.from_onnx and I instantiate my pipeline like this.

# initialize a depthai pipeline
pipeline = dai.Pipeline()
pipeline.setOpenVINOVersion(version=dai.OpenVINO.VERSION_2021_4)

#images from host to oak
instream = pipeline.create(dai.node.XLinkIn)
instream.setStreamName("input")

#neural network node
node = pipeline.create(dai.node.NeuralNetwork)
node.setBlobPath(nn_path)
node.input.setQueueSize(1)
node.input.setBlocking(False)

#detection neural network for oak to host

out = pipeline.create(dai.node.XLinkOut)
out.setStreamName("output")

# linking the nodes - image node output is linked to detection node
# detection network node output is linked to XLinkOut input
instream.out.link(node.input)
node.out.link(out.input)

I then run my segmentation on an image like this on OakD

with dai.Device(pipeline) as device:
    print("connected to the device.") 
    dev_in = device.getInputQueue("input")
    dev_out = device.getOutputQueue("output",1,False)
    data = np.array(Image.open(path))
    data = cv2.resize(data, dsize=(1008, 1008 ), interpolation=cv2.INTER_CUBIC)
    # data = data / 255.0
    data = data.astype(np.float16)
    frame = dai.ImgFrame()   # create an ImgFrame object
    frame.setData(data)
    frame.setType(dai.RawImgFrame.Type.NONE)
    dev_in.send(frame)
    print(frame.getHeight(),frame.getWidth())
    data_o = dev_out.get()
    output = data_o.getFirstLayerFp16()

However, after visualizing my output, it's very different than the output I got after inferencing my model on my local.
For a test image:

This what I get on OakD

Overlay on the input image:

and this is what I get on my local on the same image

Overlay on the input image:

Any idea on where I might be going wrong?

jakaskerl

Tanisha frame = dai.ImgFrame() # create an ImgFrame object
frame.setData(data)
frame.setType(dai.RawImgFrame.Type.NONE)

Set the frame type and width and height. Make sure to correctly specify the image as either interleaved or planar.

Thanks,
Jaka

Tanisha

This is the post processing I am doing on my image:
I am just reshaing my image after I get the output from my OakD and take the argmax.

output = softmax(output)
output = np.array(output)
output = output.reshape((5, nn_shape, nn_shape))
output = np.transpose(output, (1,2,0))
test_pred = np.argmax(output, axis=2)

Tanisha

So, I added these lines
frame = dai.ImgFrame() # create an ImgFrame object frame.setType(dai.ImgFrame.Type.RGB888p) frame.setWidth(512) frame.setHeight(512) frame.setData(img)

But I still get an incorrect visualization:

My model expects c,h,w format, so I transposed and resized the image and then sent it, but it still gives wrong output.

Tanisha

Should I change the way I'm saving my blob?
blob_path = blobconverter.from_onnx(
model="resnet101.onnx",
optimizer_params=["--data_type=FP16", "--input_shape=[1,3,512, 512]"],
output_dir="resnet101/",
shaves=12,
)

Should I add the planar or interleaved layout parameter here?

Matija

Tanisha

I'll split my answer in two sections.

Input preprocessing:
Check how inputs are being pre-processed on local computer. You are not applying any optimization parameters, which means that you have to pass images that are already preprocessed.

Example, if during training you take BGR image and scale it to 0-1 range by dividing with 255, you will have to feed such images over XLinkIn based on the flags you use during conversion. Alternatively, you could add "--scale 255" to optimizer_params, and then you could feed images with 0-255 range, as network will first divide them by 255 before running the inference. Note this is just an example and it depends on how your pre-processing actually looks like.

Results:

It looks like you are running softmax directly on the output that you get from the network. This is wrong. You first need to take the outputs and reshape them to the same shape as locally. I'd modify post-processing as such:

output = np.array(output)
output = output.reshape((5, nn_shape, nn_shape))
output = np.transpose(output, (1,2,0))
test_pred = np.argmax(output, axis=2)
test_prob = softmax(output)

Not sure how you softmax looks like, but you need to make sure it works with axis=2. For classes, argmax is enough, softmax is only needed if you want probability of each class as well.