Model works correctly on data sent from host but not on camera data

JariullahSafi

I am trying to integrate a custom neural net (an FCN build atop mobilenet 2). I've converted it using blobconverter and set FP16, mean of 0, and scale of 255. The following code works correctly and produces the expected mask output (despite the fact that I am not doing any pre-processing of the input which is just meant to be a division by 255, is that happening automatically?).

pipeline = depthai.Pipeline()
neural_network = pipeline.create(depthai.node.NeuralNetwork)
neural_network.setBlobPath("theneuralnet.blob")

xin_nn = pipeline.create(depthai.node.XLinkIn)
xin_nn.out.link(neural_network.input)
xin_nn.setStreamName("nn_in")

xout_nn = pipeline.create(depthai.node.XLinkOut)
xout_nn.setStreamName("nn_out")
neural_network.out.link(xout_nn.input)

im = cv2.resize(cv2.imread("/tmp/tags_image.jpg"), (640, 480)).astype("float16")
im = im.transpose((2, 0, 1)) # the model expects CHW
with depthai.Device(pipeline) as device:
    q_nn = device.getOutputQueue("nn_out")
    q_nn_in = device.getInputQueue("nn_in")


    nn_data = dai.NNData()
    nn_data.setLayer("input_layer_name", im)
    q_nn_in.send(nn_data)
    oot = q_nn.get()

    out1 = (np.array(oot.getLayerFp16("output_name1")).reshape((480, 640, 1))[:, :, 0]*255).astype("uint8")
    out2 = (np.array(oot.getLayerFp16("output_name2")).reshape((480, 640, 4))[:, :, 0]*255).astype("uint8")

I'm trying to get this model to run on the color camera of the OakD Pro POE but the output looks incorrect. Here's the code for that part.

pipeline = depthai.Pipeline()

cam_rgb = pipeline.create(depthai.node.ColorCamera)
cam_rgb.setFp16(True)
cam_rgb.setPreviewSize(640, 480)
cam_rgb.setInterleaved(False)
cam_rgb.setResolution(depthai.ColorCameraProperties.SensorResolution.THE_1080_P)

neural_network = pipeline.create(depthai.node.NeuralNetwork)
neural_network.setBlobPath("theneuralnet.blob")
cam_rgb.preview.link(neural_network.input)

xout_nn = pipeline.create(depthai.node.XLinkOut)
xout_nn.setStreamName("nn_out")
neural_network.out.link(xout_nn.input)
neural_network.passthrough.link(xout_rgb.input)

with depthai.Device(pipeline) as device:
    q_rgb = device.getOutputQueue("rgb")
    q_nn = device.getOutputQueue("nn_out")

    while True:
        oot = q_nn.tryGet()
        if oot:
            out1 = (np.array(oot.getLayerFp16("output_name1")).reshape((1, 480, 640, 1))[0, :, :, 0]*255).astype("uint8")
            out2 = (np.array(oot.getLayerFp16("output_name2")).reshape((1, 480, 640, 4))[0, :, :, 0]*255).astype("uint8")
            
            cv2.imshow("f1", np.vstack((out1, out2)))
            cv2.waitKey(1)

I suspect the images are not making it into the neural net correctly. Maybe it's a discrepancy between image and model input order? How can I go about investigating this more?

jakaskerl

JariullahSafi I suspect the images are not making it into the neural net correctly. Maybe it's a discrepancy between image and model input order?

You can specify the passthrough of the NN node to see if frames are coming in incorrectly.

jakaskerl

Hi JariullahSafi
If passthrough frames look fine its likely just a problem with the model then. Does the model run well outside depthai?

JariullahSafi despite the fact that I am not doing any pre-processing of the input which is just meant to be a division by 255, is that happening automatically

When converting the model the scale and mean are specified and give instructions how to properly preprocess the input.

Thanks,
Jaka

JariullahSafi

jakaskerl In my example I'm already doing that and the image that comes out looks fine and is float16 with channels first. Here's the transformation I need to do in order to save/view it

frame = np.array(q_rgb.get().getData()).view(np.float16).reshape((3, 480, 640)).transpose(1, 2, 0).astype(np.uint8).copy()`

JariullahSafi

jakaskerl as I mentioned in the original post, the model works correctly even inside depthai when I send data from the host. The first code snippet demonstrates how I'm doing that. The model also works correctly outside depthai.

erik

Hi JariullahSafi ,
Could you create a MRE, so we can try locally (with the image / model / script)? I suspect it's either color (BGR vs RGB ) / layout (planar vs interleaved) / data type (are you sure FP16 is correct?)

JariullahSafi

erik Hi Eric. It's a bit difficult to do that for me at the moment as both the image and model represent intellectual property. I can't post them on a public forum. Is there a process for establishing an MNDA that we can follow after which I can share the items privately.

As per the points you bring up:
- Color: The model I'm using produces correct output for the input image with both channel orders. We flip channels as an augmentation step during training. I have verified that flipping the channels results in the correct output for the test image I'm using.

- Layout: I tried setting cam_rgb.setInterleaved(True) but the device then complains about the format being incorrect

- Data type: the model is converted to FP16, yes. In my test of sending data from the host I'm also casting the data to FP16 and that seems to work. In the second example I'm setting the camera to FP16 as well.

JariullahSafi

Ok. I have some additional data.

I set up a test where I'm streaming from the camera but also running the nerual net only on inputs that I'm sending from the host. I then take the cam.preview output and pass it with no change to the neural network input queue (through `nn_data.setLayer("input", raw_frame)`

This produces the same incorrect output as before. However, if instead of passing rawframe I do np.array(raw_frame).view(np.float16), I get the correct output. This implies that despite FP16 being enabled the neural net isn't getting the data as FP16 correctly.

JariullahSafi

I don't understand why but turning FP16 off on the camera level and then adding this manip node between the camera and the NN works

manip = pipeline.create(dai.node.ImageManip)
manip.initialConfig.setResize(640, 480)
manip.initialConfig.setFrameType(dai.RawImgFrame.Type.BGR888p)
cam_rgb.preview.link(manip.inputImage)

erik

Hi JariullahSafi, is ImageManip really necessary? As you can get this img format from colorCam.preview (setting to BGR, and interleave false).

JariullahSafi

erik Doing that gives me the same incorrect result as before.