PLACES365 Custom Pretrained Model Issues

hussain_allawati

Hello,

I am trying to implement a pipeline that utilizes the PLACES365 pretrained models for scene classification.

Since the model requires the image to pre-processed and converted to a tensor and normalized (code for inference is given here), I thought to do this pre processing on the host. Also, I will be using STILL images captured from camRGB. My pipeline looks as follows:
XlinkIn (to get capture command) --> camRGB --> xlinkOut (send frames to host)
Then, pre-processing is done on host, and the tensor is sent back to device
XlinkIn (to get tensors) --> NN--> xlinkOut (send NN predictions to host)
Finally, the results are decoded in the host

The problem is that I get this error and wasn't able to figure how to solve the issue
[14442C1001AB47D700] [20.463] [NeuralNetwork(4)] [error] Input tensor 'data' (0) exceeds available data range. Data size (0B), tensor offset (0), size (150528B) - skipping inference
It seems to me there is an issue with either the pre-processing, or with sending the tensors back to the device.
I would be glad if anyone could help me with this

The code is given below:

import torch
from torch.autograd import Variable as V
import torchvision.models as models
from torchvision import transforms as trn
from torch.nn import functional as F
from PIL import Image
# (other imports such as cv2, depthai as dai, numpy, etc..)

# labels path
path = ".............../categories_places365.txt"
file = open(path, "r")
fileData = file.read().splitlines()

# nn path
nnPath = ".............../googlenet_places365.blob"

# dummy image
x = cv2.imread("blue.png")

# Create the pipeline
pipeline = dai.Pipeline()

# 1 Create input control node to acquire capture command
xinCaptureCommand = pipeline.create(dai.node.XLinkIn)
xinCaptureCommand.setStreamName("capture")

# 2 Create Camera node and give its properties
camRGB = pipeline.create(dai.node.ColorCamera)
camRGB.setResolution(dai.ColorCameraProperties.SensorResolution.THE_4_K)
camRGB.setStillSize(1080, 1080)
camRGB.setPreviewSize(1080, 1080)
camRGB.setVideoSize(1080, 1080)
# camRGB.setInterleaved(False)
camRGB.setColorOrder(dai.ColorCameraProperties.ColorOrder.RGB)

# 3 Create output node for still images
outStillRGB = pipeline.create(dai.node.XLinkOut)
outStillRGB.setStreamName("rgbStill")

# 4 Create input control node to acquire modified images
inDataToNN = pipeline.create(dai.node.XLinkIn)
inDataToNN.setStreamName("modifiedImages")

# 5 Create NN nodes
nn = pipeline.create(dai.node.NeuralNetwork)
nn.setBlobPath(nnPath)
nn.setNumInferenceThreads(2)
nn.input.setBlocking(False)

# 6 Create output node for predictions
outPredictions = pipeline.create(dai.node.XLinkOut)
outPredictions.setStreamName("predictions")

# Linking
# Link output of xinCaptureCommand to camera input control
xinCaptureCommand.out.link(camRGB.inputControl)
# Link output of camRGB to input of outStillRGB
camRGB.still.link(outStillRGB.input)
# # Link output of inDataToNN to nn
inDataToNN.out.link(nn.input)
# Link output of nn to input of outPredictions
nn.out.link(outPredictions.input)

#######################################################################################
# Connect to device and start the pipeline
with dai.Device(pipeline) as device:

    # load the image transformer
    centre_crop = trn.Compose([
        trn.Resize((256,256)),
        trn.CenterCrop(224),
        trn.ToTensor(),
        trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])

    # Create input queue to device, that receives capture command
    captureInputQueue = device.getInputQueue("capture")
    # Create output queue that will get RGB frame (Output from device, and input to host)
    stillQueue = device.getOutputQueue(name="rgbStill")
    # Create input queue to device, that receives modified frames
    modifiedImagesQueue = device.getInputQueue("modifiedImages")    
    # Create output queue that will get predictions (Output from device, and input to host)
    qDet = device.getOutputQueue(name="predictions")
   
    frame = None
    detections = []
    cv2.imshow("x",x)
    
    while True:
        # try to get a frame from the device
        stillFrame = stillQueue.tryGet()
        if stillFrame is not None:
           
            # B (i) - host get gets the frame and modifies it
            frame = stillFrame.getCvFrame()
            #frame = cv2.imdecode(stillFrame.getData(), cv2.IMREAD_UNCHANGED)
            cv2.imshow("frame before modyfying it", frame)
           
            # B (ii) modify frame
            modifiedFrame1 = Image.fromarray(frame)
            modifiedFrame2 = V(centre_crop(modifiedFrame1).unsqueeze(0))
            print("frame modified successfuly", modifiedFrame2.size)

            # C - host sends modified frame to device
            nnMsg = dai.NNData()
            nnMsg.setData(modifiedFrame2)
            modifiedImagesQueue.send(nnMsg)
           
        # D - host gets predictions from device
        inDet = qDet.tryGet()
        if inDet is not None:
            detections = inDet.detections
               
            h_x = F.softmax(detections, 1).data.squeeze()
            probs, idx = h_x.sort(0, True)
            print('{} prediction on {}'.format(arch,img_name))
            # output the prediction
            for i in range(0, 5):
                print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))

        # A - host sends capture command to device
        key = cv2.waitKey(1)
        if key == ord("q"):
            break
           
        elif key == ord('c'):
            ctrl = dai.CameraControl()
            ctrl.setCaptureStill(True)
            captureInputQueue.send(ctrl)
            #print("captured")

Note that the related to pre-processing the image and decoding the results is taken from here.

erik

Hello hussain_allawati ,
It looks like your model is expecting 150528 bytes, but you sent 0 bytes to it. Could you double-check what you are actually sending (the nnMsg.setData(modifiedFrame2) line) ? Regarding this issue, see ticket here.
Thanks, Erik

hussain_allawati

erik
Hey Erik,

I double checked the code, and It seems that everything is fine to me.
In the nnMsg.setData(modifiedFrame2) line, I am sending a tensor to the NN.

Also, I tried to print modifiedFrame2 variable when it has been just created on host (just before sending it to the device). Everything seems to be fine and it has the proper size (150528)

I expect that the issue might be in either:
1) the linking
2) sending the message
3) message properties and type (the same line you pointed to)

A) Could you please advice me what to do?

Regarding the issue you pointed, It has been closed. However, I found something interesting saying "provide the amount of bytes that the NN model expects for the inference". B) What property of the NNdata message is used to specify its size? In the NNdata documentation, I did not find anything related to setting the message size.

erik

Hello hussain_allawati ,
A) After furhter reviewing, it looks like you are using NNData.setData(), where every other depthai demo uses NNData.setLayer(layerName, data), eg.

nn_data = dai.NNData()
nn_data.setLayer("input", to_planar(frame, (180, 
detection_in.send(nn_data)

From here. This could actually be the reason that it doesn't "find" the bytes for inference.

B) You don't need to specify size, as the inference is done on the bytes. You could also create ImgFrame (not NNData), and set its size, eg.

img = dai.ImgFrame()
img.setData(data)
img.setType(dai.ImgFrame.Type.BGR888p)
img.setWidth(1280)
img.setHeight(720)
q.send(img)

Thanks, Erik

hussain_allawati

erik
Erik, thank you for your response. Upon using NNdata.setLayer , the code runs without errors, however, the inference results are worng 🙁

The message I am trying to send to the NN is basically a tensor of size [1, 3, 224, 224].
Upon reading the documentation of NNdata, I found that it accepts a List[float] as an input and hence I converted the tensor into a list of shape (1, 3, 224, 224).

When using using either tensor or list as input, inference was executed with no errors in both cases, but with wrong results (For any input image, the same classes were being generated). Note that to acquire the inference results, I used detections=inDet.getFirsttLayerFp16()

After hours of troubleshooting, I concluded that either:
1) I have a problem in using the NNData Message

            nnMsg = dai.NNData()
            nnMsg.setLayer("data", modifiedFrame2)
            modifiedImagesQueue.send(nnMsg)
            # where modifiedFrame2 is either a tensor or list of size 1, 3, 224, 224

Q1) Here, does each channel of the three channels should be mapped to a different layer?

2) Problem in extracting the inference results
detections = inDet.getFirstLayerFp16()
Q2) Am not sure whether using getFirstLayerFp16() is correct or not. There are lots of available properties. Which one shall I use?

3) Problem in converting the original caffee model to blob format

I think I am having such problem because I don't understand exactly what a "layer" is and how it works (in the above lines of code)

Q3) Could you please guide me what to do?

erik

Hello hussain_allawati ,
1) The setLayer function accepts a few things, see img below. My initial guess would be that either preprocessing is missing or that model expects planar and gets interleaved img. I would suggest adding to_planar() conversion before setting the layer;

def to_planar(arr: np.ndarray, shape: tuple) -> list:
    return cv2.resize(arr, shape).transpose(2, 0, 1).flatten()

nnMsg.setLayer("data", to_planar(modifiedFrame2, (244,244)))

2) It depends on what datatype results the model is outputing. You can see that with this:
[print(f"Layer name: {l.name}, Type: {l.dataType}, Dimensions: {l.dims}") for l in inDet.getAllLayers()]

3) I would suggest first converting to OpenVINO and run OpenVINOs inference engine to test whether your model is working as expected (and model optimizer arguments were set correctly). You can also use OAK as NCS2, so inference will happen on the OAK, see example here.

Thanks, Erik

hussain_allawati

erik
Erik,

1) The pre-processing is fine and has no issues. I compared the output of pre-processing of the placesCNN Demo with the my code, and it is exactly the same in terms of value, type, and size.

Upon using to_planar, I got this error
TypeError: Expected Ptr<cv::UMat> for argument 'src'The srchere refers to the first argument passed to cv2.resize So I think that wouldn't solve the issue.

2) The output of the command was
Layer name: prob, Type: DataType.FP16, Dimensions: [1, 365]
However, when checking the placesCNN Demo, the input to the NN and output were both having datatype of torch.float32. Won't this would be the cause of the problem?

Currently, I believe that pre processing is correct. So the issue might be due to either
(i) Sending the tensor to NN
(ii) Converting the model

To troubleshoot sending the tensor to NN, I have no clue what to do (whether to use NNData, ImgFrame, Buffer, or some other type of message, and with what properties). Given this scenario, could you please suggest me what to use?

Regarding the model conversion, I will try the example you suggested. Also, I believe there I might have missed the Model Optimizer parameters. could you please point me to a tutorial/example that demonstrates the same?

Another thing is, would it be easier for me to train a new classification model from scratch rather than troubleshooting?
Ofcourse, If I go with the former choice, the new model will be trained on a small subset of the original dataset with fewer number of classes (just as a proof of concept for the project am working on)

Thanks,

erik

Hello hussain_allawati ,
2) So the detections = inDet.getFirstLayerFp16() should be correct. What do you receive from it (print(detections))? Looking at the original script, inDet.detections, this wouldn't be correct, as NNData doesn't have detections. Regarding the float32 - Myriad X (VPU) only supports FP16, so the compile_tool (converting from OpenVINOs IR to .blob) does the conversion from FP32->FP16. You can use either NNData/ImgFrame/Buffer/etc. as NeuralNetwork node just takes bytes and does the inference.
Regarding model conversion, see docs here.
I am not sure about the classification model, as we only have training+conversion+deployment notebook (here) for object detection and semantic segmentation models.
Thanks, Erik

hussain_allawati

erik
Hey Erik,
I tried several tricks however none worked. Unfortunately the project am working on is running out of time and therefore I will work on the Places365 NN at the end if my team had time.

Thanks,