IS CNN-LSTM a compatible architecture?
We have managed to compile CNN - LSTM architecture before, where we've used GAP to obtain a vector (like for a FC/linear layer) which was fed to LSTM. I'd have to see your model or architecture to be able to say more about the error.
- Edited
This is the overall model, I tried to modify along the was with all kind of changes but the error was the same specifying the reshape node shape problem just after the CNN to my understanding. Ignore some of the comments might not be relevant to the code implemented.
I will share with you the second model architecture using the GAP as you mentioned.
i implemented the model in a different way hoping to change the result but this error once again came out.
I visualized the model using Netron and checked the error node. It identify the reshape node after the relu6 which is the output of the mobileV2 transfer learning, from here the reshape for some reason it adds another dimension to then remove it afterwards, i don't know why it does that. You can see that you have the global average pooling just afterwards before feeding into the LSTM. In other occasions i found this reshape node after the global average pooling layer you mentioned and it had the same weird problem. No matter how i change the model the problem persists. That was model version 24 so I did tried many approached before this with the same reshape issue. I hope you might give me a clear resolution to this problem.
input shape:
error node:
Interesting, can you please share ONNX file of untrained model with me? I'll try to investigate locally.
- Edited
Matija hi Matija good news, I was saving the onnx file from the tensorflow model and actually managed to get the file converter into .blob.
So far I was testing 2 ways to convert the file into .blob. the first one was to save my model into a tensorflow model and then use openvino to create the IR files. The conversation of .blob file with the IR files was giving me the error mentioned in the previous message.
In the second method I was saving the model into a onnx file however the error I was receiving was different. The error was mentioning batches being not compatible. When I was saving the model I was giving the shape as 8,15,224,224,3 however I then realised that I cannot give 8 batches. It's fine using 8 or more during the training part but when I have to create a file which should work directly for the camera the batch given should be 1 cause it will given 1 batch of 15 frames and not 8 all in once. This is my understanding of course on why I was not able to convert from an onnx file. Please let me know if you agree with my understanding of the error.
Regarding the IR file the online converter use openvino 2022.1 I'm using a later version of openvino, i don't know if this was the reason of the error in the first place which was not recognised from previous use version of openvino. Not sure, because the architecture of the model was the same when savings both onnx and IR files.
I will now have to double check if I can make the file work properly without errors. I'll keep you updated in case other errors arise.
When my project will be done I'll share the foundings on GitHub so that other people can benefit in casa they encounter similar errors, and this will includes all step from creating the model to the deploying of a custom model for these cameras.
Yeah at inference time the expected input shape will be 1. As mentioned, you can specify the input shape with --input_shape. For standard 4 dimensional inputs and ONNX input, resizing the batch node automatically typically works. Though OpenVINO might have some troubles with inferring that for 5 dimensional inputs, so it's better if your ONNX already expects batch size 1. That could have resulted in the error, yes.
Regarding the OpenVINO version, I'd stick with 2022.1 or 2022.3. Later versions are not supported and might result into some issues with unrelated errors.
And sounds good on the GitHub issue! Looking forward to it. We can also update our documentation for models with 5 dimensional inputs. Out of curiosity, what problem are you tackling so that you are using LSTM?
- Edited
Matija hi Matija, yes I'll try my best to highlight the 5 dimensions topic to my best of knowledge and understanding.
Regarding the LSTM, I'm working on my final year project and developing a model for the detection of violent situation in video surveillance. I'm using CNN for the spatial feature wrapped inside a time distributed layer so that it will apply the CNN to each time step in the sequence and then feed this into the LSTM which will then capture the the temporal feature instead.
I will better check the openvino fault and if I find a solution I'll update you
hi Matija, yes I'll try my best to highlight the 5 dimensions topic to my best of knowledge and understanding.
Thanks!
And the project sounds great. Feel free to share some results when you get them!