Newbie Questions - DepthAI API

hussain_allawati

Hello everyone,
I am new to the DepthAI environment and have just ordered an OAK-1 to use in my Bachelor's Final Year Project along with an RPi4. Can't wait to receive this fabulous piece of hardware and play with it!

I am currently reading on how to start up with the OAK-1 and DepthAI device. I have some newbie questions that might seem trivial to most of you..

1) In order to allow all the processing to happen on the VPU, I need to install the DepthAI API on the RPi.
My questions is, how the code will know whether some part will be run on the VPU or on the RPi CPU?
In other words, my pipeline is having several functions such as: Object Recognition, Face Recognition, Text to Speech, and I/O buttons. How to determine which parts will run on the CPU and what parts will run on the VPU?

2) Is it possible to have more than one neural network model on my system? (each NN will run separately; not simultaneously with others.) If possible, how to do so?

Any help will be appreciated.
Cheers,

erik

Hello hussain_allawati ,
awesome that you are working on your Bachelors project with our cameras! And please share it afterwards if that is possible (opensource) 🙂

1) Everything that's a part of the depthai pipeline will get executed on the device. So looking at this example, everything until code line 77 gets executed on the device, and everything after gets executed on the CPU (visualization only in this case).

2) Yes, you can run as many NNs as you want - parallel or serial. See demo here, which has 4 NNs running. On the how, you just add multiple nodes and connect them as you see fit.

Hopefully this helps!
Thanks, Erik

cycob

hussain_allawati my pipeline is having several functions such as: Object Recognition, Face Recognition, Text to Speech, and I/O buttons.

Your "program" will have all that, but "pipeline" is the term for what happens inside the Oak device, so your pipeline will consist of camera node(s) connected to a face reco NN node AND an object reco NN node and all those going to output nodes

hussain_allawati

erik Erik, Thank you for your informative reply! The examples you gave exactly address my issue. I will go through them with my teammates and update you about the results.

Line 77 of the first example you mentioned: with dai.Device(pipeline) as device:
As per my understanding of the code, the part before line 77 will execute on the CPU, and the code after line 77 will be executed on the OAK (opposite to what you mentioned). Is that correct?

And yeah, we have a plan to make our project open source once done!

erik

That's awesome to hear hussain_allawati !
So on line 77 (with dai.Device) what happens is that the pipeline will get serialized and sent to the OAK camera (usb, ethernet..) where the pipeline will actually be constructed in the firmware. Then you add your queues that will send/receive data to/from the device and usually a while loop to continuously run the operations (get data, display frames..).
Thanks, Erik

hussain_allawati

erik Is there a documentation/course that explains the pipeline and nodes in details? I went through the this but still have some doubts regarding the issue.

Thanks again,

erik

Hello hussain_allawati , see the overview of the API. After that, you can start exploring examples/demos and then docs about specific nodes (if needed). What doubts do you have?

hussain_allawati

erik Currently I can't address the exact doubts. I will go through the documentation you suggested, test some code, and keep you updated.

Thanks again,

hussain_allawati

erik

Erik, I made myself familiar with the API, pipeline, and nodes. I also ran the examples available.

For now, I would like to use pretrained models to perform object detection and scene classification.

I am thinking to use the MS coco dataset for objects, and Places365 dataset for scene classification.
Several pretrained models exist for these datasets. They use various architectures.
For example, the Places365 dataset has models trained using Vgg16, GoogLeNet, ResNet, and AlexNet architectures.

I attempted to download the VGG-16 Places365 model and successfully converted it to .blob using the online convertor tool.

Now my questions are:

1) How to use the converted model and decode its output?
2) Does using a converted model differs based on the originating model? In other words, If I converted several models each originating from different platforms (Caffe, Tensorflow, etc) or from different architectures (VGG-16, ResNet, etc), Is the way to use them and decode their results is the same or differs from one to other?
3) Which architecture is preferred to be used on OAK devices? (the one it is optimized for)

erik

Sounds great, thanks hussain_allawati !

erik

Hello hussain_allawati ,
1) See efficientDet demo here. Decoding really depends on the model. I would use the same decoding code that is usually present in the model repo (eg. at evaluation steps).
2) You can run any AI model on the OAK (all layers need to be supported). Since models have different output, decoding also differs.
3) I don't think Myriad X VPU was really optimized for any specific model, but you can see performance below (or here)

Thanks, Erik

hussain_allawati

erik

Thank you for your informative reply.
The efficientDet Demo you referred does object detection.
Could you please refer something similar that does image classification?

hussain_allawati

erik Erik, unfortunately, I am lost !
I spent an entire day to try out to use the Places365 VGG-16 model (after converting it to blob).
The issue is that I don't have much knowledge about NNs, and hence got stuck in understanding how to use the model. I know this discussion might not be related directly to DepthAI, but I would be glad if you can guide me!

Thanks,

erik

Hello hussain_allawati , we do;

erik

Hello hussain_allawati ,
could you share how far you have gotten with the script? I would suggest starting with whit script and change the preview size (to match model's expected image size), blob location, and then decoding logic.
Thanks, Erik

hussain_allawati

erik Erik, I looked at the examples you suggested.

Currently, I want to perform the following transformation to match the model requirements:

from torchvision import transforms as trn centre_crop = trn.Compose([ trn.Resize((256,256)), trn.CenterCrop(224), trn.ToTensor(), trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])

How can I apply such transformation on DepthAI?