That's awesome to hear hussain_allawati !
So on line 77 (with dai.Device) what happens is that the pipeline will get serialized and sent to the OAK camera (usb, ethernet..) where the pipeline will actually be constructed in the firmware. Then you add your queues that will send/receive data to/from the device and usually a while loop to continuously run the operations (get data, display frames..).
Thanks, Erik
Newbie Questions - DepthAI API
Hello hussain_allawati , see the overview of the API. After that, you can start exploring examples/demos and then docs about specific nodes (if needed). What doubts do you have?
erik Currently I can't address the exact doubts. I will go through the documentation you suggested, test some code, and keep you updated.
Thanks again,
Sounds great, thanks hussain_allawati !
- Edited
hussain_allawati my pipeline is having several functions such as: Object Recognition, Face Recognition, Text to Speech, and I/O buttons.
Your "program" will have all that, but "pipeline" is the term for what happens inside the Oak device, so your pipeline will consist of camera node(s) connected to a face reco NN node AND an object reco NN node and all those going to output nodes
Erik, I made myself familiar with the API, pipeline, and nodes. I also ran the examples available.
For now, I would like to use pretrained models to perform object detection and scene classification.
I am thinking to use the MS coco dataset for objects, and Places365 dataset for scene classification.
Several pretrained models exist for these datasets. They use various architectures.
For example, the Places365 dataset has models trained using Vgg16, GoogLeNet, ResNet, and AlexNet architectures.
I attempted to download the VGG-16 Places365 model and successfully converted it to .blob using the online convertor tool.
Now my questions are:
1) How to use the converted model and decode its output?
2) Does using a converted model differs based on the originating model? In other words, If I converted several models each originating from different platforms (Caffe, Tensorflow, etc) or from different architectures (VGG-16, ResNet, etc), Is the way to use them and decode their results is the same or differs from one to other?
3) Which architecture is preferred to be used on OAK devices? (the one it is optimized for)
Hello hussain_allawati ,
1) See efficientDet demo here. Decoding really depends on the model. I would use the same decoding code that is usually present in the model repo (eg. at evaluation steps).
2) You can run any AI model on the OAK (all layers need to be supported). Since models have different output, decoding also differs.
3) I don't think Myriad X VPU was really optimized for any specific model, but you can see performance below (or here)
Thanks, Erik
Thank you for your informative reply.
The efficientDet Demo you referred does object detection.
Could you please refer something similar that does image classification?
erik Erik, unfortunately, I am lost !
I spent an entire day to try out to use the Places365 VGG-16 model (after converting it to blob).
The issue is that I don't have much knowledge about NNs, and hence got stuck in understanding how to use the model. I know this discussion might not be related directly to DepthAI, but I would be glad if you can guide me!
Thanks,
Hello hussain_allawati ,
could you share how far you have gotten with the script? I would suggest starting with whit script and change the preview size (to match model's expected image size), blob location, and then decoding logic.
Thanks, Erik
erik Erik, I looked at the examples you suggested.
Currently, I want to perform the following transformation to match the model requirements:
from torchvision import transforms as trn
centre_crop = trn.Compose([
trn.Resize((256,256)),
trn.CenterCrop(224),
trn.ToTensor(),
trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
How can I apply such transformation on DepthAI?