Neural Network Detection Optimization on Oak-D-Cm4 POE camera

GurdeepakSidhu · Sep 17, 2024

Hi Luxonis Team,

I am seeking assistance with a challenge I am facing in detecting cans as they pass through a conveyor belt. The conveyor processes approximately 3-4 cans per second, and I am using a YOLOv8n model trained on 640x640 images to detect these cans.

The model performs accurately when tested locally on static images. However, when integrated into the Oak camera pipeline, I encounter issues with detection accuracy. The model frequently misses detections and fails to fully capture the cans in the video stream.

I have attached a video demonstrating the problem and included the pipeline definition for reference. I would greatly appreciate any insights or suggestions to improve the performance of the model for this high-speed detection use case.

pipeline = dai.Pipeline()
nn = pipeline.create(dai.node.YoloDetectionNetwork)
nnOut = pipeline.create(dai.node.XLinkOut) nnOut.setStreamName("nn")

camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setIspScale(2,3)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camRgb.setPreviewKeepAspectRatio(False)
camRgb.setFps(15)
camRgb.setPreviewSize(640,640)

xoutRgb = pipeline.create(dai.node.XLinkOut)
nnNetworkOut = pipeline.create(dai.node.XLinkOut)
nnNetworkOut.setStreamName("nnNetwork")
xoutRgb.setStreamName("rgb")

# Universal Properties
nn.setConfidenceThreshold(0.30)
nn.setNumClasses(1)
nn.setCoordinateSize(4) *
nn.setAnchors([])* *
nn.setAnchorMasks({})*
nn.setIouThreshold(0.50)
nn.setBlobPath(nnPath)
nn.setNumInferenceThreads(2)
nn.input.setBlocking(False)
nn.input.setQueueSize(1)

camRgb.video.link(xoutRgb.input)
camRgb.preview.link(nn.input)
nn.out.link(nnOut.input)
nn.outNetwork.link(nnNetworkOut.input

can-detection-2.mp4

7MB

Thank you in advance for your help!

Best regards,

jakaskerl · Sep 18, 2024

GurdeepakSidhu
First this I see is that the boxes seem to not be synced to the video stream. I guess this is not a problem if you don't intend to ship detections coupled with the actual frame.

The other is the issue of camera motion blur - docs here.

Thanks,
Jaka

GurdeepakSidhu · Sep 18, 2024

@jakaskerl

Thank you for your feedback and appreciate it a lot.

What would be the best approach in syncing the detections with the video frame? Is it to use the neural network passthrough node? Unfortunately the issue with this is that I won't be able to display the video frame in higher resolution.

In regards to motion blur issue, I understood from the document that the best approach is to reduce the exposure time value and then have more light available. Is increasing the sensitivity a good approach as well? Am I understanding this correctly?

Thank you and looking forward to your feedback.

Have a great day and best regards!

jakaskerl · Sep 19, 2024

GurdeepakSidhu

GurdeepakSidhu What would be the best approach in syncing the detections with the video frame? Is it to use the neural network passthrough node? Unfortunately the issue with this is that I won't be able to display the video frame in higher resolution.

Generally if you only want to count the objects and don't need RGB frame for production, it is not necessary to sync it. It is only for visualization. You can use a sync node.

GurdeepakSidhu Is increasing the sensitivity a good approach as well? Am I understanding this correctly?

Increasing the sensitivity will only brighten the image. If you wish to unblur the images, you need lower exposure time. Then increase the ISO to brighten the image without needing to increase the lightning. Keep in mind this will increase noise on the image.

Thanks,
Jaka

GurdeepakSidhu · Sep 19, 2024

@jakaskerl Thank you for the response and appreciate it.

Is there any example or document that shows to use Sync Node for detection and RGB Frame?

If not, It would be helpful if you can add that logic to the code I shared above?

Thank you and looking forward to your feedback.

Have a great day and take care!

jakaskerl · Sep 20, 2024

GurdeepakSidhu

import depthai as dai
from datetime import timedelta

# Create pipeline
pipeline = dai.Pipeline()

# Create nodes
nn = pipeline.create(dai.node.YoloDetectionNetwork)
nnOut = pipeline.create(dai.node.XLinkOut)
nnOut.setStreamName("nn")

camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setIspScale(2, 3)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camRgb.setPreviewKeepAspectRatio(False)
camRgb.setFps(15)
camRgb.setPreviewSize(640, 640)

xoutRgb = pipeline.create(dai.node.XLinkOut)
xoutRgb.setStreamName("rgb")

nnNetworkOut = pipeline.create(dai.node.XLinkOut)
nnNetworkOut.setStreamName("nnNetwork")

# Create sync node
sync = pipeline.create(dai.node.Sync)
sync.setSyncThreshold(timedelta(milliseconds=50))  # Adjust sync threshold if needed

# Set NN properties
nn.setConfidenceThreshold(0.30)
nn.setNumClasses(1)
nn.setCoordinateSize(4)
nn.setAnchors([])  # Add your anchors here
nn.setAnchorMasks({})  # Add your anchor masks here
nn.setIouThreshold(0.50)
nn.setBlobPath(nnPath)
nn.setNumInferenceThreads(2)
nn.input.setBlocking(False)
nn.input.setQueueSize(1)

# Link nodes
camRgb.video.link(sync.inputs["video"])  # Link the RGB output to the sync node
camRgb.preview.link(nn.input)  # Link the camera preview to NN input
nn.out.link(sync.inputs["nn"])  # Link the NN output to the sync node

# Link sync output to XLinkOut
sync.out.link(xoutRgb.input)

# If you want to also output the NN metadata separately, link as follows:
nn.outNetwork.link(nnNetworkOut.input)

# Now, when running the pipeline, the RGB frames and NN output will be synchronized.

Thanks,
Jaka

GurdeepakSidhu · Sep 20, 2024

@jakaskerl this helpful, appreciate it.

When we retrieve the rgb frames and neural network results we can just defined output queues for rgb and nn, and then retrieve the frames and neural networks using .get (blocking) method. Or do we use the sync output to retrieve the rgb frame and the neural network detections?

Appreciate it and thank you.

Have a great day!

jakaskerl · Sep 22, 2024

GurdeepakSidhu
It's easier to use the output of the Sync node (the messageGroup) since for each iteration of the host side loop, the messages are synced; it is prone to error.

If you sync, then demux the messages before sending each one separately to the host side, there could be issues with syncing as the frames might be read in separate iteration. But I guess .get would prevent that since it blocks the queues.

Thanks,
Jaka

GurdeepakSidhu · Sep 24, 2024

@jakaskerl Thank you for your insight on this, and appreciate it.

I am working on demux the messages before sending each one separately on the host side. I am seeing that at first the that the object and detections are in sync, but after sometime they are no longer in sync. Any suggestions on how to ensure that majority of the time the object and detections would be in sync?

Looking forward to hearing from you and grateful for your help.

Best regards and have a great day!

jakaskerl · Sep 25, 2024

Hi GurdeepakSidhu
Use get() on both queues. This should read synced messages one by one.

Thanks,
Jaka