Hi Luxonis Team,

I am seeking assistance with a challenge I am facing in detecting cans as they pass through a conveyor belt. The conveyor processes approximately 3-4 cans per second, and I am using a YOLOv8n model trained on 640x640 images to detect these cans.

The model performs accurately when tested locally on static images. However, when integrated into the Oak camera pipeline, I encounter issues with detection accuracy. The model frequently misses detections and fails to fully capture the cans in the video stream.

I have attached a video demonstrating the problem and included the pipeline definition for reference. I would greatly appreciate any insights or suggestions to improve the performance of the model for this high-speed detection use case.

pipeline = dai.Pipeline()
nn = pipeline.create(dai.node.YoloDetectionNetwork)
nnOut = pipeline.create(dai.node.XLinkOut) nnOut.setStreamName("nn")

camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setIspScale(2,3)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camRgb.setPreviewKeepAspectRatio(False)
camRgb.setFps(15)
camRgb.setPreviewSize(640,640)

xoutRgb = pipeline.create(dai.node.XLinkOut)
nnNetworkOut = pipeline.create(dai.node.XLinkOut)
nnNetworkOut.setStreamName("nnNetwork")
xoutRgb.setStreamName("rgb")

# Universal Properties
nn.setConfidenceThreshold(0.30)
nn.setNumClasses(1)
nn.setCoordinateSize(4) *
nn.setAnchors([])* *
nn.setAnchorMasks({})*
nn.setIouThreshold(0.50)
nn.setBlobPath(nnPath)
nn.setNumInferenceThreads(2)
nn.input.setBlocking(False)
nn.input.setQueueSize(1)

camRgb.video.link(xoutRgb.input)
camRgb.preview.link(nn.input)
nn.out.link(nnOut.input)
nn.outNetwork.link(nnNetworkOut.input

can-detection-2.mp4
7MB

Thank you in advance for your help!

Best regards,

    GurdeepakSidhu
    First this I see is that the boxes seem to not be synced to the video stream. I guess this is not a problem if you don't intend to ship detections coupled with the actual frame.

    The other is the issue of camera motion blur - docs here.

    Thanks,
    Jaka

    @jakaskerl

    Thank you for your feedback and appreciate it a lot.

    What would be the best approach in syncing the detections with the video frame? Is it to use the neural network passthrough node? Unfortunately the issue with this is that I won't be able to display the video frame in higher resolution.

    In regards to motion blur issue, I understood from the document that the best approach is to reduce the exposure time value and then have more light available. Is increasing the sensitivity a good approach as well? Am I understanding this correctly?

    Thank you and looking forward to your feedback.

    Have a great day and best regards!

      GurdeepakSidhu

      GurdeepakSidhu What would be the best approach in syncing the detections with the video frame? Is it to use the neural network passthrough node? Unfortunately the issue with this is that I won't be able to display the video frame in higher resolution.

      Generally if you only want to count the objects and don't need RGB frame for production, it is not necessary to sync it. It is only for visualization. You can use a sync node.

      GurdeepakSidhu Is increasing the sensitivity a good approach as well? Am I understanding this correctly?

      Increasing the sensitivity will only brighten the image. If you wish to unblur the images, you need lower exposure time. Then increase the ISO to brighten the image without needing to increase the lightning. Keep in mind this will increase noise on the image.

      Thanks,
      Jaka

      @jakaskerl Thank you for the response and appreciate it.

      Is there any example or document that shows to use Sync Node for detection and RGB Frame?

      If not, It would be helpful if you can add that logic to the code I shared above?

      Thank you and looking forward to your feedback.

      Have a great day and take care!

        GurdeepakSidhu

        import depthai as dai
        from datetime import timedelta
        
        # Create pipeline
        pipeline = dai.Pipeline()
        
        # Create nodes
        nn = pipeline.create(dai.node.YoloDetectionNetwork)
        nnOut = pipeline.create(dai.node.XLinkOut)
        nnOut.setStreamName("nn")
        
        camRgb = pipeline.create(dai.node.ColorCamera)
        camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
        camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
        camRgb.setIspScale(2, 3)
        camRgb.setInterleaved(False)
        camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
        camRgb.setPreviewKeepAspectRatio(False)
        camRgb.setFps(15)
        camRgb.setPreviewSize(640, 640)
        
        xoutRgb = pipeline.create(dai.node.XLinkOut)
        xoutRgb.setStreamName("rgb")
        
        nnNetworkOut = pipeline.create(dai.node.XLinkOut)
        nnNetworkOut.setStreamName("nnNetwork")
        
        # Create sync node
        sync = pipeline.create(dai.node.Sync)
        sync.setSyncThreshold(timedelta(milliseconds=50))  # Adjust sync threshold if needed
        
        # Set NN properties
        nn.setConfidenceThreshold(0.30)
        nn.setNumClasses(1)
        nn.setCoordinateSize(4)
        nn.setAnchors([])  # Add your anchors here
        nn.setAnchorMasks({})  # Add your anchor masks here
        nn.setIouThreshold(0.50)
        nn.setBlobPath(nnPath)
        nn.setNumInferenceThreads(2)
        nn.input.setBlocking(False)
        nn.input.setQueueSize(1)
        
        # Link nodes
        camRgb.video.link(sync.inputs["video"])  # Link the RGB output to the sync node
        camRgb.preview.link(nn.input)  # Link the camera preview to NN input
        nn.out.link(sync.inputs["nn"])  # Link the NN output to the sync node
        
        # Link sync output to XLinkOut
        sync.out.link(xoutRgb.input)
        
        # If you want to also output the NN metadata separately, link as follows:
        nn.outNetwork.link(nnNetworkOut.input)
        
        # Now, when running the pipeline, the RGB frames and NN output will be synchronized.

        Thanks,
        Jaka

        @jakaskerl this helpful, appreciate it.

        When we retrieve the rgb frames and neural network results we can just defined output queues for rgb and nn, and then retrieve the frames and neural networks using .get (blocking) method. Or do we use the sync output to retrieve the rgb frame and the neural network detections?

        Appreciate it and thank you.

        Have a great day!

          GurdeepakSidhu
          It's easier to use the output of the Sync node (the messageGroup) since for each iteration of the host side loop, the messages are synced; it is prone to error.

          If you sync, then demux the messages before sending each one separately to the host side, there could be issues with syncing as the frames might be read in separate iteration. But I guess .get would prevent that since it blocks the queues.

          Thanks,
          Jaka

          @jakaskerl Thank you for your insight on this, and appreciate it.

          I am working on demux the messages before sending each one separately on the host side. I am seeing that at first the that the object and detections are in sync, but after sometime they are no longer in sync. Any suggestions on how to ensure that majority of the time the object and detections would be in sync?

          Looking forward to hearing from you and grateful for your help.

          Best regards and have a great day!