• DepthAI-v2
  • What is the meaning of Device vs Host Decoding?

Hi, I am looking into the gen2-yolo examples.

Some of the examples use "device-decoding" , while others use "host-decoding". Could anyone explain the differences to me? How do these relate in terms of inference time?

Also, how does this relate to the network nodes in depthai-core?

I noticed, for example, that the device-decoding example uses dai.node.YoloDetectionNetwork to make the inference, while the yolox example (host-decoding) uses dai.node.NeuralNetwork . I guess, the post-processing of the latter is done in python and the post-processing of the former is done in C++, making the inference faster?

Device decoding uses the accelerator chip inside the Oak to do the inference. That means it takes the data and decodes it without sending it back to the host computer system. If you do it properly you can keep the entire pipeline on device (camera -> ISP->imagemanip -> NN). This is generally a good thing because the Oak is actually much faster at inference than many systems it would be connected to, since it is meant to be an edge device.

Thanks for the response! How do I make sure that the decoding happens on-device? Do I need to include the decoding steps into (e.g.) the onnx model graph?

    Hi StefanWerner
    The decoding on device is done when setting the configuration of the NN node:

    detectionNetwork.setConfidenceThreshold(confidenceThreshold)
    detectionNetwork.setNumClasses(classes)
    detectionNetwork.setCoordinateSize(coordinates)
    detectionNetwork.setAnchors(anchors)
    detectionNetwork.setAnchorMasks(anchorMasks)
    detectionNetwork.setIouThreshold(iouThreshold)
    detectionNetwork.setBlobPath(nnPath)
    detectionNetwork.setNumInferenceThreads(2)
    detectionNetwork.input.setBlocking(False)

    Here you specify the IOU and some other things that tell the NN node how to parse the results. You then access the decoded information with "detections" attribute of NN result.

    If you wish to do host decoding, then you can extract the output layer and do your own parsing like it is done here:
    https://github.com/luxonis/depthai-experiments/blob/0ddbc7232a8b3d3597fb36930ad09c4e25ea0a89/gen2-yolo/host-decoding/main.py#L154C9-L168C60

    Yolo model will usually have the same output structure which will allow the NN node to automatically parse the results without your help. Also, it's faster.

    Hope this helps,
    Jaka