Pipeline is frozen even when all node input queues are set to non-blocking

EvanPeterson · Jan 19, 2024

I have a pipeline that is working great. When I add two new nodes to it though (highlighted in green below) the whole pipeline freezes, meaning if I request a frame from an output queue, the first frame comes immediately, but any attempt to fetch a second frame hangs indefinitely.

My understanding based on https://discuss.luxonis.com/d/1774-script-node-blocking-behaviour-isnt-documented-or-as-expected is that when I have all input queues set to non-blocking (e.g. node.input.setBlocking(False)), I should not have any pipeline freezing, since no input queues are getting backed up. As can be seen in the above picture, all node inputs have a green circle which means they are all set to non-blocking.

Another detail which may affect things: every Script node I am using performs a blocking get (e.g. node.io["an_input"].get()) on each of its inputs for each iteration of the script, and outputs one output on every iteration of the script.

My question is: why is my pipeline freezing when all node input queues are set to non-blocking?

jakaskerl · Jan 20, 2024

Hi EvanPeterson
In addition to blocking behaviour, a pipeline can be blocked by get calls. Are you not receiving any data to the script node input? This could be due to the model not producing any NNData (detections).
Try using tryGet so you can check whether input message is received.

Thanks,
Jaka

EvanPeterson · Jan 25, 2024

Thanks for your reply @jakaskerl. I rolled back my changes, and got the pipeline back in a working state. Then I switched my earliest Script node ("Script 1" in the image above) from performing a .get() on its input to performing .tryGet() instead. What I'm seeing is that the script receives 8 or so NNData frames from its input, then doesn't receive any more. This isn't the case when I use .get() in that script. When I use .get(), the Script node is always able to receive input NNData whenever I request it from the final output queues. Do you know why that might be?

jakaskerl · Jan 26, 2024

Hi EvanPeterson
Adding the code you have inside the script node might shed some more light on the issue I think. I don't currently have any clue as to why the node would block when set to tryGet(). There is likely an underlying issue somewhere else in the code.

Thanks,
Jaka

EvanPeterson · Jan 31, 2024

Thanks, here is the code for that Script node ("Script 1"):

"""
A script which takes face detection bounding boxes produced by a neural network (including spatial
coordinates), and isolates the one that is closest to the camera.
"""
import marshal
import time

def argmin(data):
    """
    Returns the index of the minimum value in `data`. Source:
    https://stackoverflow.com/a/72758707/7159273
    """
    return min(range(len(data)), key=lambda i: data[i])


def main():
    while True:
        face_det_out = node.io["face_det_nn"].get()
        user_face = None

        # Oak-D outputs `0` if it is not able to determine the spatial depth value `z`.
        faces = [f for f in face_det_out.detections if f.spatialCoordinates.z != 0]
        if len(faces) == 1:
            user_face = faces[0]
        elif len(faces) > 1:
            # We select the face closest to the Oak-D camera as the user's face.
            i_min = argmin([f.spatialCoordinates.z for f in faces])
            user_face = faces[i_min]

        if user_face is None:
            # Send an empty buffer.
            msg = Buffer(0)
        else:
            # DepthAI only supports a limited number of message types being passed from pipeline
            # node to pipeline node. Since we're not using any of
            # [their registered types](https://docs.luxonis.com/projects/api/en/latest/components/messages/)
            # for this message we need to use their base `Buffer` type. Source:
            # https://docs.luxonis.com/projects/api/en/latest/components/nodes/script/#usage
            serialized = marshal.dumps(
                {
                    "xmin": user_face.xmin,
                    "ymin": user_face.ymin,
                    "xmax": user_face.xmax,
                    "ymax": user_face.ymax,
                    "x": user_face.spatialCoordinates.x,
                    "y": user_face.spatialCoordinates.y,
                    "z": user_face.spatialCoordinates.z,
                }
            )
            msg = Buffer(len(serialized))
            msg.setData(serialized)

        node.io["user_face"].send(msg)

        time.sleep(0.001)


main()

jakaskerl · Feb 1, 2024

EvanPeterson What I'm seeing is that the script receives 8 or so NNData frames from its input, then doesn't receive any more.

This means 8 or more/ around 8 frames? Doing tryGet would break the list comprehension with something like none object has no property detections.... Maybe try to up the sleep time to 0.1 seconds and see if it makes a difference.

Difficult to say what the issue is. Is there any way you could attach a MRE?

Thanks,
Jaka

EvanPeterson · Feb 2, 2024

I've figured out that the freezing issue only occurs when the NoC (network-on-chip) DDR memory speed is low. I've gotten the pipeline to a state where sometimes NoC DDR is low (bottoming out at 138 MB/s within a few seconds) when I start it up, and sometimes its high (topping out at 1780 MB/s within a few seconds). When NoC speed is slow the pipeline freezes on the second requested frame. When NoC speed is high the pipeline never freezes and works great.

Right now the pipeline is being slow and freezing about 40% of the time. Before it was 100% of the time. I improved it by adding nn.input.setQueueSize(1) to the new NeuralNetwork node in my pipeline, which also has nn.input.setBlocking(False).

@jakaskerl Do you have any other tips for improving NoC DDR speed on a DepthAI pipeline?

EvanPeterson · Feb 2, 2024

I tried setting nn.input.setQueueSize(1) on the other two NeuralNetwork nodes in my pipeline, and that puts NoC DDR MB/s at 960 MB/s every time I start up the pipeline, but the pipeline also freezes on the second frame request every time with those settings. So maybe NoC DDR MB/s is not actually the issue.

EvanPeterson · Feb 2, 2024

Ok, I tried lowering the FPS on the pipeline's ColorCamera and MonoCamera nodes from the default 30 down to 15, using myCameraNode.setFps(15). That brought average NoC DDR MB/s down to 870, but now the pipeline is not freezing! I don't have a clear idea of why its working, but I'm glad it is for now.

jakaskerl · Feb 9, 2024

Hi @EvanPeterson
I think there might be a bug in the FW for blocking/non-blocking links. In theory this should work unless camera pools are being overflooded but I don't think this is happening here. There seems to be some blocking despite setting all links to non-blocking, which needs to be fixed in FW.

Thanks,
Jaka