People tracker crashing when switching model from mobilenet to YOLO

guillaumefHUPI

Hello guys,

I have an issue when i switch my mobilenet model to a tinyyolov4. I used the people-tracker code luxonis/depthai-experimentsblob/master/gen2-people-tracker/people_tracker.py and adapted it to send result to an SP32.
if i run the command python main.py -cam -spi -roi 0.5 i don't have any problem, I see my sp32 logs using termite doing what its supposed to do.
when i run it with -yolo, the window frame showing what the camera is seeing is crashing after a couple detections.
I tried lowering fps, still the same issue.
Any idea why it's crashing ?
Thanks

#main.py
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-f', '--flash', default=False, action='store_true', help='Commande à run pour flasher le code sur la carte')
parser.add_argument('-nn', '--nn', type=str, help=".blob path")
parser.add_argument('-yolo',action='store_true',default=False)
parser.add_argument('-vid', '--video', type=str,
                    help="Path to video file to be used for inference (conflicts with -cam)")
parser.add_argument('-spi', '--spi', action='store_true', default=False,
                    help="Send tracklets to the MCU via SPI")
parser.add_argument('-cam', '--camera', action="store_true", default=False,
                    help="Use DepthAI RGB camera for inference (conflicts with -vid)")
parser.add_argument('-t', '--threshold', default=0.25, type=float,
                    help="Minimum distance the object has to move to be considered a real movement")
parser.add_argument('-roi', '--roi_position', type=float, default=0.5,
                    help='ROI Position (0-1)')
parser.add_argument('-a', '--axis', default=True, action='store_false',
                    help='Axis for cumulative counting (default=x axis)')
args = parser.parse_args()

DEBUG = not args.flash
FLASH = args.flash

if args.yolo:
    res = 416
else:
    res = 300

if DEBUG:
    from pathlib import Path

    import blobconverter
    import cv2
    import depthai as dai
    import numpy as np
    from time import monotonic
    import time as time 

    # Labels
    if args.yolo:
        labelMap = [
        "person",         "bicycle",    "car",           "motorbike",     "aeroplane",   "bus",           "train",
        "truck",          "boat",       "traffic light", "fire hydrant",  "stop sign",   "parking meter", "bench",
        "bird",           "cat",        "dog",           "horse",         "sheep",       "cow",           "elephant",
        "bear",           "zebra",      "giraffe",       "backpack",      "umbrella",    "handbag",       "tie",
        "suitcase",       "frisbee",    "skis",          "snowboard",     "sports ball", "kite",          "baseball bat",
        "baseball glove", "skateboard", "surfboard",     "tennis racket", "bottle",      "wine glass",    "cup",
        "fork",           "knife",      "spoon",         "bowl",          "banana",      "apple",         "sandwich",
        "orange",         "broccoli",   "carrot",        "hot dog",       "pizza",       "donut",         "cake",
        "chair",          "sofa",       "pottedplant",   "bed",           "diningtable", "toilet",        "tvmonitor",
        "laptop",         "mouse",      "remote",        "keyboard",      "cell phone",  "microwave",     "oven",
        "toaster",        "sink",       "refrigerator",  "book",          "clock",       "vase",          "scissors",
        "teddy bear",     "hair drier", "toothbrush"
    ]
    else:
        labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",
                    "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]


    # Get argument first
    parentDir = Path(__file__).parent

    videoPath = args.video or parentDir / Path('demo/youtube_video.mp4')
    
    #YOLO
    if args.yolo:
        print("in yolo")
        nnPath = str((Path('../models/yolo-v4-tiny-tf_openvino_2021.4_6shave.blob')).resolve().absolute())
    #MOBILENET
    else:
        print("in mobile net")
        nnPath = args.nn or blobconverter.from_zoo(name="mobilenet-ssd", shaves=7)



    # Whether we want to use video from host or rgb camera
    VIDEO = not args.camera

    class TextHelper:
        def __init__(self) -> None:
            self.bg_color = (0, 0, 0)
            self.color = (255, 255, 255)
            self.text_type = cv2.FONT_HERSHEY_SIMPLEX
            self.line_type = cv2.LINE_AA
        def putText(self, frame, text, coords):
            cv2.putText(frame, text, coords, self.text_type, 1, self.bg_color, 6, self.line_type)
            cv2.putText(frame, text, coords, self.text_type, 1, self.color, 2, self.line_type)

    # Start defining a pipeline
    pipeline = dai.Pipeline()

    #YOLO
    # Create and configure the detection network
    if args.yolo:
        detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)
        detectionNetwork.setBlobPath(nnPath)
        detectionNetwork.setConfidenceThreshold(0.5)
        detectionNetwork.setNumClasses(80)
        detectionNetwork.setCoordinateSize(4)

        # pas besoin pour les full yolo, que pour les tiny
        detectionNetwork.setAnchors([10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319])
        detectionNetwork.setAnchorMasks({"side26": [1, 2, 3], "side13": [3, 4, 5]})

        detectionNetwork.setNumInferenceThreads(1)
        detectionNetwork.input.setBlocking(False)

        # Create and configure the object tracker
        objectTracker = pipeline.create(dai.node.ObjectTracker)
        objectTracker.setDetectionLabelsToTrack([0])  # Object to track
        # possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
        objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
        # Take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
        objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.SMALLEST_ID)

        #Create and configure SysLog node
        sysLog = pipeline.create(dai.node.SystemLogger)
        sysLog.setRate(100)

    #MOBILENET
    else:
    # Create and configure the detection network
        detectionNetwork = pipeline.create(dai.node.MobileNetDetectionNetwork)
        detectionNetwork.setBlobPath(str(Path(nnPath).resolve().absolute()))
        detectionNetwork.setConfidenceThreshold(0.5)

        # Create and configure the object tracker
        objectTracker = pipeline.create(dai.node.ObjectTracker)
        objectTracker.setDetectionLabelsToTrack([2, 6, 7, 10, 12, 13, 14, 15, 17])  # Object to track
        # possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
        objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
        # Take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
        objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.SMALLEST_ID)

        #Create and configure SysLog node
        sysLog = pipeline.create(dai.node.SystemLogger)
        sysLog.setRate(100)

    # Create and configure the color camera
    colorCam = pipeline.create(dai.node.ColorCamera)
    colorCam.setPreviewSize(res, res)
    colorCam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
    colorCam.setInterleaved(False)
    colorCam.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
    if args.yolo:
        colorCam.setFps(5)
    # Connect RGB preview to the detection network
    colorCam.preview.link(detectionNetwork.input)


    time.sleep(1)
    

    # Link detection networks outputs to the object tracker
    detectionNetwork.passthrough.link(objectTracker.inputTrackerFrame)
    detectionNetwork.passthrough.link(objectTracker.inputDetectionFrame)
    detectionNetwork.out.link(objectTracker.inputDetections)

    spiin_str = pipeline.create(dai.node.SPIIn)
    spiin_str.setStreamName("SPIIn")
    spiin_str.setBusId(1)

    script = pipeline.create(dai.node.Script)
    objectTracker.out.link(script.inputs['tracklets'])
    sysLog.out.link(script.inputs['sysLogOut'])

    with open("script.py", "r") as f:
        s = f.read()
        s = s.replace("THRESH_DIST_DELTA",str(args.threshold))
        s = s.replace("ROI_POSITION", str(args.roi_position))
        s = s.replace("AXIS", str(args.axis))
        s = s.replace("WIDTH", str(res))
        s = s.replace("HEIGHT", str(res))
        script.setScript(s)


    # Send send RGB preview frames to the host
    xlinkOut = pipeline.create(dai.node.XLinkOut)
    xlinkOut.setStreamName("preview")
    objectTracker.passthroughTrackerFrame.link(xlinkOut.input)

    if args.spi:
        # Send tracklets via SPI to the MCU
        spiOut = pipeline.create(dai.node.SPIOut)
        spiOut.setStreamName("out")
        spiOut.setBusId(0)
        spiOut.input.setBlocking(True)
        script.outputs['out'].link(spiOut.input)
    ##tracker out
    trackerOut = pipeline.create(dai.node.XLinkOut)
    trackerOut.setStreamName("tracklets")
    objectTracker.out.link(trackerOut.input)

    # Pipeline defined, now the device is connected to
    with dai.Device(pipeline) as device:
        previewQ = device.getOutputQueue("preview",maxSize=4, blocking=False)
        tracklets = device.getOutputQueue("tracklets", 4, False)

        startTime = time.monotonic()
        counter = 0
        frame = None
        # text = TextHelper()
        #
        def update():
            global counter, frame, text, W, H
            W = res
            H = res
            if previewQ.has():
                frame = previewQ.get().getCvFrame()

            trackletsData = tracklets.get().tracklets
            
            if trackletsData is not None and frame is not None:
                for t in trackletsData:
                    roi = t.roi.denormalize(frame.shape[1], frame.shape[0])
                    x1 = int(roi.topLeft().x)
                    y1 = int(roi.topLeft().y)
                    x2 = int(roi.bottomRight().x)
                    y2 = int(roi.bottomRight().y)

                    try:
                        label = labelMap[t.label]
                    except:
                        label = t.label

                    cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
                    cv2.putText(frame, f"ID: {[t.id]}", (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
                    cv2.putText(frame, t.status.name, (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
                    cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), cv2.FONT_HERSHEY_SIMPLEX)

                cv2.putText(frame, "NN fps: {:.2f}".format(counter / (time.monotonic() - startTime)), (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, (255, 0, 0))

            counter +=1
            # Draw ROI line
            if args.axis:
                cv2.line(frame, (int(args.roi_position * W), 0),
                     (int(args.roi_position * W), H), (0xFF, 0, 0), 5)
            else:
                cv2.line(frame, (0, int(args.roi_position * H)),
                     (W, int(args.roi_position * H)), (0xFF, 0, 0), 5)
            if frame is not None:
                cv2.imshow("frame", frame)



        def to_planar(arr: np.ndarray, shape: tuple) -> np.ndarray:
            return cv2.resize(arr, shape).transpose(2, 0, 1).flatten()

        if VIDEO:
            videoQ = device.getInputQueue("video_in")

            cap = cv2.VideoCapture(str(Path(videoPath).resolve().absolute()))
            while cap.isOpened():
                read_correctly, video_frame = cap.read()
                if not read_correctly:
                    break

                img = dai.ImgFrame()
                img.setData(to_planar(video_frame, (res, res)))
                img.setType(dai.RawImgFrame.Type.BGR888p)
                img.setTimestamp(monotonic())
                img.setWidth(res)
                img.setHeight(res)
                videoQ.send(img)

                update()
                if cv2.waitKey(1) == ord('q'):
                    break
            print("End of the video")
        else:
            while True:
                update()
                if cv2.waitKey(1) == ord('q'):
                    break
#script.py
trackableObjects = {}

import time as time
def labelTracked(argument):
#YOLO
    labelMap = {
    0: "person",
    1: "bicycle",
    2: "car",
    3: "motorbike",
    4: "aeroplane",
    5: "bus",
    6: "train",
    7: "truck",
    8: "boat",
    9: "traffic light",
    10: "fire hydrant",
    11: "stop sign",
    12: "parking meter",
    13: "bench",
    14: "bird",
    15: "cat",
    16: "dog"
}
#MOBILENET
    # labelMap = {
    #     0: "background",
    #     1: "aeroplane",
    #     2: "bicycle", #########################
    #     3: "bird",
    #     4: "boat",
    #     5: "bottle",
    #     6: "bus", ##############################
    #     7: "car", ##############################
    #     8: "cat",
    #     9: "chair",
    #     10: "cow", #############################
    #     11: "diningtable",
    #     12: "dog", #############################
    #     13: "horse",############################
    #     14: "motorbike",########################
    #     15: "person",###########################
    #     16: "pottedplant",
    #     17: "sheep",############################
    #     18: "sofa",
    #     19: "train",
    #     20: "tvmonitor"
    # }
    return labelMap.get(argument, "nothing")

def send_SPI(label, direction, temperature):
    text = "/"
    b2 = Buffer(len(text))
    b2.setData(text.encode('ascii'))
    i = 0
    while i < 9:
        if i < 8:
            node.io['out'].send(b2)
        elif i == 8:
            text = labelTracked(label) + "," + direction+","+str(temperature)
            b = Buffer(len(text))
            b.setData(text.encode('ascii'))
            node.warn("send_spi")
            node.io['out'].send(b)
        i = i + 1


def get_centroid(roi):
    x1 = roi.topLeft().x
    y1 = roi.topLeft().y
    x2 = roi.bottomRight().x
    y2 = roi.bottomRight().y
    return (int((x2-x1)/2+x1), int((y2-y1)/2+y1))


class TrackableObject:
    def __init__(self, objectID, centroid):
        # store the object ID, then initialize a list of centroids
        # using the current centroid
        self.objectID = objectID
        self.centroids = [centroid]

        # initialize a boolean used to indicate if the object has
        # already been counted or not
        self.counted = False

temp1= time.time()

while True:
    tracklets = node.io['tracklets'].get()
    SysInfo = node.io['sysLogOut'].get()
    temp = SysInfo.chipTemperature
    # fpsSC=30
    
    # if temp>60:
    #     node.io['fps'].send(fpsSC)

    
    #time.sleep(1)
    temp2=time.time()
    if temp2-temp1 > 2:
        temp1 = temp2
        send_SPI("temp","T",temp.average)


    for t in tracklets.tracklets:
        to = trackableObjects.get(t.id, None)
        # node.warn(f"{to}")
        roi = t.roi.denormalize(WIDTH, HEIGHT)
        centroid = get_centroid(roi)
        # node.warn(f"{centroid}")
        # node.warn(f"{t.status}")

        # If new tracklet, save its centroid
        if t.status == Tracklet.TrackingStatus.NEW:
            to = TrackableObject(t.id, centroid)
        elif to is not None:
            if AXIS and not to.counted:
                x = [c[0] for c in to.centroids]
                mean = sum(x) / len(x)
                direction = centroid[0] - mean
                if abs(direction) > THRESH_DIST_DELTA:
                    if centroid[0] > ROI_POSITION * WIDTH and direction > 0 and mean < ROI_POSITION * WIDTH:
                        #counter[1]+=1 #Right
                        to.counted = True
                        node.warn(f"{labelTracked(t.label)} went right")
                        send_SPI(t.label, "R", 50)

                    elif centroid[0] < ROI_POSITION * WIDTH and direction < 0 and mean > ROI_POSITION * WIDTH:
                        #counter[0]+=1 #Left
                        to.counted = True
                        node.warn(f"{labelTracked(t.label)} went left")
                        send_SPI(t.label, "L", 50)


            elif not AXIS and not to.counted:
                y = [c[1] for c in to.centroids]
                mean = sum(y) / len(y)
                direction = centroid[1] - mean
                if abs(direction) > THRESH_DIST_DELTA:
                    if centroid[1] > ROI_POSITION * HEIGHT and direction > 0 and mean < ROI_POSITION * HEIGHT:
                        #counter[3]+=1 #Down
                        to.counted = True
                        #"node.warn(f"{labelTracked(t.label)} went down")
                        send_SPI(t.label, "D", 50)

                    elif centroid[1] < ROI_POSITION * HEIGHT and direction < 0 and mean > ROI_POSITION * HEIGHT:
                        #counter[2]+=1 #Up
                        to.counted = True
                        #node.warn(f"{labelTracked(t.label)} went up")
                        send_SPI(t.label, "U", 50)

                to.centroids.append(centroid)

        trackableObjects[t.id] = to

jakaskerl

guillaumefHUPI
Always the same amount of frames regardless of the FPS?

Make the code as minimal as you possibly can, this is unreadable.

Thanks,
Jaka

guillaumefHUPI

jakaskerl the code is almost the same as the one in the default repo luxonis/depthai-experimentstree/master/gen2-people-tracker/api I can try to make it smaller if you want.

Regarding the FPS, lowering doesn't fix the issue. I tried using a YOLOv10 blob with a small resolution 128 128 created from luxonis tool https://tools.luxonis.com/ with only one shave and it works. With a 256 by 256 image it crashes though. I don't see any extreme usage of resources in the logs and no logs at all saying that the pipeline is crashing so I can't find the origin of the problem.

jakaskerl

guillaumefHUPI
Try manually increasing the pool sizes:
camRgb.setNumFramesPool(raw, isp, video, preview, still)

Default:

  int numFramesPoolRaw = 3;
    int numFramesPoolIsp = 3;
    int numFramesPoolVideo = 4;
    int numFramesPoolPreview = 4;
    int numFramesPoolStill = 4;

Thanks,
Jaka

guillaumefHUPI

@jakaskerl ok thank you. I will try this and get back to you

guillaumefHUPI

@jakaskerl i tried setting different values with the code you sent me but still no success. Do you have any idea ? Have you tried my code and if yes is it working for you ?

Thanks

jakaskerl

guillaumefHUPI
Any chance you can send me the MRE and the model (via email or here) so I can check?

Thanks,
Jaka

guillaumefHUPI

jakaskerl yes no problem but i can't find your mail when i click on your profil. What is it ?

jakaskerl

guillaumefHUPI
Send to support@luxonis.com.

Thanks
Jaka

guillaumefHUPI

hi @jakaskerl , did you have time to test the code I sent ? I forgot to precise i am using this cam : https://shop.luxonis.com/products/oak-d-iot-75

jakaskerl

guillaumefHUPI
I didn't see the ticket since the bot responded to it and it automatically assigned as "already answered".
Will take a look at it today.

Thanks
Jaka

jakaskerl

guillaumefHUPI
The model expects 256x256 input, but you are using camRgb.setPreviewSize(416,416).
Model otherwise works, i've tested it.

Thanks,
Jaka

guillaumefHUPI

jakaskerl ok thank you for testing this. I corrected with correct input size but when i run the yolo code, teh pipeline stops, i get like 20 logs of " sending [0] to out pipeline" and then nothing, no crach log. It's not the case in mobilenet. Also, in mobilenet I see logs in termite for checking spi data but not with yolo. Do you know which IA module you have on the camera ? Apparently it can be different on same product.

guillaumefHUPI

@jakaskerl i opened the cam, i have ref for cam : R0M0E0 and for compute module : R2M1E2

jakaskerl

guillaumefHUPI
HW is the same. Which depthai are you using?
If you remove SPI, does the yolo work? Can you send a MRE without the SPI? Does it work on latest depthai?

If you use an old version of depthai, yolov10 is not yet integrated.

Thanks,
Jaka