Oak-d Pro Object tracking

Lli_you_chen · May 2, 2024

I want to use Oak-d Pro to object tracking on IR camera with my model, I used Depthai Tools to change .pt into blob format,The option show that yolov8 is detect only, I am not sure it is the reason why my tracking program can't tracking object correctly, Please told me how to transfer my model into blob also can tracking

This is my tracking code, Can run ,but can't be recognized and tracked correctly

import cv2

import depthai as dai

import numpy as np

import argparse # Import the argparse package for handling command line arguments

from pathlib import Path # Import the Path object for handling paths

import time

# Define object class labels

labelMap = ["egg"]

# Set the path to the MobileNet SSD model

nnPathDefault = str((Path(__file__).parent / Path("C:/egg.blob")).resolve().absolute())

# Parse command line arguments

parser = argparse.ArgumentParser()

parser.add_argument('nnPath', nargs='?', help="Path to mobilenet detection network blob", default=nnPathDefault)

parser.add_argument('-ff', '--full_frame', action="store_true", help="Perform tracking on full RGB frame", default=False)

args = parser.parse_args()

# Check if full frame tracking is enabled

fullFrameTracking = args.full_frame

# Create a pipeline

pipeline = dai.Pipeline()

monoL = pipeline.create(dai.node.MonoCamera)

manip = pipeline.create(dai.node.ImageManip)

manipOut = pipeline.create(dai.node.XLinkOut)

detectionNetwork = pipeline.create(dai.node.MobileNetDetectionNetwork)

objectTracker = pipeline.create(dai.node.ObjectTracker)

trackerOut = pipeline.create(dai.node.XLinkOut)

manipOut.setStreamName('flood-left')

trackerOut.setStreamName("tracklets")

monoL.setNumFramesPool(24)

monoL.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

script = pipeline.create(dai.node.Script)

script.setProcessor(dai.ProcessorType.LEON_CSS)

script.setScript("""

floodBright = 0.1

LOGGING = False # Set to `True` for latency/timings debugging

node.warn(f'IR drivers detected: {str(Device.getIrDrivers())}')

while True:

# Wait first for a frame event, received at MIPI start-of-frame

event = node.io['event'].get()

if LOGGING: tEvent = Clock.now()

# Set IR flood light intensity

Device.setIrFloodLightIntensity(floodBright)

if LOGGING: tIrSet = Clock.now()

# Wait for the actual frame (after MIPI capture and ISP proc is done)

frameL = node.io['frameL'].get()

if LOGGING: tLeft = Clock.now()

if LOGGING:

latIR = (tIrSet - tEvent ).total_seconds() * 1000

latEv = (tEvent - event.getTimestamp() ).total_seconds() * 1000

latProcL = (tLeft - event.getTimestamp() ).total_seconds() * 1000

node.warn(f'T[ms] latEv:{latEv:5.3f} latIR:{latIR:5.3f} latProcL:{latProcL:6.3f}')

node.io['floodL'].send(frameL)

""")

# Model-specific settings

detectionNetwork.setBlobPath(args.nnPath)

detectionNetwork.setConfidenceThreshold(0.85)

detectionNetwork.input.setBlocking(False)

objectTracker.setDetectionLabelsToTrack([0])

objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)

objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.SMALLEST_ID)

manip.initialConfig.setResize(320, 320)

manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p)

# Linking

monoL.out.link(manip.inputImage)

monoL.frameEvent.link(script.inputs['event'])

monoL.out.link(script.inputs['frameL'])

script.outputs['floodL'].link(manipOut.input)

manip.out.link(detectionNetwork.input)

manip.out.link(objectTracker.inputTrackerFrame)

if fullFrameTracking:

manip.video.link(objectTracker.inputTrackerFrame)

else:

detectionNetwork.passthrough.link(objectTracker.inputTrackerFrame)

detectionNetwork.passthrough.link(objectTracker.inputDetectionFrame)

detectionNetwork.out.link(objectTracker.inputDetections)

objectTracker.out.link(trackerOut.input)

# Connect to device and start pipeline

with dai.Device(pipeline) as device:

preview = device.getOutputQueue("flood-left", 4, False)

tracklets = device.getOutputQueue("tracklets", 4, False)

startTime = time.monotonic()

counter = 0

fps = 0

frame = None

while True:

imgFrame = preview.get()

track = tracklets.get()

counter += 1

current_time = time.monotonic()

if (current_time - startTime) > 1:

fps = counter / (current_time - startTime)

counter = 0

startTime = current_time

color = (255, 0, 0)

frame = imgFrame.getCvFrame()

trackletsData = track.tracklets

for t in trackletsData:

roi = t.roi.denormalize(frame.shape[1], frame.shape[0])

x1 = int(roi.topLeft().x)

y1 = int(roi.topLeft().y)

x2 = int(roi.bottomRight().x)

y2 = int(roi.bottomRight().y)

try:

label = labelMap[t.label]

except:

label = t.label

cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

cv2.putText(frame, f"ID: {[t.id]}", (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

cv2.putText(frame, t.status.name, (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

cv2.rectangle(frame, (x1, y1), (x2, y2), color, cv2.FONT_HERSHEY_SIMPLEX)

cv2.putText(frame, "NN fps: {:.2f}".format(fps), (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color)

cv2.imshow("tracker", frame)

if cv2.waitKey(1) == ord('q'):

break

thanks,

Li

jakaskerl · May 6, 2024

Hi @li_you_chen
I would suggest trying your model on https://docs.luxonis.com/projects/api/en/latest/samples/Yolo/tiny_yolo/. Check if you get proper detections (bounding boxes are correct - IOU and confidence are correct). The tracking should then work as it only uses the detection results.

Thanks,
Jaka

Lli_you_chen · May 7, 2024

Hi @jakaskerl

I had try my own model on Mono & MobilenetSSD ,but It don't detect correct ,The Screen fall of wrong bounding box, I don't know why it doesn't work, But the same model at the old example is working(the old example is not using MobilenetSSD ,is rather like the bottom of https://www.oakchina.cn/2023/02/24/yolov8-blob/ )

I had modify the Mono & MobilenetSSD is

nnPath = str((Path(__file__).parent / Path('C:/egg.blob')).resolve().absolute())

labelMap = ["egg"]

manip.initialConfig.setResize(320, 320)

There’s nothing wrong with using Depthai Tools to transfer yolov8 model, right?

Thanks,

LI

jakaskerl · May 7, 2024

Hi @li_you_chen
If you train with yolov8, it won't work on mobilenetSSD. Here should be a more up to date yolov8 network notebook which you can use directly.

Thanks,
Jaka

Lli_you_chen · May 9, 2024

Hi @jakaskerl

The link is using the same Depthai Tools to transfer model as I did ,It's just that I'm not using Colab rather and using my laptop,I think there should be no difference,

I refer to the old yoloDetect example and modify it like this

import cv2

import depthai as dai

import argparse

# Specify the path to your own model weights file

model_weights_path = "C:/catdog320.blob"

parser = argparse.ArgumentParser()

parser.add_argument('nnPath', nargs='?', help="Path to mobilenet detection network blob", default=model_weights_path)

parser.add_argument('-ff', '--full_frame', action="store_true", help="Perform tracking on full RGB frame", default=False)

args = parser.parse_args()

fullFrameTracking = args.full_frame

# Load model weights using dai.OpenVINO.Blob

custom_model_blob = dai.OpenVINO.Blob(model_weights_path)

numClasses = 80

dim = next(iter(custom_model_blob.networkInputs.values())).dims

output_name, output_tenser = next(iter(custom_model_blob.networkOutputs.items()))

numClasses = output_tenser.dims[2] - 5

labelMap = ["class_%s" % i

for i in range(numClasses)

]

# Create a pipeline

pipeline = dai.Pipeline()

monoL = pipeline.create(dai.node.MonoCamera)

manip = pipeline.create(dai.node.ImageManip)

detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)

objectTracker = pipeline.create(dai.node.ObjectTracker)

manipOut = pipeline.create(dai.node.XLinkOut)

trackerOut = pipeline.create(dai.node.XLinkOut)

manipOut.setStreamName('flood-left')

trackerOut.setStreamName("tracklets")

monoL.setNumFramesPool(24)

monoL.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

script = pipeline.create(dai.node.Script)

script.setProcessor(dai.ProcessorType.LEON_CSS)

script.setScript("""

floodBright = 0.1

node.warn(f'IR drivers detected: {str(Device.getIrDrivers())}')

while True:

event = node.io['event'].get()

Device.setIrFloodLightIntensity(floodBright)

frameL = node.io['frameL'].get()

node.io['floodL'].send(frameL)

""")

# Model-specific settings

detectionNetwork.setBlob(custom_model_blob)

detectionNetwork.setConfidenceThreshold(0.7)

detectionNetwork.input.setBlocking(False)

objectTracker.setDetectionLabelsToTrack([]) #all

objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)

objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.SMALLEST_ID)

manip.initialConfig.setResize(320, 320)

manip.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p)

#link

monoL.out.link(manip.inputImage)

manip.out.link(detectionNetwork.input)

monoL.frameEvent.link(script.inputs['event'])

monoL.out.link(script.inputs['frameL'])

script.outputs['floodL'].link(manipOut.input)

if fullFrameTracking:

manip.video.link(objectTracker.inputTrackerFrame)

else:

detectionNetwork.passthrough.link(objectTracker.inputTrackerFrame)

detectionNetwork.passthrough.link(objectTracker.inputDetectionFrame)

detectionNetwork.out.link(objectTracker.inputDetections)

objectTracker.out.link(trackerOut.input)

# Connect to the device and start the Pipeline

with dai.Device(pipeline) as device:

preview = device.getOutputQueue("flood-left", 4, False)

tracklets = device.getOutputQueue(name="tracklets", maxSize=4, blocking=False)

while True:

imgFrame = preview.get()

track = tracklets.get()

print('track',track)

color = (255, 0, 0)

frame = imgFrame.getCvFrame()

trackletsData = track.tracklets

print('trackdata',trackletsData)

for t in trackletsData:

print('t',t)

roi = t.roi.denormalize(frame.shape[1], frame.shape[0])

x1 = int(roi.topLeft().x)

y1 = int(roi.topLeft().y)

x2 = int(roi.bottomRight().x)

y2 = int(roi.bottomRight().y)

try:

label = labelMap[t.label]

except:

label = t.label

cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

cv2.putText(frame, f"ID: {[t.id]}", (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

cv2.putText(frame, t.status.name, (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

cv2.rectangle(frame, (x1, y1), (x2, y2), color, cv2.FONT_HERSHEY_SIMPLEX)

cv2.imshow("tracker", frame)

if cv2.waitKey(1) == ord('q'):

break

The program can run but cannot recognize anything, If adding detectionNetwork.out.link(trackerOut.input),

it will show " trackletsData = track.tracklets AttributeError: 'depthai.ImgDetections' object has no attribute 'tracklets' " ,Can you help me debug

jakaskerl · May 10, 2024

Hi @li_you_chen
Hmm, could you send the model over, along with MRE so we can check please?
Try changing the IOU threshold and the confidence to see if you get any improvements in detections.
Also important make sure you update to the latest depthai version (2.25.1).

Thanks,
Jaka

Lli_you_chen · May 11, 2024

Hi @jakaskerl

Raising the confident threshold doesn't change anything, iou is not allow to setting in this example,

This my model(.pt & .blob) There is no problem with these two models in detection, it just can't do tracking.

jakaskerl · May 13, 2024

Hi @li_you_chen
I tested with the model and I am getting no detections (to pictures of cats and dogs). I looks like detection problem.

Lli_you_chen · May 14, 2024

Hi @jakaskerl

I don’t know how you test my model ,but I’m pretty sure it’s work on this Yolodetectionnetwork,The Only wondering me is why the same model in blob format is not work on the mobilnetdetection ,

I had always use Depthai Tools to convert the model from .pt into .blob

Thanks,

Li

jakaskerl · May 14, 2024

Hi @li_you_chen
I tested it on images of cats and dogs, with the swapped model inside this example: https://docs.luxonis.com/projects/api/en/latest/samples/Yolo/tiny_yolo/

Could you post the output of detections:

inDet = qDet.tryGet()

if inDet is not None:
            detections = inDet.detections
            print(detections)

You can't interchange Yolo blobs and Mbnet, they have completely different decoding functions.

Thanks,
Jaka

Lli_you_chen · May 16, 2024

Hi @jakaskerl

After I change model ,it works and print(detections) show [<depthai.ImgDetection object at 0x000002276FD7ECF0>, <depthai.ImgDetection object at 0x000002276FCD8CB0>

it works ,but i have to delete two line(that is not a problem)

#cv2.putText(frame, labelMap[detection.label]….

#cv2.putText(frame, f"{int(detection.confidence * 100)}%…….

I modify this three place

#change my model

nnPathDefault = str((Path(__file__).parent / Path('C:\catdog320.blob')).resolve().absolute())

#MobileNetDetectionNetwork -> YoloDetectionNetwork

detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)

detectionNetwork.setBlobPath(nnPathDefault)

it show nothing on the frame

Do you mean this trained model's format is mobilennetssd ?

jakaskerl · May 17, 2024

Hi @li_you_chen
No, it's yolo. I'm just saying you can not use a yolo blob inside MobileNetDetectionNetwork.

#!/usr/bin/env python3

"""
The code is the same as for Tiny Yolo V3 and V4, the only difference is the blob file
- Tiny YOLOv3: https://github.com/david8862/keras-YOLOv3-model-set
- Tiny YOLOv4: https://github.com/TNTWEN/OpenVINO-YOLOV4
"""

from pathlib import Path
import sys
import cv2
import depthai as dai
import numpy as np
import time

# tiny yolo v4 label texts
labelMap = [
    "cat", "dog"
]

syncNN = True

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
detectionNetwork = pipeline.create(dai.node.YoloDetectionNetwork)
xoutRgb = pipeline.create(dai.node.XLinkOut)
nnOut = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("rgb")
nnOut.setStreamName("nn")

# Properties
camRgb.setPreviewSize(320, 320)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.RGB)
camRgb.setFps(40)

# Network specific settings
detectionNetwork.setConfidenceThreshold(0.5)
detectionNetwork.setNumClasses(2)
detectionNetwork.setIouThreshold(.5)
detectionNetwork.setBlobPath("catdog320.blob")
detectionNetwork.setNumInferenceThreads(2)
detectionNetwork.input.setBlocking(False)

# Linking
camRgb.preview.link(detectionNetwork.input)
if syncNN:
    detectionNetwork.passthrough.link(xoutRgb.input)
else:
    camRgb.preview.link(xoutRgb.input)

detectionNetwork.out.link(nnOut.input)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:

    # Output queues will be used to get the rgb frames and nn data from the outputs defined above
    qRgb = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)
    qDet = device.getOutputQueue(name="nn", maxSize=4, blocking=False)

    frame = None
    detections = []
    startTime = time.monotonic()
    counter = 0
    color2 = (255, 255, 255)

    # nn data, being the bounding box locations, are in <0..1> range - they need to be normalized with frame width/height
    def frameNorm(frame, bbox):
        normVals = np.full(len(bbox), frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox), 0, 1) * normVals).astype(int)

    def displayFrame(name, frame):
        color = (255, 0, 0)
        for detection in detections:
            bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))
            cv2.putText(frame, labelMap[detection.label], (bbox[0] + 10, bbox[1] + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"{int(detection.confidence * 100)}%", (bbox[0] + 10, bbox[1] + 40), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)
        # Show the frame
        cv2.imshow(name, frame)

    while True:
        if syncNN:
            inRgb = qRgb.get()
            inDet = qDet.get()
        else:
            inRgb = qRgb.tryGet()
            inDet = qDet.tryGet()

        if inRgb is not None:
            frame = inRgb.getCvFrame()
            cv2.putText(frame, "NN fps: {:.2f}".format(counter / (time.monotonic() - startTime)),
                        (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color2)

        if inDet is not None:
            detections = inDet.detections
            counter += 1

        if frame is not None:
            displayFrame("rgb", frame)

        if cv2.waitKey(1) == ord('q'):
            break

Does this work for you? Cause it doesn't for me and I think the model is not created correctly.

Thanks ,
Jaka

Lli_you_chen · May 19, 2024

Hi @jakaskerl

I can't run the code you send ,but i can run https://docs.luxonis.com/projects/api/en/latest/samples/Yolo/tiny_yolo/

jakaskerl · May 19, 2024

Hi @li_you_chen
This is the exact same script as tiny_yolo in the examples. I only changed the model to catdog and the label names respectively. Can you recheck please?

Thanks
Jaka

Lli_you_chen · May 20, 2024

Hi @jakaskerl

it can run ,but can't detect right ,full of wrong boundingbox , even I raise threshold

can i object tracking by yolodetect not mobilenet ssd

jakaskerl · May 20, 2024

Hi @li_you_chen

li_you_chen full of wrong boundingbox , even I raise threshold

This suggest a problem with the model. Did you test it off depthai?

Thanks,
Jaka

Lli_you_chen · May 22, 2024

Hi @jakaskerl

Can i object tracking by yolodetect not mobilenet ssd. Or If I train my own mobilennet ssd model so I can run this

Thanks,

Li

jakaskerl · May 23, 2024

Hi @li_you_chen
Yes, of course you can. The model may be different but the on-device decoding is done in a way that ensures the output structure is the same, so they can both be linked to an object tracker.

If you train your own mobilenet model, the example you sent should be working without any changes (well, except for the model path).

Thanks,
Jaka

Lli_you_chen · May 24, 2024

Hi @jakaskerl

I think I will try training MobileMe Ssd model,Do you have any tutorial can training

jakaskerl · May 24, 2024

Hi @li_you_chen
We have some here but we ditched MBNet a while ago and mostly use YOLO now. If you wish you can train YOLO as well.

Thanks,
Jaka