Request for assistance with OAK-D POE CM4 and Jetson TX2 integration

Babacar · Jul 21, 2023

Hi,

I'm currently working on a project using the OAK-D POE CM4 camera, and I'm encountering latency issues when using the DeepSORT algorithm in real-time. The latency is around 9 seconds, which is far too high for my project's needs.

I have an NVIDIA Jetson TX2 platform that I'd like to use to speed up the processing of the algorithm. However, I'm unsure how I can integrate it given that the OAK-D POE CM4 camera is integrated with a Raspberry Pi CM4 board.

Do you have any suggestions on how I could share processing tasks between the Raspberry Pi and the Jetson TX2, or any general recommendations on how I might reduce the latency of my system?

Thanks in advance for your help, and I look forward to your guidance.

Best Regards,

Babacar

jakaskerl · Jul 22, 2023

Hi Babacar
Afaik there really isn't a way of connecting the CM4 POE VPU to jetson since it's tightly integrated with Rpi. A latency of 9 seconds seems a little extreme. Did you follow a guide for setting up Deepsort? Where do you see latency coming from; is OAK the issue or is it RPI processing that's causing it?

Thanks,
Jaka

Babacar · Jul 24, 2023

hello jakaskerl

I followed this setup guide (https://github.com/luxonis/depthai-experiments/tree/422abacfab83fe8fa2035b80c46f9afdc7a1c374/gen2-deepsort-tracking) for DeepSort, but I cannot determine with certainty whether the latency is coming from the OAK module or the Raspberry Pi's processing, as they are tightly integrated and interdependent.

jakaskerl · Jul 24, 2023

Hi Babacar
I tested the script on my machine and got around 1s of latency. This would lead me to believe it's the raspberry pi that is incapable of fast processing. How are you measuring the latency? Where are you viewing the camera feed?

Thanks,
Jaka

Babacar · Jul 24, 2023

jakaskerl

I am using an Oak-D PoE CM4 camera, so all the processing is done directly on the Raspberry Pi board. To measure the latency, I view the camera feed on the Raspberry Pi by connecting via SSH from my computer.

jakaskerl · Jul 25, 2023

Hi Babacar
Than the problem is that SSH-ing from a PC introduces even more latency into the pipeline, which is not there when running the CM4 on its own (what it's designed for). I suggest you to view the latency measurements using getTimestamp() on frames on the RPI and check the those values instead of looking at the camera feed which will depending on the image size (and ETH connection) introduce even more latency - possibly the largest portion of it.

Here are some examples:
https://docs.luxonis.com/projects/api/en/latest/samples/host_side/latency_measurement/

Thanks,
Jaka

Babacar · Jul 25, 2023

Hi jakaskerl

But how to launch the program without connecting via SSH? By opening the camera and plugging in an HDMI cable? I tried that, but I'm not receiving any image on my screen.

Babacar · Jul 25, 2023

jakaskerl

Here are the results I obtained by running the code on my Raspberry Pi via an SSH connection:

Latency: 108.63 ms, Average latency: 103.88 ms, Standard deviation: 10.33
Latency: 112.14 ms, Average latency: 103.90 ms, Standard deviation: 10.32
Latency: 115.76 ms, Average latency: 103.92 ms, Standard deviation: 10.33
Latency: 99.22 ms, Average latency: 103.91 ms, Standard deviation: 10.32

jakaskerl · Jul 25, 2023

Hi Babacar
Ok, this is great and means the OAK is capable of doing it real-time. Can you also time the loop on RPI without showing a preview (this is usually most resource intensive).

Thanks,
Jaka

Babacar · Jul 26, 2023

Hi jakaskerl

Thank you for your previous insights. I want to clarify that the latency measurements I shared with you earlier were taken without showing the preview (I had commented out `cv2.imshow('frame', imgFrame.getCvFrame())`).

After including the preview display in the computation, here are the new values I obtained:

Latency: 481.45 ms, Average latency: 527.23 ms, Std: 36.31

Latency: 492.62 ms, Average latency: 527.20 ms, Std: 36.31

Latency: 488.96 ms, Average latency: 527.18 ms, Std: 36.31

Latency: 486.37 ms, Average latency: 527.15 ms, Std: 36.31

Latency: 496.27 ms, Average latency: 527.13 ms, Std: 36.31

Latency: 492.14 ms, Average latency: 527.10 ms, Std: 36.31

Latency: 503.84 ms, Average latency: 527.09 ms, Std: 36.30

Latency: 515.83 ms, Average latency: 527.08 ms, Std: 36.29

Latency: 507.38 ms, Average latency: 527.07 ms, Std: 36.28

Latency: 507.18 ms, Average latency: 527.05 ms, Std: 36.27

Latency: 498.62 ms, Average latency: 527.03 ms, Std: 36.27

Latency: 515.08 ms, Average latency: 527.02 ms, Std: 36.25

As you can see, adding the preview display significantly increases the latency.

Best,

Babacar

jakaskerl · Jul 26, 2023

Hi Babacar
Yes, i figured. I seem to have badly formed my question.

Could you try to time the loop (the while True part) of the script ALSO without showing the preview. This is to try to get a sense of how fast the RPI is able to process information (without images).

Thanks,
Jaka

Babacar · Jul 31, 2023

jakaskerl

Hi Jaka,

Apologies for the delay in response. I want to confirm whether this is the correct modification to the code that you requested:

import depthai as dai

import numpy as np

import time

# Create pipeline

pipeline = dai.Pipeline()

pipeline.setXLinkChunkSize(0)

# Define source and output

camRgb = pipeline.create(dai.node.ColorCamera)

camRgb.setFps(60)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

xout = pipeline.create(dai.node.XLinkOut)

xout.setStreamName("out")

camRgb.isp.link(xout.input)

# Connect to device and start pipeline

with dai.Device(pipeline) as device:

print(device.getUsbSpeed())

q = device.getOutputQueue(name="out")

diffs = np.array([])

while True:

start_time = time.time() # Record start time of the loop

imgFrame = q.get()

latencyMs = (dai.Clock.now() - imgFrame.getTimestamp()).total_seconds() * 1000

diffs = np.append(diffs, latencyMs)

print('Latency: {:.2f} ms, Average latency: {:.2f} ms, Std: {:.2f}'.format(latencyMs, np.average(diffs), np.std(diffs)))

end_time = time.time() # Record end time of the loop

loop_time = (end_time - start_time) * 1000 # Calculate loop time in ms

print('Loop time: {:.2f} ms'.format(loop_time))

Please let me know if this is correct, or if there are any further changes that I should make.

Thanks,
Babacar

jakaskerl · Jul 31, 2023

Hi Babacar
Edit the code in main.py (for deepsort) to check how much time it takes for one iteration to complete. The point of this is to find it how long host-side code takes on the RPI (if it's too resource intensive).
EDIT: Of course, without imshow().

Thanks,
Jaka

Babacar · Aug 1, 2023

Hi jakaskerl

Here's the code that I implemented:

import time

# ...

while True:

# Begin timing

start_time = time.time()

for name, q in queues.items():

    # Add all msgs (color frames, object detections and recognitions) to the Sync class.

    if q.has():

        sync.add_msg(q.get(), name)

msgs = sync.get_msgs()

if msgs is not None:

    frame = msgs["color"].getCvFrame()

    detections = msgs["detection"].detections

    embeddings = msgs["embedding"]

    # Write raw frame to the raw_output video

    raw_out.write(frame)

    # Update the tracker

    object_tracks = tracker_iter(detections, embeddings, tracker, frame)

    # For each tracking object

    for track in object_tracks:

        #... All existing code 

    # Write the frame with annotations to the output video

    out.write(frame)

# End timing and print elapsed time

end_time = time.time()

elapsed_time = end_time - start_time

print(f"Elapsed time for iteration: {elapsed_time} seconds")

raw_out.release()

out.release()

These are the results I got:

Elapsed time for iteration: 0.13381719589233398 seconds

Elapsed time for iteration: 0.1333160400390625 seconds

Elapsed time for iteration: 0.13191676139831543 seconds

...

Elapsed time for iteration: 0.13199663162231445 seconds

Thanks, Jaka, for your input so far.I would appreciate any further suggestions you might have to fix this issue.

Babacar · Aug 1, 2023

jakaskerl

Following your advice, I've made some further modifications to my code and have also removed the video writing part. The changes have resulted in considerable improvements in the performance. However, the time taken per iteration now varies widely. Here's a subset of the results:

Elapsed time for iteration: 2.3365020751953125e-05 seconds

Elapsed time for iteration: 2.3603439331054688e-05 seconds

...

...

Elapsed time for iteration: 2.6702880859375e-05 seconds

Elapsed time for iteration: 2.3603439331054688e-05 seconds

Elapsed time for iteration: 3.361701965332031e-05 seconds

Elapsed time for iteration: 2.4080276489257812e-05 seconds

...

...

Elapsed time for iteration: 0.00014281272888183594 seconds

Elapsed time for iteration: 0.06066274642944336 seconds

Elapsed time for iteration: 0.05930662155151367 seconds

Elapsed time for iteration: 0.05977463722229004 seconds

Elapsed time for iteration: 0.06491947174072266 seconds

Babacar · Aug 1, 2023

jakaskerl

Now I am going to try training my own model by following this tutorial:

https://github.com/luxonis/depthai-ml-training/blob/master/colab-notebooks/YoloV8_training.ipynb

But after, does DeepSORT support YOLOv8 and can YOLOv8 be imported in JSON format like YOLOv6 from the DeepSORT_Tracking GitHub repository?

Thank you for your help.

jakaskerl · Aug 1, 2023

Hi Babacar
I'm not sure so I asked Bard:

Yes, DeepSORT supports YOLOv8. You can import YOLOv8 in JSON format from the DeepSORT_Tracking GitHub repository. Here are the steps on how to do it:

Clone the DeepSORT_Tracking GitHub repository.
Go to the deepsort/deepsort/detection/ directory.
Copy the yolov4.cfg and yolov4.weights files from the tutorial you linked to.
Create a new file called yolov8.json.
Paste the following code into the yolov8.json file:

{
 "model": "yolov8",
  "classes": ["person"],
  "path": "./yolov4.cfg",
  "weights": "./yolov4.weights"
   }

Save the yolov8.json file.

Now you can use DeepSORT to track objects detected by YOLOv8.

Here are some additional resources that you may find helpful:

DeepSORT documentation: https://github.com/nwojke/deep_sort/blob/master/README.md
YOLOv8 tutorial: https://pjreddie.com/darknet/yolo/

Hope this helps,
Jaka

Babacar · Aug 3, 2023

Hi Jaka,

I think there might have been a misunderstanding in our last exchange. I intend to train my YOLOv8 model using this code: https://github.com/luxonis/depthai-ml-training/blob/master/colab-notebooks/YoloV8_training.ipynb, and then import it in JSON format, as indicated in the tutorial.

I plan on using this specific DeepSORT repository from Luxonis

and I would like to, instead of launching it with yolov6.json, do it with a yolov8 that I have trained on my own database.

Furthermore, I'm not quite sure about the "deepsort/deepsort/detection" directory you mentioned. I don't see the yolov4.cfg and yolov4.weights files.

Could you provide more clarification on this?

Best regards,

Babacar

erik · Aug 3, 2023

Hi Babacar ,
We have just updated the deepsort demo, and you should be able to easily replace the default object detection model with your own yolov8. THoughts?

Babacar · Aug 7, 2023

Hi erik

Thank you, I am currently training my model, afterwards I will try with the Deep SORT demo.