How to reduce CPU and Memory Usage?

ZachJaison · Jul 14, 2024

Hi,

I am planning to run 32 OAK-1-PoE Cameras. Yep thirty two. I am running yolov6n object detection. I need assistance with various things.

My first question can i run all of the cameras in One PC?
I have python script with some logic . Can i run all the cameras in one python script ? If so how ? since i am loading one pipeline ? Please give me an example thats all.
When i ran just one python script for one camera it took up good amount of CPU and Memory how can i optimize it? As in not code wise but like Can i use a GPU? if so how exactly? if i run it on linux will it get better? if i write it in C++ will it get better? Please explain in detail.

If there is anything that can help me with processing please explain since i do have good GPU etc…… Suggest any alternatives?

Thank you.

jakaskerl · Jul 14, 2024

ZachJaison My first question can i run all of the cameras in One PC?

Yes, the processing is performed on the device. The host is only used for displaying the streams.

ZachJaison I have python script with some logic . Can i run all the cameras in one python script ? If so how ? since i am loading one pipeline ? Please give me an example thats all.

You can. Example here.

ZachJaison When i ran just one python script for one camera it took up good amount of CPU and Memory how can i optimize it? As in not code wise but like Can i use a GPU? if so how exactly? if i run it on linux will it get better? if i write it in C++ will it get better? Please explain in detail.

You don't need a GPU since the processing takes place on the device which has a VPU and other HW-accelerated blocks for optimized processing.
Host CPU will likely be utilized for network communication and displaying the frames (if you are doing that). Pure network communication shouldn't take too much of your CPU, and displaying the streams won't really be possible due to ETH bandwidth limitations (32 devices for only 1Gbps bandwidth).

Thanks,
Jaka

ZachJaison · Jul 14, 2024

jakaskerl Yes, the processing is performed on the device. The host is only used for displaying the streams.

Thank you for your response.

I would like to clarify on this part. I am NOT running Standalone but i am running my python code on the PC(host) directly. So can you please confirm once again if the processing is happening on the PC or the Camera.

Can you please share the reference or clarify how is the Camera is processing it if i am running it on-device and not standalone?

jakaskerl You don't need a GPU since the processing takes place on the device which has a VPU and other HW-accelerated blocks for optimized processing

Well as mentioned earlier i am not running it standalone so will the VPU take care of processing or my pc ?

jakaskerl displaying the streams won't really be possible due to ETH bandwidth limitations (32 devices for only 1Gbps bandwidth).

So basically i dont display the stream but just run my logic correct in this case i can connect 32 devices to my PC(host) again i am not running the camera on standalone. Correct?

jakaskerl · Jul 15, 2024

ZachJaison I would like to clarify on this part. I am NOT running Standalone but i am running my python code on the PC(host) directly. So can you please confirm once again if the processing is happening on the PC or the Camera.

The host side merely works to create an "instruction set" which is uploaded to the device and performed on there.
The OAKs are edge devices and use the host only to display data.

ZachJaison So basically i dont display the stream but just run my logic correct in this case i can connect 32 devices to my PC(host) again i am not running the camera on standalone. Correct?

Correct.

Thanks,
Jaka

ZachJaison · Jul 15, 2024

jakaskerl

Thank you for your response. So I'm using a mini pc since I wanted to place it in a compact space hence its a pretty low intel processor. I just connected 2 cameras out of the 32 cameras to my mini PC and according to task manager CPU hits almost 90-100% and Memory 85-90% and network 100%. BUT one thing that I noticed is my programs isn't affected at all and everything is working perfectly. Btw I ran two python scripts separately for each camera since i am yet to implement the thread.

Anyways How is that possible? How can it use up this much of the cpu and memory and not have much lag or issues? If I run it 24/7 will it then have an issue? Or am I seeing the wrong % on the task manger.

Do you recommend to change the PC to something i7 ?

I haven't tried to connect 32 camera to my mini PC using switch. As mentioned earlier due to the % usage will it not work/ crash?

Is the cpu and memory % going high since I am displaying the frames if I don't display the frame as in cv2.imshow part will the cpu and memory % go down ?

What do you think

Thank you once again

jakaskerl · Jul 15, 2024

ZachJaison Is the cpu and memory % going high since I am displaying the frames if I don't display the frame as in cv2.imshow part will the cpu and memory % go down ?

If you are displaying frames, the CPU will work hard especially if the frames are large. As for memory, not sure really. Disabling it should lower the usage a lot.

Thanks,
Jaka

ZachJaison · Jul 16, 2024

@jakaskerl

Well,

Here is my code. Please take a look.

import pathlib

import cv2

import depthai

import numpy as np

import time

import collections

# Initialize the pipeline

pipeline = depthai.Pipeline()

# Set up the RGB camera

cam_rgb = pipeline.createColorCamera()

cam_rgb.setResolution(depthai.ColorCameraProperties.SensorResolution.THE_1080_P) # Set camera resolution

cam_rgb.setPreviewSize(640, 640) # Adjust as needed, here it's set to 640x640

cam_rgb.setInterleaved(False)

# Set up the YOLOv6 detection network

detection_nn = pipeline.createYoloDetectionNetwork()

detection_nn.setBlobPath('C:/Users/Administrator/Desktop/best_ckpt_openvino_2022.1_6shave.blob') # Specify your YOLOv6 blob file path

detection_nn.setConfidenceThreshold(0.6)

detection_nn.setNumClasses(5) # Set to 5 classes as per your model

detection_nn.setCoordinateSize(4)

# Define default anchors as a flat list

default_anchors = [

10, 13, 16, 30, 33, 23,  # First scale

30, 61, 62, 45, 59, 119,  # Second scale

116, 90, 156, 198, 373, 326  # Third scale

]

# Set the anchors and masks for YOLOv6

detection_nn.setAnchors(default_anchors)

detection_nn.setAnchorMasks({"side26": [0, 1, 2], "side13": [3, 4, 5]}) # Adjust if your model uses different masks

detection_nn.setIouThreshold(0.5)

# Set up XLinkOut for camera and neural network outputs

xout_rgb = pipeline.createXLinkOut()

xout_rgb.setStreamName("rgb")

xout_nn = pipeline.createXLinkOut()

xout_nn.setStreamName("nn")

# Link the camera preview to the outputs

cam_rgb.preview.link(xout_rgb.input)

cam_rgb.preview.link(detection_nn.input)

detection_nn.out.link(xout_nn.input)

# Variables for detection aggregation and analysis

frame_detection_counts = collections.defaultdict(lambda: collections.defaultdict(int)) # To store counts of each type of defect per frame

detection_active = False # Flag to indicate active detection window

start_time = None # To mark the start of the detection window

time_window = 2 # 2-second time window for detection aggregation

frame_threshold = 50 # Minimum number of frames to consider a defect valid

# Variable to simulate sensor input

SensorIn = False

# Define the mapping of numerical labels to human-readable defect names

label_map = {

0: "x",

1: "y",

2: "z",

3: "q"

}

def frameNorm(frame, bbox):

normVals = np.full(len(bbox), frame.shape[0])

normVals[::2] = frame.shape[1]

return (np.clip(np.array(bbox), 0, 1) \* normVals).astype(int)

def start_detection():

global detection_active, start_time

detection_active = True

start_time = time.perf_counter()

def collect_detections(detections):

global frame_detection_counts

defect_counter = collections.Counter()  # Counter for current detection cycle

# Count detections for each type

for detection in detections:

    defect_label = label_map.get(detection.label, "Unknown")

    defect_counter[defect_label] += 1

# Update frame detection counts

for defect, count in defect_counter.items():

    key = f"{count} {defect}"

    frame_detection_counts[defect][key] += 1

def analyze_detections():

global frame_detection_counts

final_defects = {}

# Determine the most frequent count for each defect type

for defect, counts in frame_detection_counts.items():

    # Filter out defect types that don't meet the frame threshold

    valid_counts = {k: v for k, v in counts.items() if v >= frame_threshold}

    

    if valid_counts:

        max_occurrence = max(valid_counts, key=valid_counts.get)  # Get the key with the highest count

        final_defects[defect] = max_occurrence

return final_defects

def decide_rejection(final_defects):

# Apply rejection rules based on the final analysis of defect occurrences

for defect, occurrence in final_defects.items():

    count, _ = occurrence.split(' ', 1)  # Split to get the count

    count = int(count)

    if defect == "Crushed Tea Bag" and count >= 1:

        return True

    if defect == "Tag Out" and count >= 3:

        return True

    # Add more rules as needed

return False

for device in depthai.Device.getAllAvailableDevices():

print(f"{device.getMxId()} {device.state}")

# Start the pipeline with a specific device ID

# device_id = '14442C1091B39ECF00' # Replace with your specific device ID

# device_id = '14442C10E149D6D600'

device_id = '14442C10B17DEBCF00'

device_info = depthai.DeviceInfo(device_id)

# Manually set the device state to BOOTLOADER

device_info.state = depthai.XLinkDeviceState.X_LINK_BOOTLOADER

# Start the pipeline

with depthai.Device(pipeline, device_info) as device:

# Print device information

for device_info in depthai.Device.getAllAvailableDevices():

    print(f"{device_info.getMxId()} {device_info.state}")

q_rgb = device.getOutputQueue("rgb")

q_nn = device.getOutputQueue("nn")

frame = None

detections = []

# Variables for FPS calculation

last_fps_update_time = time.time()

cam_frame_count = 0

nn_frame_count = 0

cam_fps = 0

nn_fps = 0

while True:

    in_rgb = q_rgb.tryGet()

    in_nn = q_nn.tryGet()

    current_time = time.time()

    # if in_rgb is not None:

    #     frame = in_rgb.getCvFrame()

    #     cam_frame_count += 1

    if in_nn is not None:

        detections = in_nn.detections

        nn_frame_count += 1

    # Calculate FPS every second

    if current_time - last_fps_update_time >= 1.0:

        cam_fps = cam_frame_count / (current_time - last_fps_update_time)

        nn_fps = nn_frame_count / (current_time - last_fps_update_time)

        cam_frame_count = 0

        nn_frame_count = 0

        last_fps_update_time = current_time

    # Check for space bar to simulate sensor input

    x=True

    key = cv2.waitKey(1) & 0xFF

    if x:  # Space bar pressed

        SensorIn = not SensorIn

        # if SensorIn:

        #     print("SensorIn triggered: ON")

        # else:

        #     print("SensorIn triggered: OFF")

    if SensorIn and not detection_active:

        start_detection()

    if detection_active:

        if (time.perf_counter() - start_time) <= time_window:

            collect_detections(detections)

        else:

            final_defects = analyze_detections()

            print("Final Defects Count:", final_defects)

            if decide_rejection(final_defects):

                print("Reject the product.")

            else:

                print("Accept the product.")

            # Reset for the next detection cycle

            detection_active = False

            frame_detection_counts.clear()

    # if frame is not None:

    #     for detection in detections:

    #         bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))

    #         cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (255, 0, 0), 2)

    #         cv2.putText(frame, f"ID: {label_map.get(detection.label, 'Unknown')}", (bbox[0], bbox[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    #     # Display FPS on the frame

    #     cv2.putText(frame, f"Camera FPS: {cam_fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    #     cv2.putText(frame, f"NN FPS: {nn_fps:.2f}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    #     cv2.imshow("preview", frame)

    if key == ord('q'):  # Quit the loop

        break

cv2.destroyAllWindows()

As per your previous comments i commented out.

# if in_rgb is not None:

# frame = in_rgb.getCvFrame()

# cam_frame_count += 1

AND

# if frame is not None:

# for detection in detections:

# bbox = frameNorm(frame, (detection.xmin, detection.ymin, detection.xmax, detection.ymax))

# cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (255, 0, 0), 2)

# cv2.putText(frame, f"ID: {label_map.get(detection.label, 'Unknown')}", (bbox[0], bbox[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

# # Display FPS on the frame # cv2.putText(frame, f"Camera FPS: {cam_fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) # cv2.putText(frame, f"NN FPS: {nn_fps:.2f}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
# cv2.imshow("preview", frame)

So after commenting it, I ran Three cameras in three separate scripts.

Here is my task manager:

Please explain why is my network receiving ?? As per what you have mentioned the camera processes everything if so i shouldn't be receiving anything correct ? I have commented the part as shown above the parts which gives me the frame. Please look at the screenshot below

Please look at my script i try commenting two of the lines below :

q_rgb = device.getOutputQueue("rgb")

and this line

in_rgb = q_rgb.tryGet()

Then i my network drops to kbps but then in this case my detection are not working properly.

How to fix this ?

Thank you

jakaskerl · Jul 16, 2024

Hi @ZachJaison
You are still sending the frames, you're just not visualizing them.
While this does lower the CPU consumption (still at 100% for whatever reason ) it still utilizes the bandwidth. To stop sending frames, remove the link in the pipeline.

cam_rgb.preview.link(xout_rgb.input) - also remove the XlinkOut node and the host side:

ZachJaison q_rgb = device.getOutputQueue("rgb")

Thanks,
Jaka

ZachJaison · Jul 17, 2024

jakaskerl

Thank you, your right Amazing my network is at 0% and apparently CPU usage went down.

Comes to my next question, if i find a detection is there a way to just capture that specific frame as picture as its critical for my application to have an image not livestream? So basically removing all the Xlinkout for camera etc.. is there any way to capture the that specific frame of that detection?.

I hope this is possible.

Thank you Jaka

jakaskerl · Jul 17, 2024

ZachJaison
You can use a script node to check if detection is produced and send a trigger to ColorCamera to capture a still image.

example: https://docs.luxonis.com/software/depthai/examples/script_camera_control (uh the demo is rendered incorrectly )

Thanks,
Jaka

ZachJaison · Jul 19, 2024

@jakaskerl

Thank you for your support,

ColorCamera works well.

Regards