Hi Rezahojjatysaeedy
You are only looking at the largest output.
Understanding the Outputs: YOLOv5 typically gives three outputs corresponding to three different scales. Each output contains a set of bounding boxes predicted at that scale. The shape of these outputs is usually [number_of_boxes, 5 + number_of_classes], where number_of_boxes depends on the scale.
Processing Each Scale: You need to process each of these outputs separately. Each output will have its own set of bounding boxes, and you'll need to apply the same decoding logic (converting center coordinates to corner coordinates, applying confidence threshold, and NMS) to each.
Combining Results from All Scales: After processing each output, you should combine the results to get the final set of detections. This is where NMS is crucial to remove duplicates and overlapping boxes.
Coordinate Scaling: Since YOLOv5 operates on a normalized coordinate system, you might need to scale the bounding box coordinates back to the original image dimensions.
Here's a more detailed approach:
def process_output(output, img_width, img_height):
num_classes = len(labelMap)
num_values_per_detection = 5 + num_classes
num_detections = len(output) // num_values_per_detection
detections = output.reshape((num_detections, num_values_per_detection))
processed_boxes = []
for detection in detections:
x_center, y_center, width, height, confidence = detection[:5]
class_probs = detection[5:]
if confidence < conf_thresh:
continue
class_id = np.argmax(class_probs)
class_confidence = class_probs[class_id]
# Scale coordinates back to original image size
x1 = (x_center - width / 2) * img_width
y1 = (y_center - height / 2) * img_height
x2 = (x_center + width / 2) * img_width
y2 = (y_center + height / 2) * img_height
processed_boxes.append([x1, y1, x2, y2, confidence, class_id])
# Apply Non-Maximum Suppression
boxes_nms = non_max_suppression(processed_boxes, iou_thresh)
return boxes_nms
# Assuming you have three outputs: output1, output2, output3
# And assuming you have the original image dimensions: img_width, img_height
boxes_all_scales = []
for output in [output1, output2, output3]:
boxes = process_output(output, img_width, img_height)
boxes_all_scales.extend(boxes)
# Final NMS across all scales
final_boxes = non_max_suppression(boxes_all_scales, iou_thresh)
# Now draw these boxes on the frame
for box in final_boxes:
frame = draw_boxes(frame, box, len(labelMap))
This code assumes that output1, output2, and output3 are the outputs from the three scales of the YOLOv5 model. The process_output function processes each output, scales the coordinates, and applies NMS. Finally, it combines the results from all scales and applies NMS again to get the final set of detections.
Hope this helps,
Jaka