Hi Rezahojjatysaeedy
Gave you tried using the YoloDetectionNetwork node instead? It should feature on device decoding.
Example here.
You can provide blob too if you wish, or maybe some minimal code I can use directly.
Thanks,
Jaka
Hi Rezahojjatysaeedy
Gave you tried using the YoloDetectionNetwork node instead? It should feature on device decoding.
Example here.
You can provide blob too if you wish, or maybe some minimal code I can use directly.
Thanks,
Jaka
Hi jakaskerl,
Thanks for reply. The device side won't work for me. I already have a heavy pipeline and I need to get at least 20 fps. I tried it and it only gives me 7 fps. About the code, I have not incorporated into my pipeline yet. All I'm doing is to replace the blob from line 42 here with my trained blob and also change the 80 elements labelMap
list with one element labelMap = ["fissure"]
because I only have one class, 'fissure'. I cannot upload the model here probably it's too large 14 MB but I share a link to it.
https://drive.google.com/drive/folders/1pQaj04wSzYs5fZlmfM1ZQYmKzkFnMa20?usp=sharing
Hi Rezahojjatysaeedy
The host decoding is currently expecting a different model with different dimensions. I tested out your model and upon running
layers = in_nn.getAllLayerNames()
print("Layers: ", layers)
I get ['output1_yolov5', 'output2_yolov5', 'output3_yolov5']
. Not sure which one to use and what the end resolution should be. Basically every output here is scaled down by a factor of 4 (first 115200, then 28800, then 7200).
Thanks,
Jaka
Thanks jakaskerl, It was a really helpful toward debugging. In main.py
there is this line cols = output.shape[0]//10647
where given output.shape[0] = 63888
makes the cols = 6
but all these looks a bit arbitrary. can you please elaborate a little where these numbers are coming from? Maybe this can help me to understand better what's going on? By the way I have no idea why I have three outputs. It must detect a box around eye at the end.
Hi Rezahojjatysaeedy
The number 10647 seems to be specific for the stock model used. It's used to properly parse the results from the model.
When making the model, you should have specified the output layer size. This should translate to .blob file as well. But it will be specific to your model and how you configured the layers.
Thanks,
Jaka
Thanks jakaskerl
I managed to make it work. But It's very slow, about 7 fps the same speed I was getting on device and unlike device deployment now I'm not getting any detection. When I tested the default blob, `yolov5s_sku_openvino_2021.4_6shave.blob`on host I was getting 18 fps and both blobs have the same size (about 14 MB). Do you have any idea what might be causing this issue?
Rezahojjatysaeedy
Check what the bottleneck is.
Rezahojjatysaeedy I'm not getting any detection
Can you make sure this is not just a decoding issue? Perhaps something is incorrectly decoded (consult gpt4 with the output if you can).
Thanks,
Jaka
Hi jakaskerl,
I trained another network with the same size as your example and I'm getting similar fps as yours. But I noticed a difference in host visualization vs device visualization. On device you used frameNorm()
function that normalizes the boxes w.r.t the frame shape. Such a normalization does not exist in host-decoding which makes the box coordinates small float numbers. Now when I use faceNorm
on host these are my only boxes:
x1: 0 y1: 0 x2: 208 y2: 208
x1: 208 y1: 208 x2: 416 y2: 416
Playing with iou and conf doesn't make it better. I know that the model must work better as it is detecting correctly on device side. Do you have any idea what might be going wrong on the host implementation?
Hi Rezahojjatysaeedy
Could you post your current code to the drive. If you have problems with host-side decoding you can usually consult the GPT and there is a high chance it will solve it for you.
Thanks,
Jaka
Hi jakaskerl
I tried chatGPT. It just gives some general advice about how to debug it and so. I copied the link to the entire host-decoding folder. Currently I'm using best.blob
which is a detection model trained in yolov5. It's supposed to draw bounding boxes around the eye fissure.
https://drive.google.com/drive/folders/1H9SQyroWo9O4fa_6Pe8lPZC8INk4gLPw?usp=sharing
Hi Rezahojjatysaeedy
You are only looking at the largest output.
Understanding the Outputs: YOLOv5 typically gives three outputs corresponding to three different scales. Each output contains a set of bounding boxes predicted at that scale. The shape of these outputs is usually [number_of_boxes, 5 + number_of_classes], where number_of_boxes depends on the scale.
Processing Each Scale: You need to process each of these outputs separately. Each output will have its own set of bounding boxes, and you'll need to apply the same decoding logic (converting center coordinates to corner coordinates, applying confidence threshold, and NMS) to each.
Combining Results from All Scales: After processing each output, you should combine the results to get the final set of detections. This is where NMS is crucial to remove duplicates and overlapping boxes.
Coordinate Scaling: Since YOLOv5 operates on a normalized coordinate system, you might need to scale the bounding box coordinates back to the original image dimensions.
Here's a more detailed approach:
def process_output(output, img_width, img_height):
num_classes = len(labelMap)
num_values_per_detection = 5 + num_classes
num_detections = len(output) // num_values_per_detection
detections = output.reshape((num_detections, num_values_per_detection))
processed_boxes = []
for detection in detections:
x_center, y_center, width, height, confidence = detection[:5]
class_probs = detection[5:]
if confidence < conf_thresh:
continue
class_id = np.argmax(class_probs)
class_confidence = class_probs[class_id]
# Scale coordinates back to original image size
x1 = (x_center - width / 2) * img_width
y1 = (y_center - height / 2) * img_height
x2 = (x_center + width / 2) * img_width
y2 = (y_center + height / 2) * img_height
processed_boxes.append([x1, y1, x2, y2, confidence, class_id])
# Apply Non-Maximum Suppression
boxes_nms = non_max_suppression(processed_boxes, iou_thresh)
return boxes_nms
# Assuming you have three outputs: output1, output2, output3
# And assuming you have the original image dimensions: img_width, img_height
boxes_all_scales = []
for output in [output1, output2, output3]:
boxes = process_output(output, img_width, img_height)
boxes_all_scales.extend(boxes)
# Final NMS across all scales
final_boxes = non_max_suppression(boxes_all_scales, iou_thresh)
# Now draw these boxes on the frame
for box in final_boxes:
frame = draw_boxes(frame, box, len(labelMap))
This code assumes that output1, output2, and output3 are the outputs from the three scales of the YOLOv5 model. The process_output function processes each output, scales the coordinates, and applies NMS. Finally, it combines the results from all scales and applies NMS again to get the final set of detections.
Hope this helps,
Jaka