Trouble obtaining expected bounding boxes with custom YoloX model on OAK-D

MTMM

Hello,

I'm currently trying to run a custom YoloX model on the OAK-D using Python. I created the custom model with yolox_nano and yolox_tiny and an input size of 416*416. However, although I have been able to make the program run without errors through various attempts, I'm not getting the expected bounding boxes. I have verified the accuracy of the model itself using YOLOX/tools/demo.py, and it appears to have sufficient accuracy.

I combined the code from depthai-experiments/gen2-yolo/yolox/main.py and luxonis/depthai760 to create the code. Additionally, I modified it to not perform input image standardization using mean and std, following the example of YOLOX/tools/demo.py.

Currently, I suspect that there may be an issue somewhere in the excerpt of the code I've provided below. However, I'm unsure whether I should try to modify the code as it is or if I should recreate the model with different settings.

I would greatly appreciate any advice you can provide.

image, ratio = preproc(frame, (SHAPE, SHAPE)) # image value are as expected(I think)

buff = dai.Buffer()

flat32_in_int8_array = np.array([image], dtype=np.float32).view(np.int8)

buff.setData(flat32_in_int8_array)

qNnInput.send(buff)

in_nn = qNn.get()

if in_nn is not None:

res = toTensorResult(in_nn).get("output") # res value are not as expected(I think)

jakaskerl

Hi MTMM
Are you by any chance using RGB instead of BGR? Also could you put this code into a bit more context (maybe a MRE of some sort).

Thanks,
Jaka

MTMM

jakaskerl

Thank you, Jaka.

Regarding whether I'm using RGB or not, I tried both commenting in and out the line 19 in the preproc() function of the code below(where "padded_img = padded_img[:, :, ::-1]"), which I believe can reverse the channel order. However, it didn't seem to have much impact on the bounding box results.

I have included a shortened version of the code as much as possible below.

I really appreciate your help.

from pathlib import Path
import numpy as np
import cv2
import depthai as dai
from depthai_sdk.utils import toTensorResult


def preproc(image, input_size, mean=None, std=None, swap=(2, 0, 1)):
    padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
    img = np.array(image)
    ratio = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
    resized_img = cv2.resize(
        img,
        (int(img.shape[1] * ratio), int(img.shape[0] * ratio)),
        interpolation=cv2.INTER_LINEAR,
    ).astype(np.float32)
    padded_img[: int(img.shape[0] * ratio), : int(img.shape[1] * ratio)] = resized_img

    padded_img = padded_img[:, :, ::-1]
    padded_img = padded_img.transpose(swap)
    padded_img = np.ascontiguousarray(padded_img, dtype=np.float16)
    return padded_img, ratio


def nms(boxes, scores, nms_thr):
    """Single class NMS implemented in Numpy."""
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= nms_thr)[0]
        order = order[inds + 1]

    return keep


def multiclass_nms(boxes, scores, nms_thr, score_thr):
    """Multiclass NMS implemented in Numpy"""
    final_dets = []
    num_classes = scores.shape[1]
    for cls_ind in range(num_classes):
        cls_scores = scores[:, cls_ind]
        valid_score_mask = cls_scores > score_thr
        if valid_score_mask.sum() == 0:
            continue
        else:
            valid_scores = cls_scores[valid_score_mask]
            valid_boxes = boxes[valid_score_mask]
            keep = nms(valid_boxes, valid_scores, nms_thr)
            if len(keep) > 0:
                cls_inds = np.ones((len(keep), 1)) * cls_ind
                dets = np.concatenate(
                    [valid_boxes[keep], valid_scores[keep, None], cls_inds], 1
                )
                final_dets.append(dets)
    if len(final_dets) == 0:
        return None
    return np.concatenate(final_dets, 0)


def demo_postprocess(outputs, img_size, p6=False):
    grids = []
    expanded_strides = []

    if not p6:
        strides = [8, 16, 32]
    else:
        strides = [8, 16, 32, 64]

    hsizes = [img_size[0] // stride for stride in strides]
    wsizes = [img_size[1] // stride for stride in strides]

    for hsize, wsize, stride in zip(hsizes, wsizes, strides):
        xv, yv = np.meshgrid(np.arange(wsize), np.arange(hsize))
        grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
        grids.append(grid)
        shape = grid.shape[:2]
        expanded_strides.append(np.full((*shape, 1), stride))

    grids = np.concatenate(grids, 1)
    expanded_strides = np.concatenate(expanded_strides, 1)
    outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
    outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides

    return outputs


SHAPE = 416
labelMap = [
    "TestClass0",     "TestClass1",       "car",           "motorbike",     "aeroplane",   "bus",           "train",
    "truck",          "boat",       "traffic light", "fire hydrant",  "stop sign",   "parking meter", "bench",
    "bird",           "cat",        "dog",           "horse",         "sheep",       "cow",           "elephant",
    "bear",           "zebra",      "giraffe",       "backpack",      "umbrella",    "handbag",       "tie",
    "suitcase",       "frisbee",    "skis",          "snowboard",     "sports ball", "kite",          "baseball bat",
    "baseball glove", "skateboard", "surfboard",     "tennis racket", "bottle",      "wine glass",    "cup",
    "fork",           "knife",      "spoon",         "bowl",          "banana",      "apple",         "sandwich",
    "orange",         "broccoli",   "carrot",        "hot dog",       "pizza",       "donut",         "cake",
    "chair",          "sofa",       "pottedplant",   "bed",           "diningtable", "toilet",        "tvmonitor",
    "laptop",         "mouse",      "remote",        "keyboard",      "cell phone",  "microwave",     "oven",
    "toaster",        "sink",       "refrigerator",  "book",          "clock",       "vase",          "scissors",
    "teddy bear",     "hair drier", "toothbrush"
]


pipeline = dai.Pipeline()

camera = pipeline.create(dai.node.ColorCamera)
camera.setPreviewSize(SHAPE, SHAPE)
camera.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camera.setInterleaved(False)
camera.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

nn = pipeline.create(dai.node.NeuralNetwork)
# nn.setBlobPath(str(Path("./depthai-experiments/gen2-yolo/yolox/yolox_tiny.blob").resolve().absolute()))
nn.setBlobPath(str(Path("./depthai-experiments/gen2-yolo/yolox/test.blob").resolve().absolute()))
nn.setNumInferenceThreads(2)
nn.input.setBlocking(True)

# Send camera frames to the host
camera_xout = pipeline.create(dai.node.XLinkOut)
camera_xout.setStreamName("camera")
camera.preview.link(camera_xout.input)

# Send converted frames from the host to the NN
nn_xin = pipeline.create(dai.node.XLinkIn)
nn_xin.setStreamName("nnInput")
# nn_xin.setMaxDataSize(80000000)
nn_xin.out.link(nn.input)

# Send bounding boxes from the NN to the host via XLink
nn_xout = pipeline.create(dai.node.XLinkOut)
nn_xout.setStreamName("nn")
nn.out.link(nn_xout.input)


# Pipeline is defined, now we can connect to the device
with dai.Device(pipeline) as device:
    qCamera = device.getOutputQueue(name="camera", maxSize=4, blocking=False)
    qNnInput = device.getInputQueue("nnInput", maxSize=4, blocking=False)
    qNn = device.getOutputQueue(name="nn", maxSize=4, blocking=True)

    while True:
        inRgb = qCamera.get()
        frame = inRgb.getCvFrame()
        # frame = cv2.imread("./test_img.jpg")

        image, ratio = preproc(frame, (SHAPE, SHAPE))

        buff = dai.Buffer()
        flat32_in_int8_array = np.array([image], dtype=np.float32).view(np.int8)
        buff.setData(flat32_in_int8_array)
        qNnInput.send(buff)

        in_nn = qNn.get()

        if in_nn is not None:
            res = toTensorResult(in_nn).get("output")
            data = np.array(res).reshape(1, -1, 85)

            predictions = demo_postprocess(data, (SHAPE, SHAPE), p6=False)[0]
            boxes = predictions[:, :4]
            scores = predictions[:, 4, None] * predictions[:, 5:]

            boxes_xyxy = np.ones_like(boxes)
            boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2.
            boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2.
            boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2.
            boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2.
            dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.001, score_thr=0.001)

            if dets is not None:
                final_boxes = dets[:, :4]
                final_scores, final_cls_inds = dets[:, 4], dets[:, 5]

                for i in range(len(final_boxes)):
                    bbox = final_boxes[i]
                    score = final_scores[i]
                    class_name = labelMap[int(final_cls_inds[i])]

                    if score >= 0.001:
                        # Limit the bounding box to 0..SHAPE
                        bbox[bbox > SHAPE - 1] = SHAPE - 1
                        bbox[bbox < 0] = 0
                        xy_min = (int(bbox[0]), int(bbox[1]))
                        xy_max = (int(bbox[2]), int(bbox[3]))
                        # Display detection's BB, label and confidence on the frame
                        cv2.rectangle(frame, xy_min , xy_max, (255, 0, 0), 2)

        cv2.imshow("rgb", frame)
        if cv2.waitKey(1) == ord('q'):
            break

MTMM

I am attaching the current result image for reference.

The bounding boxes are in strange positions and hardly move.

jakaskerl

Hi MTMM
I have used the code you provided, but changed your model to yolox_tiny that comes with the example (depthai-experiments/gen2-yolo/yolox/main.py) . The code works very well and the bounding boxes are correct, meaning the issue is probably in the model you have trained (the structure or the output parsing). Could you recheck the accuracy and results scaling? It looks like that could be the problem.

Thanks,
Jaka

MTMM

jakaskerl

Thank you, Jaka.

I investigated the model and suspected that the issue might lie in the conversion to ONNX and then to a blob, as the original model was successfully detecting targets with over 90% confidence.

Using Luxonis' BlobConverter, I experimented with different arguments and found that changing "Myriad compile params" argument to "-ip FP16" made the abnormal bounding boxes disappear. Although I had previously tried this change without success, it unexpectedly resolved the issue this time, indicating that there might have been unintentional changes or previous mistakes.

Now that the abnormal bounding boxes have disappeared and the model is detecting some targets, I will focus on the model conversion process to address the unusually low confidence levels below 50%.

Thank you so much for your valuable assistance!

AdamJones

What is the cheapest way to print a booklet?

DavidCharles

AdamJones
Hey
When it comes to printing booklets, there are several cost-effective options available. Here are a few suggestions for printing booklets at a lower cost:
Online Printing Services
Local Print Shops
Print on Demand Services
Bulk Printing
Paper Selection