Hello everyone! I have exported keras’a yolov8 to onnx and added layer so inputs could be in nchw. But it looks like I have messed up with export parameters, and input data is not in required format and I get empty output after post process. Does someone know which parameters should I specify?

Hi @"AleksNet"#p27565 , Have you checked these docs? https://docs.luxonis.com/software/ai-inference/conversion/ Thanks, Erik

Hi @"erik"#p27577 , Yes, I have checked them, and after some experiments, I figured out the desired image preproccesing \- value range 0 - 255, mean should be 0 and scale should be 1? \- brg order, should I add a flag in conversion or just read data in brg format from the camera? \- my model input shape is 1,3,640,640 \- and it requires float32 tensor type for input, so --ip should be FP32? Thanks, Oleksandr

Hi @"AleksNet"#p27602 1. Yes, that's correct, as image will also be in 0-255 2. Up to you. By default it's BGR, as that's how it is in opencv as well, but you can toggle to RGB instead (via ColorCamera node) 4. Unfortunately no, you'll need FP16 as chip only supports FP16. There might be some accuracy loss due to quantization difference, you can test it out on CPU first (via openvino model) to see what you can expect from OAK camera.

@"erik"#p27609 , I changed my layers to float16, will it work if I set --ip FP16? When I use getFrame, I get data as it passed to nn? I have tried inRgb.getFrame(), and put it to onnx model, but it outputs nothing… I can share code and output to provide example

Hi @"AleksNet"#3160 , Yes, you'd use `--ip FP16`. You can use NeuralNetwork node's `passthrough` output to receive the frame on which inference ran. >I have tried inRgb.getFrame(), and put it to onnx model, but it outputs nothing… Can you show MRE (minimal repro example) on this? Thanks, Erik

Hello @"erik"#p27618! Are there any updates on this issue? Thanks, Aleks

@"AleksNet"#p27779 The code looks ok at first glance.. I assume you get the desired image when viewing frames from the `inRgb` queue. Do you get any output from the nn node at all? If that is the case, it's probably the mean and scale values that are off. Thanks, Jaka

Hi @"jakaskerl"#p27788! I get outputs from nn, but after postprocessing, I get 0 results. It is normal behavior when image PREprocessing is incorrect. As I understand, the best way to have a look at raw input in nn is inRgb.getCvFrame() and then print it out?

Hi @"jakaskerl"#p27788! Is there a way to passthrough the image(stored on my PC), not a video input from the camera, so I would have static data to test on Happy New Year and best regards, Aleks

Export parameters for model from keras yolov8

jakaskerl

AleksNet
Ok, denormalization is not the issue. Why are you using FP16 streams. When converting to blob, you can specify the datatype and then you can just use UINT8 input so you don't need to perform conversion each time. Likely a cause of the issue as well. Also I am not getting any detections for blob side, only for ONNX (which are wrong like yours). Did you configure the scale and offset when converting to blob?

https://docs.luxonis.com/software/ai-inference/conversion/#Conversion-Advanced%20Settings-Model%20Compiler%20Flags

Thanks,
Jaka

AleksNet

Hi jakaskerl,
I am using fp16 because my model used fp32, but I changed it at least to fp16. And I convert from openVino format, there are no options for this. But when I had converted from onnx to blob, I used this parameters --data_type=FP16 --mean_values=[0,0,0] --scale_values=[1,1,1] --layout=NCHW --input_shape=[1,3,640,640], so my data represents in 0-255 range

jakaskerl

AleksNet
That seems ok, but you can make the model input UINT8, then you can omit the setFp16. I'm not sure how the FP16 camera output looks, perhaps it's the wrong endian type...

Thanks,
Jaka

AleksNet

Hi jakaskerl!
I have added a layer to convert from uint8 to fp16 as input and removed setFP6 flag and now it works only for onnx. It still does not work for blob. I converted onnx to openVino and then to blob. Here are the code and files:

\#!/usr/bin/env python3

from pathlib import Path

import sys

import cv2

import depthai as dai

import numpy as np

import time

import tensorflow as tf

import keras_cv

import keras

import onnxruntime

nnPath = str((Path('./models/YOLO KERAS/model_with_cast_uint8_to_fp16_ov.blob')).resolve().absolute())

nnPath_onnx = str((Path('./models/YOLO KERAS/model_with_cast_uint8_to_fp16.onnx')).resolve().absolute())

session= onnxruntime.InferenceSession(nnPath_onnx)

input_name=session.get_inputs()[0].name

output_name0=session.get_outputs()[0].name

output_name1=session.get_outputs()[1].name

image_path = "image.jpg"

BOX_REGRESSION_CHANNELS=64

def decode_regression_to_boxes(preds):

"""Decodes the results of the YOLOV8Detector forward-pass into boxes.

Returns left / top / right / bottom predictions with respect to anchor

points.

Each coordinate is encoded with 16 predicted values. Those predictions are

softmaxed and multiplied by [0..15] to make predictions. The resulting

predictions are relative to the stride of an anchor box (and correspondingly

relative to the scale of the feature map from which the predictions came).

"""

preds_bbox = keras.layers.Reshape((-1, 4, BOX_REGRESSION_CHANNELS // 4))(


preds


)

preds_bbox = tf.nn.softmax(preds_bbox, axis=-1) \* tf.range(


BOX_REGRESSION_CHANNELS // 4, dtype="float32"


)

return tf.reduce_sum(preds_bbox, axis=-1)

def dist2bbox(distance, anchor_points):

"""Decodes distance predictions into xyxy boxes.

Input left / top / right / bottom predictions are transformed into xyxy box

predictions based on anchor points.

The resulting xyxy predictions must be scaled by the stride of their

corresponding anchor points to yield an absolute xyxy box.

"""

left_top, right_bottom = tf.split(distance, 2, axis=-1)

x1y1 = anchor_points - left_top

x2y2 = anchor_points + right_bottom

return tf.concat((x1y1, x2y2), axis=-1)  # xyxy bbox

def get_anchors(

image_shape,

strides=[8, 16, 32],

base_anchors=[0.5, 0.5],

):


"""Gets anchor points for YOLOV8.

YOLOV8 uses anchor points representing the center of proposed boxes, and

matches ground truth boxes to anchors based on center points.

Args:

    image_shape: tuple or list of two integers representing the height and

        width of input images, respectively.

    strides: tuple of list of integers, the size of the strides across the

        image size that should be used to create anchors.

    base_anchors: tuple or list of two integers representing the offset from

        (0,0) to start creating the center of anchor boxes, relative to the

        stride. For example, using the default (0.5, 0.5) creates the first

        anchor box for each stride such that its center is half of a stride

        from the edge of the image.

Returns:

    A tuple of anchor centerpoints and anchor strides. Multiplying the

    two together will yield the centerpoints in absolute x,y format.

"""

base_anchors = tf.constant(base_anchors, dtype="float32")

all_anchors = []

all_strides = []

for stride in strides:

    hh_centers = tf.range(0, image_shape[0], stride)

    ww_centers = tf.range(0, image_shape[1], stride)

    ww_grid, hh_grid = tf.meshgrid(ww_centers, hh_centers)

    grid = tf.cast(

        tf.reshape(tf.stack([hh_grid, ww_grid], 2), [-1, 1, 2]),

        "float32",

    )

    anchors = (

        tf.expand_dims(

            base_anchors \* tf.constant([stride, stride], "float32"), 0

        )

        + grid

    )

    anchors = tf.reshape(anchors, [-1, 2])

    all_anchors.append(anchors)

    all_strides.append(tf.repeat(stride, anchors.shape[0]))

all_anchors = tf.cast(tf.concat(all_anchors, axis=0), "float32")

all_strides = tf.cast(tf.concat(all_strides, axis=0), "float32")

all_anchors = all_anchors / all_strides[:, None]

# Swap the x and y coordinates of the anchors.

all_anchors = tf.concat(

    [all_anchors[:, 1, None], all_anchors[:, 0, None]], axis=-1

)

return all_anchors, all_strides

def decode_predictions(

    boxes_,

    scores_,

    images,

):

    boxes = boxes_

    scores = scores_

    boxes = decode_regression_to_boxes(boxes)

    anchor_points, stride_tensor = get_anchors(image_shape=(640,640,3))

    stride_tensor = tf.expand_dims(stride_tensor, axis=-1)

    box_preds = dist2bbox(boxes, anchor_points) \* stride_tensor

    prediction_decoder = keras_cv.layers.MultiClassNonMaxSuppression(

                bounding_box_format="xyxy",

                from_logits=False,

                iou_threshold=0.2,

                confidence_threshold=0.2

            )

    

    return prediction_decoder(box_preds, scores)

\# Get argument first

labelMap = [

"
"green",         "pink",    "orange"
"

]

\# Create pipeline

pipeline = dai.Pipeline()

camRgb = pipeline.create(dai.node.ColorCamera)

camRgb.setPreviewSize(640, 640)

camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

camRgb.setInterleaved(False)

camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.RGB)

\# camRgb.setFp16(True) # Model requires FP16 input

\# NN that detects faces in the image

nn = pipeline.create(dai.node.NeuralNetwork)

nn.setBlobPath(nnPath)

nn.setNumInferenceThreads(2)

camRgb.preview.link(nn.input)

\# Send bouding box from the NN to the host via XLink

nn_xout = pipeline.create(dai.node.XLinkOut)

nn_xout.setStreamName("nn")

nn.out.link(nn_xout.input)

\# Send rgb frames to the host

rgb_xout = pipeline.create(dai.node.XLinkOut)

rgb_xout.setStreamName("rgb")

nn.passthrough.link(rgb_xout.input)

\# Connect to device and start pipeline

with dai.Device(pipeline) as device:

# Output queues will be used to get the rgb frames and nn data from the outputs defined above

qRgb = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)

qDet = device.getOutputQueue(name="nn", maxSize=4, blocking=False)

detections = []

while True:

    inRgb = qRgb.get()

    frame = inRgb.getCvFrame()

    in_nn = qDet.tryGet()

    if in_nn is not None:

        # [print(f"Layer name: {l.name}, Type: {l.dataType}, Dimensions: {l.dims}") for l in in_nn.getAllLayers()]

        # Extract the output shape: (batch_size, channels, num_predictions)

        boxes = np.array(in_nn.getLayerFp16('box')).reshape(1, 8400, 64)

        classes = np.array(in_nn.getLayerFp16('class')).reshape(1, 8400, 3)

        detections=[]

        # print(classes)

        result=decode_predictions(boxes, classes, frame)

        # print(result)

        result_boxes=result["boxes"]

        num_of_dects=result["num_detections"]

        if(num_of_dects[0] >0):

            print("num_of_dects")

            print(num_of_dects)

            for bbox_data in result_boxes[0]:

                bbox = [np.int64(bbox_data[0]),np.int64(bbox_data[1]),np.int64(bbox_data[2]),np.int64(bbox_data[3])]

                frame = cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (0,255,0), 2)

        image = inRgb.getCvFrame()

        image = np.expand_dims(image, axis=0)

        image = np.reshape(image, (1,3,640,640))

        res = session.run(

            output_names=[output_name0, output_name1], 

            input_feed={input_name: image}

        )

        result=decode_predictions(res[0], res[1], np.expand_dims(np.array(frame),axis=0))

        # print(result)

        result_boxes=result["boxes"]

        num_of_dects=result["num_detections"]

        if(num_of_dects[0] >0):

            print("num_of_dects onnx")

            print(num_of_dects)

            for bbox_data in result_boxes[0]:

                bbox = [np.int64(bbox_data[0]),np.int64(bbox_data[1]),np.int64(bbox_data[2]),np.int64(bbox_data[3])]

                frame = cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (0,0,255), 2)

    

    cv2.imshow("rgb", frame)

    if cv2.waitKey(1) == ord('q'):

        break

    if cv2.waitKey(1) == ord("s"):

        cv2.imwrite("img.png", frame)

        print(frame)

        reshaped_frame = frame.reshape(-1, frame.shape[2])  # Reshape to (height\*width, channels)

        np.savetxt("img.txt", reshaped_frame, fmt="%.6f")

        break

Files:
https://drive.google.com/drive/folders/1cXhwfOF7TG81ZSIZ4NJjKl3dlUGRLctz

thanks,
Aleks

JanCuhel

Hi @AleksNet,

thank you for the update! I want to update you as well. I have compared the predictions of ONNX and OpenVino IR models and both models work. This suggest that the issue lies in the conversion from IR to blob. I have also tried to change the dynamic input shape of the ONNX model to static, but the resulting blob also didn't work. I'm investigating the IR -> blob conversion at the moment. I'll keep you updated.

Best,
Jan

AleksNet

Hi JanCuhel!
Are there any updates?

Thanks,
AleksNet

AleksNet

Hi @JanCuhel
I have tried to use model with auto cast to uint8, by converter with this params ONNX to .blob:
--data_type=FP16 --mean_values=[0,0,0] --scale_values=[1,1,1] --layout=NCHW --input_shape=[1,3,640,640]
and
--ipU8
And it also does not work

Thanks,
Aleks

JanCuhel

Hey @AleksNet,

Apologies for the delay in my response. However, I have great news for you! I managed to convert the model, so the blob works! The ONNX model you shared with us contains preprocessing operations, which, when removed and included during model optimization, do the trick. I've archived all the necessary scripts, exported models, and README.md with instructions on converting the model for you to check here.

Here's an example of detected boxes when running on a device:

Kind regards,
Jan

AleksNet

Hi JanCuhel!
I have just checked and it works for me too! Looks like I owerworked preproccessing for model...
Thank you very much)
Best regards,
Aleks

JanCuhel

Hey AleksNet

no worries! Glad I could help!

Kind regards,
Jan

« Previous Page