Spatial for custom net - doable?

tarocal

Hello,

I have been using Spatial Detection Network module with pipeline.createMobileNetSpatialDetectionNetwork()

It has been amazing. But I wanted to use the same funtionality for my custom nets. The model is something like this. This has been difficult. Is it because it is not yet possible? or is this not something you will be rolling out?

Something like facenet, which is very similar to mobilenet is easy to adapt to this form. But using custom nets, and finding the location of the detected object's heatmap concentration is not working. Any fixes or examples?

The error seems to be from the ROI.

[14442C10B12849D200] [16.722] [SpatialDetectionNetwork(1)] [error] ROI x:0.7597127 y:0.902489 width:0 height:0 is not a valid rectangle.

... gets posted.

Also, there are only 28 detections always (for Face or Mobile networks), why is that?

P.S: trying with the following:
VM Workstation 15+, Ubuntu 20.04, OAK-D, Depathai 2.1.0.0, model: posenet_0005

Brandon

Hi tarocal ,

Great question. This is possible and supported. So from what I'm seeing is that it's just the format of output from your network is different than the format of input that the spatial location calculator is expecting.

So the Spatial Location Calculator needs input in the format defined here:
https://docs.luxonis.com/projects/api/en/latest/components/nodes/spatial_location_calculator/

And it looks like the ROI returned by your neural network are simply in a different format.

So what would be required here is to process the bounding box on the host, convert the metadata results to the format that the Spatial Detection Network requires, and then feed that into the Spatial Detection Network. I think we have an example of processing custom networks on the host.

I'm pinging Erik offline as he knows more about this than me.

We can probably help write an example of this as well.

Thanks,
Brandon

erik

Hello tarocal ,
As Brandon mentioned, the problem is the output of your network - apparently, it's not the same as the usual MobileNet output.

option would be to decode the output on the host and sending SpatialLocationCalculatorConfig back to the device
option would be to decode the NN output in script node (similar example here) and create SpatialLocationCalculatorConfig on the device and feed that into SpatialLocationCalculator.

Demo of sending SpatialLocationCalculatorConfig to device here.
Hopefully, this explanation was clear🙂
Thanks, Erik

SamiUddin

I have a bbox array on host...

And here is the code, so How can I send the array of bboxes (TopRight and ButtomLeft) to config.roi = dai.Rect(topLeft, bottomRight) at once ? I mean all bounding boxes of an image at once to the queue.

#!/usr/bin/env python3

import cv2
import depthai as dai
import numpy as np

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8s-pose.pt')  # load an official model
stepSize = 0.05

newConfig = False

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
spatialLocationCalculator = pipeline.create(dai.node.SpatialLocationCalculator)

xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutSpatialData = pipeline.create(dai.node.XLinkOut)
xinSpatialCalcConfig = pipeline.create(dai.node.XLinkIn)

xoutDepth.setStreamName("depth")
xoutSpatialData.setStreamName("spatialData")
xinSpatialCalcConfig.setStreamName("spatialCalcConfig")

# Properties
monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoLeft.setCamera("left")
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoRight.setCamera("right")

stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.setLeftRightCheck(True)
stereo.setSubpixel(True)

# Config
topLeft = dai.Point2f(0.4, 0.4)
bottomRight = dai.Point2f(0.6, 0.6)

config = dai.SpatialLocationCalculatorConfigData()
config.depthThresholds.lowerThreshold = 100
config.depthThresholds.upperThreshold = 10000
calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN
config.roi = dai.Rect(topLeft, bottomRight)

spatialLocationCalculator.inputConfig.setWaitForMessage(False)
spatialLocationCalculator.initialConfig.addROI(config)

# Linking
monoLeft.out.link(stereo.left)
monoRight.out.link(stereo.right)

spatialLocationCalculator.passthroughDepth.link(xoutDepth.input)
stereo.depth.link(spatialLocationCalculator.inputDepth)

spatialLocationCalculator.out.link(xoutSpatialData.input)
xinSpatialCalcConfig.out.link(spatialLocationCalculator.inputConfig)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:
    device.setIrLaserDotProjectorBrightness(765) # in mA, 0..1200
    device.setIrFloodLightBrightness(1200) # in mA, 0..1500
    # Output queue will be used to get the depth frames from the outputs defined above
    depthQueue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)
    spatialCalcQueue = device.getOutputQueue(name="spatialData", maxSize=4, blocking=False)
    spatialCalcConfigInQueue = device.getInputQueue("spatialCalcConfig")

    color = (255, 255, 255)

    print("Use WASD keys to move ROI!")

    while True:
        inDepth = depthQueue.get() # Blocking call, will wait until a new data has arrived

        depthFrame = inDepth.getFrame() # depthFrame values are in millimeters

        depth_downscaled = depthFrame[::4]
        if np.all(depth_downscaled == 0):
            min_depth = 0  # Set a default minimum depth value when all elements are zero
        else:
            min_depth = np.percentile(depth_downscaled[depth_downscaled != 0], 1)
        max_depth = np.percentile(depth_downscaled, 99)
        depthFrameColor = np.interp(depthFrame, (min_depth, max_depth), (0, 255)).astype(np.uint8)
        depthFrameColor = cv2.applyColorMap(depthFrameColor, cv2.COLORMAP_HOT)
        results = model("E:/DEPTHAI_FREELANCE/OAKDEV/image.jpg")
        for result in results:
            keypoints = result.keypoints.xy
            boxes = result.boxes.xyxy
            img = result.orig_img
            print(boxes)
        
        #config.roi = dai.Rect(topLeft, bottomRight)
        #config.calculationAlgorithm = calculationAlgorithm
        #cfg = dai.SpatialLocationCalculatorConfig()
        #cfg.addROI(config)
        #spatialCalcConfigInQueue.send(cfg)
        
        spatialData = spatialCalcQueue.get().getSpatialLocations()
        
        for depthData in spatialData:
            roi = depthData.config.roi
            roi = roi.denormalize(width=depthFrameColor.shape[1], height=depthFrameColor.shape[0])
            xmin = int(roi.topLeft().x)
            ymin = int(roi.topLeft().y)
            xmax = int(roi.bottomRight().x)
            ymax = int(roi.bottomRight().y)

            depthMin = depthData.depthMin
            depthMax = depthData.depthMax

            fontType = cv2.FONT_HERSHEY_TRIPLEX
            cv2.rectangle(depthFrameColor, (xmin, ymin), (xmax, ymax), color, 1)
            cv2.putText(depthFrameColor, f"X: {int(depthData.spatialCoordinates.x)} mm", (xmin + 10, ymin + 20), fontType, 0.5, color)
            cv2.putText(depthFrameColor, f"Y: {int(depthData.spatialCoordinates.y)} mm", (xmin + 10, ymin + 35), fontType, 0.5, color)
            cv2.putText(depthFrameColor, f"Z: {int(depthData.spatialCoordinates.z)} mm", (xmin + 10, ymin + 50), fontType, 0.5, color)

Regards!

SamiUddin

@erik Please assist in this.
Regards!

erik

Hi @SamiUddin ,
Likely it will be easier to just fetch depth map along with color frame, and then do spatial calculation on the host (instead of sending it back to device):
luxonis/depthai-experimentstree/master/gen2-calc-spatials-on-host

Otherwise, you can create cfg = dai.SpatialLocationCalculatorConfig() once, and then loop over your boxes and do cfg.addROI multiple times.
Thanks, Erik

SamiUddin

erik Thanks @erik I will look into.