• DepthAI
  • Spatial for custom net - doable?

Hello,

I have been using Spatial Detection Network module with pipeline.createMobileNetSpatialDetectionNetwork()

It has been amazing. But I wanted to use the same funtionality for my custom nets. The model is something like this. This has been difficult. Is it because it is not yet possible? or is this not something you will be rolling out?

Something like facenet, which is very similar to mobilenet is easy to adapt to this form. But using custom nets, and finding the location of the detected object's heatmap concentration is not working. Any fixes or examples?

The error seems to be from the ROI.

[14442C10B12849D200] [16.722] [SpatialDetectionNetwork(1)] [error] ROI x:0.7597127 y:0.902489 width:0 height:0 is not a valid rectangle.

... gets posted.

Also, there are only 28 detections always (for Face or Mobile networks), why is that?

P.S: trying with the following:
VM Workstation 15+, Ubuntu 20.04, OAK-D, Depathai 2.1.0.0, model: posenet_0005

    Hi tarocal ,

    Great question. This is possible and supported. So from what I'm seeing is that it's just the format of output from your network is different than the format of input that the spatial location calculator is expecting.

    So the Spatial Location Calculator needs input in the format defined here:
    https://docs.luxonis.com/projects/api/en/latest/components/nodes/spatial_location_calculator/

    And it looks like the ROI returned by your neural network are simply in a different format.

    So what would be required here is to process the bounding box on the host, convert the metadata results to the format that the Spatial Detection Network requires, and then feed that into the Spatial Detection Network. I think we have an example of processing custom networks on the host.

    I'm pinging Erik offline as he knows more about this than me.

    We can probably help write an example of this as well.

    Thanks,
    Brandon

    Hello tarocal ,
    As Brandon mentioned, the problem is the output of your network - apparently, it's not the same as the usual MobileNet output.

    1. option would be to decode the output on the host and sending SpatialLocationCalculatorConfig back to the device
    2. option would be to decode the NN output in script node (similar example here) and create SpatialLocationCalculatorConfig on the device and feed that into SpatialLocationCalculator.

    Demo of sending SpatialLocationCalculatorConfig to device here.
    Hopefully, this explanation was clear๐Ÿ™‚
    Thanks, Erik

    3 years later

    I have a bbox array on host...

    And here is the code, so How can I send the array of bboxes (TopRight and ButtomLeft) to config.roi = dai.Rect(topLeft, bottomRight) at once ? I mean all bounding boxes of an image at once to the queue.

    #!/usr/bin/env python3
    
    import cv2
    import depthai as dai
    import numpy as np
    
    from ultralytics import YOLO
    
    # Load a model
    model = YOLO('yolov8s-pose.pt')  # load an official model
    stepSize = 0.05
    
    newConfig = False
    
    # Create pipeline
    pipeline = dai.Pipeline()
    
    # Define sources and outputs
    monoLeft = pipeline.create(dai.node.MonoCamera)
    monoRight = pipeline.create(dai.node.MonoCamera)
    stereo = pipeline.create(dai.node.StereoDepth)
    spatialLocationCalculator = pipeline.create(dai.node.SpatialLocationCalculator)
    
    xoutDepth = pipeline.create(dai.node.XLinkOut)
    xoutSpatialData = pipeline.create(dai.node.XLinkOut)
    xinSpatialCalcConfig = pipeline.create(dai.node.XLinkIn)
    
    xoutDepth.setStreamName("depth")
    xoutSpatialData.setStreamName("spatialData")
    xinSpatialCalcConfig.setStreamName("spatialCalcConfig")
    
    # Properties
    monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    monoLeft.setCamera("left")
    monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    monoRight.setCamera("right")
    
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    stereo.setLeftRightCheck(True)
    stereo.setSubpixel(True)
    
    # Config
    topLeft = dai.Point2f(0.4, 0.4)
    bottomRight = dai.Point2f(0.6, 0.6)
    
    config = dai.SpatialLocationCalculatorConfigData()
    config.depthThresholds.lowerThreshold = 100
    config.depthThresholds.upperThreshold = 10000
    calculationAlgorithm = dai.SpatialLocationCalculatorAlgorithm.MEDIAN
    config.roi = dai.Rect(topLeft, bottomRight)
    
    spatialLocationCalculator.inputConfig.setWaitForMessage(False)
    spatialLocationCalculator.initialConfig.addROI(config)
    
    # Linking
    monoLeft.out.link(stereo.left)
    monoRight.out.link(stereo.right)
    
    spatialLocationCalculator.passthroughDepth.link(xoutDepth.input)
    stereo.depth.link(spatialLocationCalculator.inputDepth)
    
    spatialLocationCalculator.out.link(xoutSpatialData.input)
    xinSpatialCalcConfig.out.link(spatialLocationCalculator.inputConfig)
    
    # Connect to device and start pipeline
    with dai.Device(pipeline) as device:
        device.setIrLaserDotProjectorBrightness(765) # in mA, 0..1200
        device.setIrFloodLightBrightness(1200) # in mA, 0..1500
        # Output queue will be used to get the depth frames from the outputs defined above
        depthQueue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)
        spatialCalcQueue = device.getOutputQueue(name="spatialData", maxSize=4, blocking=False)
        spatialCalcConfigInQueue = device.getInputQueue("spatialCalcConfig")
    
        color = (255, 255, 255)
    
        print("Use WASD keys to move ROI!")
    
        while True:
            inDepth = depthQueue.get() # Blocking call, will wait until a new data has arrived
    
            depthFrame = inDepth.getFrame() # depthFrame values are in millimeters
    
            depth_downscaled = depthFrame[::4]
            if np.all(depth_downscaled == 0):
                min_depth = 0  # Set a default minimum depth value when all elements are zero
            else:
                min_depth = np.percentile(depth_downscaled[depth_downscaled != 0], 1)
            max_depth = np.percentile(depth_downscaled, 99)
            depthFrameColor = np.interp(depthFrame, (min_depth, max_depth), (0, 255)).astype(np.uint8)
            depthFrameColor = cv2.applyColorMap(depthFrameColor, cv2.COLORMAP_HOT)
            results = model("E:/DEPTHAI_FREELANCE/OAKDEV/image.jpg")
            for result in results:
                keypoints = result.keypoints.xy
                boxes = result.boxes.xyxy
                img = result.orig_img
                print(boxes)
            
            #config.roi = dai.Rect(topLeft, bottomRight)
            #config.calculationAlgorithm = calculationAlgorithm
            #cfg = dai.SpatialLocationCalculatorConfig()
            #cfg.addROI(config)
            #spatialCalcConfigInQueue.send(cfg)
            
            spatialData = spatialCalcQueue.get().getSpatialLocations()
            
            for depthData in spatialData:
                roi = depthData.config.roi
                roi = roi.denormalize(width=depthFrameColor.shape[1], height=depthFrameColor.shape[0])
                xmin = int(roi.topLeft().x)
                ymin = int(roi.topLeft().y)
                xmax = int(roi.bottomRight().x)
                ymax = int(roi.bottomRight().y)
    
                depthMin = depthData.depthMin
                depthMax = depthData.depthMax
    
                fontType = cv2.FONT_HERSHEY_TRIPLEX
                cv2.rectangle(depthFrameColor, (xmin, ymin), (xmax, ymax), color, 1)
                cv2.putText(depthFrameColor, f"X: {int(depthData.spatialCoordinates.x)} mm", (xmin + 10, ymin + 20), fontType, 0.5, color)
                cv2.putText(depthFrameColor, f"Y: {int(depthData.spatialCoordinates.y)} mm", (xmin + 10, ymin + 35), fontType, 0.5, color)
                cv2.putText(depthFrameColor, f"Z: {int(depthData.spatialCoordinates.z)} mm", (xmin + 10, ymin + 50), fontType, 0.5, color)

    Regards!