Incorrect spatial location for yolov8 detections when using depthai-core

shAhn

When using the DepthAI Core C++ API, the depth accuracy is terrible, and the measurements are often output as a single fixed value.

For more details, please refer to this GitHub link: luxonis/depthai-coreissues/1272.

jakaskerl

shAhn
Try with https://docs.luxonis.com/software/depthai/examples/spatial_location_calculator/ and set ROI over an object that you know the distance of. Could be a bad calibration, but let's make sure the app works first.

Thanks,
Jaka

shAhn

jakaskerl I built and ran the Spatial Location Calculator C++ example, but the depth values were still output as single fixed values, such as 5979mm and 7469mm. Additionally, the depth measurements were not accurate.

However, when using Python, the depth values were correct, which suggests that this is not a calibration issue.
(I have already performed calibration using a checkerboard.)

I suspect that when using the C++ API, the camera's calibration extrinsic and intrinsic parameters might be default values instead of the actual calibrated ones.

Could you check if the C++ API is correctly retrieving the calibration parameters?

jakaskerl

shAhn Perhaps you are using an older version of core? I checked and they both work the same for me. X and Y are 0 unless you move the roi around.

Thanks,
Jaka

shAhn

jakaskerl
No, both depthai-core and the bootloader are up to date.

X and Y are not my focus—I am only checking the Z values, but the depth measurements are inaccurate.
For example, all objects detected within the range of 5.6m to 6.8m are measured as 7.469m, regardless of their actual position.

I built the C++ example using the following steps:

git clone https://github.com/luxonis/depthai-core.git
git submodule update --init --recursive
cmake -S. -Bbuild -D'DEPTHAI_BUILD_EXAMPLES=ON'
cmake --build build
cd build/examples
./SpatialDetection/spatial_location_calculator.cpp

Could you verify if there is any issue with how the C++ API handles depth measurements compared to Python?

Also, could you please take a look at issue
[BUG] incorrect spatial location for yolov8 detections when using depthai-core (#1272)
in this link: https://github.com/luxonis/depthai-core/issues?

shAhn

shAhn
Could you please try running the following C++ code and check if the depth values are correctly displayed during object detection?

I used yolov8n_coco_640x352.blob in the CMakeLists.txt.

$$
#include <chrono>
#include "utility.hpp"
#include "depthai/depthai.hpp"

// Label map for YOLO
static const std::vector<std::string> labelMap = {
"person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse",
"sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag",
"tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove",
"skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon",
"bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza",
"donut", "cake", "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor",
"laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
"refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"
};

int main(int argc, char** argv) {
std::string nnPath(BLOB_PATH);

if(argc > 1) {
    nnPath = std::string(argv[1]);
}
printf("Using blob at path: %s\n", nnPath.c_str());

dai::Pipeline pipeline;
auto camRgb = pipeline.create<dai::node::ColorCamera>();
auto spatialDetectionNetwork = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
auto monoLeft = pipeline.create<dai::node::MonoCamera>();
auto monoRight = pipeline.create<dai::node::MonoCamera>();
auto stereo = pipeline.create<dai::node::StereoDepth>();

// Camera and network settings
camRgb->setPreviewSize(640, 352);
camRgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
camRgb->setInterleaved(false);
camRgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::BGR);

monoLeft->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
monoLeft->setCamera("left");
monoRight->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
monoRight->setCamera("right");

stereo->setDefaultProfilePreset(dai::node::StereoDepth::PresetMode::HIGH_DENSITY);
stereo->setDepthAlign(dai::CameraBoardSocket::CAM_A);
stereo->setOutputSize(monoLeft->getResolutionWidth(), monoLeft->getResolutionHeight());

spatialDetectionNetwork->setBlobPath(nnPath);
spatialDetectionNetwork->setConfidenceThreshold(0.5f);
spatialDetectionNetwork->setBoundingBoxScaleFactor(0.5);
spatialDetectionNetwork->setDepthLowerThreshold(100);
spatialDetectionNetwork->setDepthUpperThreshold(5000);
spatialDetectionNetwork->setNumClasses(80);
spatialDetectionNetwork->setIouThreshold(0.5f);

// Linking nodes
monoLeft->out.link(stereo->left);
monoRight->out.link(stereo->right);
camRgb->preview.link(spatialDetectionNetwork->input);
stereo->depth.link(spatialDetectionNetwork->inputDepth);

dai::Device device(pipeline);
auto previewQueue = device.getOutputQueue("rgb", 4, false);
auto depthQueue = device.getOutputQueue("depth", 4, false);

while(true) {
    auto imgFrame = previewQueue->get<dai::ImgFrame>();
    auto depth = depthQueue->get<dai::ImgFrame>();

    cv::Mat frame = imgFrame->getCvFrame();
    cv::Mat depthFrame = depth->getFrame();
    
    cv::imshow("depth", depthFrame);
    cv::imshow("rgb", frame);

    if(cv::waitKey(1) == 'q') {
        break;
    }
}
return 0;

}
$$

Could you check if the depth values are accurate when performing object detection using this code?

Let me know if you experience the same issue with incorrect or fixed depth values.

jakaskerl

shAhn
The code you have sent seems incomplete. Can you send the full MRE?

Thanks,
Jaka

shAhn

#include <chrono>

#include "utility.hpp"

// Includes common necessary includes for development using depthai library
#include "depthai/depthai.hpp"

static const std::vector<std::string> labelMap = {
    "person",        "bicycle",      "car",           "motorbike",     "aeroplane",   "bus",         "train",       "truck",        "boat",
    "traffic light", "fire hydrant", "stop sign",     "parking meter", "bench",       "bird",        "cat",         "dog",          "horse",
    "sheep",         "cow",          "elephant",      "bear",          "zebra",       "giraffe",     "backpack",    "umbrella",     "handbag",
    "tie",           "suitcase",     "frisbee",       "skis",          "snowboard",   "sports ball", "kite",        "baseball bat", "baseball glove",
    "skateboard",    "surfboard",    "tennis racket", "bottle",        "wine glass",  "cup",         "fork",        "knife",        "spoon",
    "bowl",          "banana",       "apple",         "sandwich",      "orange",      "broccoli",    "carrot",      "hot dog",      "pizza",
    "donut",         "cake",         "chair",         "sofa",          "pottedplant", "bed",         "diningtable", "toilet",       "tvmonitor",
    "laptop",        "mouse",        "remote",        "keyboard",      "cell phone",  "microwave",   "oven",        "toaster",      "sink",
    "refrigerator",  "book",         "clock",         "vase",          "scissors",    "teddy bear",  "hair drier",  "toothbrush"};

static std::atomic<bool> syncNN{true};

int main(int argc, char** argv) {
    using namespace std;
    using namespace std::chrono;
    std::string nnPath(BLOB_PATH);

    // If path to blob specified, use that
    if(argc > 1) {
        nnPath = std::string(argv[1]);
    }

    // Print which blob we are using
    printf("Using blob at path: %s\n", nnPath.c_str());

    // Create pipeline
    dai::Pipeline pipeline;

    // Define sources and outputs
    auto camRgb = pipeline.create<dai::node::ColorCamera>();
    auto spatialDetectionNetwork = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
    auto monoLeft = pipeline.create<dai::node::MonoCamera>();
    auto monoRight = pipeline.create<dai::node::MonoCamera>();
    auto stereo = pipeline.create<dai::node::StereoDepth>();

    auto xoutRgb = pipeline.create<dai::node::XLinkOut>();
    auto xoutNN = pipeline.create<dai::node::XLinkOut>();
    auto xoutDepth = pipeline.create<dai::node::XLinkOut>();
    auto nnNetworkOut = pipeline.create<dai::node::XLinkOut>();

    xoutRgb->setStreamName("rgb");
    xoutNN->setStreamName("detections");
    xoutDepth->setStreamName("depth");
    nnNetworkOut->setStreamName("nnNetwork");

    // Properties
    camRgb->setPreviewSize(640, 352);
    camRgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
    camRgb->setInterleaved(false);
    camRgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::BGR);

    monoLeft->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
    monoLeft->setCamera("left");
    monoRight->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
    monoRight->setCamera("right");

    // setting node configs
    stereo->setDefaultProfilePreset(dai::node::StereoDepth::PresetMode::HIGH_DENSITY);
    // Align depth map to the perspective of RGB camera, on which inference is done
    stereo->setDepthAlign(dai::CameraBoardSocket::CAM_A);
    stereo->setOutputSize(monoLeft->getResolutionWidth(), monoLeft->getResolutionHeight());

    spatialDetectionNetwork->setBlobPath(nnPath);
    spatialDetectionNetwork->setConfidenceThreshold(0.5f);
    spatialDetectionNetwork->input.setBlocking(false);
    spatialDetectionNetwork->setBoundingBoxScaleFactor(0.5);
    spatialDetectionNetwork->setDepthLowerThreshold(100);
    spatialDetectionNetwork->setDepthUpperThreshold(12000);

    // yolo specific parameters
    spatialDetectionNetwork->setNumClasses(80);
    spatialDetectionNetwork->setCoordinateSize(4);
    spatialDetectionNetwork->setAnchors({10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319});
    spatialDetectionNetwork->setAnchorMasks({{"side26", {1, 2, 3}}, {"side13", {3, 4, 5}}});
    spatialDetectionNetwork->setIouThreshold(0.5f);

    // Linking
    monoLeft->out.link(stereo->left);
    monoRight->out.link(stereo->right);

    camRgb->preview.link(spatialDetectionNetwork->input);
    if(syncNN) {
        spatialDetectionNetwork->passthrough.link(xoutRgb->input);
    } else {
        camRgb->preview.link(xoutRgb->input);
    }

    spatialDetectionNetwork->out.link(xoutNN->input);

    stereo->depth.link(spatialDetectionNetwork->inputDepth);
    spatialDetectionNetwork->passthroughDepth.link(xoutDepth->input);
    spatialDetectionNetwork->outNetwork.link(nnNetworkOut->input);

    // Connect to device and start pipeline
    dai::Device device(pipeline);

    // Output queues will be used to get the rgb frames and nn data from the outputs defined above
    auto previewQueue = device.getOutputQueue("rgb", 4, false);
    auto detectionNNQueue = device.getOutputQueue("detections", 4, false);
    auto depthQueue = device.getOutputQueue("depth", 4, false);
    auto networkQueue = device.getOutputQueue("nnNetwork", 4, false);

    auto startTime = steady_clock::now();
    int counter = 0;
    float fps = 0;
    auto color = cv::Scalar(255, 255, 255);
    bool printOutputLayersOnce = true;

    while(true) {
        auto imgFrame = previewQueue->get<dai::ImgFrame>();
        auto inDet = detectionNNQueue->get<dai::SpatialImgDetections>();
        auto depth = depthQueue->get<dai::ImgFrame>();
        auto inNN = networkQueue->get<dai::NNData>();

        if(printOutputLayersOnce && inNN) {
            std::cout << "Output layer names: ";
            for(const auto& ten : inNN->getAllLayerNames()) {
                std::cout << ten << ", ";
            }
            std::cout << std::endl;
            printOutputLayersOnce = false;
        }

        cv::Mat frame = imgFrame->getCvFrame();
        cv::Mat depthFrame = depth->getFrame();  // depthFrame values are in millimeters

        cv::Mat depthFrameColor;
        cv::normalize(depthFrame, depthFrameColor, 255, 0, cv::NORM_INF, CV_8UC1);
        cv::equalizeHist(depthFrameColor, depthFrameColor);
        cv::applyColorMap(depthFrameColor, depthFrameColor, cv::COLORMAP_HOT);

        counter++;
        auto currentTime = steady_clock::now();
        auto elapsed = duration_cast<duration<float>>(currentTime - startTime);
        if(elapsed > seconds(1)) {
            fps = counter / elapsed.count();
            counter = 0;
            startTime = currentTime;
        }

        auto detections = inDet->detections;

        for(const auto& detection : detections) {
            auto roiData = detection.boundingBoxMapping;
            auto roi = roiData.roi;
            roi = roi.denormalize(depthFrameColor.cols, depthFrameColor.rows);
            auto topLeft = roi.topLeft();
            auto bottomRight = roi.bottomRight();
            auto xmin = (int)topLeft.x;
            auto ymin = (int)topLeft.y;
            auto xmax = (int)bottomRight.x;
            auto ymax = (int)bottomRight.y;
            cv::rectangle(depthFrameColor, cv::Rect(cv::Point(xmin, ymin), cv::Point(xmax, ymax)), color, cv::FONT_HERSHEY_SIMPLEX);

            int x1 = detection.xmin * frame.cols;
            int y1 = detection.ymin * frame.rows;
            int x2 = detection.xmax * frame.cols;
            int y2 = detection.ymax * frame.rows;

            uint32_t labelIndex = detection.label;
            std::string labelStr = to_string(labelIndex);
            if(labelIndex < labelMap.size()) {
                labelStr = labelMap[labelIndex];
            }
            cv::putText(frame, labelStr, cv::Point(x1 + 10, y1 + 20), cv::FONT_HERSHEY_TRIPLEX, 0.5, 255);
            std::stringstream confStr;
            confStr << std::fixed << std::setprecision(2) << detection.confidence * 100;
            cv::putText(frame, confStr.str(), cv::Point(x1 + 10, y1 + 35), cv::FONT_HERSHEY_TRIPLEX, 0.5, 255);

            std::stringstream depthX;
            depthX << "X: " << (int)detection.spatialCoordinates.x << " mm";
            cv::putText(frame, depthX.str(), cv::Point(x1 + 10, y1 + 50), cv::FONT_HERSHEY_TRIPLEX, 0.5, 255);
            std::stringstream depthY;
            depthY << "Y: " << (int)detection.spatialCoordinates.y << " mm";
            cv::putText(frame, depthY.str(), cv::Point(x1 + 10, y1 + 65), cv::FONT_HERSHEY_TRIPLEX, 0.5, 255);
            std::stringstream depthZ;
            depthZ << "Z: " << (int)detection.spatialCoordinates.z << " mm";
            cv::putText(frame, depthZ.str(), cv::Point(x1 + 10, y1 + 80), cv::FONT_HERSHEY_TRIPLEX, 0.5, 255);

            cv::rectangle(frame, cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)), color, cv::FONT_HERSHEY_SIMPLEX);
        }

        std::stringstream fpsStr;
        fpsStr << std::fixed << std::setprecision(2) << fps;
        cv::putText(frame, fpsStr.str(), cv::Point(2, imgFrame->getHeight() - 4), cv::FONT_HERSHEY_TRIPLEX, 0.4, color);

        cv::imshow("depth", depthFrameColor);
        cv::imshow("rgb", frame);

        int key = cv::waitKey(1);
        if(key == 'q' || key == 'Q') {
            return 0;
        }
    }
    return 0;
}

Here is the code.

It is a modified version of depthai-core/examples/SpatialDetection/spatial_tiny_yolo.cpp.
To use YOLOv8, I modified depthai-core/examples/CMakeLists.txt as follows:

line[367, 368]
target_compile_definitions(spatial_tiny_yolo_v3 PRIVATE BLOB_PATH="${yolov8n_blob}")
target_compile_definitions(spatial_tiny_yolo_v4 PRIVATE BLOB_PATH="${yolov8n_blob}")

jakaskerl

shAhn

Looks good to me. Ignore objects that are too thin to calculate depth - median is used so eg. person object is calculating depth from the wall behind. Depth values look good.

Thanks,
Jaka

shAhn

jakaskerl
I’ve solved the issue!
After adding the line stereo->setSubpixel(true);, the system now outputs accurate depth measurements.

However, I have a follow-up question.

While reviewing the CMakeLists.txt, I noticed that files like depthai_calib.json and BW10980BC.json are being pulled via Hunter.

In my case, I performed custom calibration after purchasing the camera, so I need the program to use the calibration parameters stored inside the device itself.

So my question is:

When using the C++ API, does the program use those sample calibration files, or does it automatically retrieve the actual calibration parameters stored on the camera?

jakaskerl

shAhn After adding the line stereo->setSubpixel(true);, the system now outputs accurate depth measurements.

That is very strange and should not happen.

shAhn When using the C++ API, does the program use those sample calibration files, or does it automatically retrieve the actual calibration parameters stored on the camera?

It automatically retrieves the calibration on the device. Same calibration file across multiple devices wouldn't work as the intrinsics change too much.

Thanks,
Jaka