X
xperroni

  • 21 hours ago
  • Joined Feb 4, 2022
  • 1 best answer
  • izou3 Yes, I also stumbled on this issue a while back: anchor-less YOLO models converted by blobconverter don't work with YoloDetectionNetwork, but they do if you use Luxonis' online model conversion tool.

    As explained here, the online converter "uses a slightly edited Detection head, which removes some of the final processing operations from the original head. That is because these steps are done inside the YoloDetectionNetwork node from DepthAI library, so that is why it didn't work.".

    • erik I was hoping to be able to write custom nodes to run in the camera using OpenCV (or an OpenCV-like API), instead of using the ImageManip and Script nodes, which never seem to quite be enough. I've also grown frustrated with how hard it's become to load the OAK's with any model not already present in the DepthAI Model ZOO. Many recent PyTorch models when converted to ONNX incorporate operators not supported in versions of OpenVINO prior to 2023.0, which dropped the MyriadX backend. I suppose it's not impossible to back-port those models, but it's a huge inconvenience, especially when compared to other platforms such as TensorRT which most often enjoy out-of-the-box support.

    • I was experimenting with the pre-trained YOLO11n detection model from Ultralytics, and I found that if I convert it into a Blob like this:

      $ python3 -c 'from ultralytics import YOLO ; model = YOLO("yolo11n.pt") ; model.export(format="onnx")'
      $ python3 -m blobconverter --onnx-model yolo11n.onnx --shaves 3

      ...and then try to run the model using the YOLOv8 example code, I get the following error over and over again:

      [1944301041162F1200] [1.3] [3.861] [DetectionNetwork(1)] [error] Mask is not defined for output layer with width '8400'. Define at pipeline build time using: 'setAnchorMasks' for 'side8400'.

      If however I use the online model exporter to convert the weights file yolo11n.pt into a blob, then it works.

      Why is that? Is there some missing blobconverter option that the online tool applies automatically behind the scenes?

    • Having read through the description of the first RVC4 cameras available for pre-order, I must confess to being rather disappointed.

      I was hoping for an upgrade that would support models built with newer ONNX operators and maybe increase computing capacity a bit, while mostly keeping the form factor, connectivity and workflow of the RVC2 devices. My ideal upgrade would have been an OAK-D Pro W that loaded models in ONNX format and exposed an API for writing custom shave kernels.

      Instead what we got is a much bulkier, costly, and frankly bloated line of devices, that I imagine will require an extra power inlet for most USB-based setups (it certainly will for mine). Having a fully flashed SBC crammed into the camera might seem good on paper, but in practice it will require a lot of architecture changes to make it worth the added cost, or even a viable option in terms of mechanics and power delivery.

      I sure hope that wee see a "lite" series of upgraded OAK devices in the near term that is more like the RVC2 line in terms of features and cost. Otherwise I might have no choice but to look for alternatives from other vendors in the next few years — perhaps even going so far as do without in-device processing.

    • The new RVC4 cameras certainly look powerful, but am I the only one left wanting for something more lightweight and economical? Yes, the S2 cameras are still available, but Intel removing Movidius support in OpenVINO has limited their ability to run newer AI models. I love the OAK line and want to continue using it in future projects, but these new prices are sure going to sting.

    • I was able to get rid of the depth artifacts, but it took more than lowering the confidence threshold. Specifically, I'm using the following configuration options:

          import depthai as dai
      
          from depthai import MedianFilter
          DecimationMode = dai.RawStereoDepthConfig.PostProcessing.DecimationFilter.DecimationMode
      
          # ...
      
          message = dai.StereoDepthConfig()
          options = message.get()
      
          options.algorithmControl.leftRightCheckThreshold = 1
          options.costMatching.confidenceThreshold = 200
          options.costMatching.linearEquationParameters.alpha = 1
          options.costMatching.linearEquationParameters.beta = 1
          options.postProcessing.brightnessFilter.maxBrightness = 250
          options.postProcessing.decimationFilter.decimationFactor = 2
          options.postProcessing.decimationFilter.decimationMode = DecimationMode.NON_ZERO_MEAN
          options.postProcessing.median = MedianFilter.KERNEL_7x7
          options.postProcessing.spatialFilter.enable = True
          options.postProcessing.speckleFilter.enable = True
      
          message.set(options)
      
          # ...

      For illustration, in the image below you can see the view from the RGB camera, the depth map with (mostly) default settings, and the depth map with the above settings.

    • I recently started getting the following warning from my OAK-D Pro W:

      [18443010E1217F0E00] [1.2.4] [1.311] [system] [warning] Capping IR dot projector brightness to 1%

      After checking the documentation, I realized that the problem was that I replaced a call to the deprecated Device method setIrLaserDotProjectorBrightness() with setIrLaserDotProjectorIntensity(), but didn't account for the change in input range — the former accepted a milli-amper value between 0 and 1200, while the latter accepts an intensity value between 0 and 1. Updating my code to constrain intensity values to the range [0, 1] silenced the warning.

      Posting here for reference in case someone else stumbles on the same problem.

      • I'm working on an autonomous wheelchair system that uses OAK-D Pro W cameras for person tracking and obstacle detection. The camera stands at a height of 2 meters, looking down at a 27° angle. The IR laser dot projector is set to 765mA, and the flood IR LED is off. The following options are set in the dai::node::StereoDepth object:

        stereo->setDefaultProfilePreset(dai::node::StereoDepth::PresetMode::HIGH_ACCURACY);
        stereo->setLeftRightCheck(true);
        stereo->setRectification(true);
        stereo->setExtendedDisparity(false);
        stereo->setSubpixel(true);
        stereo->initialConfig.setMedianFilter(dai::MedianFilter::KERNEL_7x7);

        I have found that for some very specific tiled surfaces, the camera will underestimate the depth of the ridges between tiles, placing them a lot closer to the camera then they're actually are. However this is only a problem when the IR laser dot is on — artifacts disappear when it is turned off. See the pictures below for examples:

        Depth image with IR laser dot turned on — notice the light blue (low depth) artifacts inside the white circle.

        The same surface images with the IR laser dot off — no artifacts are seen.

        The thing about this issue is that it's extremely hard to reproduce. So far I have been unable to find another environment where it's so consistently found as the one above. For example, none of these tiled surfaces show the same pattern:

        On the other hand, this flight of stairs shows depth artifacts even with the laser dot turned off:

        Has anyone seen similar issues? Any suggestions on how to address it?

        • I was able to get rid of the depth artifacts, but it took more than lowering the confidence threshold. Specifically, I'm using the following configuration options:

              import depthai as dai
          
              from depthai import MedianFilter
              DecimationMode = dai.RawStereoDepthConfig.PostProcessing.DecimationFilter.DecimationMode
          
              # ...
          
              message = dai.StereoDepthConfig()
              options = message.get()
          
              options.algorithmControl.leftRightCheckThreshold = 1
              options.costMatching.confidenceThreshold = 200
              options.costMatching.linearEquationParameters.alpha = 1
              options.costMatching.linearEquationParameters.beta = 1
              options.postProcessing.brightnessFilter.maxBrightness = 250
              options.postProcessing.decimationFilter.decimationFactor = 2
              options.postProcessing.decimationFilter.decimationMode = DecimationMode.NON_ZERO_MEAN
              options.postProcessing.median = MedianFilter.KERNEL_7x7
              options.postProcessing.spatialFilter.enable = True
              options.postProcessing.speckleFilter.enable = True
          
              message.set(options)
          
              # ...

          For illustration, in the image below you can see the view from the RGB camera, the depth map with (mostly) default settings, and the depth map with the above settings.

      • I'm working on a mobile robotics project that's inching towards production. Some of our partners have expressed concerns about the long-term viability of the OAK platform: we know the RVC2 cameras will be available until 2030, but already there are some recent AI model we cannot use due to OpenVINO dropping Myriad X support and the lack of an alternative architecture. This is only to get worse as the AI ecosystem moves on.

        I subscribed to the RVC4 newsletter last year, but have yet to receive any issues — if this was simply a delivery problem, it would be nice to have a page where we could read past issues. We would also like to know whether we'll still be using OpenVINO for the AI models (and if not, what will take its place), and whether / when the new RVC4 cameras will be available for pre-order.

      • Hi jakaskerl

        jakaskerl Could you check where the spatial bounding box is located.

        How can I do that?

        jakaskerl I think it stays within the bounds of depth, which makes it not lay on top of the detected object.

        That would explain the results I'm getting at the borders, but wouldn't change the underlying issue — that objects detected close to the borders of the RGB image lack depth data.

        jakaskerl I think you could change this by using a different averaging method (setSpatialCalculationAlgorithm()) and lowering the setBoundingBoxScaleFactor().

        The bounding box scale factor is set to 0.5 and the object depth is computed by using the MIN algorithm. But I don't see how changing either would fix the issue with objects at the borders. If the object is outside the bounds of the depth map, whatever algorithm I choose will still be working on incorrect data; nor will changing the bounding box scale factor help if there is no "correct" ROI to narrow down to.

        In the end it still seems that the only practical solution is to discard objects detected too close to the borders.

        • I'm also doing spatial object detection with the RGB camera set to its maximum FoV and "squeezing" the RGB images into the detector. More specifically, my ColorCamera node is set with:

          setResolution(dai::ColorCameraProperties::SensorResolution::THE_12_MP)
          setIspScale(1, 3)
          setPreviewKeepAspectRatio(false)

          In contrast, the resolution of the monochrome cameras is set to the lowest possible configuration, that is dai::MonoCameraProperties::SensorResolution::THE_400_P.

          Looking into the RGB and depth images, then projecting the detection coordinates to a plane, it does look like the spatial detector is doing the "right thing" and correctly aligning / scaling the data so that they match as well as possible — which it should be able to do by computing projections from the intrinsic / extrinsic data available in the OAK camera itself. See below a few examples:

          The one problem I have is that, as can be seen above, there is a relatively large region at the borders of the RGB image for which no depth data is available. When an object is detected at these regions, position estimates can get wildly wrong, as seen below:

          It would be nice if there were some mechanism in the DepthAI API to guard against such cases — for example, an option to ignore any detection with bounds beyond the region covered by the depth map.

          • What are the axes directions for the spatial coordinates in SpatialImgDetection? I was expecting it to follow camera coordinate conventions, that is:

            • X points to the right
            • Y points down
            • Z points forward

            However, in tests I found Y to point up — Y coordinate values are positive above the camera's center, and negative below.

            Is this by design? If yes, what is the rationale?

            • jakaskerl replied to this.
            • Hi xperroni
              OAK camera uses left-handed (Cartesian) coordinate system for all spatial coordiantes.
              This is because the center of the image is the origin, so it makes sense intuitively that x is right, y is up and z is depth.

              Thanks,
              Jaka

            • I'm using the ObjectTracker node in an academic project, and I'd like to include a description of how it works in a report. The problem is, other than KCF, there is really no indication of what algorithms are used.

              For example, what algorithm is used when ZERO_TERM_COLOR_HISTOGRAM is selected? Given the close relationship of DepthAI to OpenCV I would guess it must be some variation of MeanShift, but it would be great to know for sure.

              Also, what is the policy for discarding tracks? ObjectTracker has a threshold for using a detection for tracking, but no attribute for setting a tracking timeout, etc.

            • erik I'm sorry, but other than being in C++, is the code I provided in my original post not an adequate minimum example of the issue? If so, what is it missing?

            • I'm writing a C++ module to interface with OAK-D PoE cameras. One of my requirements is to retrieve and display RGB video at 1080p resolution and 30 FPS. I had the idea of using VideoEncoder to encode frames for faster transfer, then decode them in the host for display.

              This works fine so long as I am polling the data output queue using get(). If however I use a callback, the actual frame rate starts at 30 but then after some time it starts steadily slowing down until stabilizing around 9 FPS. What's weirder is that it seems the slow down is triggered by changes in camera input — I had several cases where the FPS rate remained at 30 until I made a sudden movement, and then quickly dropped to about 9 FPS.

              Any ideas about this?

              See below a simple program I wrote to demonstrate the issue.

              #include <chrono>
              #include <iostream>
              #include <queue>
              #include <set>
              
              #include "depthai/depthai.hpp"
              
              class CameraRGB {
                std::shared_ptr<dai::Device> device_;
              
                dai::Pipeline pipeline_;
              
                std::shared_ptr<dai::node::ColorCamera> camera_;
              
                std::shared_ptr<dai::node::VideoEncoder> encoder_;
              
                std::shared_ptr<dai::node::XLinkOut> output_;
              
                std::shared_ptr<dai::DataOutputQueue> queue_;
              
              public:
                CameraRGB(bool encoded) {
                  camera_ = pipeline_.create<dai::node::ColorCamera>();
                  float fps = camera_->getFps();
              
                  std::cerr << "[INFO] Camera FPS rate: " << fps << std::endl;
              
                  output_ = pipeline_.create<dai::node::XLinkOut>();
                  output_->setStreamName("rgb");
              
                  if (encoded) {
                    encoder_ = pipeline_.create<dai::node::VideoEncoder>();
                    encoder_->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
              
                    camera_->video.link(encoder_->input);
                    encoder_->bitstream.link(output_->input);
                  }
                  else {
                    camera_->video.link(output_->input);
                  }
              
                  device_ = std::make_shared<dai::Device>(pipeline_);
                  queue_ = device_->getOutputQueue("rgb", 1, false);
                }
              
                cv::Mat poll() {
                  auto frame = queue_->get<dai::ImgFrame>();
                  return getCvFrame(*frame);
                }
              
                cv::Mat getCvFrame(dai::ImgFrame& frame) {
                  if (encoder_) {
                    return cv::imdecode(frame.getData(), cv::IMREAD_UNCHANGED);
                  }
                  else {
                    return frame.getCvFrame();
                  }
                }
              
                std::shared_ptr<dai::DataOutputQueue> getQueue() {
                  return queue_;
                }
              };
              
              void poll(CameraRGB& camera) {
                auto last = std::chrono::steady_clock::now();
                int frame_count = 0;
              
                while (true) {
                  auto image = camera.poll();
              
                  frame_count += 1;
              
                  auto now = std::chrono::steady_clock::now();
                  std::chrono::duration<double> dt = now - last;
                  double dt_seconds = dt.count();
                  if (dt_seconds >= 1.0) {
                    std::cerr << "[INFO] Frame rate: " << (frame_count / dt_seconds) << std::endl;
                    frame_count = 0;
                    last = now;
                  }
              
                  cv::imshow("Image", image);
              
                  int key = cv::waitKey(1);
                  if(key == 'q' || key == 'Q') {
                    return;
                  }
                }
              }
              
              void callback(CameraRGB& camera) {
                auto last = std::chrono::steady_clock::now();
                int frame_count = 0;
              
                auto queue = std::queue<cv::Mat>();
                std::mutex queueMtx;
              
                auto showFrame = [&](std::shared_ptr<dai::ADatatype> callback) {
                  auto* frame = dynamic_cast<dai::ImgFrame*>(callback.get());
                  if (frame == nullptr) {
                    std::cerr << "[ERROR] Received message is not a message frame" << std::endl;
                    return;
                  }
              
                  {
                    std::unique_lock<std::mutex> lock(queueMtx);
                    auto image = camera.getCvFrame(*frame);
                    queue.push(image);
                  }
              
                  frame_count += 1;
              
                  auto now = std::chrono::steady_clock::now();
                  std::chrono::duration<double> dt = now - last;
                  double dt_seconds = dt.count();
                  if (dt_seconds >= 1.0) {
                    std::cerr << "[INFO] Frame rate: " << (frame_count / dt_seconds) << std::endl;
                    frame_count = 0;
                    last = now;
                  }
                };
              
                camera.getQueue()->addCallback(showFrame);
              
                while (true) {
                  int key = cv::waitKey(1);
                  if(key == 'q' || key == 'Q') {
                    return;
                  }
              
                  {
                    std::unique_lock<std::mutex> lock(queueMtx);
                    if (queue.empty()) {
                      continue;
                    }
                  }
              
                  auto image = queue.front();
                  queue.pop();
              
                  cv::imshow("Image", image);
                }
              }
              
              int main(int argc, char** argv) {
                std::set<std::string> options;
                for (int i = 1; i < argc; ++i) {
                  options.insert(argv[i]);
                }
              
                bool encoded = (options.count("encoded") > 0);
                bool polling = (options.count("polling") > 0);
              
                CameraRGB camera(encoded);
              
                if (polling) {
                  poll(camera);
                }
                else {
                  callback(camera);
                }
              
                return 0;
              }
              • erik replied to this.