I have a working camera and real-time application.

For my use case the camera stays in a fixed position and only has a variable distance from the area it observes. This distance is set by the user. When the camera is positioned physically further away then naturally the area it observes takes up a smaller pixel area of the resulting preview size image.

The application has a calibration step that I'd like to update so that when the camera is positioned further away like this, the user can effectually zoom-in so that the observed area takes up more pixels in the preview size image. The desire for this is so that the preview image has more "meaningful pixels" during inference time.

This is a real-time application with a small preview size so this is why I'm exploring this idea of more "meaningful pixels".

My questions are:

  1. Does this more "meaningful pixels" idea make sense? Is there a better term I should be using?
  2. What options and APIs do I have to achieve this?
  3. Are there any existing code examples that would help guide me?
  • jakaskerl replied to this.
  • Hi luxd
    Back with info.
    Apparently both methods should be identical and should run on the same hardware accelerator.
    The trick is to first use setResize() and then setCenterCrop(), then image should look the same as with setPreviewSize().

    Example from SDK:

    if self._ar_resize_mode == ResizeMode.CROP:
                self.image_manip.initialConfig.setResize(self._size)
                # With .setCenterCrop(1), ImageManip will first crop the frame to the correct aspect ratio,
                # and then resize it to the NN input size
                self.image_manip.initialConfig.setCenterCrop(1, self._size[0] / self._size[1])

    Thanks,
    Jaka

    Thanks jakaskerl. I'll give that a try. I appreciate the pipeline sequence note, that helps set expectations.

    @jakaskerl

    Below is my current implementation. I feel like I'm missing some kind of "apply" API as I see no effect when the cropIn and cropOut calls are made. Do you see what I'm doing wrong?

    This is my pipeline setup.

    // 1. Create pipeline
    dai::Pipeline corePipeline;
    
    // 2. Define sources and outputs
    colorCam = corePipeline.create<dai::node::ColorCamera>();
    auto configIn = corePipeline.create<dai::node::XLinkIn>();
    auto xLinkPreviewOut = corePipeline.create<dai::node::XLinkOut>();
    
    // 3. Properties
    colorCam->setPreviewSize(previewSize.width, previewSize.height);
    colorCam->setResolution(dai::ColorCameraProperties::SensorResolution::THE_720_P);
    
    configIn->setStreamName(STREAM_CONFIG_IN);
    
    xLinkPreviewOut->setStreamName(STREAM_PREVIEW);
    
    // 4. Linking
    colorCam->preview.link(xLinkPreviewOut->input);
    configIn->out.link(colorCam->inputConfig);
    
    // 5. Connecting
    device = std::shared_ptr<dai::Device>(new dai::Device(corePipeline, deviceInfo));
    
    // 6. Queries
    rgbQueue = device->getOutputQueue(STREAM_PREVIEW, 4, false);
    configQueue = device->getInputQueue(STREAM_CONFIG_IN, 4, false);

    The later calls to cropIn and cropOut are made, but I see no effect in the resulting preview image. I've confirmed the methods are executing and not erroring.

    void CropIn()
    {
        dai::ImageManipConfig cfg;
        cfg.setCenterCrop(.5);
        configQueue->send(cfg);
        std::cout << "crop in" << std::endl;
    }
    
    void CropOut()
    {
        dai::ImageManipConfig cfg;
        cfg.setCenterCrop(1);
        configQueue->send(cfg);
        std::cout << "crop out" << std::endl;
    }

      jakaskerl Thank you. I was under the impression the ImageManipConfig accounted for the manipulation but now I see that's just part of the puzzle as it needs to be fed to the manip node.

      I still need some clarity. Is the crop happening to the downsized preview image or is the crop happening to the original image before it's downsized to the preview image? Maybe it makes no difference from DepthAI's perspective (functionally), but from my API-user perspective it's unclear.

      From what I can tell, it's the former (and this isn't what I want) as now the preview image is smaller than the desired colorCam->setPreviewSize(previewSize.width, previewSize.height); call. Maybe this is what you meant when you said "along with setResize()" in your initial comment? If true, the resize would now be stretching up and losing quality correct?

      @jakaskerl My current understanding is:

      1. The crop occurs to the preview image after it's been downsized from the original (which is set via setResolution)
      2. A setResize call is required to get back to the original setPreviewSize
      3. This results in a lossy image due to the upscale

      Is this correct? If not, why?

      If so, how do I achieve a crop of the original image (at the dimensions set via setResolution) before it's downsized to the preview resolution (set via setPreviewSize)?

        Hi luxd

        1. Yes, the preview output will first scale down the video output (16:9 cropped from ISP - but max 4K) and then crop it to the aspect ratio set with setPreviewSize() illustration
        2. setResize() is part of the ImageManip node and will resize the image to fit given W,H. I only mentioned this because sometimes the crop you make will be to small for NN node's input layer.
        3. You will always lose quality when upscaling.

        luxd The desire for this is so that the preview image has more "meaningful pixels" during inference time.

        By cropping the image you are creating more meaningful pixels (the ratio of those against all pixels in the image).

        luxd If so, how do I achieve a crop of the original image (at the dimensions set via setResolution) before it's downsized to the preview resolution (set via setPreviewSize)?

        Set preview size will change the aspect ratio of the image unless specified size is 16:9, then only downsize will (again, depends on the size) happen.
        You can set your preview size to be 1080p, then crop the image using ImageManip node.

        Also explained here.

        Hope this helps,
        Jaka

        • luxd replied to this.

          jakaskerl Thank you, this does help. I'll explore setting a higher preview dimension and then crop that as I believe that will give me the best result I'm after.

          • luxd replied to this.

            luxd OK, your suggested approach improved the situation (thanks again), but I have a few questions:

            1. Can setPreviewSize be reset dynamically or only when a pipeline is configured? I tried, but it seemed to have no effect.
            2. Does the ImageManipConfig.setResize use a different algorithm than setPreviewSize when downscaling? Seems like yes.

            I ask both of these questions as setResize (downscaling) resulted in noticeably jaggyness/aliasing for the final output image that I feed to my NN. When the downscale happens via setPreviewSize this jaggyness/aliasing is not present.

            For the moment I've found a middle ground of initial values and editable runtime clamped values for my user to crop to the area of interest. This way I have more meaningful pixels that aren't upscaled and lossy but instead always either a perfect fit or downscaled with minimal jaggyness/aliasing. However, the current ideal situation seems to be that if I could have user input result in resetting the setPreviewSize before setCenterCrop ing I'd get the results I want without the jaggyness/aliasing.

              Hi luxd

              1. To my knowledge, the preview size cannot be changed during runtime.

              2. I'll ask the team as I am not completely sure why the difference. Were both resizes originally the same image? I though it could have something to do with the width being a factor of 32.

              Thanks,
              Jaka

              • luxd replied to this.

                jakaskerl Thank you.

                1. This is what it looks like to me. Who could 100% confirm this?
                2. Great, I look forward to the response. I perused the source myself (setPreviewSize and setResize) but I only found hard assignments and not algorithm implementations like I expected. Each crop and resize was from a different pipeline setup that simply tested 720, 704, 512, and 384 square resolutions for setPreviewSize.

                It seems that the higher the initial setPreviewSize before a crop and resize, the greater the jaggyness/aliasing. If I could setPreviewSize dynamically as to only setCenterCrop without a setResize it seems I'd get the best result. If however you can confirm that both setPreviewSize and setResize use the same algorithm then I'll know something else is going on.

                  Hi luxd
                  Back with info.
                  Apparently both methods should be identical and should run on the same hardware accelerator.
                  The trick is to first use setResize() and then setCenterCrop(), then image should look the same as with setPreviewSize().

                  Example from SDK:

                  if self._ar_resize_mode == ResizeMode.CROP:
                              self.image_manip.initialConfig.setResize(self._size)
                              # With .setCenterCrop(1), ImageManip will first crop the frame to the correct aspect ratio,
                              # and then resize it to the NN input size
                              self.image_manip.initialConfig.setCenterCrop(1, self._size[0] / self._size[1])

                  Thanks,
                  Jaka

                  @jakaskerl Regarding the above setResize followed by setCenterCrop calls, if I change setCenterCrop to setCropRect I get a crash. Through some digging, it looks like I need two ImageManipNodes (at least so far I don't get the crash with this approach).

                  1. When and when do you not need multiple ImageManipNodes? This isn't clear unless I'm missing something in the docs.

                  2. Assuming I do in fact need two ImageManipNodes, I'm able to get the initialConfig to work via the below, but then when I want to update the manip nodes based on user input it doesn't work as expected. Do you see anything that I'm missing or doing wrong?

                       manipResize->initialConfig.setResize(resizeDimensionDefault, resizeDimensionDefault);
                       manipResize->setMaxOutputFrameSize(colorCam->getResolutionHeight() * colorCam->getResolutionWidth() * 3);
                       
                       manipCrop->initialConfig.setCropRect(GetCropCoordinates());
                       manipCrop->setMaxOutputFrameSize(colorCam->getResolutionHeight() * colorCam->getResolutionWidth() * 3);

                    Here is the post initalization related code that executes on user input:

                       dai::ImageManipConfig cfgResize;
                       cfgResize.setResize(resizeDimension, resizeDimension);
                       configQueue->send(cfgResize);
                       
                       dai::ImageManipConfig cfgCrop;
                       cfgCrop.setCropRect(GetCropCoordinates());
                       configQueue->send(cfgCrop);