Using Full Sensor Input with DepthAI, Avoiding Cropping for DepthaI and NN

AndreiMunteanu

I’m working on an embedded system using a Luxonis OAK camera and the DepthAI pipeline. The camera is the Oak D lite Auto Focus

One limitation I’ve consistently run into is related to how frames are passed from the camera sensor to the neural network node

The OAK-D Lite RGB sensor provides a native square-ish / 4:3 field of view, but in practice:

The pipeline commonly outputs a 16:9 preview
That preview is:
- either cropped from the full sensor
- or squished to match NN input

Then the NN node further resizes to fixed input (e.g. 640×640) and sometimes introducing distortion or additional cropping

The RGB sensor provides a native 4:3 field of view, but in practice the pipeline usually outputs a 16:9 preview. This preview is either cropped from the full sensor image or resized to match the neural network input size. So essentially what I have to work with is a smaller square inside the full square.

For my use case, this causes problems:

Cropping reduces the field of view, so people near the edges are missed
Resizing with distortion affects accuracy

What I want is simple:
Bypass that bottleneck of cropping to 16:9 and feed the RGB image (not the depth stuff) directly to the NN. Use the full RGB sensor frame and resize it uniformly to the neural network input size (for example 640x640), without cropping or distortion. (or at least with a minimal amount of cropping.

Typical configuration I tried:

$$
camRgb.setPreviewSize(640, 640)
camRgb.setPreviewKeepAspectRatio(False)
$$

This either distorts the image or still results in cropping.

So my question is:

Is it possible on OAK-D Lite to use the full sensor frame (not cropped to 16:9) and pass it to a neural network node with a clean resize?

I am considering using the ISP output instead of preview:

$$
camRgb.isp.link(manip.inputImage)
manip.initialConfig.setResize(640, 640)
$$

But I am not sure if the ISP output preserves the full field of view or if it is already cropped internally.

Also:

Is there a recommended way to avoid cropping completely in DepthAI?
Is letterboxing supported, or does it need to be implemented manually?

I am looking for the correct pipeline setup that:

uses the full sensor FOV
avoids cropping
avoids distortion
keeps inference and preview consistent
feeds the NN exactly what the camera sensor sees, again, with minimal crop or image stretch (none if possible)

Any guidance or example pipelines would help. Do you think that updating to the latest DepthAI version will solve my problem?

Thanks!

OskarSonc

Hey AndreiMunteanu - yes, this is possible, but not through camRgb.preview.

On DepthAI v2, the preview stream is usually derived from the video stream, and the video stream is cropped from ISP. So when you ask for a square preview, you usually end up with either a crop or a stretched image. If you want the full RGB field of view to go into the neural network, the usual approach is to use the ISP output and resize that with ImageManip.

The catch is that a 4:3 sensor image cannot become a 1:1 NN input without some compromise. You either crop, stretch, or letterbox. If you want to keep the full FOV without distortion, letterboxing is the right option. DepthAI supports that, but it is done through ImageManip rather than setPreviewKeepAspectRatio(False).

Relevant docs: Resolution Techniques for NNs, Camera node, ImageManip.

I do, however, suggest moving to DepthAi v3 since it is newer and more documented.

Thanks,
Oskar