Neural Stereo Depth Estimation with LENS

Janez

High-Accuracy, On-Device Depth from Learned Stereo Matching

Depth perception is a core capability for spatial AI. At Luxonis, Neural Stereo compliments traditional hand-crafted stereo algorithms with a learned approach that delivers higher accuracy, better robustness, and superior visual quality—fully on-device.

At the center of this approach is LENS (Luxonis Edge Neural Stereo), our proprietary neural stereo architecture optimized for edge deployment.

Unlike many neural depth approaches that depend on cloud inference or high-power GPUs, LENS is purpose-built for OAK4, enabling real-time neural depth directly on the edge, with a single affordable device.

You can quickly review the quality of neural stereo depth clouds on our OAK4 landing page

What Is Neural Stereo?

Neural Stereo uses a deep neural network (DNN) to learn feature correspondence between a rectified stereo image pair. Instead of relying on manually tuned heuristics, the model learns complex matching behavior directly from data, enabling strong performance in challenging real-world environments, and improving the output disparity. This allows Neural Stereo to handle edge cases that consistently break classical pipelines without sacrificing determinism or physical grounding.

The LENS Architecture

LENS is a Luxonis-developed neural stereo matching model that applies state-of-the-art techniques while remaining tightly grounded in stereo geometry. Among edge-deployable neural stereo solutions, LENS leads in accuracy while running entirely on-device using onboard AI compute.

Inputs

Rectified left image
Rectified right image

Outputs

Subpixel disparity map
Confidence map
Edge detection map

Training

The model is trained on 300,000+ stereo image pairs with ground-truth disparity, including extensive data captured from OAK cameras, ensuring strong alignment with deployed use cases.

Resolution Variants

LENS is currently available in four variants with the fifth - the 3XL - being added shortly, ranging from low-resolution, high-speed models to full-resolution, high-accuracy models. This flexibility allows users to choose the best tradeoff between accuracy, latency, and compute.

Strengths

Superior visual quality with smooth, low-artifact depth maps
Excellent object separation, ideal for tracking and interaction
High fill rate, minimizing holes in the depth map
Low overall error across standard benchmarks
Passive stereo support, effective without active illumination
Superior performance in challenging environments like reflective surfaces, low-texture scenes, glare, and low light

Limitations

Reduced detail on small objects when using smaller variants
Overfilling of distant regions, such as sky areas
Higher latency and lower FPS for larger models
- ~100 ms for LENS Large
- 660 ms+ for 3XL full-resolution

Lower output resolution at high frame rates (240p–360p)
High AI compute usage for M, L, and 3XL variants, impacting parallel workloads

LENS vs. The Field

To understand the value of LENS, it is critical to compare it against the two existing standards in the industry: traditional algorithmic stereo and other neural methods.

1. LENS vs. Traditional Stereo (SGBM)

Traditional algorithms, such as Semi-Global Matching (SGBM), have long been the industry standard for edge devices because they are computationally cheap. However, they match pixels based purely on local brightness patterns.

The Traditional Flaw: Because they rely on local texture, traditional algorithms fail catastrophically on textureless surfaces (white walls), repetitive patterns (fences), or reflective surfaces (shiny floors).
The LENS Advantage: LENS utilizes learned semantic context. It understands that a white wall is a continuous surface even if it lacks texture, allowing it to fill in gaps that leave traditional algorithms blind.

2. LENS vs. Other Neural Methods

On the other end of the spectrum are other neural networks (e.g., Retinify, ESS, BANet). While often accurate, they usually suffer from a "resource tax."

The Competitor Flaw: Many competing neural stereo models are either too heavy (requiring desktop-class GPUs) or, if optimized for mobile, suffer significant accuracy drops.
The LENS Advantage: LENS hits the "sweet spot." It is the most performant model for its size, designed specifically to run fully on-device without needing external compute. As shown in our benchmarks, LENS matches or beats models running on unrestricted hardware (FP32) while running efficiently on the OAK edge accelerator.

Given identical stereo inputs, LENS always produces the same output. LENS consistently outperforms classical algorithms and competiting nerual stereo methods on benchmarks like Kitty and Middlebury, while reducing artifacts in low-texture and difficult regions.

Ideal Applications

Best overall quality: Ideal as the standard setting when robustness and overall accuracy is the priority.
Tracking people/hands: Well-suited for applications involving people and hands.
Challenging environments: Where low-textures, reflections and other adversary conditions for stereo depth are present, e.g. Garages or Warehouses.
Object dimensioning: Effective for depth estimation of sizable objects, such as boxes.

Configuration Options

Confidence threshold: Filters low-confidence pixels for cleaner depth
Edge threshold: Improves boundary sharpness and segmentation

Take a look at the Neural Depth documentation page to learn how to use neural depth and tweak configuration options.

Robustness, Determinism, and Explainability

While DNNs are often criticized as “black boxes,” Neural Stereo is fundamentally constrained by stereo geometry, which significantly reduces typical neural risks.

Stereo Matching Constraint

Depth is only produced when valid correspondence exists between left and right images.

No hallucination: Without real disparity, the model cannot invent depth
Traceability: Predictions are tied to local feature similarity and cost aggregation
Confidence estimation: Learned aleatoric uncertainty highlights unreliable regions

Adversarial Robustness

Unlike monocular depth models, Neural Stereo cannot be fooled by flat images or perspective illusions. Without real stereo disparity, matching costs remain high, preventing false 3D reconstruction.

Luxonis Edge Neural Stereo is resistant to hallucinations and adversary data

Compared to monocular depth estimation models

Geometry-First Learning

LENS focuses on learning:

Robust feature correspondences
Effective cost aggregation
Accurate disparity refinement

It avoids heavy reliance on high-level scene priors, making its behavior more deterministic, verifiable, and physically grounded.

Developer Quick Start: Running Neural Stereo with LENS

Neural Stereo is available now in the latest versions of OAK Viewer. Connect your OAK 4 via your network and try it out by selecting 'Neural 3D' in the left side menu. Developers can also easily try Neural Stereo depth estimation using depthai-core and the provided Python examples. The following steps walk through setting up a local environment and running a basic Neural Depth example on an OAK device.

Prerequisites

Python 3.8+
An OAK 4 D camera connected to your system
macOS or Linux (Windows users can adapt the steps accordingly)

Setup and Run

Clone the DepthAI core repository and set up a virtual environment:

git pull https://github.com/luxonis/depthai-core
cd depthai-core
python3 -m venv venv # macOS / Linux
source venv/bin/activate # macOS / Linux
python examples/python/install_requirements.py

Once dependencies are installed, run the Neural Stereo depth example:

python examples/python/NeuralDepth/neural_depth.py

There are multiple examples at your disposal:

Neural Depth Minimal - Minimal example showing basic NeuralDepth usage with disparity output visualization.

Neural Depth - Demonstrates the NeuralDepth node with runtime configuration of confidence threshold, edge threshold, and temporal filtering.
Neural Depth RGBD - Combines NeuralDepth with the RGBD node to generate a point cloud, viewable via remote connection.
Neural Depth Align - Demonstrates aligning NeuralDepth output to an RGB camera using the ImageAlign node.

Make sure to explore the NeuralDepth documentation pages for a deep dive into details.

Conclusion

Neural Stereo with LENS combines deep learning with strict geometric constraints to deliver exceptional depth quality on edge devices. For applications where accuracy, robustness, and reliability matter most, LENS sets the benchmark.

NickDeboar

Any plans to support this on Oak D Series 2 cameras? Even if it's just running locally on a PC, and not the camera it self?

Janez

NickDeboar
At this time there are no concrete plans to support Neural Depth on OAK-D Series 2. It may be possible to run it locally on a PC, but this would likely require an NDA or separate licensing, and a clearer understanding of the use case. If you are interested, you can reach out via email (support@ or sales@ luxonis.com) to discuss further.