High-Accuracy, On-Device Depth from Learned Stereo Matching
Depth perception is a core capability for spatial AI. At Luxonis, Neural Stereo compliments traditional hand-crafted stereo algorithms with a learned approach that delivers higher accuracy, better robustness, and superior visual quality—fully on-device.
At the center of this approach is LENS (Luxonis Edge Neural Stereo), our proprietary neural stereo architecture optimized for edge deployment.
Unlike many neural depth approaches that depend on cloud inference or high-power GPUs, LENS is purpose-built for OAK4, enabling real-time neural depth directly on the edge, with a single affordable device.
You can quickly review the quality of neural stereo depth clouds on our OAK4 landing page

What Is Neural Stereo?
Neural Stereo uses a deep neural network (DNN) to learn feature correspondence between a rectified stereo image pair. Instead of relying on manually tuned heuristics, the model learns complex matching behavior directly from data, enabling strong performance in challenging real-world environments, and improving the output disparity. This allows Neural Stereo to handle edge cases that consistently break classical pipelines without sacrificing determinism or physical grounding.
The LENS Architecture
LENS is a Luxonis-developed neural stereo matching model that applies state-of-the-art techniques while remaining tightly grounded in stereo geometry. Among edge-deployable neural stereo solutions, LENS leads in accuracy while running entirely on-device using onboard AI compute.
Inputs
- Rectified left image
- Rectified right image
Outputs
- Subpixel disparity map
- Confidence map
- Edge detection map


Training
The model is trained on 300,000+ stereo image pairs with ground-truth disparity, including extensive data captured from OAK cameras, ensuring strong alignment with deployed use cases.
Resolution Variants
LENS is currently available in four variants with the fifth - the 3XL - being added shortly, ranging from low-resolution, high-speed models to full-resolution, high-accuracy models. This flexibility allows users to choose the best tradeoff between accuracy, latency, and compute.
Strengths
- Superior visual quality with smooth, low-artifact depth maps
- Excellent object separation, ideal for tracking and interaction
- High fill rate, minimizing holes in the depth map
- Low overall error across standard benchmarks
- Passive stereo support, effective without active illumination
- Superior performance in challenging environments like reflective surfaces, low-texture scenes, glare, and low light

Limitations
Reduced detail on small objects when using smaller variants
Overfilling of distant regions, such as sky areas
Higher latency and lower FPS for larger models

Lower output resolution at high frame rates (240p–360p)
High AI compute usage for M, L, and 3XL variants, impacting parallel workloads
LENS vs. The Field
To understand the value of LENS, it is critical to compare it against the two existing standards in the industry: traditional algorithmic stereo and other neural methods.
1. LENS vs. Traditional Stereo (SGBM)
Traditional algorithms, such as Semi-Global Matching (SGBM), have long been the industry standard for edge devices because they are computationally cheap. However, they match pixels based purely on local brightness patterns.
- The Traditional Flaw: Because they rely on local texture, traditional algorithms fail catastrophically on textureless surfaces (white walls), repetitive patterns (fences), or reflective surfaces (shiny floors).
- The LENS Advantage: LENS utilizes learned semantic context. It understands that a white wall is a continuous surface even if it lacks texture, allowing it to fill in gaps that leave traditional algorithms blind.
2. LENS vs. Other Neural Methods
On the other end of the spectrum are other neural networks (e.g., Retinify, ESS, BANet). While often accurate, they usually suffer from a "resource tax."
- The Competitor Flaw: Many competing neural stereo models are either too heavy (requiring desktop-class GPUs) or, if optimized for mobile, suffer significant accuracy drops.
- The LENS Advantage: LENS hits the "sweet spot." It is the most performant model for its size, designed specifically to run fully on-device without needing external compute. As shown in our benchmarks, LENS matches or beats models running on unrestricted hardware (FP32) while running efficiently on the OAK edge accelerator.
Given identical stereo inputs, LENS always produces the same output. LENS consistently outperforms classical algorithms and competiting nerual stereo methods on benchmarks like Kitty and Middlebury, while reducing artifacts in low-texture and difficult regions.

Ideal Applications
- Best overall quality: Ideal as the standard setting when robustness and overall accuracy is the priority.
- Tracking people/hands: Well-suited for applications involving people and hands.
- Challenging environments: Where low-textures, reflections and other adversary conditions for stereo depth are present, e.g. Garages or Warehouses.
- Object dimensioning: Effective for depth estimation of sizable objects, such as boxes.


Configuration Options
Take a look at the Neural Depth documentation page to learn how to use neural depth and tweak configuration options.
Robustness, Determinism, and Explainability
While DNNs are often criticized as “black boxes,” Neural Stereo is fundamentally constrained by stereo geometry, which significantly reduces typical neural risks.
Stereo Matching Constraint
Depth is only produced when valid correspondence exists between left and right images.
No hallucination: Without real disparity, the model cannot invent depth
Traceability: Predictions are tied to local feature similarity and cost aggregation
Confidence estimation: Learned aleatoric uncertainty highlights unreliable regions
Adversarial Robustness
Unlike monocular depth models, Neural Stereo cannot be fooled by flat images or perspective illusions. Without real stereo disparity, matching costs remain high, preventing false 3D reconstruction.
Luxonis Edge Neural Stereo is resistant to hallucinations and adversary data

Compared to monocular depth estimation models

Geometry-First Learning
LENS focuses on learning:
Robust feature correspondences
Effective cost aggregation
Accurate disparity refinement
It avoids heavy reliance on high-level scene priors, making its behavior more deterministic, verifiable, and physically grounded.
Developer Quick Start: Running Neural Stereo with LENS
Neural Stereo is available now in the latest versions of OAK Viewer. Connect your OAK 4 via your network and try it out by selecting 'Neural 3D' in the left side menu. Developers can also easily try Neural Stereo depth estimation using depthai-core and the provided Python examples. The following steps walk through setting up a local environment and running a basic Neural Depth example on an OAK device.
Prerequisites
Setup and Run
Clone the DepthAI core repository and set up a virtual environment:
git pull https://github.com/luxonis/depthai-core
cd depthai-core
python3 -m venv venv # macOS / Linux
source venv/bin/activate # macOS / Linux
python examples/python/install_requirements.py
Once dependencies are installed, run the Neural Stereo depth example:
python examples/python/NeuralDepth/neural_depth.py
There are multiple examples at your disposal:
- Neural Depth Minimal - Minimal example showing basic NeuralDepth usage with disparity output visualization.
- Neural Depth - Demonstrates the NeuralDepth node with runtime configuration of confidence threshold, edge threshold, and temporal filtering.
- Neural Depth RGBD - Combines NeuralDepth with the RGBD node to generate a point cloud, viewable via remote connection.
- Neural Depth Align - Demonstrates aligning NeuralDepth output to an RGB camera using the ImageAlign node.
Make sure to explore the NeuralDepth documentation pages for a deep dive into details.
Conclusion
Neural Stereo with LENS combines deep learning with strict geometric constraints to deliver exceptional depth quality on edge devices. For applications where accuracy, robustness, and reliability matter most, LENS sets the benchmark.