Hi everyone, I am a technician working for a public transport bus company in Greece (Astiko KTEL), and we are developing an automated passenger counting system for our fleet. We have the hardware set up and working, but we need some guidance on the best AI model and pipeline logic for our specific environment. **Our Hardware & OS Setup:** * **Camera:** OAK-D-Lite * **Host:** Raspberry Pi 4 (running standard Debian 12 / Bookworm, not the Luxonis pre-built image). We installed the `depthai` library and dependencies manually and everything communicates perfectly. * **Power Supply:** A robust DC-DC Step-Down Converter (25W, 8-90V to 5V 5A Type-C) connected directly to the bus battery, so we have completely eliminated any voltage drop/brownout issues during AI inference spikes. * **Display:** The RPi communicates via I2C with an LCD 2004 screen on the dashboard to show real-time counts to the driver. **The Environment & Placement:** * The OAK-D-Lite will be mounted directly above the bus doors, facing straight down towards the floor. * The mounting height is exactly **2.10 meters**. * Because of the top-down angle, the camera will primarily see the **top of the head and the shoulders** of the passengers entering or exiting. It will rarely see full faces or full bodies. **Our Goal:** We want to count passengers crossing a virtual line (In / Out). **What we have tried so far (Desk Testing):** 1. `person-detection-retail-0013`: Struggled at close range because it seems to look for full-body silhouettes. 2. `face-detection-retail-0004`: Works flawlessly and locks on instantly, but since passengers won't be looking up at the camera in the bus, we are worried it might miss them if it only sees the top of their hair/heads. 3. `person-detection-asl-0001`: Gave us a lot of false positives (detecting random objects as people) during our indoor tests, though we haven't tested it in the actual bus yet. 4. We recently looked into the DepthAI SDK (Line Counter node), which looks very promising. **Our Questions:** 1. What is the recommended pre-compiled model (blob) specifically for **top-down head/shoulder tracking** from a 2.10m height? Is there a dedicated "head detection" model that works better than the retail person detection for this angle? 2. Would you recommend using the raw DepthAI API with `ObjectTracker` (e.g., ZERO_TERM_COLOR_HISTOGRAM or SHORT_TERM_IMAGELESS), or is it more robust to use the newer **DepthAI SDK** with the `line_counter` feature for a bumpy bus environment? 3. Are there any specific threshold configurations (Confidence, Tracker settings) you recommend to avoid false positives from the floor/shadows when the doors open? Thank you in advance for your time and guidance! Any code snippets or specific node suggestions for the top-down counting pipeline would be greatly appreciated.

Hey @"FotisGkogkas"#p33441 - I’d start with the stock SCRFD person pipeline plus tracking, not face detection. For a 2.10 m top-down bus-door view, the best first references are [people-tracker](https://github.com/luxonis/oak-examples/tree/main/neural-networks/object-tracking/people-tracker), [people-counter](https://github.com/luxonis/oak-examples/tree/main/neural-networks/counting/people-counter), and the lower-level [ObjectTracker example](https://github.com/luxonis/depthai-core/blob/main/examples/python/ObjectTracker/object_tracker.py). I would also test the depth-based approach here: [depth-people-counting](https://github.com/luxonis/oak-examples/tree/main/neural-networks/counting/depth-people-counting). cc @"KlemenSkrlj"#862 Thanks, Oskar

model/pipeline Top-Down Passenger Counting (In/Out) on Buses?

FotisGkogkas

Hi everyone,

I am a technician working for a public transport bus company in Greece (Astiko KTEL), and we are developing an automated passenger counting system for our fleet. We have the hardware set up and working, but we need some guidance on the best AI model and pipeline logic for our specific environment.

Our Hardware & OS Setup:

Camera: OAK-D-Lite
Host: Raspberry Pi 4 (running standard Debian 12 / Bookworm, not the Luxonis pre-built image). We installed the depthai library and dependencies manually and everything communicates perfectly.
Power Supply: A robust DC-DC Step-Down Converter (25W, 8-90V to 5V 5A Type-C) connected directly to the bus battery, so we have completely eliminated any voltage drop/brownout issues during AI inference spikes.
Display: The RPi communicates via I2C with an LCD 2004 screen on the dashboard to show real-time counts to the driver.

The Environment & Placement:

The OAK-D-Lite will be mounted directly above the bus doors, facing straight down towards the floor.
The mounting height is exactly 2.10 meters.
Because of the top-down angle, the camera will primarily see the top of the head and the shoulders of the passengers entering or exiting. It will rarely see full faces or full bodies.

Our Goal: We want to count passengers crossing a virtual line (In / Out).

What we have tried so far (Desk Testing):

person-detection-retail-0013: Struggled at close range because it seems to look for full-body silhouettes.
face-detection-retail-0004: Works flawlessly and locks on instantly, but since passengers won't be looking up at the camera in the bus, we are worried it might miss them if it only sees the top of their hair/heads.
person-detection-asl-0001: Gave us a lot of false positives (detecting random objects as people) during our indoor tests, though we haven't tested it in the actual bus yet.
We recently looked into the DepthAI SDK (Line Counter node), which looks very promising.

Our Questions:

What is the recommended pre-compiled model (blob) specifically for top-down head/shoulder tracking from a 2.10m height? Is there a dedicated "head detection" model that works better than the retail person detection for this angle?
Would you recommend using the raw DepthAI API with ObjectTracker (e.g., ZERO_TERM_COLOR_HISTOGRAM or SHORT_TERM_IMAGELESS), or is it more robust to use the newer DepthAI SDK with the line_counter feature for a bumpy bus environment?
Are there any specific threshold configurations (Confidence, Tracker settings) you recommend to avoid false positives from the floor/shadows when the doors open?

Thank you in advance for your time and guidance! Any code snippets or specific node suggestions for the top-down counting pipeline would be greatly appreciated.

OskarSonc

Hey FotisGkogkas - I’d start with the stock SCRFD person pipeline plus tracking, not face detection. For a 2.10 m top-down bus-door view, the best first references are people-tracker, people-counter, and the lower-level ObjectTracker example.

I would also test the depth-based approach here: depth-people-counting.

cc @KlemenSkrlj
Thanks,
Oskar

FotisGkogkas

OskarSonc

Device: OAK-D-Lite-AF

Mounting: top-down above bus door

Camera height: 2.05 m from the floor

Environment: real bus entrance/exit, with poles/door structures near ROI, fast passenger flow, occasional simultaneous crossings

Goal: robust IN/OUT counting for production bus use

What we tested today (based on your advice)

We switched to depth-based counting (line crossing) instead of relying on RGB-only foreground.

We tested multiple depth windows (near/far thresholds), ROI widths/heights, and line-crossing parameters.

We tested:

pure depth mask contouring

depth mask + background subtraction (MOG2 on depth mask) to remove static structures

We tuned debounce/duplicate suppression, track matching distance, and track persistence.

We verified service-side logs and added runtime snapshots to inspect what the camera sees in real operation.

Current main issue

Even when the camera clearly sees the scene, we still get unstable behavior:

missed OUT events

occasional duplicate IN events

unstable blob behavior depending on thresholds/timing

some “OUT ignored floor@0” events

In the latest runs, the system counts some events, but still has significant misses and duplicates (not reliable enough for production).

Problem summary (critical detail):

The repeated false IN/OUT events are occurring even when only ONE person is moving through the door area (me, during testing).

So the issue is not caused by multiple passengers at once in these test runs.

A single-person crossing still produces duplicate/incorrect event sequences.

What we need from you

Could you please help with one of the following:

Recommended official baseline algorithm/config for top-down bus-door counting at ~2.05 m height (with poles/door edges in view), or

Point out likely mistakes in our approach and parameters, with concrete corrections, especially for:

depth range selection

ROI shape/placement

line-crossing strategy

tracker settings for fast bidirectional crossings

static structure suppression (poles/door frames)

If you have an internal/official “ready” template for this use case (bus doorway passenger flow), we would strongly prefer that.

One more critical question: Would changing the camera mounting from horizontal (landscape) to vertical (portrait) be a key move for this setup? The logic is that by aligning the longer axis of the sensor with the passenger’s path, the system would capture the movement over a longer distance, resulting in more frames per passenger for the tracker to analyze. If you think this is a superior approach, could you please provide a recommended configuration or pipeline for a vertical top-down installation so I can compare and implement it?

Please download the files to see how its the setup until now. i use raspberry pi 4 2gb

https://we.tl/t-x4hOc324VdPc4Au3

OskarSonc

FotisGkogkas
Thanks for sending the files and debug snapshot. After reviewing them, the main thing I noticed is that your current pipeline is already quite far from a stock Luxonis baseline: it is using a custom depth band -> mask -> MOG2/background subtraction -> contour blobs -> custom tracking -> line crossing stack. From the snapshot, the ROI also still contains strong static structures from the doorway/pole area, so the current instability looks more like blob/track fragmentation than a simple threshold problem.

We do not currently have an official production-ready “bus doorway passenger counter” template for this exact setup. The closest premade baseline on our side is the depth-based example here: depth-people-counting. For an RGB/tracker baseline, the closest stock examples are people-tracker, people-counter, and the lower-level ObjectTracker example.

So if you want the most stable reference point, we would suggest comparing your current custom pipeline against depth-people-counting first, with a tighter ROI and less doorway structure inside the active area. On the mounting question: portrait orientation is a reasonable thing to test, because it can give more travel distance in the direction of motion and improve line-crossing stability, but it is not a guaranteed fix by itself.