Hi,
To get the Z coordinate (depth) of each tracked person, you need to enable stereo depth and read the depth at the center of each detection’s bounding box. (Or take an average of a few pixels to make it better)
Since this example uses the SDK, the best approach is to use a callback (e.g. on tracked detections), fetch the latest depth frame, and sample the depth at the center of the bounding box.
The DepthAI SDK provides a way to do this via its trigger/callback system — for reference, see this: Custom Trigger.
Hope this helps!