I'm trying to do the same thing, and have displayed the depth frame and depth bounding boxes as shown in the example provided. The bounding boxes do not seem to match the bounding boxes from the stretched camera preview. What suggestions would you have for fixing this? I'm thinking I could split the YoloSpatialDetectionNetwork into a YoloDetectionNetwork and a SpatialLocationCalculator, and warp the bounding box that is output from the YoloDetectionNetwork according to how the feed was stretched, and then feed that into the SpatialLocationCalculator. Does this make sense, and/or are there other possible solutions?