I want to use OAK-D Pro PoE and YOLOv8 or 9 instance segmentation model to detect metal cubes:
1. Detect objects using ColorCamera
2. Calculate world coordinates using depthFrame of monoCameras.
To achieve this goal, I referred to the following documents.
## depthAI Reference Documents
1. multi-cam calibration
estimate extrinsics["cam_to_world"] using colorCamera intrinsic_mat @ (3840, 2160)
**intrinsic_mat is dependent of camera resolution.**
2. Calc spatials on host
extrac HFOV in dai.CameraBoardSocket.CAM_C.
estimate [X{cam}, Y{cam}, Z_{cam}] using depthFrame.
**HFOV is independent of camera resolution.**
3. rgb-depth aligned
set same resolution of colorCamera, monoCameras with 720P.
but colorCamera can't support 720P resolution, so setted 1080P
stereo DepthAlign to rgbCamSocket.
all cameras resolution has setted 1080P
4. YOLO Segment & Depth | OAK-D Pro PoE
## Developement Plan and Question
1. cameraPose.py generate extrinsics["cam_to_world"] with checkerboard.
get and save extrinsics["cam_to_world"]
but about What camera, ColorCamera or MonoCamera intrinsic_mat ?
and What resolution is setted 1080P, 720P ?
Or stereo's rectifiedRight or syncedRight in the same pipeline as main.py's pipeline?
2. main.py detect and estimate world coordinates.
host detect cube using rgbFrame.
- then I can use any size, any resolution model images of model input.
Get postprocessed depth with stereo DepthAlign to rgbCamSocket
but when we have got extrinsics["cam_to_world"] for CAM_C
Doesn't stereo DepthAlign affect extrinsics["cam_to_world"] matrix?
If I set depthAlign to CAM_A in stereoDepth, does the depth information become Z_cam of ColorCamers?
Estimate camera cooordinates using depthFrame, BBox, segementation mask.
Finally, camera coords are converted to world coords using extrinsics["cam_to_world"].
Are there any additional issues that need to be carefully considered in the above plan?