Multi-camera calibration - OAK-D-Pro W faulty translation matrix

marlu · Jun 27, 2023

Hi there @Nejc-Luxonis ,
I try to run a multi camera calibration by launching main.py. I first validated the calibration on my OAK-D-Pro camera and I confirm a reasonable result. Later on, I tried the same procedure with OAK-D-Pro W camera. The rotation vector looks promising, but the transformation vector shows unreal values. See the values comparison of both cameras calibrated with marker in the same position:

OAK-D-Pro:
Rotation vector:
[[-0.06005594]
[ 0.09223808]
[ 0.0277156 ]]

Translation vector:
[[-0.03522526]
[-0.04736168]
[ 0.70119316]]

OAK-D-Pro W
Rotation vector:
[[0.00374177]
[0.00809129]
[0.02615565]]

Translation vector:
[[-1.55830205]
[-0.91438703]
[ 2.036714 ]]

Any opinions on potential reason ? Thank you

erik · Jun 27, 2023

Hi marlu , Could you elaborate on why those would be unreal values?

marlu · Jun 27, 2023

Hi @erik , because I physically measured and verified that value shown by OAK-D-Pro calibration is correct. But when OAK-D-Pro W is placed at the same position, the translation vector is apparently wrong.

marlu · Jun 29, 2023

@erik @Nejc-Luxonis Is this calibration example tested also with cameras facing other directions? Every camera would find its relative transformation to its own printed pattern, while the transformation between two patterns is known.

erik · Jun 30, 2023

Hi marlu ,
It might be that you'd need to undistort the stream, not sure if cv2 provides correct extrinsics if the image is warped.. About second question - current scripts don't support such use-case, but we could add it in the future. So you have 2 calibration boards at a known distance/rotation, and have multiple OAK cameras looking at them, and you'd want to get extrinsics of each cameras based on each other camera?

marlu · Jun 30, 2023

@erik Correct, my calibration setup is visualized on the following scheme. If CAM1 <> Pattern1 and CAM2 <> Pattern2 transforms are correct, I can easily compute the transformation between cameras. I interchanged OAK-D-Pro W to standard OAK-D-Pro, although the computed transform is still quite different to distances approximately measured by the distance measuring tool.

Ppfgnoobi · Feb 19, 2024

@marlu @erik do you have any updates on the Multi-Camera Calibration of cameras not facing the same view?
And did you find a way to undistort the image? i tried to use OAK-D-W POE, faceing the same issue with wrong vectors with calibration. (i´m using the depthai-experiments/gen2-multiple-devices examples)

jakaskerl · Feb 19, 2024

Hi @pfgnoobi
Could you add the matrices you received from the multi cam calibration and the intrinsic matrix/distortion coefficients from the eeprom?
Some visual aid and ground truth will help as well.

Thanks,
Jaka

erik · Feb 19, 2024

Hi @pfgnoobi ,
We haven't yet pursued it, but likely will in the upcoming months. Regarding the undistortion - could you elaborate on the question? Like wide FOV img undistortion?

Ppfgnoobi · Feb 19, 2024

Hi,
so to give some more context:
I´m trying to get the position data (floor x/y) of multiple people walking through a light installation built inside a 30' shipping container. My first Idea was to just use a single OAK-D-W-POE OV9782, mounted in the middle-top facing straight down. Not working as would need to train my own person detection model (available ones don´t like birdseye cam data). So my new test setup would look like this or maybe just 4x OAK-D, one in each corner?

Problem with new Setup:
Multicam Calibration not working, as calibration grid placed in the middle of the container is too small do recognice. Need to place in the middle so all cam's can see it.

@erik Problem combining OAK-D-W-POE and OAK-D-lite with depthai-experiments/gen2-multiple-devices/spatial-detection-fusion:
OAK-D-W not lining up with OAK-D-lite, see screenshots bottom half, Bird's Eye View only showing OAK-D-lite position (OAK-D-wide position data likely somewhere off screen). I have the feeling this is due to the 150° FOV.
Do in need to add some wide FOV img undistortion in the spatial-detection or calibration process? If yes, where and what would i need to add.

*top two pics are from multicam-calibration process, rest is from running the spatial-fusion-detection

jakaskerl
i uploaded the files here, .json was not allowed in the forum*

https://we.tl/t-h170S5oKo7

thanks for your help, I don´t really know what i´m doing, just getting startet with computer vision. Still a noob.
Also please let me know if you know a different approach to the final person position data goal. Maybe also just 2-3 Webcams on top facing straight down and some sort of blob track?

erik · Feb 20, 2024

Hi @pfgnoobi ,
Could you share some screenshots by just using depth cameras instead, as mentioned in the email?
luxonis/depthai-experimentstree/master/gen2-depth-people-counting#depth-people-counting

Do in need to add some wide FOV img undistortion in the spatial-detection or calibration process?

Only if the NN requires undistorted images, then you should use Camera node and use undistortion:
https://docs.luxonis.com/projects/api/en/latest/samples/Camera/camera_undistort/#undistort-camera-stream

OAK-D-W not lining up with OAK-D-lite, see screenshots bottom half

What would be the point of the multi-cam calibration if that's the case? In the example above, it's for spatial detection fusion - as multiple cameras detect the same object (and its 3d detection), to fuse it together for better accuracy.
Thanks, Erik

Ppfgnoobi · Feb 20, 2024

erik thanks, i did some birdseyeview depth testing with the depthai viewer. It seems like i have visible hight changes by moving within 4.5-4.6m width on groundarea. This massive reduction in HFOV is due to the need to rectify the monocolorimages for depth/disparity calculation, if i understood correctly.
So the 127°HFOV (here 10m) gets rectified to 96° (measured 5.5m) and usable depth HFOV is 84° (my measured) 4.6m. Also tested with OAK-D, mounted the same way, which resulted in usable height changes within an area arround HFOV 70°.

So to cover a groundarea with width of 12m, i would need 3x OAK-D-W or 4x OAK-D mounted arround 2.5-2.6m height.
I will do some further testing and also reading throught the configuring-stereo-depth docs to hopefully pimp the results.

What is your thought after seeing the Screenshots?
Is top-view depth map the way to proceed?
Would this usecase be better suitable for OAK-D-W-POE?

3 more from depth-preview (depthai-python example)

What would be the point of the multi-cam calibration if that's the case?

sry i meant OAK-D-W person tracking position not lining up with OAK-D-lite person tracking position. In Bird#s Eye View there is only the magenta marker (OAK-D) visible, the yellow one is somewhere outside render area. But no need to investigate further.

big thanks for your support

erik · Feb 21, 2024

Hi @pfgnoobi ,

rectify the monocolorimages for depth/disparity calculation, if i understood correctly.

Yes, that's correct, some FOV is lost due to rectification, but you can configure that with stereo alpha param, example here (--alpha): https://docs.luxonis.com/projects/api/en/latest/samples/StereoDepth/stereo_depth_video/#stereo-depth-video

I'd also run detection on depth frames / colorized frames (like blob detection + thresholding) instead of pointcloud.
Thoughts?

Ppfgnoobi · Feb 22, 2024

Hi @erik,
thanks, i tried to implement this alpha param thing into depth-people-counting example (you suggested earlier).
It worked and i got much closer to my final needs and also slowly understand a bit more.

This are my current results, looks quite okay. Will play with some thresholding to run detection more close to groundlevel (e.g Person knee level) so i get more real x/y coordinates. Or do you also have some ideas on improving this accuracy this?

And to double check if i got the workings of depth-people-counting correct:

Replay.Pipeline loads the Videorecordings and sends qued up frames from Host to Device.
Device creates depth frames send to host
Host does some depth value stretching? why do you need to use this disparity multiplier thing? Is it because different devices have different disparityresolution?
Host creates copy of depthFrame and colors it with colormap (rgbDepth)
depthFrame gets cropped to area of intrest and some openCV stuff to create and detect blob
Detections and rgbDepth get sent to Device into ObjectTracker node
ObjecTracker node creates Tracklet and sends data to Host
Host calculates center and moved distance (moved left/right)

did i read this example code correct?

Can i implement this openCV stuff from Step 5. into the device? using a Script Node?
This process seems not really efficient for realtime stuff, sending stuff back and forth multiple times.
Over the weekend, i will hopefully figure out how to implement the live camara into the example. So maybe this back and forth is nowthing ot worry about.

big thanks

erik · Feb 22, 2024

Hi @pfgnoobi ,

On accuracy: I'd apply https://docs.luxonis.com/projects/api/en/latest/tutorials/configuring-stereo-depth/#brightness-filter and then lower confidence threshold. You will have less depth points, but they will be more accurate, which can help with the noise. It's mostly trial & error here.

depthFrame = (depthFrame*disparityMultiplier).astype(np.uint8)

Do you mean this? It's because disparity has values between 0..95 (standard disparity search, or 0..190 with extended, or a lot more if using subpixel mode), and you want to normalize it to 0..255 for nicer visualization.

Detections and rgbDepth get sent to Device into ObjectTracker node

Yes, data flow is quite hacky, we could also do object tracking on the host, but we just utilized as much computation on the OAK in this example.

And yes, the whole flow you described is correct.

Can i implement this openCV stuff from Step 5. into the device? using a Script Node?

Please have a look here:
https://docs.luxonis.com/en/latest/pages/tutorials/on-device-programming/
It would likely be possible to make custom NN that does the CV stuff, but would take more time. With script node it would be way too slow.

Thanks, Erik

Ppfgnoobi · Mar 4, 2024

Hi @erik,
thanks a lot, also for pointing me to custom NN - seems to be quite intresting to try after i got these "basics" working.

The last days i did some more testing and tried to get depth-people-counting working with live camera stream, not via recorded video. I have a strange error which i´m not able to fix. When i create the xinFrame = pipeline.createXLinkIn() pipeline Depth looks like Picture one. If i comment out the 2 lines after #Linking (see code below) the depth looks normal. Why is this strange thing happening?
(down below is only reduced projectcode)

import cv2
import numpy as np
import depthai as dai

device = dai.Device()
print("Creating Stereo Depth pipeline")
pipeline = dai.Pipeline()

camLeft = pipeline.create(dai.node.MonoCamera)
camRight = pipeline.create(dai.node.MonoCamera)

camLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
camRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)

### Define stereoDepth node and create outputs
#---
stereo = pipeline.create(dai.node.StereoDepth)

#Set StereoDepth config
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)  # KERNEL_7x7 default, 5x5, 3x3, MEDIAN_OFF
stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
stereo.setLeftRightCheck(True)
#stereo.setExtendedDisparity(extended)
stereo.setSubpixel(True)

#Alpha scaling to use 'full' FOV
stereo.setAlphaScaling(0.2)
config = stereo.initialConfig.get()
config.postProcessing.brightnessFilter.minBrightness = 0
stereo.initialConfig.set(config)

xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutDepth.setStreamName("depthOut")

#Define connection between nodes
#--
camLeft.out.link(stereo.left)
camRight.out.link(stereo.right)
#stereo.syncedLeft.link(xoutLeft.input)
#stereo.syncedRight.link(xoutRight.input)
stereo.disparity.link(xoutDepth.input)

#tracking on Device
objectTracker = pipeline.createObjectTracker()
objectTracker.inputTrackerFrame.setBlocking(True)
objectTracker.inputDetectionFrame.setBlocking(True)
objectTracker.inputDetections.setBlocking(True)
objectTracker.setDetectionLabelsToTrack([1])  # track only person
# possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS
objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
# take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.UNIQUE_ID)

# Linking
xinFrame = pipeline.createXLinkIn()    #------------------- adding
xinFrame.setStreamName("frameIn")     #-------------------- this creates ugly depth
xinFrame.out.link(objectTracker.inputDetectionFrame)

cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
cvColorMap[0] = [0, 0, 0]
print("Creating DepthAI device")

with device:
    device.startPipeline(pipeline)
    q = device.getOutputQueue(name="depthOut", maxSize=4, blocking=False)

    while True:

        name = q.getName()
        depthFrame = q.get().getCvFrame()

        depthRgb = getDisparityFrame(depthFrame, cvColorMap)

        cv2.imshow(name, depthRgb)
        if cv2.waitKey(1) == ord("q"):
            break

jakaskerl · Mar 5, 2024

Hi @pfgnoobi
Not able to reproduce the issue. The code you have sent does not utilize the object tracker and doesn't have getDisparityFrame defined.

You are linking input to ObjectTracker which should not have an effect on the disparity output.

Thanks,
Jaka

Ppfgnoobi · Mar 5, 2024

Hi @jakaskerl,
sry here is the full code. This code results in a strange depth map.
If you also comment out the Linking part, everything looks nice.
All the other parts which are comment out are just to find the trouble maker code part. Thanks for your help

import cv2
import numpy as np
import depthai as dai

DETECTION_ROI = (20,130,650,240) # x,y,w,h 250,130,450,240
THRESH_DIST_DELTA = 0.2 #thrsh minimum distance to get counted

def getDisparityFrame(frame, cvColorMap):
    maxDisp = stereo.initialConfig.getMaxDisparity()
    disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
    disp = cv2.applyColorMap(disp, cvColorMap)

    return disp

class TextHelper:
    def __init__(self) -> None:
        self.bg_color = (0, 0, 0)
        self.color = (255, 255, 255)
        self.text_type = cv2.FONT_HERSHEY_SIMPLEX
        self.line_type = cv2.LINE_AA
    def putText(self, frame, text, coords):
        cv2.putText(frame, text, coords, self.text_type, 1.3, self.bg_color, 5, self.line_type)
        cv2.putText(frame, text, coords, self.text_type, 1.3, self.color, 2, self.line_type)
        return frame
    def rectangle(self, frame, topLeft,bottomRight, size=1.):
        cv2.rectangle(frame, topLeft, bottomRight, self.bg_color, int(size*4))
        cv2.rectangle(frame, topLeft, bottomRight, self.color, int(size))
        return frame

def to_planar(arr: np.ndarray) -> list:
    return arr.transpose(2, 0, 1).flatten()


class PeopleCounter:
    def __init__(self):
        self.tracking = {}
        self.lost_cnt = {}
        self.people_counter = [0,0,0,0] # Up, Down, Left, Right

    def __str__(self) -> str:
        return f"Left: {self.people_counter[2]}, Right: {self.people_counter[3]}"

    def tracklet_removed(self, coords1, coords2):
        deltaX = coords2[0] - coords1[0]
        print('Delta X', deltaX)

        if THRESH_DIST_DELTA < abs(deltaX):
            self.people_counter[2 if 0 > deltaX else 3] += 1
            direction = "left" if 0 > deltaX else "right"
            print(f"Person moved {direction}")

    def get_centroid(self, roi):
        x1 = roi.topLeft().x
        y1 = roi.topLeft().y
        x2 = roi.bottomRight().x
        y2 = roi.bottomRight().y
        return ((x2+x1)/2, (y2+y1)/2)

    def new_tracklets(self, tracklets):
        for t in tracklets:
            # If new tracklet, save its centroid
            if t.status == dai.Tracklet.TrackingStatus.NEW:
                self.tracking[str(t.id)] = self.get_centroid(t.roi)
                self.lost_cnt[str(t.id)] = 0
            elif t.status == dai.Tracklet.TrackingStatus.TRACKED:
                self.lost_cnt[str(t.id)] = 0
            elif t.status == dai.Tracklet.TrackingStatus.LOST:
                self.lost_cnt[str(t.id)] += 1
                # Tracklet has been lost for too long
                if 10 < self.lost_cnt[str(t.id)]:
                    self.lost_cnt[str(t.id)] = -999
                    self.tracklet_removed(self.tracking[str(t.id)], self.get_centroid(t.roi))
            elif t.status == dai.Tracklet.TrackingStatus.REMOVED:
                if 0 <= self.lost_cnt[str(t.id)]:
                    self.lost_cnt[str(t.id)] = -999
                    self.tracklet_removed(self.tracking[str(t.id)], self.get_centroid(t.roi))


device = dai.Device()
print("Creating Stereo Depth pipeline")
pipeline = dai.Pipeline()

camLeft = pipeline.create(dai.node.MonoCamera)
camRight = pipeline.create(dai.node.MonoCamera)

camLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
camRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)

### Define stereoDepth node and create outputs
#---
stereo = pipeline.create(dai.node.StereoDepth)

#Set StereoDepth config
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)  # KERNEL_7x7 default, 5x5, 3x3, MEDIAN_OFF
stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
stereo.setLeftRightCheck(True)
#stereo.setExtendedDisparity(extended)
stereo.setSubpixel(True)

#Alpha scaling to use 'full' FOV
stereo.setAlphaScaling(0.2)
config = stereo.initialConfig.get()
config.postProcessing.brightnessFilter.minBrightness = 0
stereo.initialConfig.set(config)

xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutDepth.setStreamName("depthOut")

#Define connection between nodes
#--
camLeft.out.link(stereo.left)
camRight.out.link(stereo.right)
#stereo.syncedLeft.link(xoutLeft.input)
#stereo.syncedRight.link(xoutRight.input)
stereo.disparity.link(xoutDepth.input)

#tracking on Device
objectTracker = pipeline.createObjectTracker()
objectTracker.inputTrackerFrame.setBlocking(True)
objectTracker.inputDetectionFrame.setBlocking(True)
objectTracker.inputDetections.setBlocking(True)
objectTracker.setDetectionLabelsToTrack([1])  # track only person
# possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS
objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
# take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.UNIQUE_ID)

# Linking
xinFrame = pipeline.createXLinkIn()     #------this seems to add ...
xinFrame.setStreamName("frameIn")       #------- ... strange depth error, comment out and it looks fine
xinFrame.out.link(objectTracker.inputDetectionFrame)
'''
# Maybe we need to send the old frame here, not sure
xinFrame.out.link(objectTracker.inputTrackerFrame)

xinDet = pipeline.createXLinkIn()
xinDet.setStreamName("detIn")
xinDet.out.link(objectTracker.inputDetections)

trackletsOut = pipeline.createXLinkOut()
trackletsOut.setStreamName("trackletsOut")
objectTracker.out.link(trackletsOut.input)

'''

cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
cvColorMap[0] = [0, 0, 0]
print("Creating DepthAI device")

with device:
    device.startPipeline(pipeline)
    q = device.getOutputQueue(name="depthOut", maxSize=4, blocking=False)
    '''
    trackletsQ = device.getOutputQueue(name="trackletsOut", maxSize=4, blocking=False)
    detInQ = device.getInputQueue("detIn")
    frameInQ = device.getInputQueue("frameIn")
    '''
    disparityMultiplier = 255 / stereo.initialConfig.getMaxDisparity()

    text = TextHelper()
    counter = PeopleCounter()
    #bis hier
    while True:

        name = q.getName()
        depthFrame = q.get().getFrame()
        depthFrame = (depthFrame*disparityMultiplier).astype(np.uint8)    #use for depth
        depthRgb = getDisparityFrame(depthFrame, cvColorMap)
        '''
        trackletsIn = trackletsQ.tryGet()
        if trackletsIn is not None:
            counter.new_tracklets(trackletsIn.tracklets)

        # Crop only the corridor:

        cropped = depthFrame[DETECTION_ROI[1]:DETECTION_ROI[3], DETECTION_ROI[0]:DETECTION_ROI[2]]
        cv2.imshow('Crop', cropped)

        ret, thresh = cv2.threshold(cropped, 20, 145, cv2.THRESH_BINARY)
        cv2.imshow('thr', thresh)

        blob = cv2.morphologyEx(thresh, cv2.MORPH_OPEN,
                                cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (22, 22)))  # editet elipse size 37,37
        cv2.imshow('blob', blob)

        edged = cv2.Canny(blob, 20, 80)
        cv2.imshow('Canny', edged)

        contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

        dets = dai.ImgDetections()
        # len contorus is count of seperate heads/blobs
        if len(contours) != 0:
            c = max(contours, key=cv2.contourArea)
            x, y, w, h = cv2.boundingRect(c)
            # cv2.imshow('Rect', text.rectangle(blob, (x,y), (x+w, y+h)))
            x += DETECTION_ROI[0]
            y += DETECTION_ROI[1]
            area = w * h
            # print(len(contours), area)

            if 760 < area:
                # Send the detection to the device - ObjectTracker node
                det = dai.ImgDetection()
                det.label = 1
                det.confidence = 1.0
                det.xmin = x
                det.ymin = y
                det.xmax = x + w
                det.ymax = y + h
                dets.detections = [det]

                # Draw rectangle on the biggest countour
                text.rectangle(depthRgb, (x, y), (x + w, y + h), size=2.5)

        detInQ.send(dets)
        imgFrame = dai.ImgFrame()
        imgFrame.setData(to_planar(depthRgb))
        imgFrame.setType(dai.RawImgFrame.Type.BGR888p)
        imgFrame.setWidth(depthRgb.shape[0])
        imgFrame.setHeight(depthRgb.shape[1])
        frameInQ.send(imgFrame)
        '''
        text.rectangle(depthRgb, (DETECTION_ROI[0], DETECTION_ROI[1]), (DETECTION_ROI[2], DETECTION_ROI[3]))
        text.putText(depthRgb, str(counter), (20, 40))

        cv2.imshow(name, depthRgb)
        if cv2.waitKey(1) == ord("q"):
            break

jakaskerl · Mar 6, 2024

Hi pfgnoobi

Running script as is:

With commented out lines:

Are you sure you are not also changing socket so something. This looks like switched left and right.

Thanks,
Jaka

Ppfgnoobi · Mar 8, 2024

jakaskerl
I did some more testing but i can´t figure out where this error comes from.
I´m running excactly the same code and it gives me an ugly depth, as soon as i comment out the two lines after #Linking it works fine.

If i invert the sockets, like this and leave the rest of the code the same as in older post it correct depth. So issue fixed thanks

#Define connection between nodes #-- camLeft.out.link(stereo.right) camRight.out.link(stereo.left)