• DepthAI
  • Multi-camera calibration - OAK-D-Pro W faulty translation matrix

Hi there @Nejc-Luxonis ,
I try to run a multi camera calibration by launching main.py. I first validated the calibration on my OAK-D-Pro camera and I confirm a reasonable result. Later on, I tried the same procedure with OAK-D-Pro W camera. The rotation vector looks promising, but the transformation vector shows unreal values. See the values comparison of both cameras calibrated with marker in the same position:

OAK-D-Pro:
Rotation vector:
[[-0.06005594]
[ 0.09223808]
[ 0.0277156 ]]

Translation vector:
[[-0.03522526]
[-0.04736168]
[ 0.70119316]]

OAK-D-Pro W
Rotation vector:
[[0.00374177]
[0.00809129]
[0.02615565]]

Translation vector:
[[-1.55830205]
[-0.91438703]
[ 2.036714 ]]

Any opinions on potential reason ? Thank you

  • erik replied to this.

    Hi marlu , Could you elaborate on why those would be unreal values?

    Hi @erik , because I physically measured and verified that value shown by OAK-D-Pro calibration is correct. But when OAK-D-Pro W is placed at the same position, the translation vector is apparently wrong.

    @erik @Nejc-Luxonis Is this calibration example tested also with cameras facing other directions? Every camera would find its relative transformation to its own printed pattern, while the transformation between two patterns is known.

    • erik replied to this.

      Hi marlu ,
      It might be that you'd need to undistort the stream, not sure if cv2 provides correct extrinsics if the image is warped.. About second question - current scripts don't support such use-case, but we could add it in the future. So you have 2 calibration boards at a known distance/rotation, and have multiple OAK cameras looking at them, and you'd want to get extrinsics of each cameras based on each other camera?

      @erik Correct, my calibration setup is visualized on the following scheme. If CAM1 <> Pattern1 and CAM2 <> Pattern2 transforms are correct, I can easily compute the transformation between cameras. I interchanged OAK-D-Pro W to standard OAK-D-Pro, although the computed transform is still quite different to distances approximately measured by the distance measuring tool.

      8 months later

      @marlu @erik do you have any updates on the Multi-Camera Calibration of cameras not facing the same view?
      And did you find a way to undistort the image? i tried to use OAK-D-W POE, faceing the same issue with wrong vectors with calibration. (i´m using the depthai-experiments/gen2-multiple-devices examples)

      Hi @pfgnoobi
      Could you add the matrices you received from the multi cam calibration and the intrinsic matrix/distortion coefficients from the eeprom?
      Some visual aid and ground truth will help as well.

      Thanks,
      Jaka

        Hi @pfgnoobi ,
        We haven't yet pursued it, but likely will in the upcoming months. Regarding the undistortion - could you elaborate on the question? Like wide FOV img undistortion?

        Hi,
        so to give some more context:
        I´m trying to get the position data (floor x/y) of multiple people walking through a light installation built inside a 30' shipping container. My first Idea was to just use a single OAK-D-W-POE OV9782, mounted in the middle-top facing straight down. Not working as would need to train my own person detection model (available ones don´t like birdseye cam data). So my new test setup would look like this or maybe just 4x OAK-D, one in each corner?

        Problem with new Setup:
        Multicam Calibration not working, as calibration grid placed in the middle of the container is too small do recognice. Need to place in the middle so all cam's can see it.

        @erik Problem combining OAK-D-W-POE and OAK-D-lite with depthai-experiments/gen2-multiple-devices/spatial-detection-fusion:
        OAK-D-W not lining up with OAK-D-lite, see screenshots bottom half, Bird's Eye View only showing OAK-D-lite position (OAK-D-wide position data likely somewhere off screen). I have the feeling this is due to the 150° FOV.
        Do in need to add some wide FOV img undistortion in the spatial-detection or calibration process? If yes, where and what would i need to add.

        *top two pics are from multicam-calibration process, rest is from running the spatial-fusion-detection

        jakaskerl
        i uploaded the files here, .json was not allowed in the forum*

        https://we.tl/t-h170S5oKo7

        thanks for your help, I don´t really know what i´m doing, just getting startet with computer vision. Still a noob.
        Also please let me know if you know a different approach to the final person position data goal. Maybe also just 2-3 Webcams on top facing straight down and some sort of blob track?

        Hi @pfgnoobi ,
        Could you share some screenshots by just using depth cameras instead, as mentioned in the email?
        luxonis/depthai-experimentstree/master/gen2-depth-people-counting#depth-people-counting

        Do in need to add some wide FOV img undistortion in the spatial-detection or calibration process?

        Only if the NN requires undistorted images, then you should use Camera node and use undistortion:
        https://docs.luxonis.com/projects/api/en/latest/samples/Camera/camera_undistort/#undistort-camera-stream

        OAK-D-W not lining up with OAK-D-lite, see screenshots bottom half

        What would be the point of the multi-cam calibration if that's the case? In the example above, it's for spatial detection fusion - as multiple cameras detect the same object (and its 3d detection), to fuse it together for better accuracy.
        Thanks, Erik

          erik thanks, i did some birdseyeview depth testing with the depthai viewer. It seems like i have visible hight changes by moving within 4.5-4.6m width on groundarea. This massive reduction in HFOV is due to the need to rectify the monocolorimages for depth/disparity calculation, if i understood correctly.
          So the 127°HFOV (here 10m) gets rectified to 96° (measured 5.5m) and usable depth HFOV is 84° (my measured) 4.6m. Also tested with OAK-D, mounted the same way, which resulted in usable height changes within an area arround HFOV 70°.

          So to cover a groundarea with width of 12m, i would need 3x OAK-D-W or 4x OAK-D mounted arround 2.5-2.6m height.
          I will do some further testing and also reading throught the configuring-stereo-depth docs to hopefully pimp the results.

          What is your thought after seeing the Screenshots?
          Is top-view depth map the way to proceed?
          Would this usecase be better suitable for OAK-D-W-POE?



          3 more from depth-preview (depthai-python example)

          What would be the point of the multi-cam calibration if that's the case?

          sry i meant OAK-D-W person tracking position not lining up with OAK-D-lite person tracking position. In Bird#s Eye View there is only the magenta marker (OAK-D) visible, the yellow one is somewhere outside render area. But no need to investigate further.

          big thanks for your support 🙂

          Hi @pfgnoobi ,

          rectify the monocolorimages for depth/disparity calculation, if i understood correctly.

          Yes, that's correct, some FOV is lost due to rectification, but you can configure that with stereo alpha param, example here (--alpha): https://docs.luxonis.com/projects/api/en/latest/samples/StereoDepth/stereo_depth_video/#stereo-depth-video

          I'd also run detection on depth frames / colorized frames (like blob detection + thresholding) instead of pointcloud.
          Thoughts?

          Hi @erik,
          thanks, i tried to implement this alpha param thing into depth-people-counting example (you suggested earlier).
          It worked and i got much closer to my final needs and also slowly understand a bit more.

          This are my current results, looks quite okay. Will play with some thresholding to run detection more close to groundlevel (e.g Person knee level) so i get more real x/y coordinates. Or do you also have some ideas on improving this accuracy this?

          And to double check if i got the workings of depth-people-counting correct:

          1. Replay.Pipeline loads the Videorecordings and sends qued up frames from Host to Device.

          2. Device creates depth frames send to host

          3. Host does some depth value stretching? why do you need to use this disparity multiplier thing? Is it because different devices have different disparityresolution?

          4. Host creates copy of depthFrame and colors it with colormap (rgbDepth)

          5. depthFrame gets cropped to area of intrest and some openCV stuff to create and detect blob

          6. Detections and rgbDepth get sent to Device into ObjectTracker node

          7. ObjecTracker node creates Tracklet and sends data to Host

          8. Host calculates center and moved distance (moved left/right)

            did i read this example code correct?


            Can i implement this openCV stuff from Step 5. into the device? using a Script Node?
            This process seems not really efficient for realtime stuff, sending stuff back and forth multiple times.
            Over the weekend, i will hopefully figure out how to implement the live camara into the example. So maybe this back and forth is nowthing ot worry about.


            big thanks

          Hi @pfgnoobi ,

          On accuracy: I'd apply https://docs.luxonis.com/projects/api/en/latest/tutorials/configuring-stereo-depth/#brightness-filter and then lower confidence threshold. You will have less depth points, but they will be more accurate, which can help with the noise. It's mostly trial & error here.

          depthFrame = (depthFrame*disparityMultiplier).astype(np.uint8)

          Do you mean this? It's because disparity has values between 0..95 (standard disparity search, or 0..190 with extended, or a lot more if using subpixel mode), and you want to normalize it to 0..255 for nicer visualization.

          1. Detections and rgbDepth get sent to Device into ObjectTracker node

          Yes, data flow is quite hacky, we could also do object tracking on the host, but we just utilized as much computation on the OAK in this example.

          And yes, the whole flow you described is correct.

          Can i implement this openCV stuff from Step 5. into the device? using a Script Node?

          Please have a look here:
          https://docs.luxonis.com/en/latest/pages/tutorials/on-device-programming/
          It would likely be possible to make custom NN that does the CV stuff, but would take more time. With script node it would be way too slow.

          Thanks, Erik

            11 days later

            Hi @erik,
            thanks a lot, also for pointing me to custom NN - seems to be quite intresting to try after i got these "basics" working.

            The last days i did some more testing and tried to get depth-people-counting working with live camera stream, not via recorded video. I have a strange error which i´m not able to fix. When i create the xinFrame = pipeline.createXLinkIn() pipeline Depth looks like Picture one. If i comment out the 2 lines after #Linking (see code below) the depth looks normal. Why is this strange thing happening?
            (down below is only reduced projectcode)


            import cv2
            import numpy as np
            import depthai as dai
            
            device = dai.Device()
            print("Creating Stereo Depth pipeline")
            pipeline = dai.Pipeline()
            
            camLeft = pipeline.create(dai.node.MonoCamera)
            camRight = pipeline.create(dai.node.MonoCamera)
            
            camLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
            camRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
            
            ### Define stereoDepth node and create outputs
            #---
            stereo = pipeline.create(dai.node.StereoDepth)
            
            #Set StereoDepth config
            stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
            stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)  # KERNEL_7x7 default, 5x5, 3x3, MEDIAN_OFF
            stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
            stereo.setLeftRightCheck(True)
            #stereo.setExtendedDisparity(extended)
            stereo.setSubpixel(True)
            
            #Alpha scaling to use 'full' FOV
            stereo.setAlphaScaling(0.2)
            config = stereo.initialConfig.get()
            config.postProcessing.brightnessFilter.minBrightness = 0
            stereo.initialConfig.set(config)
            
            xoutDepth = pipeline.create(dai.node.XLinkOut)
            xoutDepth.setStreamName("depthOut")
            
            #Define connection between nodes
            #--
            camLeft.out.link(stereo.left)
            camRight.out.link(stereo.right)
            #stereo.syncedLeft.link(xoutLeft.input)
            #stereo.syncedRight.link(xoutRight.input)
            stereo.disparity.link(xoutDepth.input)
            
            #tracking on Device
            objectTracker = pipeline.createObjectTracker()
            objectTracker.inputTrackerFrame.setBlocking(True)
            objectTracker.inputDetectionFrame.setBlocking(True)
            objectTracker.inputDetections.setBlocking(True)
            objectTracker.setDetectionLabelsToTrack([1])  # track only person
            # possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS
            objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
            # take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
            objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.UNIQUE_ID)
            
            # Linking
            xinFrame = pipeline.createXLinkIn()    #------------------- adding
            xinFrame.setStreamName("frameIn")     #-------------------- this creates ugly depth
            xinFrame.out.link(objectTracker.inputDetectionFrame)
            
            cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
            cvColorMap[0] = [0, 0, 0]
            print("Creating DepthAI device")
            
            with device:
                device.startPipeline(pipeline)
                q = device.getOutputQueue(name="depthOut", maxSize=4, blocking=False)
            
                while True:
            
                    name = q.getName()
                    depthFrame = q.get().getCvFrame()
            
                    depthRgb = getDisparityFrame(depthFrame, cvColorMap)
            
                    cv2.imshow(name, depthRgb)
                    if cv2.waitKey(1) == ord("q"):
                        break

            Hi @pfgnoobi
            Not able to reproduce the issue. The code you have sent does not utilize the object tracker and doesn't have getDisparityFrame defined.

            You are linking input to ObjectTracker which should not have an effect on the disparity output.

            Thanks,
            Jaka

            Hi @jakaskerl,
            sry here is the full code. This code results in a strange depth map.
            If you also comment out the Linking part, everything looks nice.
            All the other parts which are comment out are just to find the trouble maker code part. Thanks for your help

            import cv2
            import numpy as np
            import depthai as dai
            
            DETECTION_ROI = (20,130,650,240) # x,y,w,h 250,130,450,240
            THRESH_DIST_DELTA = 0.2 #thrsh minimum distance to get counted
            
            def getDisparityFrame(frame, cvColorMap):
                maxDisp = stereo.initialConfig.getMaxDisparity()
                disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
                disp = cv2.applyColorMap(disp, cvColorMap)
            
                return disp
            
            class TextHelper:
                def __init__(self) -> None:
                    self.bg_color = (0, 0, 0)
                    self.color = (255, 255, 255)
                    self.text_type = cv2.FONT_HERSHEY_SIMPLEX
                    self.line_type = cv2.LINE_AA
                def putText(self, frame, text, coords):
                    cv2.putText(frame, text, coords, self.text_type, 1.3, self.bg_color, 5, self.line_type)
                    cv2.putText(frame, text, coords, self.text_type, 1.3, self.color, 2, self.line_type)
                    return frame
                def rectangle(self, frame, topLeft,bottomRight, size=1.):
                    cv2.rectangle(frame, topLeft, bottomRight, self.bg_color, int(size*4))
                    cv2.rectangle(frame, topLeft, bottomRight, self.color, int(size))
                    return frame
            
            def to_planar(arr: np.ndarray) -> list:
                return arr.transpose(2, 0, 1).flatten()
            
            
            class PeopleCounter:
                def __init__(self):
                    self.tracking = {}
                    self.lost_cnt = {}
                    self.people_counter = [0,0,0,0] # Up, Down, Left, Right
            
                def __str__(self) -> str:
                    return f"Left: {self.people_counter[2]}, Right: {self.people_counter[3]}"
            
                def tracklet_removed(self, coords1, coords2):
                    deltaX = coords2[0] - coords1[0]
                    print('Delta X', deltaX)
            
                    if THRESH_DIST_DELTA < abs(deltaX):
                        self.people_counter[2 if 0 > deltaX else 3] += 1
                        direction = "left" if 0 > deltaX else "right"
                        print(f"Person moved {direction}")
            
                def get_centroid(self, roi):
                    x1 = roi.topLeft().x
                    y1 = roi.topLeft().y
                    x2 = roi.bottomRight().x
                    y2 = roi.bottomRight().y
                    return ((x2+x1)/2, (y2+y1)/2)
            
                def new_tracklets(self, tracklets):
                    for t in tracklets:
                        # If new tracklet, save its centroid
                        if t.status == dai.Tracklet.TrackingStatus.NEW:
                            self.tracking[str(t.id)] = self.get_centroid(t.roi)
                            self.lost_cnt[str(t.id)] = 0
                        elif t.status == dai.Tracklet.TrackingStatus.TRACKED:
                            self.lost_cnt[str(t.id)] = 0
                        elif t.status == dai.Tracklet.TrackingStatus.LOST:
                            self.lost_cnt[str(t.id)] += 1
                            # Tracklet has been lost for too long
                            if 10 < self.lost_cnt[str(t.id)]:
                                self.lost_cnt[str(t.id)] = -999
                                self.tracklet_removed(self.tracking[str(t.id)], self.get_centroid(t.roi))
                        elif t.status == dai.Tracklet.TrackingStatus.REMOVED:
                            if 0 <= self.lost_cnt[str(t.id)]:
                                self.lost_cnt[str(t.id)] = -999
                                self.tracklet_removed(self.tracking[str(t.id)], self.get_centroid(t.roi))
            
            
            device = dai.Device()
            print("Creating Stereo Depth pipeline")
            pipeline = dai.Pipeline()
            
            camLeft = pipeline.create(dai.node.MonoCamera)
            camRight = pipeline.create(dai.node.MonoCamera)
            
            camLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
            camRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_800_P)
            
            ### Define stereoDepth node and create outputs
            #---
            stereo = pipeline.create(dai.node.StereoDepth)
            
            #Set StereoDepth config
            stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
            stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)  # KERNEL_7x7 default, 5x5, 3x3, MEDIAN_OFF
            stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout
            stereo.setLeftRightCheck(True)
            #stereo.setExtendedDisparity(extended)
            stereo.setSubpixel(True)
            
            #Alpha scaling to use 'full' FOV
            stereo.setAlphaScaling(0.2)
            config = stereo.initialConfig.get()
            config.postProcessing.brightnessFilter.minBrightness = 0
            stereo.initialConfig.set(config)
            
            xoutDepth = pipeline.create(dai.node.XLinkOut)
            xoutDepth.setStreamName("depthOut")
            
            #Define connection between nodes
            #--
            camLeft.out.link(stereo.left)
            camRight.out.link(stereo.right)
            #stereo.syncedLeft.link(xoutLeft.input)
            #stereo.syncedRight.link(xoutRight.input)
            stereo.disparity.link(xoutDepth.input)
            
            #tracking on Device
            objectTracker = pipeline.createObjectTracker()
            objectTracker.inputTrackerFrame.setBlocking(True)
            objectTracker.inputDetectionFrame.setBlocking(True)
            objectTracker.inputDetections.setBlocking(True)
            objectTracker.setDetectionLabelsToTrack([1])  # track only person
            # possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS
            objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
            # take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
            objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.UNIQUE_ID)
            
            # Linking
            xinFrame = pipeline.createXLinkIn()     #------this seems to add ...
            xinFrame.setStreamName("frameIn")       #------- ... strange depth error, comment out and it looks fine
            xinFrame.out.link(objectTracker.inputDetectionFrame)
            '''
            # Maybe we need to send the old frame here, not sure
            xinFrame.out.link(objectTracker.inputTrackerFrame)
            
            xinDet = pipeline.createXLinkIn()
            xinDet.setStreamName("detIn")
            xinDet.out.link(objectTracker.inputDetections)
            
            trackletsOut = pipeline.createXLinkOut()
            trackletsOut.setStreamName("trackletsOut")
            objectTracker.out.link(trackletsOut.input)
            
            '''
            
            cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
            cvColorMap[0] = [0, 0, 0]
            print("Creating DepthAI device")
            
            with device:
                device.startPipeline(pipeline)
                q = device.getOutputQueue(name="depthOut", maxSize=4, blocking=False)
                '''
                trackletsQ = device.getOutputQueue(name="trackletsOut", maxSize=4, blocking=False)
                detInQ = device.getInputQueue("detIn")
                frameInQ = device.getInputQueue("frameIn")
                '''
                disparityMultiplier = 255 / stereo.initialConfig.getMaxDisparity()
            
                text = TextHelper()
                counter = PeopleCounter()
                #bis hier
                while True:
            
                    name = q.getName()
                    depthFrame = q.get().getFrame()
                    depthFrame = (depthFrame*disparityMultiplier).astype(np.uint8)    #use for depth
                    depthRgb = getDisparityFrame(depthFrame, cvColorMap)
                    '''
                    trackletsIn = trackletsQ.tryGet()
                    if trackletsIn is not None:
                        counter.new_tracklets(trackletsIn.tracklets)
            
                    # Crop only the corridor:
            
                    cropped = depthFrame[DETECTION_ROI[1]:DETECTION_ROI[3], DETECTION_ROI[0]:DETECTION_ROI[2]]
                    cv2.imshow('Crop', cropped)
            
                    ret, thresh = cv2.threshold(cropped, 20, 145, cv2.THRESH_BINARY)
                    cv2.imshow('thr', thresh)
            
                    blob = cv2.morphologyEx(thresh, cv2.MORPH_OPEN,
                                            cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (22, 22)))  # editet elipse size 37,37
                    cv2.imshow('blob', blob)
            
                    edged = cv2.Canny(blob, 20, 80)
                    cv2.imshow('Canny', edged)
            
                    contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
            
                    dets = dai.ImgDetections()
                    # len contorus is count of seperate heads/blobs
                    if len(contours) != 0:
                        c = max(contours, key=cv2.contourArea)
                        x, y, w, h = cv2.boundingRect(c)
                        # cv2.imshow('Rect', text.rectangle(blob, (x,y), (x+w, y+h)))
                        x += DETECTION_ROI[0]
                        y += DETECTION_ROI[1]
                        area = w * h
                        # print(len(contours), area)
            
                        if 760 < area:
                            # Send the detection to the device - ObjectTracker node
                            det = dai.ImgDetection()
                            det.label = 1
                            det.confidence = 1.0
                            det.xmin = x
                            det.ymin = y
                            det.xmax = x + w
                            det.ymax = y + h
                            dets.detections = [det]
            
                            # Draw rectangle on the biggest countour
                            text.rectangle(depthRgb, (x, y), (x + w, y + h), size=2.5)
            
                    detInQ.send(dets)
                    imgFrame = dai.ImgFrame()
                    imgFrame.setData(to_planar(depthRgb))
                    imgFrame.setType(dai.RawImgFrame.Type.BGR888p)
                    imgFrame.setWidth(depthRgb.shape[0])
                    imgFrame.setHeight(depthRgb.shape[1])
                    frameInQ.send(imgFrame)
                    '''
                    text.rectangle(depthRgb, (DETECTION_ROI[0], DETECTION_ROI[1]), (DETECTION_ROI[2], DETECTION_ROI[3]))
                    text.putText(depthRgb, str(counter), (20, 40))
            
                    cv2.imshow(name, depthRgb)
                    if cv2.waitKey(1) == ord("q"):
                        break

              Hi pfgnoobi

              Running script as is:

              With commented out lines:

              Are you sure you are not also changing socket so something. This looks like switched left and right.

              Thanks,
              Jaka

                jakaskerl
                I did some more testing but i can´t figure out where this error comes from.
                I´m running excactly the same code and it gives me an ugly depth, as soon as i comment out the two lines after #Linking it works fine.

                If i invert the sockets, like this and leave the rest of the code the same as in older post it correct depth. So issue fixed thanks

                #Define connection between nodes
                #--
                camLeft.out.link(stereo.right)
                camRight.out.link(stereo.left)