ToF 12MP Person Segmentation problem

BonkoKaradjov · Dec 2, 2024

Dear Luxonis Community,

As a generalist I am not a really good programmer and I wanted kindly to ask you for help.
I wanted to replace the Kinect Azure and V2 and support the really great OAK Cameras for low budget art
purposes.
Unfortunately I am the only person that does all the videodesign/3D etc. and also the coding stuff, so my knowledge about coding is "ok" but sometimes I have gaps where I don't know how to go further.

I bound in my modified ToF PoE Sensor with 12MP Camera together with the segmentation example and tried to replace the depth cam with the ToF Cam.
My goal is to send out from my PoE sensor an
- as high as possible Color image,
- an alpha (black and white) (cutout of one or multiple persons silhouette)
- and probably a cutout of a combinded (color+cutout) of one or multiple persons silhouettes as clean as possible - or the whole raw ToF Depth Channel (for further work)
Those 2 or 3 video streams would go over NDI in ideal form from scratch only starting the software and in as high as possible image quality but still not lagging in FPS sent through the network.

I kind of achieved to start the program but my knowledge stops here in terms of the next steps how to go further and I somehow did something very wrong.
The image (because it had to cropped for blobtrack??) is now smaller, I wished it could be as "uncropped" as possible (to use the most of the area) and the canvas and my body recognition kinda works (not very well, but works) and it does not seem to use the ToF Depth channel because I maybe did something wrong with my color channels.
Also my fps is at around 9FPS and it does not seem to be a network problem.
I was hoping that combining ToF and Color Channel would give me a much cleaner segmentation also in
low or no light situations.

Here is my code I tried to write/combine.
Link to code ToF_Depth_Segmentation

I would be very, very thankful for any help and sorry for asking beginners questions, I am really slow
in reading and understanding the code snippets.
Hope that if anybody has time to help me that I can barely implement your corrections without asking to many additional questions.

Thank you again very much for any help.

With kind regards
Bonko

(My Video is upside down because I have a custom build of the ToF Sensor, I could of course turn it 180degrees afterwards or before but I don't know if this mixes up the calibration?? because my custom build has a slightly rotated ToF picture)
Screenshot_Try_ToF_DepthSegmentation

jakaskerl · Dec 3, 2024

BonkoKaradjov

Upscale the segmentation results to ISP image, not the other way around, since you are cutting out large portions of FOV. Since the aspect ratios are different, keep that in mind when scaling (use the height scaling factor for width as well). The input to NN should incorporate the whole RGB image FOV for this to work.
Normalize the second image to UINT8

Thanks,
Jaka

BonkoKaradjov · Dec 12, 2024

Dear Jaka, thank you very much for your help. Unfortunately I am not really good at coding or need more time to work myself into depthai to follow your advice. I tried another approach (down below).

But thank you very much for your help and response

BonkoKaradjov · Dec 12, 2024

My goal is to work out 2 video streams.
One color and one ToF and send them through network (NDI) with low latency and synchronized.
Color should be as high quality as the network allows and ToF depth with enough information to
be usable to color ramp out regions in Touchdesigner.

I find it comfortable to change the settings by different key press (later on I will bind it into some kind of Tkinter interface and save the initial data into a json file to load up saved configurations).

My code below is asynch and not very fast through network (and I am aware that combining two codes like this is not the best but if there is any step by step tutorial out there or help I would be very thankful.
I am still to stupid to read simple things as what drives the fps of the images, where exactly ends the image creation and where to combine the streams etc.

I had synchronized approaches but they had to much delay when they went through the network and for arts purposes I need them nearly realtime and in good quality.
I also know the video encoder solution but do not really understand how it works when streaming only data not recording...etc.

Thanks to everyone for any help

BonkoKaradjov · Dec 12, 2024

`import time
import depthai as dai
import cv2
from itertools import cycle
import numpy as np

#Step size ('W','A','S','D' controls)
STEP_SIZE = 8
#Manual exposure/focus/white-balance set step
EXP_STEP = 500 # us
ISO_STEP = 50
LENS_STEP = 3
WB_STEP = 200

#===> Tof Color Map
cvColorMap = cv2.applyColorMap(np.arange(256, dtype=np.uint8), cv2.COLORMAP_JET)
cvColorMap[0] = [0, 0, 0]
#<===

def clamp(num, v0, v1):
return max(v0, min(num, v1))

#Create pipeline for Rgb and ToF
pipeline = dai.Pipeline()

#Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_C)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)
camRgb.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)
camRgb.setIspScale(1,2) # 1080P -> 720P
stillEncoder = pipeline.create(dai.node.VideoEncoder)

controlIn = pipeline.create(dai.node.XLinkIn)
configIn = pipeline.create(dai.node.XLinkIn)
ispOut = pipeline.create(dai.node.XLinkOut)
videoOut = pipeline.create(dai.node.XLinkOut)
stillMjpegOut = pipeline.create(dai.node.XLinkOut)

controlIn.setStreamName('control')
configIn.setStreamName('config')
ispOut.setStreamName('isp')
videoOut.setStreamName('video')
stillMjpegOut.setStreamName('still')

#Properties
#Small Video cropped
camRgb.setVideoSize(640,360)
stillEncoder.setDefaultProfilePreset(1, dai.VideoEncoderProperties.Profile.MJPEG)

#Linking
camRgb.isp.link(ispOut.input)
camRgb.still.link(stillEncoder.input)
camRgb.video.link(videoOut.input)
controlIn.out.link(camRgb.inputControl)
configIn.out.link(camRgb.inputConfig)
stillEncoder.bitstream.link(stillMjpegOut.input)

#=========> ToF Pipeline Begin

tof = pipeline.create(dai.node.ToF)

#Configure the ToF node
tofConfig = tof.initialConfig.get()

#Optional. Best accuracy, but adds motion blur.
#see ToF node docs on how to reduce/eliminate motion blur.
tofConfig.enableOpticalCorrection = True
tofConfig.enablePhaseShuffleTemporalFilter = True
tofConfig.phaseUnwrappingLevel = 1
tofConfig.phaseUnwrapErrorThreshold = 25

tofConfig.enableTemperatureCorrection = False # Not yet supported

xinTofConfig = pipeline.create(dai.node.XLinkIn)
xinTofConfig.setStreamName("tofConfig")
xinTofConfig.out.link(tof.inputConfig)

tof.initialConfig.set(tofConfig)

cam_tof = pipeline.create(dai.node.Camera)
cam_tof.setFps(120) # ToF node will produce depth frames at /2 of this rate
cam_tof.setBoardSocket(dai.CameraBoardSocket.CAM_A)
cam_tof.raw.link(tof.input)
cam_tof.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)

xout = pipeline.create(dai.node.XLinkOut)
xout.setStreamName("depth")
tof.depth.link(xout.input)

tofConfig = tof.initialConfig.get()

#<================== End ToF Pipeline

#Connect to device and start pipeline
with dai.Device(pipeline) as device:

#Get data queues
controlQueue = device.getInputQueue('control')
configQueue = device.getInputQueue('config')
ispQueue = device.getOutputQueue('isp')
videoQueue = device.getOutputQueue('video')
stillQueue = device.getOutputQueue('still')

# Max cropX & cropY
maxCropX = (camRgb.getIspWidth() - camRgb.getVideoWidth()) / camRgb.getIspWidth()
maxCropY = (camRgb.getIspHeight() - camRgb.getVideoHeight()) / camRgb.getIspHeight()
print(maxCropX, maxCropY, camRgb.getIspWidth(), camRgb.getVideoHeight())

# Default crop
cropX = 0
cropY = 0
sendCamConfig = True

# Defaults and limits for manual focus/exposure controls
lensPos = 150
expTime = 20000
sensIso = 800
wbManual = 4000
ae_comp = 0
ae_lock = False
awb_lock = False
saturation = 0
contrast = 0
brightness = 0
sharpness = 0
luma_denoise = 0
chroma_denoise = 0
control = 'none'
show = False

awb_mode = cycle([item for name, item in vars(dai.CameraControl.AutoWhiteBalanceMode).items() if name.isupper()])
anti_banding_mode = cycle([item for name, item in vars(dai.CameraControl.AntiBandingMode).items() if name.isupper()])
effect_mode = cycle([item for name, item in vars(dai.CameraControl.EffectMode).items() if name.isupper()])


#=====> ToF Data get
qDepth = device.getOutputQueue(name="depth")

tofConfigInQueue = device.getInputQueue("tofConfig")

counter = 0
#<===== ToF Data get END
while True:

    #====> ToF Camera Features
    start = time.time()
    key = cv2.waitKey(1)
    if key == ord('f'):
        tofConfig.enableFPPNCorrection = not tofConfig.enableFPPNCorrection
        tofConfigInQueue.send(tofConfig)
    elif key == ord('o'):
        tofConfig.enableOpticalCorrection = not tofConfig.enableOpticalCorrection
        tofConfigInQueue.send(tofConfig)
    elif key == ord('w'):
        tofConfig.enableWiggleCorrection = not tofConfig.enableWiggleCorrection
        tofConfigInQueue.send(tofConfig)
    elif key == ord('t'):
        tofConfig.enableTemperatureCorrection = not tofConfig.enableTemperatureCorrection
        tofConfigInQueue.send(tofConfig)
    elif key == ord('q'):
        break
    elif key == ord('0'):
        tofConfig.enablePhaseUnwrapping = False
        tofConfig.phaseUnwrappingLevel = 0
        tofConfigInQueue.send(tofConfig)
    elif key == ord('1'):
        tofConfig.enablePhaseUnwrapping = True
        tofConfig.phaseUnwrappingLevel = 1
        tofConfigInQueue.send(tofConfig)
    elif key == ord('2'):
        tofConfig.enablePhaseUnwrapping = True
        tofConfig.phaseUnwrappingLevel = 2
        tofConfigInQueue.send(tofConfig)
    elif key == ord('3'):
        tofConfig.enablePhaseUnwrapping = True
        tofConfig.phaseUnwrappingLevel = 3
        tofConfigInQueue.send(tofConfig)
    elif key == ord('4'):
        tofConfig.enablePhaseUnwrapping = True
        tofConfig.phaseUnwrappingLevel = 4
        tofConfigInQueue.send(tofConfig)
    elif key == ord('5'):
        tofConfig.enablePhaseUnwrapping = True
        tofConfig.phaseUnwrappingLevel = 5
        tofConfigInQueue.send(tofConfig)
    elif key == ord('z'):
        medianSettings = [dai.MedianFilter.MEDIAN_OFF, dai.MedianFilter.KERNEL_3x3, dai.MedianFilter.KERNEL_5x5,
                          dai.MedianFilter.KERNEL_7x7]
        currentMedian = tofConfig.median
        nextMedian = medianSettings[(medianSettings.index(currentMedian) + 1) % len(medianSettings)]
        print(f"Changing median to {nextMedian.name} from {currentMedian.name}")
        tofConfig.median = nextMedian
        tofConfigInQueue.send(tofConfig)

    imgFrame = qDepth.get()  # blocking call, will wait until a new data has arrived
    depth_map = imgFrame.getFrame()
    max_depth = (tofConfig.phaseUnwrappingLevel + 1) * 1500  # 100MHz modulation freq.
    depth_colorized = np.interp(depth_map, (0, max_depth), (0, 255)).astype(np.uint8)
    depth_colorized = cv2.applyColorMap(depth_colorized, cvColorMap)

    cv2.imshow("Colorized depth", depth_colorized)
    counter += 1
    # <==== ToF Camera Features and Show END

    vidFrames = videoQueue.tryGetAll()
    for vidFrame in vidFrames:
        # Showing CROPPED Frame Window
        cv2.imshow('video', vidFrame.getCvFrame())

    ispFrames = ispQueue.tryGetAll()
    for ispFrame in ispFrames:
        if show:
            txt = f"[{ispFrame.getSequenceNum()}] "
            txt += f"Exposure: {ispFrame.getExposureTime().total_seconds()*1000:.3f} ms, "
            txt += f"ISO: {ispFrame.getSensitivity()}, "
            txt += f"Lens position: {ispFrame.getLensPosition()}, "
            txt += f"Color temp: {ispFrame.getColorTemperature()} K"
            print(txt)
        #Showing original color image
        cv2.imshow('isp', ispFrame.getCvFrame())

        # Send new cfg to camera
        if sendCamConfig:
            cfg = dai.ImageManipConfig()
            cfg.setCropRect(cropX, cropY, 0, 0)
            configQueue.send(cfg)
            print('Sending new crop - x: ', cropX, ' y: ', cropY)
            sendCamConfig = False

    stillFrames = stillQueue.tryGetAll()
    for stillFrame in stillFrames:
        # Decode JPEG
        frame = cv2.imdecode(stillFrame.getData(), cv2.IMREAD_UNCHANGED)
        # Display
        cv2.imshow('still', frame)

    # Update screen (1ms pooling rate)
    key = cv2.waitKey(1)
    if key == ord('q'):
        break
    elif key == ord('/'):
        show = not show
        if not show: print("Printing camera settings: OFF")
    elif key == ord('c'):
        ctrl = dai.CameraControl()
        ctrl.setCaptureStill(True)
        controlQueue.send(ctrl)
    elif key == ord('t'):
        print("Autofocus trigger (and disable continuous)")
        ctrl = dai.CameraControl()
        ctrl.setAutoFocusMode(dai.CameraControl.AutoFocusMode.AUTO)
        ctrl.setAutoFocusTrigger()
        controlQueue.send(ctrl)
    elif key == ord('f'):
        print("Autofocus enable, continuous")
        ctrl = dai.CameraControl()
        ctrl.setAutoFocusMode(dai.CameraControl.AutoFocusMode.CONTINUOUS_VIDEO)
        controlQueue.send(ctrl)
    elif key == ord('e'):
        print("Autoexposure enable")
        ctrl = dai.CameraControl()
        ctrl.setAutoExposureEnable()
        controlQueue.send(ctrl)
    elif key == ord('b'):
        print("Auto white-balance enable")
        ctrl = dai.CameraControl()
        ctrl.setAutoWhiteBalanceMode(dai.CameraControl.AutoWhiteBalanceMode.AUTO)
        controlQueue.send(ctrl)
    elif key in [ord(','), ord('.')]:
        if key == ord(','): lensPos -= LENS_STEP
        if key == ord('.'): lensPos += LENS_STEP
        lensPos = clamp(lensPos, 0, 255)
        print("Setting manual focus, lens position: ", lensPos)
        ctrl = dai.CameraControl()
        ctrl.setManualFocus(lensPos)
        controlQueue.send(ctrl)
    elif key in [ord('i'), ord('o'), ord('k'), ord('l')]:
        if key == ord('i'): expTime -= EXP_STEP
        if key == ord('o'): expTime += EXP_STEP
        if key == ord('k'): sensIso -= ISO_STEP
        if key == ord('l'): sensIso += ISO_STEP
        expTime = clamp(expTime, 1, 33000)
        sensIso = clamp(sensIso, 100, 1600)
        print("Setting manual exposure, time: ", expTime, "iso: ", sensIso)
        ctrl = dai.CameraControl()
        ctrl.setManualExposure(expTime, sensIso)
        controlQueue.send(ctrl)
    elif key in [ord('n'), ord('m')]:
        if key == ord('n'): wbManual -= WB_STEP
        if key == ord('m'): wbManual += WB_STEP
        wbManual = clamp(wbManual, 1000, 12000)
        print("Setting manual white balance, temperature: ", wbManual, "K")
        ctrl = dai.CameraControl()
        ctrl.setManualWhiteBalance(wbManual)
        controlQueue.send(ctrl)
    elif key in [ord('w'), ord('a'), ord('s'), ord('d')]:
        if key == ord('a'):
            cropX = cropX - (maxCropX / camRgb.getResolutionWidth()) * STEP_SIZE
            if cropX < 0: cropX = 0
        elif key == ord('d'):
            cropX = cropX + (maxCropX / camRgb.getResolutionWidth()) * STEP_SIZE
            if cropX > maxCropX: cropX = maxCropX
        elif key == ord('w'):
            cropY = cropY - (maxCropY / camRgb.getResolutionHeight()) * STEP_SIZE
            if cropY < 0: cropY = 0
        elif key == ord('s'):
            cropY = cropY + (maxCropY / camRgb.getResolutionHeight()) * STEP_SIZE
            if cropY > maxCropY: cropY = maxCropY
        sendCamConfig = True
    elif key == ord('1'):
        awb_lock = not awb_lock
        print("Auto white balance lock:", awb_lock)
        ctrl = dai.CameraControl()
        ctrl.setAutoWhiteBalanceLock(awb_lock)
        controlQueue.send(ctrl)
    elif key == ord('2'):
        ae_lock = not ae_lock
        print("Auto exposure lock:", ae_lock)
        ctrl = dai.CameraControl()
        ctrl.setAutoExposureLock(ae_lock)
        controlQueue.send(ctrl)
    elif key >= 0 and chr(key) in '34567890[]':
        if   key == ord('3'): control = 'awb_mode'
        elif key == ord('4'): control = 'ae_comp'
        elif key == ord('5'): control = 'anti_banding_mode'
        elif key == ord('6'): control = 'effect_mode'
        elif key == ord('7'): control = 'brightness'
        elif key == ord('8'): control = 'contrast'
        elif key == ord('9'): control = 'saturation'
        elif key == ord('0'): control = 'sharpness'
        elif key == ord('['): control = 'luma_denoise'
        elif key == ord(']'): control = 'chroma_denoise'
        print("Selected control:", control)
    elif key in [ord('-'), ord('_'), ord('+'), ord('=')]:
        change = 0
        if key in [ord('-'), ord('_')]: change = -1
        if key in [ord('+'), ord('=')]: change = 1
        ctrl = dai.CameraControl()
        if control == 'none':
            print("Please select a control first using keys 3..9 0 [ ]")
        elif control == 'ae_comp':
            ae_comp = clamp(ae_comp + change, -9, 9)
            print("Auto exposure compensation:", ae_comp)
            ctrl.setAutoExposureCompensation(ae_comp)
        elif control == 'anti_banding_mode':
            abm = next(anti_banding_mode)
            print("Anti-banding mode:", abm)
            ctrl.setAntiBandingMode(abm)
        elif control == 'awb_mode':
            awb = next(awb_mode)
            print("Auto white balance mode:", awb)
            ctrl.setAutoWhiteBalanceMode(awb)
        elif control == 'effect_mode':
            eff = next(effect_mode)
            print("Effect mode:", eff)
            ctrl.setEffectMode(eff)
        elif control == 'brightness':
            brightness = clamp(brightness + change, -10, 10)
            print("Brightness:", brightness)
            ctrl.setBrightness(brightness)
        elif control == 'contrast':
            contrast = clamp(contrast + change, -10, 10)
            print("Contrast:", contrast)
            ctrl.setContrast(contrast)
        elif control == 'saturation':
            saturation = clamp(saturation + change, -10, 10)
            print("Saturation:", saturation)
            ctrl.setSaturation(saturation)
        elif control == 'sharpness':
            sharpness = clamp(sharpness + change, 0, 4)
            print("Sharpness:", sharpness)
            ctrl.setSharpness(sharpness)
        elif control == 'luma_denoise':
            luma_denoise = clamp(luma_denoise + change, 0, 4)
            print("Luma denoise:", luma_denoise)
            ctrl.setLumaDenoise(luma_denoise)
        elif control == 'chroma_denoise':
            chroma_denoise = clamp(chroma_denoise + change, 0, 4)
            print("Chroma denoise:", chroma_denoise)
            ctrl.setChromaDenoise(chroma_denoise)
        controlQueue.send(ctrl)
# ===> ToF Additional device close
device.close()
# <=== ToF Additional device close``

BonkoKaradjov · Dec 12, 2024

@jakaskerl

Here is my tryout for the Code with your Changes. It kinda works but PoE has a big latency and I totally fail in binding in the video encoder process for both Depthmap and RGB Video that promises to be send much faster through the network.

import numpy as np
import cv2
import depthai as dai

from datetime import timedelta
import numpy as np
from numba import jit, prange

import NDIlib as ndi

##Globals
#offsetDepth Values

def offsetImageFind(img,rows,cols):
    print('X Value: ')
    offsetX = int(input())
    print('Y Value: ')
    offsetY = int(input())

    M = np.float32([[1, 0, offsetX], [0, 1, offsetY]])
    img = cv2.warpAffine(img, M, (cols, rows))
    return img

def offsetImageDo(img,rows,cols, offsetX,offsetY):

    M = np.float32([[1, 0, offsetX], [0, 1, offsetY]])
    img = cv2.warpAffine(img, M, (cols, rows))
    return img



@jit(nopython=True, parallel=True)
def reprojection(depth_image, depth_camera_intrinsics, camera_extrinsics, color_camera_intrinsics, depth_image_show = None):
    height = len(depth_image)
    width = len(depth_image[0])
    if depth_image_show is not None:
        image = np.zeros((height, width), np.uint8)
    else:
        image = np.zeros((height, width), np.uint16)
    if(camera_extrinsics[0][3] > 0):
        sign = 1
    else:
        sign = -1
    for i in prange(0, height):
        for j in prange(0, width):
            if sign == 1:
                #Reverse the order of the pixels
                j = width - j - 1
            d = depth_image[i][j]
            if(d == 0):
                continue
            #Convert pixel to 3d point
            x = (j - depth_camera_intrinsics[0][2]) * d / depth_camera_intrinsics[0][0]
            y = (i - depth_camera_intrinsics[1][2]) * d / depth_camera_intrinsics[1][1]
            z = d

            #Move the point to the camera frame
            x1 = camera_extrinsics[0][0] * x + camera_extrinsics[0][1] * y + camera_extrinsics[0][2] * z + camera_extrinsics[0][3]
            y1 = camera_extrinsics[1][0] * x + camera_extrinsics[1][1] * y + camera_extrinsics[1][2] * z + camera_extrinsics[1][3]
            z1 = camera_extrinsics[2][0] * x + camera_extrinsics[2][1] * y + camera_extrinsics[2][2] * z + camera_extrinsics[2][3]

            u = color_camera_intrinsics[0][0] * (x1  / z1) + color_camera_intrinsics[0][2]
            v = color_camera_intrinsics[1][1] * (y1  / z1) + color_camera_intrinsics[1][2]
            int_u = round(u)
            int_v = round(v)
            #if(int_v != i):
            #    print(f'v -> {v} and i -> {i}') # This should never be printed
            if int_u >= 0 and int_u < (len(image[0]) - 1) and int_v >= 0 and int_v < len(image):
                if depth_image_show is not None:
                    image[int_v][int_u] = depth_image_show[i][j][0]
                    image[int_v][int_u + sign] = depth_image_show[i][j][0]
                else:
                    image[int_v][int_u] = z1
                    image[int_v][int_u + sign] = z1
    return image
FPS = 30

RGB_SOCKET = dai.CameraBoardSocket.CAM_C
LEFT_SOCKET = dai.CameraBoardSocket.CAM_A

###NDI Globals
ndi_name_1 = 'BkNDI_Color'
ndi_name_2 = 'BkNDI_TOF'

COLOR_RESOLUTION = dai.ColorCameraProperties.SensorResolution.THE_12_MP
tofSize = (640, 480)
rgbSize = (1352, 1012)

FPPN_ENABLE = True
WIGGLE_ENABLE = True
TEMPERATURE_ENABLE = False
OPTICAL_ENABLE = True

#Used for colorizing the depth map
MIN_DEPTH = 500  # mm
MAX_DEPTH = 10000  # mm


device = dai.Device()

try:
    calibData = device.readCalibration2()
    M1 = np.array(calibData.getCameraIntrinsics(LEFT_SOCKET, *tofSize))
    D1 = np.array(calibData.getDistortionCoefficients(LEFT_SOCKET))
    M2 = np.array(calibData.getCameraIntrinsics(RGB_SOCKET, *rgbSize))
    D2 = np.array(calibData.getDistortionCoefficients(RGB_SOCKET))

    T = (
        np.array(calibData.getCameraTranslationVector(LEFT_SOCKET, RGB_SOCKET, False))
        * 10
    )  # to mm for matching the depth
    R = np.array(calibData.getCameraExtrinsics(LEFT_SOCKET, RGB_SOCKET, False))[
        0:3, 0:3
    ]
    TARGET_MATRIX = M1

    lensPosition = calibData.getLensPosition(RGB_SOCKET)
except:
    raise


pipeline = dai.Pipeline()

#Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
tofIn = pipeline.create(dai.node.Camera)
tof = pipeline.create(dai.node.ToF)
sync = pipeline.create(dai.node.Sync)
out = pipeline.create(dai.node.XLinkOut)


tofIn.setBoardSocket(LEFT_SOCKET)
tofIn.properties.numFramesPoolRaw = 5
tofIn.setFps(FPS)
tofIn.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)

tofConfig = tof.initialConfig.get()
#tofConfig.depthParams.freqModUsed = dai.RawToFConfig.DepthParams.TypeFMod.MIN
#tofConfig.depthParams.freqModUsed = dai.RawToFConfig.DepthParams.TypeFMod.MAX
#tofConfig.depthParams.avgPhaseShuffle = False
#tofConfig.depthParams.minimumAmplitude = 3.0
tofConfig.enableFPPNCorrection = FPPN_ENABLE
tofConfig.enableOpticalCorrection = OPTICAL_ENABLE
tofConfig.enableWiggleCorrection = WIGGLE_ENABLE
tofConfig.enableTemperatureCorrection = TEMPERATURE_ENABLE
#tofConfig.median = dai.MedianFilter.KERNEL_5x5

tofConfig.enablePhaseUnwrapping = True
tofConfig.phaseUnwrappingLevel = 4
tof.initialConfig.set(tofConfig)

camRgb.setBoardSocket(RGB_SOCKET)
camRgb.setResolution(COLOR_RESOLUTION)
camRgb.setFps(FPS)
camRgb.setIspScale(1,3)
#camRgb.setImageOrientation(dai.CameraImageOrientation.ROTATE_180_DEG)

out.setStreamName("out")

sync.setSyncThreshold(timedelta(milliseconds=20))

#Linking
camRgb.isp.link(sync.inputs["rgb"])
tofIn.raw.link(tof.input)
tof.depth.link(sync.inputs["depth"])
sync.out.link(out.input)

#######################################
###NDI Config BEGIN
#if not ndi.initialize():
#return 0

#Try NDI Send 1
send_settings1 = ndi.SendCreate()
send_settings1.ndi_name = ndi_name_1
ndi_send_1 = ndi.send_create(send_settings1)
ndi.SendCreate('Name')
video_frame_1 = ndi.VideoFrameV2()

#Try NDI Send 2
send_settings2 = ndi.SendCreate()
send_settings2.ndi_name = ndi_name_2
ndi_send_2 = ndi.send_create(send_settings2)
ndi.SendCreate('Name')
video_frame_2 = ndi.VideoFrameV2()

###NDI Config END
#######################################


def colorizeDepth(frameDepth, minDepth=MIN_DEPTH, maxDepth=MAX_DEPTH):
    invalidMask = frameDepth == 0
    depthFrameColor = np.interp(frameDepth, (minDepth, maxDepth), (0, 255)).astype(
        np.uint8
    )
    depthFrameColor = cv2.applyColorMap(depthFrameColor, cv2.COLORMAP_JET)
    #Set invalid depth pixels to black
    depthFrameColor[invalidMask] = 0
    return depthFrameColor


def getAlignedDepth(frameDepth):
    R1, R2, _, _, _, _, _ = cv2.stereoRectify(M1, D1, M2, D2, (100, 100), R, T)  # The (100,100) doesn't matter as it is not used for calculating the rotation matrices
    leftMapX, leftMapY = cv2.initUndistortRectifyMap(M1, None, R1, TARGET_MATRIX, tofSize, cv2.CV_32FC1)
    depthRect = cv2.remap(frameDepth, leftMapX, leftMapY, cv2.INTER_NEAREST)
    newR = np.dot(R2, np.dot(R, R1.T))  # Should be very close to identity
    newT = np.dot(R2, T)
    combinedExtrinsics = np.eye(4)
    combinedExtrinsics[0:3, 0:3] = newR
    combinedExtrinsics[0:3, 3] = newT
    depthAligned = reprojection(depthRect, TARGET_MATRIX, combinedExtrinsics, TARGET_MATRIX)
    #Rotate the depth to the RGB frame
    R_back = R2.T
    mapX, mapY = cv2.initUndistortRectifyMap(TARGET_MATRIX, None, R_back, M2, rgbSize, cv2.CV_32FC1)
    outputAligned = cv2.remap(depthAligned, mapX, mapY, cv2.INTER_NEAREST)
    return outputAligned


rgbWeight = 0.5
depthWeight = 0.5


def updateBlendWeights(percent_rgb):
    """
    Update the rgb and depth weights used to blend depth/rgb image
    @param[in] percent_rgb The rgb weight expressed as a percentage (0..100)
    """
    global depthWeight
    global rgbWeight
    rgbWeight = float(percent_rgb) / 100.0
    depthWeight = 1.0 - rgbWeight


#Connect to device and start pipeline
with device:
    device.startPipeline(pipeline)
    queue = device.getOutputQueue("out", 8, False)

    # Configure windows; trackbar adjusts blending ratio of rgb/depth
    rgb_depth_window_name = "rgb-depth"

    cv2.namedWindow(rgb_depth_window_name)
    cv2.createTrackbar(
        "RGB Weight %",
        rgb_depth_window_name,
        int(rgbWeight * 100),
        100,
        updateBlendWeights,
    )
    while True:
        messageGroup: dai.MessageGroup = queue.get()
        frameRgb: dai.ImgFrame = messageGroup["rgb"]
        frameDepth: dai.ImgFrame = messageGroup["depth"]
        # Blend when both received
        if frameRgb is not None and frameDepth is not None:
            frameRgb = frameRgb.getCvFrame()

            cv2.imshow("rgb", frameRgb)
            cv2.imshow("depth", colorizeDepth(frameDepth.getCvFrame()))

            alignedDepth = getAlignedDepth(frameDepth.getFrame())
            # visualise Point Cloud
            rgb = cv2.cvtColor(frameRgb, cv2.COLOR_BGR2RGB)

            # Colorize the aligned depth
            alignedDepth = colorizeDepth(alignedDepth)

            # Offset Depth Find Out
            #alignedDepth = offsetImageFind(alignedDepth,1080,1920)
            #alignedDepth = offsetImageDo(alignedDepth,1080,1920,40,-33)


            # Undistort the RGB frame
            mapX, mapY = cv2.initUndistortRectifyMap(
                M2, D2, None, M2, rgbSize, cv2.CV_32FC1
            )
            frameRgb = cv2.remap(frameRgb, mapX, mapY, cv2.INTER_LINEAR)

            # rotage depth
            scaleer = 0.7
            frameRgb = cv2.rotate(frameRgb, cv2.ROTATE_180)
            frameRgb = cv2.resize(frameRgb,(0,0),fx= scaleer, fy = scaleer)
            alignedDepth = cv2.resize(alignedDepth, (0,0), fx= scaleer, fy= scaleer)
            blended = cv2.addWeighted(frameRgb, rgbWeight, alignedDepth, depthWeight, 0)


            cv2.imshow(rgb_depth_window_name, blended)

            ### Ndi Send 1 --> BEGIN

            stream_frame_1 = frameRgb
            stream_frame_1 = cv2.cvtColor(stream_frame_1, cv2.COLOR_BGR2BGRA)

            video_frame_1.data = stream_frame_1
            video_frame_1.FourCC = ndi.FOURCC_VIDEO_TYPE_BGRX

            ndi.send_send_video_v2(ndi_send_1, video_frame_1)

            ### NDI Send 1 --> END

            ### Ndi Send 2 --> BEGIN
            alignedDepth = cv2.cvtColor(alignedDepth, cv2.COLOR_BGR2BGRA)

            video_frame_2.data = alignedDepth
            video_frame_2.FourCC = ndi.FOURCC_VIDEO_TYPE_BGRX

            ndi.send_send_video_v2(ndi_send_2, video_frame_2)

            ### NDI Send 2 --> END

        key = cv2.waitKey(1)
        if key == ord("q"):
            break

BonkoKaradjov · Dec 12, 2024

I am somehow even to stupid to post the code as a readable code so here two links...I am very sorry

Link to Async File with faster transfer and options to change tof and rgb by keyboard

Link to Synced File, better aligned, with much more latency and without options to change tof and rgb by keyboard

BonkoKaradjov · Dec 15, 2024

After understanding the Linking System it was much easier to code the example.

I had success and will solve some speed problems for the first with a much faster switch.
For now I am planning a version with GUI for the following script that works pretty nice.

Link to script with synced cameras RGB and ToF + controls option

Thanks again to everyone for your help

erik · Dec 15, 2024

Hi @BonkoKaradjov ,
Good to hear you had success. Just note that:

Video encoding can only be done on color stream atm, as encoding doesn't support depth (as it's int16, and video enc. hardware doesn't support that)
Faster switch (eg 2.5/5/10 gbps) won't necessarily help, as camera chip can only do 1gbps, so you'd want to either reduce resolution or FPS to use less bandwidth

Thanks, Erik

BonkoKaradjov · Dec 15, 2024

Dear erik Thank you very much for your help

Good to know about the switch. Actually my last script with the setispScale(1,3) works pretty ok
and I will delete unnecessary code after some fixes I want to add and see how much I can reduce the lag when
increasing the resolution slightly.

Unfortunately THE_1352X1012 does not make the sensor start, so I must reduce from THE_12_MP but it ist still very good
If I got it right video encoding is already running on this script and I am very happy about the progress and the sensor itself.
Next step will be to uniform the depthmaps values in one color range (graymap) beginning from start to end, like white to black or the opposite.

Thank you again for your help and sorry for asking so many questions in the past.
It took me some time to understand depthai, now things start to get fun.

BonkoKaradjov · Dec 30, 2024

Urgent ask for help about the pixel perfect alignement.

I worked out a Touchdesigner to OaK ToF 12MP (custom
build from Luxonis) and still am not able to fix the alignement in code between ToF and 12Mp sensor.

My goal is still to send out 2 streams:

ToF with 640x480
RGB with 1035x1024
but aligned together in sync and pixels matching
(in Touchdesigner I would upscale the ToF image then, so they could match together, so that the network video is nearly realtime without big lags or latency)

Here is my work in a nutshell until now.
Link to folder

My TD Group says that if the sensor cameras had been aligned it should match when sending out the raw data somehow.
I am trying to recommend your sensors to the group, where lots of artists are searching for sensors.

My production I would use the sensor starts on january 2nd 2025 and I will for shure have some time to work this issue out for 1 or 2 weeks but I was hoping to start with the arts process also soo, as possible to not run out of time.

I would be very thankful for any help, still I am not that good in coding with depth ai and more time to understand how to combine different scripts together.

Thank you very much for any help

jakaskerl · 4 Jan

Hi BonkoKaradjov
Apologies for late response.
Does alignment work outside touchdesigner? Best to keep it python only untill everything works as expected.
I see you are performing alignment on host. The code look fine at first glance, perhaps the intrinsics/extrinsics are incorrect?
Can you try using ImageAlign node to check the alignment? https://docs.luxonis.com/software/depthai/examples/tof_align/ - example

Thanks,
Jaka

BonkoKaradjov · 4 Jan

Dear jakaskerl

thank you very much for your help

My approach trying out the alignment outside in Touchdesigner is due to the fact that I am trying since 1-1/2 months to achieve the following:

(as mentioned above my ToF sensor is a custom build from luxonis with a 12MP Color Camera)

RGB Camera Stream through ndi oder syphon (minimum resolution 1352x1024)
ToF Camera Stream (if possible 32bit) through ndi oder syphon (mininmum resolution 640x480)
both streams should be aligned as perfect as possible and woul be layered one over the other in another software (Touchdesigner) only by scaling up the ToF stream
Color and ToF with as less latency as possible and as high quality as possible for the color camera

I remember that I had many tryouts with the example you had sent me.
I tested it out again now and could somehow align the streams by

#rotate depth
alignedDepthColorized = cv2.rotate(alignedDepthColorized, cv2.ROTATE_180)
#transform depth
M = np.float32([[1, 0, getXOffset], [0, 1, getYOffset]])
alignedDepthColorized = cv2.warpAffine(alignedDepthColorized, M, (cols, rows))

but the depth is only aligned in a spot, not if I come near to the camera or go left or right.
Here a video showing the alignment:
https://gyazo.com/5ea2deadf471d4a6fb38e890e562f9b5

Also If alignement works with this script the streams I would get from color and ToF would be only (960x540) so that the reason why I payed lots of money, almost double the price, to get a ToF custom build with 12MP cam would not make sense for me.

Also I don't know but I remember that with this alignement script sending out a color stream that is higher in resolution somehow introduced unnecessary latency over PoE.

I was really hoping that I could get an alignment such as the one from the Orbecc Femto Mega between depth and 12MP camera.
If so the Oak ToF with the 12MP cam would beat the Femto Mega because it would also have a much more flexible Depthmap.

In my script above my videostreams go pretty fluently through the network almost without latency but I could not achieve sending the ToF video as a 32bit stream through Syphon (only as 16 bit over Ndi).
Also as in this script, I would implement a camera control over OSC as already implemented to control different situations such as changing focus.

Here my new script that tries out Syphon:
https://www.dropbox.com/scl/fi/whzgpadq1mnsb0rf63o3n/RgbTof_newSyphon_WORKING_04_01c.py?rlkey=4n1yvoibmhl0eff9vdigjvt2q&dl=0

Thank you again very much for your help