I tried to upload a precomputed 3x3 homography matrix and utilize the ImageManip node method set3x3TransformMatrix but I couldn't figure out how to actually stitch the frames together on the luxonis, so I just do it all warping, stitching, and blending on the host which is an Nvidia Jetson Orin Nano. But do you think the luxonis itself can output a stitched image from a pair of color cams, or should I just utilize cuda and do that on the host?

Or we could use its on device calibration parameters and some warp function from the ImageManip node to avoid having to use feature detection to generate and upload a homography matrix to warp and stitch the images on device and have the oak output a single larger stitched frame. and I can do the multi-band blending on the host gpu utilizing cuda and lastly outputting the final stitched and seam blended frame directly to the display from the GPU. Is this possible?

Which way makes the most sense?

    Hi Nearpoint
    You can stitch them using a script node, but I wouldn't recommend it since it is computationally expensive. You can however use the NN node. -- example here.

    Thanks,
    Jaka

    oh wow you can run PyTorch code on the Luxonis? That is so wild. Any idea if I would be able to run something like this which I use to blend the images?

    but before I run this code I use cv2.warpPerspective to warp the frames as necessary would this be the equivalent kornia code?

    cv2.cuda.warpPerspective(gpu_frame_left1, homography_matrix1, (h*2, w*2), gpu_warped_left1, borderMode=cv2.BORDER_CONSTANT, stream=stream1)

    def warpPerspectiveKornia(gpu_frame, homography_matrix):
        gpu_warped = kornia.warp_perspective(gpu_frame, homography_matrix, dsize=(h*2, w*2))
        return gpu_warped
    
    def calc_overlap_and_weights(tensor_warped_left1, tensor_warped_right1):
        mask_left = torch.all(tensor_warped_left1 != torch.tensor([0, 0, 0]).cuda(), dim=-1)
        mask_right = torch.all(tensor_warped_right1 != torch.tensor([0, 0, 0]).cuda(), dim=-1)
        overlap_mask = mask_left & mask_right
        overlap_width = torch.sum(overlap_mask.any(dim=0))
        weight_left = torch.linspace(1, 0, steps=overlap_width).cuda()
        weight_right = torch.linspace(0, 1, steps=overlap_width).cuda()
        canvas1 = torch.zeros([3, w*2 - overlap_width, h*2])
        return overlap_width, weight_left, weight_right
    
    def find_non_black_indices(image):
        # Sum the pixel values along the color channel (assuming the color channel is the last dimension)
        # and then along the vertical axis to create a 1D tensor
        sum_along_axis = image.sum(dim=[0, -1])
    
        # Find the first and last non-black column
        non_black_indices = (sum_along_axis > 0).nonzero(as_tuple=True)[0]
    
        if len(non_black_indices) == 0:
            return None, None
    
        return non_black_indices[0].item(), non_black_indices[-1].item()
    
    def blend_images(frame_left_tensor, frame_right_tensor, weight_left_tensor, weight_right_tensor, overlap_width, output):
    
        # Get the indices for the non-black regions
        start_idx_left, end_idx_left = find_non_black_indices(frame_left_tensor)
        start_idx_right, end_idx_right = find_non_black_indices(frame_right_tensor)
       
        if start_idx_left is None or end_idx_left is None or start_idx_right is None or end_idx_right is None:
            print("Error: Unable to find non-black indices.")
            return None  # or handle the error in some other way
    
        # Get the regions to be blended
        blend_region_left = frame_left_tensor[:, end_idx_left - overlap_width:end_idx_left, :]
        blend_region_right = frame_right_tensor[:, start_idx_right - 1:start_idx_right + overlap_width - 1, :]
       
        weight_left_expanded = weight_left_tensor.unsqueeze(0).unsqueeze(2).expand_as(blend_region_left)
        weight_right_expanded = weight_right_tensor.unsqueeze(0).unsqueeze(2).expand_as(blend_region_right)
    
        # Perform the blending
        blend_result = (blend_region_left * weight_left_expanded + blend_region_right * weight_right_expanded) / (weight_left_expanded + weight_right_expanded)
       
        # Create a copy of the left frame on the GPU to hold the output
        output = frame_left_tensor.clone()
       
        # Update the overlap region in the output tensor
        output[:, end_idx_left - overlap_width:end_idx_left, :] = blend_result
       
        return output

    Hi @Nearpoint ,
    You could use Warp node to warp the frames (or use Kornia as you mentioned) before you'd send them to the custom NN. Let us know if this will work, as some operations aren't supported, and some are quite slow (anything that's not vectorized will be very slow).