Hello Luxonis community,
I've been working on a project where I want to run a YOLO segmentation model on the DepthAI camera and extract the masks from the YOLO model. I've used the ultralytics
library to integrate the YOLO model with a video input, and then I overlay the predicted masks on the video frames.
Here's a brief overview of the steps I've taken:
Loaded the YOLO model from a checkpoint.
Opened a video file using OpenCV's VideoCapture
.
Iterated through the frames, and for each frame:
Converted the frame to a PIL image.
Predicted objects and masks using the YOLO model.
If masks were detected, drew them on the image.
Wrote the processed frames to a new video file.
Here's a snippet of the code:
$$
import cv2
from ultralytics import YOLO
from PIL import Image, ImageDraw
from shapely.geometry import Polygon
import numpy as np
model = YOLO("best.pt")
Open video capture
cap = cv2.VideoCapture('video.mp4')
Get video properties
fourcc = cv2.VideoWriter_fourcc(*'XVID')
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
Initialize VideoWriter object
out = cv2.VideoWriter('video.avi', fourcc, fps, (width, height))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Convert the frame to RGB PIL Image
img_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
results = model.predict(img_pil, conf=0.85)
result = results[0]
masks = result.masks
# Check if we have masks before any conversion or drawing
if masks is not None:
# Convert the image to RGBA for transparency support
img_pil = img_pil.convert("RGBA")
overlay = Image.new('RGBA', img_pil.size, (255, 255, 255, 0))
overlay_draw = ImageDraw.Draw(overlay)
for mask in masks:
polygon = mask.xy[0]
if len(polygon) >= 3:
overlay_draw.polygon(polygon, outline=(0, 255, 0), fill=(0, 255, 0, 127))
polygon_shapely = Polygon(polygon)
centroid = polygon_shapely.centroid
circle_radius = 5
left_up_point = (centroid.x - circle_radius, centroid.y - circle_radius)
right_down_point = (centroid.x + circle_radius, centroid.y + circle_radius)
overlay_draw.ellipse([left_up_point, right_down_point], fill=(255, 0, 0))
img_pil = Image.alpha_composite(img_pil, overlay)
frame = cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)
out.write(frame)
Release the video objects
cap.release()
out.release()
$$
I would love to get some insights on:
Is it possible to run this directly on the DepthAI camera to take advantage of its processing capabilities?
Can the DepthAI SDK handle the mask overlay tasks directly?
Looking forward to any advice, suggestions, or relevant experiences you can share!