The new [DataDreamer v0.2.0](https://github.com/luxonis/datadreamer) update expands its capabilities for generating synthetic data and pre-annotating real datasets. This version introduces [SlimSAM](https://github.com/czg1225/SlimSAM) for **instance segmentation**. Additionally, the Qwen 2.5 language model is now integrated as a prompt generator, boosting text data generation. With these updates, DataDreamer becomes even better suited for data preparation and annotation. https://www.youtube.com/watch?v=FkZKzspzQJU ## Data Prepping (Annotation) Step 1 of training and deploying (to eg. OAK cameras) a custom ML model is data preparation. This includes collecting, cleaning, and annotating data. DataDreamer simplifies this process by providing tools for auto-annotation of images and generating synthetic data. [upl-image-preview uuid=e601bec0-30d5-471d-a4be-d711cac62d47 url=https://discuss.luxonis.com/assets/files/2024-12-03/1733215147-956366-ml-training-flow.jpg alt={TEXT?}] On the video above you can see how to use DataDreamer to auto-annotate a video (sequence of images). If you took all frames of the video and include the in the dataset you would have a lot of similar images, which isn't good for training a model, as it would be [overfitting](https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/). That's why you'd usually take only eg. 1 frame per second, which we can accomplish easily with OpenCV ([Gist here](https://gist.github.com/Erol444/e2b4980032b65b59e0736e8412d75e9b)). After we have images in `frames/` folder, we can use DataStreamer to auto-annotate them. I also cropped the video (yellow rectangle) so only lemons are in the FOV (field of view), which helps with detection accuracy, as from my tests it doesn't work well with smaller objects. ## Datadreamer ```bash datadreamer --save_dir frames --class_names "lemons" --task instance-segmentation \ --annotator_size large --use_tta --image_annotator owlv2-slimsam \ --conf_threshold 0.25 --annotate_only --annotation_iou_threshold 0.1 --device cuda ``` Let's break down the command (all [Params explain here](https://github.com/luxonis/datadreamer?tab=readme-ov-file#-additional-parameters)): - `--save_dir frames` - folder with images - `--class_names "lemons"` - class name - I only want to detect lemons - `--task instance-segmentation` - we want to do instance segmentation, which includes bounding boxes (detection) and masks. You could change it to `detection` if you only want bounding boxes, or `classification` if you only want to classify images. - `annotator_size large` - size of the annotator. You can choose between `base` and `large`. `large` is more accurate, but slower. - `use_tta` - use [Test Time Augmentation](https://towardsdatascience.com/test-time-augmentation-tta-and-how-to-perform-it-with-keras-4ac19b67fb4d). This will do rotations/flips/crops/scaling to image and aggregate predictions to make outputs more robust. It provides better results, but is slower. - `image_annotator owlv2-slimsam` - Currently only annotator that supports segmentation. `owlv2` can be used for detection, and `clip` for image classification. - `conf_threshold 0.25` - confidence threshold. If the model is less than 25% sure that it's a lemon, it won't annotate it. - `annotate_only` - just annotate images, don't generate new/synthetic data. - `annotation_iou_threshold 0.1` - [IOU](https://learnopencv.com/intersection-over-union-iou-in-object-detection-and-segmentation/) threshold. - `device cuda` - use NVIDIA GPU for inference. If you don't have a GPU, you can use `cpu`, but it will be much slower. After letting it sit for a couple of minutes (or hours, depending on the number of images), you'll have annotated images in `frames/annotations.json` file. ```bash erik@eriks-MacBook-Pro frames % ls -l total 140424 -rw-r--r-- 1 erik staff 28571010 Nov 26 15:02 annotations.json -rw-r--r-- 1 erik staff 135045 Nov 26 14:56 frame_0000.jpg -rw-r--r-- 1 erik staff 132306 Nov 26 14:56 frame_0001.jpg -rw-r--r-- 1 erik staff 128611 Nov 26 14:56 frame_0002.jpg -rw-r--r-- 1 erik staff 130135 Nov 26 14:56 frame_0003.jpg -rw-r--r-- 1 erik staff 129604 Nov 26 14:56 frame_0004.jpg ... ``` ### Annotations These are in raw format, with detections (bounding boxes), masks, and labels for each image. One could also include `--dataset_format yolo` or `--dataset_format coco` argument to get annotations in YOLO or COCO format. ### Visualizing Annotations To visualize annotations (draw mask/detection/label on the image), we'll be using [SuperVision](https://github.com/roboflow/supervision) library, which is an opensource CV tool for easy visualizing and debugging computer vision models. ```python import cv2 import numpy as np import json import os import supervision as sv from supervision.detection.utils import polygon_to_mask # Path to your folder containing images and annotation JSON file image_folder = "frames" annotation_file = "frames/annotations.json" # Load annotations with open(annotation_file, 'r') as f: annotations = json.load(f) # Sort annotations by image_name annotations = {k: v for k, v in sorted(annotations.items(), key=lambda item: item[0])} yellow = sv.Color(255,240,0) label_anotator = sv.LabelAnnotator(color=yellow, text_color=sv.Color(0,0,0)) polygon_annotator = sv.PolygonAnnotator(color=yellow, thickness=3) # Iterate through each image in the annotations for image_name, data in annotations.items(): image_path = os.path.join(image_folder, image_name) image = cv2.imread(image_path) if image is None: print(f"Could not read image: {image_name}") continue # Extract bounding boxes and labels from image annotation boxes = np.array(data["boxes"], dtype=np.float32) # Bounding boxes class_ids = np.array(data["labels"], dtype=np.int32) # Class IDs if "masks" in data: masks = [] for polygon in data["masks"]: new_mask = polygon_to_mask( polygon=np.array(polygon, dtype=np.int32), resolution_wh=(image.shape[1], image.shape[0]) # (width, height) ) masks.append(new_mask.astype(bool)) masks = np.array(masks, dtype=bool) else: masks = None # Create sv.Detections object if len(boxes) != 0: detections = sv.Detections( xyxy=boxes, class_id=class_ids, mask=masks ) # Draw labels and polygons on the image labels = ["Lemon"] * len(data["labels"]) # Only 1 class anyways image = label_anotator.annotate(scene=image, detections=detections, labels=labels) image = polygon_annotator.annotate(scene=image, detections=detections) # Display the annotated image cv2.imshow("Annotated Image", image) cv2.waitKey(33) # Wait 33ms cv2.destroyAllWindows() ``` OpenCV will open a window with the image and annotations, like on the image below. You can use `q` to close the window. [upl-image-preview uuid=ca5a5ddd-b2bc-4e71-bbc7-410719cf062f url=https://discuss.luxonis.com/assets/files/2024-12-03/1733218171-104063-screenshot-2024-12-03-at-42647-pm.webp alt={TEXT?}] ----- On the next blog post, we'll cover how to train the model and then deploy it to OAK cameras. Stay tuned! Let us know if you have any questions/thoughts in the comments:) \- Erik

@"erik"#175 better if tutorial should focus on OAK4.

Auto-Annotate Datasets with LVMs using DataDreamer

erik

The new DataDreamer v0.2.0 update expands its capabilities for generating synthetic data and pre-annotating real datasets. This version introduces SlimSAM for instance segmentation. Additionally, the Qwen 2.5 language model is now integrated as a prompt generator, boosting text data generation. With these updates, DataDreamer becomes even better suited for data preparation and annotation.

Data Prepping (Annotation)

Step 1 of training and deploying (to eg. OAK cameras) a custom ML model is data preparation. This includes collecting, cleaning, and annotating data. DataDreamer simplifies this process by providing tools for auto-annotation of images and generating synthetic data.

On the video above you can see how to use DataDreamer to auto-annotate a video (sequence of images). If you took all frames of the video
and include the in the dataset you would have a lot of similar images, which isn't good for training a model, as it would be overfitting.

That's why you'd usually take only eg. 1 frame per second, which we can accomplish easily with OpenCV (Gist here). After we have images in frames/ folder, we can use DataStreamer to auto-annotate them.

I also cropped the video (yellow rectangle) so only lemons are in the FOV (field of view), which helps with detection accuracy, as from my tests it doesn't work well with smaller objects.

Datadreamer

datadreamer --save_dir frames --class_names "lemons" --task instance-segmentation \
--annotator_size large --use_tta --image_annotator owlv2-slimsam \
--conf_threshold 0.25  --annotate_only --annotation_iou_threshold 0.1 --device cuda

Let's break down the command (all Params explain here):

--save_dir frames - folder with images
--class_names "lemons" - class name - I only want to detect lemons
--task instance-segmentation - we want to do instance segmentation, which includes bounding boxes (detection) and masks. You could change it to detection if you only want bounding boxes, or classification if you only want to classify images.
annotator_size large - size of the annotator. You can choose between base and large. large is more accurate, but slower.
use_tta - use Test Time Augmentation. This will do rotations/flips/crops/scaling to image and aggregate predictions to make outputs more robust. It provides better results, but is slower.
image_annotator owlv2-slimsam - Currently only annotator that supports segmentation. owlv2 can be used for detection, and clip for image classification.
conf_threshold 0.25 - confidence threshold. If the model is less than 25% sure that it's a lemon, it won't annotate it.
annotate_only - just annotate images, don't generate new/synthetic data.
annotation_iou_threshold 0.1 - IOU threshold.
device cuda - use NVIDIA GPU for inference. If you don't have a GPU, you can use cpu, but it will be much slower.

After letting it sit for a couple of minutes (or hours, depending on the number of images), you'll have annotated images in frames/annotations.json file.

erik@eriks-MacBook-Pro frames % ls -l
total 140424
-rw-r--r--  1 erik  staff  28571010 Nov 26 15:02 annotations.json
-rw-r--r--  1 erik  staff    135045 Nov 26 14:56 frame_0000.jpg
-rw-r--r--  1 erik  staff    132306 Nov 26 14:56 frame_0001.jpg
-rw-r--r--  1 erik  staff    128611 Nov 26 14:56 frame_0002.jpg
-rw-r--r--  1 erik  staff    130135 Nov 26 14:56 frame_0003.jpg
-rw-r--r--  1 erik  staff    129604 Nov 26 14:56 frame_0004.jpg
...

Annotations

These are in raw format, with detections (bounding boxes), masks, and labels for each image. One could also include --dataset_format yolo or --dataset_format coco argument to get annotations in YOLO or COCO format.

Visualizing Annotations

To visualize annotations (draw mask/detection/label on the image), we'll be using SuperVision library, which is an opensource CV tool for easy visualizing and debugging computer vision models.

import cv2
import numpy as np
import json
import os
import supervision as sv
from supervision.detection.utils import polygon_to_mask

# Path to your folder containing images and annotation JSON file
image_folder = "frames"
annotation_file = "frames/annotations.json"

# Load annotations
with open(annotation_file, 'r') as f:
    annotations = json.load(f)

# Sort annotations by image_name
annotations = {k: v for k, v in sorted(annotations.items(), key=lambda item: item[0])}

yellow = sv.Color(255,240,0)
label_anotator = sv.LabelAnnotator(color=yellow, text_color=sv.Color(0,0,0))
polygon_annotator = sv.PolygonAnnotator(color=yellow, thickness=3)

# Iterate through each image in the annotations
for image_name, data in annotations.items():
    image_path = os.path.join(image_folder, image_name)
    image = cv2.imread(image_path)

    if image is None:
        print(f"Could not read image: {image_name}")
        continue

    # Extract bounding boxes and labels from image annotation
    boxes = np.array(data["boxes"], dtype=np.float32)  # Bounding boxes
    class_ids = np.array(data["labels"], dtype=np.int32)  # Class IDs

    if "masks" in data:
        masks = []
        for polygon in data["masks"]:
            new_mask = polygon_to_mask(
                polygon=np.array(polygon, dtype=np.int32),
                resolution_wh=(image.shape[1], image.shape[0])  # (width, height)
            )
            masks.append(new_mask.astype(bool))
        masks = np.array(masks, dtype=bool)
    else:
        masks = None

    # Create sv.Detections object
    if len(boxes) != 0:
        detections = sv.Detections(
            xyxy=boxes,
            class_id=class_ids,
            mask=masks
        )

        # Draw labels and polygons on the image
        labels = ["Lemon"] * len(data["labels"]) # Only 1 class anyways
        image = label_anotator.annotate(scene=image, detections=detections, labels=labels)
        image = polygon_annotator.annotate(scene=image, detections=detections)

    # Display the annotated image
    cv2.imshow("Annotated Image", image)
    cv2.waitKey(33) # Wait 33ms

cv2.destroyAllWindows()

OpenCV will open a window with the image and annotations, like on the image below. You can use q to close the window.

On the next blog post, we'll cover how to train the model and then deploy it to OAK cameras. Stay tuned!

Let us know if you have any questions/thoughts in the comments🙂
- Erik

SamiUddin

@erik better if tutorial should focus on OAK4.

erik

@SamiUddin Good catch from the image🙂 Step 1 and 2 (dataset preparation and ML training) is the same, deployment is minimally different and we'll include both options in tutorials:

Local / online model conversion (similar to blobconverter)
RVC2 / RVC4 - from onnx you can compile & use within depthaiv3 for either