BlogMachine LearningReleases

Auto-Annotate Datasets with LVMs using DataDreamer

The new DataDreamer v0.2.0 update expands its capabilities for generating synthetic data and pre-annotating real datasets. This version introduces SlimSAM for instance segmentation. Additionally, the Qwen 2.5 language model is now integrated as a prompt generator, boosting text data generation. With these updates, DataDreamer becomes even better suited for data preparation and annotation.

Data Prepping (Annotation)

Step 1 of training and deploying (to eg. OAK cameras) a custom ML model is data preparation. This includes collecting, cleaning, and annotating data. DataDreamer simplifies this process by providing tools for auto-annotation of images and generating synthetic data.

On the video above you can see how to use DataDreamer to auto-annotate a video (sequence of images). If you took all frames of the video
and include the in the dataset you would have a lot of similar images, which isn't good for training a model, as it would be overfitting.

That's why you'd usually take only eg. 1 frame per second, which we can accomplish easily with OpenCV (Gist here). After we have images in frames/ folder, we can use DataStreamer to auto-annotate them.

I also cropped the video (yellow rectangle) so only lemons are in the FOV (field of view), which helps with detection accuracy, as from my tests it doesn't work well with smaller objects.

Datadreamer

datadreamer --save_dir frames --class_names "lemons" --task instance-segmentation \
--annotator_size large --use_tta --image_annotator owlv2-slimsam \
--conf_threshold 0.25  --annotate_only --annotation_iou_threshold 0.1 --device cuda

Let's break down the command (all Params explain here):

  • --save_dir frames - folder with images
  • --class_names "lemons" - class name - I only want to detect lemons
  • --task instance-segmentation - we want to do instance segmentation, which includes bounding boxes (detection) and masks. You could change it to detection if you only want bounding boxes, or classification if you only want to classify images.
  • annotator_size large - size of the annotator. You can choose between base and large. large is more accurate, but slower.
  • use_tta - use Test Time Augmentation. This will do rotations/flips/crops/scaling to image and aggregate predictions to make outputs more robust. It provides better results, but is slower.
  • image_annotator owlv2-slimsam - Currently only annotator that supports segmentation. owlv2 can be used for detection, and clip for image classification.
  • conf_threshold 0.25 - confidence threshold. If the model is less than 25% sure that it's a lemon, it won't annotate it.
  • annotate_only - just annotate images, don't generate new/synthetic data.
  • annotation_iou_threshold 0.1 - IOU threshold.
  • device cuda - use NVIDIA GPU for inference. If you don't have a GPU, you can use cpu, but it will be much slower.

After letting it sit for a couple of minutes (or hours, depending on the number of images), you'll have annotated images in frames/annotations.json file.

erik@eriks-MacBook-Pro frames % ls -l
total 140424
-rw-r--r--  1 erik  staff  28571010 Nov 26 15:02 annotations.json
-rw-r--r--  1 erik  staff    135045 Nov 26 14:56 frame_0000.jpg
-rw-r--r--  1 erik  staff    132306 Nov 26 14:56 frame_0001.jpg
-rw-r--r--  1 erik  staff    128611 Nov 26 14:56 frame_0002.jpg
-rw-r--r--  1 erik  staff    130135 Nov 26 14:56 frame_0003.jpg
-rw-r--r--  1 erik  staff    129604 Nov 26 14:56 frame_0004.jpg
...

Annotations

These are in raw format, with detections (bounding boxes), masks, and labels for each image. One could also include --dataset_format yolo or --dataset_format coco argument to get annotations in YOLO or COCO format.

Visualizing Annotations

To visualize annotations (draw mask/detection/label on the image), we'll be using SuperVision library, which is an opensource CV tool for easy visualizing and debugging computer vision models.

import cv2
import numpy as np
import json
import os
import supervision as sv
from supervision.detection.utils import polygon_to_mask

# Path to your folder containing images and annotation JSON file
image_folder = "frames"
annotation_file = "frames/annotations.json"

# Load annotations
with open(annotation_file, 'r') as f:
    annotations = json.load(f)

# Sort annotations by image_name
annotations = {k: v for k, v in sorted(annotations.items(), key=lambda item: item[0])}

yellow = sv.Color(255,240,0)
label_anotator = sv.LabelAnnotator(color=yellow, text_color=sv.Color(0,0,0))
polygon_annotator = sv.PolygonAnnotator(color=yellow, thickness=3)

# Iterate through each image in the annotations
for image_name, data in annotations.items():
    image_path = os.path.join(image_folder, image_name)
    image = cv2.imread(image_path)

    if image is None:
        print(f"Could not read image: {image_name}")
        continue

    # Extract bounding boxes and labels from image annotation
    boxes = np.array(data["boxes"], dtype=np.float32)  # Bounding boxes
    class_ids = np.array(data["labels"], dtype=np.int32)  # Class IDs

    if "masks" in data:
        masks = []
        for polygon in data["masks"]:
            new_mask = polygon_to_mask(
                polygon=np.array(polygon, dtype=np.int32),
                resolution_wh=(image.shape[1], image.shape[0])  # (width, height)
            )
            masks.append(new_mask.astype(bool))
        masks = np.array(masks, dtype=bool)
    else:
        masks = None

    # Create sv.Detections object
    if len(boxes) != 0:
        detections = sv.Detections(
            xyxy=boxes,
            class_id=class_ids,
            mask=masks
        )

        # Draw labels and polygons on the image
        labels = ["Lemon"] * len(data["labels"]) # Only 1 class anyways
        image = label_anotator.annotate(scene=image, detections=detections, labels=labels)
        image = polygon_annotator.annotate(scene=image, detections=detections)

    # Display the annotated image
    cv2.imshow("Annotated Image", image)
    cv2.waitKey(33) # Wait 33ms

cv2.destroyAllWindows()

OpenCV will open a window with the image and annotations, like on the image below. You can use q to close the window.


On the next blog post, we'll cover how to train the model and then deploy it to OAK cameras. Stay tuned!

Let us know if you have any questions/thoughts in the comments🙂
- Erik

Comments (2)

The new DataDreamer v0.2.0 update expands its capabilities for generating synthetic data and pre-annotating real datasets. This version introduces SlimSAM for instance segmentation. Additionally, the Qwen 2.5 language model is now integrated as a prompt generator, boosting text data generation. With these updates, DataDreamer becomes even better suited for data preparation and annotation.

Data Prepping (Annotation)

Step 1 of training and deploying (to eg. OAK cameras) a custom ML model is data preparation. This includes collecting, cleaning, and annotating data. DataDreamer simplifies this process by providing tools for auto-annotation of images and generating synthetic data.

On the video above you can see how to use DataDreamer to auto-annotate a video (sequence of images). If you took all frames of the video
and include the in the dataset you would have a lot of similar images, which isn't good for training a model, as it would be overfitting.

That's why you'd usually take only eg. 1 frame per second, which we can accomplish easily with OpenCV (Gist here). After we have images in frames/ folder, we can use DataStreamer to auto-annotate them.

I also cropped the video (yellow rectangle) so only lemons are in the FOV (field of view), which helps with detection accuracy, as from my tests it doesn't work well with smaller objects.

Datadreamer

datadreamer --save_dir frames --class_names "lemons" --task instance-segmentation \
--annotator_size large --use_tta --image_annotator owlv2-slimsam \
--conf_threshold 0.25  --annotate_only --annotation_iou_threshold 0.1 --device cuda

Let's break down the command (all Params explain here):

  • --save_dir frames - folder with images
  • --class_names "lemons" - class name - I only want to detect lemons
  • --task instance-segmentation - we want to do instance segmentation, which includes bounding boxes (detection) and masks. You could change it to detection if you only want bounding boxes, or classification if you only want to classify images.
  • annotator_size large - size of the annotator. You can choose between base and large. large is more accurate, but slower.
  • use_tta - use Test Time Augmentation. This will do rotations/flips/crops/scaling to image and aggregate predictions to make outputs more robust. It provides better results, but is slower.
  • image_annotator owlv2-slimsam - Currently only annotator that supports segmentation. owlv2 can be used for detection, and clip for image classification.
  • conf_threshold 0.25 - confidence threshold. If the model is less than 25% sure that it's a lemon, it won't annotate it.
  • annotate_only - just annotate images, don't generate new/synthetic data.
  • annotation_iou_threshold 0.1 - IOU threshold.
  • device cuda - use NVIDIA GPU for inference. If you don't have a GPU, you can use cpu, but it will be much slower.

After letting it sit for a couple of minutes (or hours, depending on the number of images), you'll have annotated images in frames/annotations.json file.

erik@eriks-MacBook-Pro frames % ls -l
total 140424
-rw-r--r--  1 erik  staff  28571010 Nov 26 15:02 annotations.json
-rw-r--r--  1 erik  staff    135045 Nov 26 14:56 frame_0000.jpg
-rw-r--r--  1 erik  staff    132306 Nov 26 14:56 frame_0001.jpg
-rw-r--r--  1 erik  staff    128611 Nov 26 14:56 frame_0002.jpg
-rw-r--r--  1 erik  staff    130135 Nov 26 14:56 frame_0003.jpg
-rw-r--r--  1 erik  staff    129604 Nov 26 14:56 frame_0004.jpg
...

Annotations

These are in raw format, with detections (bounding boxes), masks, and labels for each image. One could also include --dataset_format yolo or --dataset_format coco argument to get annotations in YOLO or COCO format.

Visualizing Annotations

To visualize annotations (draw mask/detection/label on the image), we'll be using SuperVision library, which is an opensource CV tool for easy visualizing and debugging computer vision models.

import cv2
import numpy as np
import json
import os
import supervision as sv
from supervision.detection.utils import polygon_to_mask

# Path to your folder containing images and annotation JSON file
image_folder = "frames"
annotation_file = "frames/annotations.json"

# Load annotations
with open(annotation_file, 'r') as f:
    annotations = json.load(f)

# Sort annotations by image_name
annotations = {k: v for k, v in sorted(annotations.items(), key=lambda item: item[0])}

yellow = sv.Color(255,240,0)
label_anotator = sv.LabelAnnotator(color=yellow, text_color=sv.Color(0,0,0))
polygon_annotator = sv.PolygonAnnotator(color=yellow, thickness=3)

# Iterate through each image in the annotations
for image_name, data in annotations.items():
    image_path = os.path.join(image_folder, image_name)
    image = cv2.imread(image_path)

    if image is None:
        print(f"Could not read image: {image_name}")
        continue

    # Extract bounding boxes and labels from image annotation
    boxes = np.array(data["boxes"], dtype=np.float32)  # Bounding boxes
    class_ids = np.array(data["labels"], dtype=np.int32)  # Class IDs

    if "masks" in data:
        masks = []
        for polygon in data["masks"]:
            new_mask = polygon_to_mask(
                polygon=np.array(polygon, dtype=np.int32),
                resolution_wh=(image.shape[1], image.shape[0])  # (width, height)
            )
            masks.append(new_mask.astype(bool))
        masks = np.array(masks, dtype=bool)
    else:
        masks = None

    # Create sv.Detections object
    if len(boxes) != 0:
        detections = sv.Detections(
            xyxy=boxes,
            class_id=class_ids,
            mask=masks
        )

        # Draw labels and polygons on the image
        labels = ["Lemon"] * len(data["labels"]) # Only 1 class anyways
        image = label_anotator.annotate(scene=image, detections=detections, labels=labels)
        image = polygon_annotator.annotate(scene=image, detections=detections)

    # Display the annotated image
    cv2.imshow("Annotated Image", image)
    cv2.waitKey(33) # Wait 33ms

cv2.destroyAllWindows()

OpenCV will open a window with the image and annotations, like on the image below. You can use q to close the window.


On the next blog post, we'll cover how to train the model and then deploy it to OAK cameras. Stay tuned!

Let us know if you have any questions/thoughts in the comments🙂
- Erik

@SamiUddin Good catch from the image🙂 Step 1 and 2 (dataset preparation and ML training) is the same, deployment is minimally different and we'll include both options in tutorials:

  • Local / online model conversion (similar to blobconverter)
  • RVC2 / RVC4 - from onnx you can compile & use within depthaiv3 for either