Hi @KlemenSkrlj, I think I’ve identified what might be causing the duplicate UUID warnings:
I’m using images labeled in YOLO format instead of XML, as shown in the tutorial.
Each image has multiple annotations (i.e., multiple objects), and I suspect this is leading to the same UUID being assigned multiple times.
Here's a simplified version of the script I'm using to parse the dataset:
$$
import cv2
from luxonis_ml.data import DatasetIterator
from pathlib import Path
CLASS_NAMES = ["a", "b", "c"] # Example class names
def process_dir(dir_path: Path) -> tuple[DatasetIterator, list[str]]:
images = [str(i.absolute().resolve()) for i in dir_path.glob("*.jpg")]
def generator() -> DatasetIterator:
for img_path in images:
img_path = Path(img_path)
txt_path = img_path.with_suffix(".txt")
if not txt_path.exists():
continue
height, width, _ = cv2.imread(str(img_path)).shape
with open(txt_path, "r") as f:
for line in f:
parts = line.strip().split()
if len(parts) != 5:
continue
class_id, x_center, y_center, w, h = map(float, parts)
class_name = CLASS_NAMES[int(class_id)] if int(class_id) < len(CLASS_NAMES) else str(class_id)
# Convert to top-left corner x, y
x = x_center - w / 2
y = y_center - h / 2
yield {
"file": str(img_path),
"annotation": {
"class": class_name,
"boundingbox": {
"x": x,
"y": y,
"w": w,
"h": h
}
}
}
return generator(), images
$$
Let me know if this interpretation makes sense. If not, I’m considering cleaning up the dataset by removing augmentations and retrying with a simpler version to confirm. Would appreciate your thoughts on how best to handle this.