Thank you, will try this today!
Vvedoua
- 2 days ago
- Joined 4 Feb
- 1 best answer
Hello,
Anyone having experience with using gaze detection on a video instead of livestreaming from webcam? Specifically using the gen2-gaze-detection library
I have a new use case, to use the Oak-D to record raw video (just like a webcam), then in post-processing I need to apply gaze detection to translate fixed coordinates as Areas of Interest, eg: road, left_mirror, right_mirror, etc.
- Edited
- Best Answerset by jakaskerl
Hey Jaka,
Thanks for the reply but I was able to solve this issue by implementing logic to only apply gaze estimation to the largest face in frame. As my use case was to estimate gaze of a driver WITHOUT estimating gaze of backseat passengers for Oak D placed above steering wheel, my approach works.
Hopefully my documentation below can help others encountering similar issues.
Best,
Vlad
For documentation-
My project is building heavily upon depthai-experiments/gen2-gaze-estimation. In order to record the driver's gaze estimation without having the program break each time the backseat passenger was detected, I made changes to main.py and script.py
ORIGINAL LOGS (before implementing solution):
Process Process-1: Traceback (most recent call last): File "C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\\Lib\\multiprocessing\\process.py", line 314, in _bootstrap self.run() File "C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\\Lib\\multiprocessing\\process.py", line 108, in run self._target(\*self._args, \*\*self._kwargs) File "C:\\Users\\srd0157\\OneDrive - subarujapan\\Desktop\\CV_test\\oakd_gaze\\gazeModule.py", line 242, in run_gaze_detection gaze = np.array(msgs["gaze"][i].getFirstLayerFp16()) \~\~\~\~\~\~\~\~\~\~\~\~^^^ IndexError: list index out of range
SOLUTION:
In the main.py, I changed:
msgs = sync.get_msgs() if msgs is not None: frame = msgs["color"].getCvFrame() dets = msgs["detection"].detections for i, detection in enumerate(dets): det = BoundingBox(detection) tl, br = det.denormalize(frame.shape) cv2.rectangle(frame, tl, br, (10, 245, 10), 1)
TO:
msgs = sync.get_msgs() if msgs is not None: frame = msgs["color"].getCvFrame() dets = msgs["detection"].detections gaze_data = msgs["gaze"] for i, detection in enumerate(dets): if i>= len(gaze_data): continue det = BoundingBox(detection) tl, br = det.denormalize(frame.shape) cv2.rectangle(frame, tl, br, (10, 245, 10), 1)
The other major changes can be seen in script.py further below:
script.py
THE ORIGINAL face_dets section
face_dets = node.io['face_det_in'].tryGet() if face_dets is not None: passthrough = node.io['face_pass'].get() seq = passthrough.getSequenceNum() # No detections, carry on if len(face_dets.detections) == 0: del sync[str(seq)] continue # node.warn(f"New detection {seq}") if len(sync) == 0: continue img = find_in_dict(seq, "frame") if img is None: continue add_to_dict(face_dets.detections[0], seq, "detections") for det in face_dets.detections: correct_bb(det) # To head post estimation model cfg1 = ImageManipConfig() cfg1.setCropRect(det.xmin, det.ymin, det.xmax, det.ymax) cfg1.setResize(60, 60) cfg1.setKeepAspectRatio(False) node.io['headpose_cfg'].send(cfg1) node.io['headpose_img'].send(img) # To face landmark detection model cfg2 = ImageManipConfig() cfg2.setCropRect(det.xmin, det.ymin, det.xmax, det.ymax) cfg2.setResize(48, 48) cfg2.setKeepAspectRatio(False) node.io['landmark_cfg'].send(cfg2) node.io['landmark_img'].send(img) break # Only 1 face at the time currently supported
MY MODIFIED face_dets SECTION
face_dets = node.io['face_det_in'].tryGet() if face_dets is not None: passthrough = node.io['face_pass'].get() seq = passthrough.getSequenceNum() # No detections, carry on if len(face_dets.detections) == 0: del sync[str(seq)] continue if len(sync) == 0: continue img = find_in_dict(seq, "frame") if img is None: continue # Find largest face largest_face = max(face_dets.detections, key=lambda det: (det.xmax - det.xmin) * (det.ymax - det.ymin)) add_to_dict(largest_face, seq, "detections") correct_bb(largest_face) # should process only largest face cfg1 = ImageManipConfig() cfg1.setCropRect(largest_face.xmin, largest_face.ymin, largest_face.xmax, largest_face.ymax) cfg1.setResize(60, 60) cfg1.setKeepAspectRatio(False) node.io['headpose_cfg'].send(cfg1) node.io['headpose_img'].send(img) cfg2 = ImageManipConfig() cfg2.setCropRect(largest_face.xmin, largest_face.ymin, largest_face.xmax, largest_face.ymax) cfg2.setResize(48, 48) cfg2.setKeepAspectRatio(False) node.io['landmark_cfg'].send(cfg2) node.io['landmark_img'].send(img)
Hi,
I'm trying to implement this gaze process but it's breaking when a face in the background enters frame.
If I'm correct the pipeline is:
1. Input image -->2. Face-detection-model ( I've tried both face-detection-retail-0004 (and 0005) -->
3. Cropped face -->4A. landmarks-regression-retail-0009 &
4B. head-pose-estimation-adas-0001
-->
5. Cropped L+R eyes & head pose angles -->
6. Gaze-estimation-adas-0002What confuses me is that up minus 4A ( landmarks-regression-retail-0009) this is a very similar pipeline to gen2-face-recognition which I've tested and allows me to have multiple faces simultaneously in frame.