Seeking face (or eye) 3D localisation to integrate into animatronics project

ShannonH

Hi - first time posting here.
I recently purchased the Oak-D cameras and am really loving the performance and functionality.
I'm looking for advise on what model / code to use for lightweight face (or eye) 3D locolocation at > 10Hz refresh.
I would like to use a Raspberri Pi as the host, not a hard constraint if this is too difficult.
To give context, I am wanting to integrate the Oak-D into an animatronic project so that the animatronic creature reacts to people's movements in front of it.
Any suggestions on which models / code / hosts to use would be greatly appreciated!
Thanks, Shannon

erik

Hello @ShannonH ,
awesome that you like the camera! You could actually just use this demo (img below), as it does face detection and displays 3D (in the demo only Z coordinate) coordinates of the face, and also age/gender estimation (which you can remove for your use case). RPi shouldn't have a problem of handling this pipeline, as everything happens on the device itself - so streaming 3D coordinates of face would work at ~ 30FPS actually.
Thanks, Erik

ShannonH

Thankyou very much Eric, you've saved me a bit of time trying to locate an appropriate example - really appreciate your advice. I'll post the results on this thread when we have the project up and running

ShannonH

We have the camera face tracking working well and talking to the servo motor controller!
Thankyou very much for the advise regarding which DepthaAI demo would be most suitable.
For this applircation it would be really neat to have an expanded FOV to enable a face tracking over a wider spatial range.
Does the age/gender demo use the maximum available FOV of the Oak-D camera?
If not, are you able to provide some advice on how to increase the FOV?

erik

ShannonH It does use the largest vertical FOV but not horizontal FOV - as it crops frames to 1:1 aspect ratio. I would suggest looking at this tutorial to get larger FOV. You can also check Displaying detections in High-Res - specifically option 3/4 as it's related to your question.
Thanks, Erik

ShannonH

Increase of horizontal FOV is exactly what we're looking for.
Unsucessfully attemped letterboxing using the code below.
Changed the imagemanip 'setResize' function to 'setResizeThumbnail'
It looks like it made no change 🙁
Also, the link to the letterbox code example on this page is broken.
Any suggestions?


def create_pipeline():
    print("Creating pipeline...")
    pipeline = dai.Pipeline()

    print("Creating Color Camera...")
    cam = pipeline.create(dai.node.ColorCamera)
    cam.setPreviewSize(1080, 1080)
    cam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
    cam.setInterleaved(False)
    cam.setBoardSocket(dai.CameraBoardSocket.RGB)
    cam_xout = pipeline.createXLinkOut()
    cam_xout.setStreamName("cam_out")
    cam.preview.link(cam_xout.input)

    # ImageManip that will crop the frame before sending it to the Face detection NN node
    face_det_manip = pipeline.create(dai.node.ImageManip)
    #face_det_manip.initialConfig.setResize(300, 300)
    face_det_manip.initialConfig.setResizeThumbnail(300, 300)
    face_det_manip.initialConfig.setFrameType(dai.RawImgFrame.Type.RGB888p)

    monoLeft = pipeline.create(dai.node.MonoCamera)
    monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
    monoRight = pipeline.create(dai.node.MonoCamera)
    monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

    stereo = pipeline.create(dai.node.StereoDepth)
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    monoLeft.out.link(stereo.left)
    monoRight.out.link(stereo.right)

    # NeuralNetwork
    print("Creating Face Detection Neural Network...")
    face_det_nn = pipeline.create(dai.node.MobileNetSpatialDetectionNetwork)
    face_det_nn.setConfidenceThreshold(0.5)
    face_det_nn.setBoundingBoxScaleFactor(0.8)
    face_det_nn.setDepthLowerThreshold(100)
    face_det_nn.setDepthUpperThreshold(5000)
    face_det_nn.setBlobPath(blobconverter.from_zoo(name="face-detection-retail-0004", shaves=6))

    cam.preview.link(face_det_manip.inputImage)
    stereo.depth.link(face_det_nn.inputDepth)

    # Link Face ImageManip -> Face detection NN node
    face_det_manip.out.link(face_det_nn.input)

    # Send face detections to the host (for bounding boxes)
    face_det_xout = pipeline.create(dai.node.XLinkOut)
    face_det_xout.setStreamName("face_det_out")
    face_det_nn.out.link(face_det_xout.input)

    # Script node will take the output from the face detection NN as an input and set ImageManipConfig
    # to the 'age_gender_manip' to crop the initial frame
    image_manip_script = pipeline.create(dai.node.Script)
    face_det_nn.out.link(image_manip_script.inputs['face_det_in'])

    # Only send metadata, we are only interested in timestamp, so we can sync
    # depth frames with NN output
    face_det_nn.passthrough.link(image_manip_script.inputs['passthrough'])

    image_manip_script.setScript("""
l = [] # List of images
# So the correct frame will be the first in the list
# For this experiment this function is redundant, since everything
# runs in blocking mode, so no frames will get lost
def remove_prev_frame(seq):
    for rm, frame in enumerate(l):
        if frame.getSequenceNum() == seq:
            # node.warn(f"List len {len(l)} Frame with same seq num: {rm},seq {seq}")
            break
    for i in range(rm):
        l.pop(0)

while True:
    preview = node.io['preview'].tryGet()
    if preview is not None:
        # node.warn(f"New frame {preview.getSequenceNum()}")
        l.append(preview)

    face_dets = node.io['face_det_in'].tryGet()
    # node.warn(f"Faces detected: {len(face_dets)}")
    if face_dets is not None:
        passthrough = node.io['passthrough'].get()
        seq = passthrough.getSequenceNum()
        # node.warn(f"New detection {seq}")
        if len(l) == 0:
            continue
        remove_prev_frame(seq)
        img = l[0] # Matching frame is the first in the list
        l.pop(0) # Remove matching frame from the list

        for det in face_dets.detections:
            cfg = ImageManipConfig()
            cfg.setCropRect(det.xmin, det.ymin, det.xmax, det.ymax)
            cfg.setResize(62, 62)
            cfg.setKeepAspectRatio(False)
            node.io['manip_img'].send(img)
            node.io['manip_cfg'].send(cfg)
""")
    cam.preview.link(image_manip_script.inputs['preview'])

    age_gender_manip = pipeline.create(dai.node.ImageManip)
    age_gender_manip.initialConfig.setResize(62, 62)
    age_gender_manip.setWaitForConfigInput(False)
    image_manip_script.outputs['manip_cfg'].link(age_gender_manip.inputConfig)
    image_manip_script.outputs['manip_img'].link(age_gender_manip.inputImage)


    print("Pipeline created.")
    return pipeline

erik

Hello ShannonH ,
I believe it doesn't make any change since preview size is already at 1:1 (1080x1080) as specified by cam.setPreviewSize(1080, 1080). And thanks for reporting, just fixed the broken links.
Thanks, Erik