I have a use case to do a liveness check (e.g. detect fake/static faces on camera) and was wondering if Oak-D can do it making use of the depth information?

I am not an expert but tried and failed

  • using the depth frame after detecting the face (eg try to see if depth of pixels near centroid are changing and not flat like a screen but the depth seem to flicker regardless)
  • estimate the height of subject (I tried some formula but inaccurate, #height=detection.spatialCoordinates.z/(pixel height)/focal length (4.52mm?)). It will filter small faces in camera but a actual size printed face would defeat this I guess.

I read about getting 3d position of face landmark and am not sure if anyone tried and got one to share (though I think a printed face that is contour to a person's face will defeat it again)?

Thanks

Is there someone who had a similar use case and found success?

    Hello sky,
    I can't recall anyone doing a similar project, but IMO it would be possible.
    As you have mentioned, there are several ways to perform this. Besides a few that you mentioned you could also plot all depth points (in face ROI) and bin them, there might be some features by which you could distinguish between fake (eg printed on paper) faces from real 3D ones.
    You could use face landmark model (this experiment uses it) and get spatial coordinates of the landmarks. Checking if nose it closer than eyes could be one way to check if it's a real face. For getting coordinates from the landmarks you could either use SpatialLocationCalculator(SLC) - decode landmarks, get ROI, send it to SLC, either on host or in Script node, or you could calculate spatial coordinates on host (code snippet here).
    I am unsure if getting the size of the head would be sufficient, as "attacker" could just print a face - of a size of a face - on a paper.
    Thanks, Erik

    Yes, getting the head size is insufficient, and will printing an A4 head bending to the face is a possible attack.
    The eyes closer than nose method I didn't continue as it is possible to detect a face that is slightly tilted to the camera (e.g. left eye/spectacle is closer than nose to the camera). Maybe I should treat that as a fake or hint person to face camera.

    The binning one I probably will try as not sure what to do with the depth so far.

    I guess there will be some dirty coding (mixing methods) to achieve what I want. Thanks,

    I'm not following super closely but one idea would be train a model that takes in left, right and RGB into the model directly.

    Thoughts?

    Thanks,
    Brandon

    5 days later

    I am currently using the available OpenVINO model for detecting a face. Training a model meant doing it from scratch for detecting a face both on RGB and interpolate the results to determine if it is real?

      sky Yes that's right. What I suggested would be the more complex approach - but likely harder to fake.

      We could likely eventually directly help with this but our ML team is a bit underwater right now.

      a year later