Hello MichalO,
that could work, though you will need to create a custom CV model (tutorial here) to take frame + face bounding box to blur the face. Blurring itself is simple (tutorial), but blurring based on bounding box could be challenging. Another option would be to blur frames on the host with openCV, which would be easier (and there are tons of tutorials as well). What kind of low-latency do you require? There will be a few 100ms, since face detection (NN inference) by itself takes 100ms. Regarding the fish-eye lens, I am unsure if ArduCam provides color camera replacement (for OAK-1), but you could attach an external lens (eg this).
Thanks, Erik